analysis of sequence from tem14
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATEDEYYRRPLQV
LRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQE
A
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
sec.str. with predator
> gi|30366|emb|CAA38478.1|
. . . . .
1 MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF 50
___HHHHHHHHHHHHHHHHHHH__________________HHHHHHHHHH
. . . . .
51 AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN 100
HHHHHH_________HHHHHHHHHHHH_____EEEEEEEEEEEE______
. . . .
101 LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQE 140
__________HHHHHHHHH_EEEEEE___HHHHHH_____
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
method : 1
alpha-contents : 75.8 %
beta-contents : 1.5 %
coil-contents : 22.7 %
class : alpha
method : 2
alpha-contents : 61.7 %
beta-contents : 0.0 %
coil-contents : 38.3 %
class : alpha
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
GPI: learning from metazoa
-14.70 -5.84 -3.19 -5.14 -4.00 0.00 -32.00 0.00 -0.08 -6.61 -1.84 -12.00 -12.00 0.00 0.00 0.00 -97.40
-7.02 -5.79 -5.85 -2.15 0.00 0.00 -24.00 0.00 -0.06 -6.61 -1.84 -12.00 -12.00 0.00 -12.00 0.00 -89.32
ID: gi|30366|emb|CAA38478.1| AC: xxx Len: 140 1:I 129 Sc: -89.32 Pv: 9.270793e-01 NO_GPI_SITE
GPI: learning from protozoa
-14.45 -7.00 -3.85 -1.49 -4.00 0.00 -32.00 0.00 0.00 -5.73 -7.18 -12.00 -12.00 0.00 0.00 0.00 -99.70
-22.52 -2.64 -1.42 -2.38 -4.00 0.00 0.00 0.00 -1.28 -8.14 -7.18 -12.00 -12.00 -12.00 -12.00 0.00 -97.56
ID: gi|30366|emb|CAA38478.1| AC: xxx Len: 140 1:I 119 Sc: -97.56 Pv: 8.274222e-01 NO_GPI_SITE
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
# SignalP euk predictions
# name Cmax pos ? Ymax pos ? Smax pos ? Smean ?
gi|30366|em 0.842 21 Y 0.840 21 Y 0.979 11 Y 0.949 Y
# SignalP gram- predictions
# name Cmax pos ? Ymax pos ? Smax pos ? Smean ?
gi|30366|em 0.540 19 Y 0.650 19 Y 0.993 9 Y 0.956 Y
# SignalP gram+ predictions
# name Cmax pos ? Ymax pos ? Smax pos ? Smean ?
gi|30366|em 0.258 59 N 0.348 19 Y 0.997 9 Y 0.952 Y
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
low complexity regions: SEG 12 2.2 2.5
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
1-4 MARP
lctllllmatlagala 5-20
21-141 SSSKEENRIIPGGIYDADLNDEWVQRALHF
AISEYNKATEDEYYRRPLQVLRAREQTFGG
VNYFFDVEVGRTICTKSQPNLDTCAFHEQP
ELQKKQLCSFEIYEVPWEDRMSLVNSRCQE
A
low complexity regions: SEG 25 3.0 3.3
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
1-141 MARPLCTLLLLMATLAGALASSSKEENRII
PGGIYDADLNDEWVQRALHFAISEYNKATE
DEYYRRPLQVLRAREQTFGGVNYFFDVEVG
RTICTKSQPNLDTCAFHEQPELQKKQLCSF
EIYEVPWEDRMSLVNSRCQEA
low complexity regions: SEG 45 3.4 3.75
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
1-141 MARPLCTLLLLMATLAGALASSSKEENRII
PGGIYDADLNDEWVQRALHFAISEYNKATE
DEYYRRPLQVLRAREQTFGGVNYFFDVEVG
RTICTKSQPNLDTCAFHEQPELQKKQLCSF
EIYEVPWEDRMSLVNSRCQEA
low complexity regions: XNU
# Score cutoff = 21, Search from offsets 1 to 4
# both members of each repeat flagged
# lambda = 0.347, K = 0.200, H = 0.664
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
MARPLCTllllmatlagalassskeenriiPGGIYDADLNDEWVQRALHFAISEYNKATE
DEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSF
EIYEVPWEDRMSLVNSRCQEA
1 - 7 MARPLCT
8 - 30 lll lmatlagala ssskeenrii
31 - 141 PGGIYDADLN DEWVQRALHF AISEYNKATE DEYYRRPLQV LRAREQTFGG VNYFFDVEVG
RTICTKSQPN LDTCAFHEQP ELQKKQLCSF EIYEVPWEDR MSLVNSRCQE A
low complexity regions: DUST
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATE
DEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSF
EIYEVPWEDRMSLVNSRCQEA
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
coiled coil prediction for gi|30366|emb|CAA38478.1|
sequence: 140 amino acids, 0 residue(s) in coiled coil state
. | . | . | . | . | . 60
MARPLCTLLL LMATLAGALA SSSKEENRII PGGIYDADLN DEWVQRALHF AISEYNKATE
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local
. | . | . | . | . | . 120
DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local
. | . |
EIYEVPWEDR MSLVNSRCQE
~~~~~~~~~~ ~~~~~~~~~~
---------- ----------
~~~~~~~~~~ ~~~~~~~~~~
~~~~~~~~~~ ~~~~~~~~~~
~~~~~~~~~~ ~~~~~~~~~~
~~~~~~~~~~ ~~~~~~~~~~
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
prediction of transmembrane regions with toppred2
***********************************
*TOPPREDM with eukaryotic function*
***********************************
tem14.___inter___ is a single sequence
Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale
Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok
Using sequence file: tem14.___inter___
(1 sequences)
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF
AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN
LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQEA
(p)rokaryotic or (e)ukaryotic: e
Charge-pair energy: 0
Length of full window (odd number!): 21
Length of core window (odd number!): 11
Number of residues to add to each end of helix: 1
Critical length: 60
Upper cutoff for candidates: 1
Lower cutoff for candidates: 0.6
Total of 1 structures are to be tested
Candidate membrane-spanning segments:
Helix Begin End Score Certainity
1 4 24 1.888 Certain
----------------------------------------------------------------------
Structure 1
Transmembrane segments included in this structure:
Segment 1
Loop length 3 117
K+R profile 2.00
+
CYT-EXT prof -
-0.40
For CYT-EXT profile neg. values indicate cytoplasmic preference.
K+R difference: 2.00
Tm probability: 1.00
-> Orientation: N-in
Charge-difference over N-terminal Tm (+-15 residues): 3.00
(NEG-POS)/(NEG+POS): -1.0000
NEG: 0.0000
POS: 1.0000
-> Orientation: N-in
CYT-EXT difference: 0.40
-> Orientation: N-out
----------------------------------------------------------------------
"tem14" 141
4 24 #t 1.8875
************************************
*TOPPREDM with prokaryotic function*
************************************
tem14.___inter___ is a single sequence
Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale
Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok
Using sequence file: tem14.___inter___
(1 sequences)
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF
AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN
LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQEA
(p)rokaryotic or (e)ukaryotic: p
Charge-pair energy: 0
Length of full window (odd number!): 21
Length of core window (odd number!): 11
Number of residues to add to each end of helix: 1
Critical length: 60
Upper cutoff for candidates: 1
Lower cutoff for candidates: 0.6
Total of 1 structures are to be tested
Candidate membrane-spanning segments:
Helix Begin End Score Certainity
1 4 24 1.888 Certain
----------------------------------------------------------------------
Structure 1
Transmembrane segments included in this structure:
Segment 1
Loop length 3 117
K+R profile 2.00
+
CYT-EXT prof -
-0.40
For CYT-EXT profile neg. values indicate cytoplasmic preference.
K+R difference: 2.00
Tm probability: 1.00
-> Orientation: N-in
Charge-difference over N-terminal Tm (+-15 residues): 3.00
(NEG-POS)/(NEG+POS): -1.0000
NEG: 0.0000
POS: 1.0000
-> Orientation: N-in
CYT-EXT difference: 0.40
-> Orientation: N-out
----------------------------------------------------------------------
"tem14" 141
4 24 #t 1.8875
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
NOW EXECUTING: /bio_software/1D/stat/saps/saps-stroh/SAPS.SSPA/saps /people/maria/tem14.___saps___
SAPS. Version of April 11, 1996.
Date run: Tue Oct 31 14:35:46 2000
File: /people/maria/tem14.___saps___
ID gi|30366|emb|CAA38478.1|
DE cystain S [Homo sapiens]
number of residues: 141; molecular weight: 16.2 kdal
1 MARPLCTLLL LMATLAGALA SSSKEENRII PGGIYDADLN DEWVQRALHF AISEYNKATE
61 DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF
121 EIYEVPWEDR MSLVNSRCQE A
--------------------------------------------------------------------------------
COMPOSITIONAL ANALYSIS (extremes relative to: swp23s)
A : 12( 8.5%); C : 5( 3.5%); D : 7( 5.0%); E : 14( 9.9%); F : 6( 4.3%)
G : 6( 4.3%); H : 2( 1.4%); I : 6( 4.3%); K : 5( 3.5%); L : 15(10.6%)
M : 3( 2.1%); N : 6( 4.3%); P : 6( 4.3%); Q : 8( 5.7%); R : 10( 7.1%)
S : 8( 5.7%); T : 7( 5.0%); V : 7( 5.0%); W : 2( 1.4%); Y : 6( 4.3%)
KR : 15 ( 10.6%); ED : 21 ( 14.9%); AGP : 24 ( 17.0%);
KRED : 36 ( 25.5%); KR-ED : -6 ( -4.3%); FIKMNY : 32 ( 22.7%);
LVIFM : 37 ( 26.2%); ST : 15 ( 10.6%).
--------------------------------------------------------------------------------
CHARGE DISTRIBUTIONAL ANALYSIS
1 00+0000000 0000000000 000+--0+00 00000-0-00 --000+0000 000-00+00-
61 --00++0000 0+0+-00000 00000-0-00 +0000+0000 0-00000-00 -00++00000
121 -00-000--+ 000000+00- 0
A. CHARGE CLUSTERS.
Positive charge clusters (cmin = 9/30 or 13/45 or 15/60): none
Negative charge clusters (cmin = 12/30 or 16/45 or 19/60): none
Mixed charge clusters (cmin = 17/30 or 23/45 or 28/60): none
B. HIGH SCORING (UN)CHARGED SEGMENTS.
There are no high scoring positive charge segments.
There are no high scoring negative charge segments.
There are no high scoring mixed charge segments.
There are no high scoring uncharged segments.
C. CHARGE RUNS AND PATTERNS.
pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)|
lmin0 4 | 5 | 7 | 28 | 8 | 9 | 12 | 9 | 11 | 14 | 5 | 7 |
lmin1 6 | 6 | 9 | 34 | 10 | 11 | 14 | 12 | 13 | 16 | 7 | 8 |
lmin2 7 | 7 | 10 | 37 | 11 | 13 | 16 | 13 | 15 | 18 | 8 | 10 |
(Significance level: 0.010000; Minimal displayed length: 6)
There are no charge runs or patterns exceeding the given minimal lengths.
Run count statistics:
+ runs >= 3: 0
- runs >= 3: 1, at 60;
* runs >= 5: 0
0 runs >= 19: 1, at 4;
--------------------------------------------------------------------------------
DISTRIBUTION OF OTHER AMINO ACID TYPES
1. HIGH SCORING SEGMENTS.
There are no high scoring hydrophobic segments.
____________________________________
High scoring transmembrane segments:
5.00 (LVIF) 2.00 (AGM) 0.00 (BZX) -1.00 (YCW) -2.00 (ST)
-6.00 (P) -8.00 (H) -10.00 (NQ) -16.00 (KR) -17.00 (ED)
Expected score/letter: -4.397
M_0.01= 45.75; M_0.05= 36.60; M_0.30= 25.71
1) From 5 to 20: length= 16, score=42.00 *
5 LCTLLLLMAT LAGALA
L: 7(43.8%); A: 4(25.0%); T: 2(12.5%);
2. SPACINGS OF C.
H2N-5-C-87-C-9-C-13-C-19-C-3-COOH
2*. SPACINGS OF C and H. (additional deluxe function for ALEX)
H2N-5-C-42-H-44-C-9-C-2-H-10-C-19-C-3-COOH
--------------------------------------------------------------------------------
REPETITIVE STRUCTURES.
A. SEPARATED, TANDEM, AND PERIODIC REPEATS: amino acid alphabet.
Repeat core block length: 4
B. SEPARATED AND TANDEM REPEATS: 11-letter reduced alphabet.
(i= LVIF; += KR; -= ED; s= AG; o= ST; n= NQ; a= YW; p= P; h= H; m= M; c= C)
Repeat core block length: 8
--------------------------------------------------------------------------------
MULTIPLETS.
A. AMINO ACID ALPHABET.
1. Total number of amino acid multiplets: 10 (Expected range: 0-- 17)
2. Histogram of spacings between consecutive amino acid multiplets:
(1-5) 5 (6-10) 2 (11-20) 1 (>=21) 3
3. Clusters of amino acid multiplets (cmin = 12/30 or 16/45 or 19/60): none
B. CHARGE ALPHABET.
1. Total number of charge multiplets: 6 (Expected range: 0
-- 11)
2 +plets (f+: 10.6%), 4 -plets (f-: 14.9%)
Total number of charge altplets: 3 (Critical number: 12)
2. Histogram of spacings between consecutive charge multiplets:
(1-5) 1 (6-10) 0 (11-20) 4 (>=21) 2
--------------------------------------------------------------------------------
PERIODICITY ANALYSIS.
A. AMINO ACID ALPHABET (core: 4; !-core: 5)
Location Period Element Copies Core Errors
8- 11 1 L 4 4 0
B. CHARGE ALPHABET ({+= KR; -= ED; 0}; core: 5; !-core: 6)
and HYDROPHOBICITY ALPHABET ({*= KRED; i= LVIF; 0}; core: 6; !-core: 8)
Location Period Element Copies Core Errors
There are no periodicities of the prescribed length.
--------------------------------------------------------------------------------
SPACING ANALYSIS.
There are no unusual spacings.
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
Start with Pfam (from /data/patterns/pfam)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /data/patterns/pfam/Pfam
Sequence file: tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value N
-------- ----------- ----- ------- ---
cystatin Cystatin domain 140.6 9.4e-40 1
SecA_protein SecA protein, amino terminal region -2.4 83 1
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
SecA_protein 1/1 41 52 .. 199 210 .. -2.4 83
cystatin 1/1 32 137 .. 1 111 [] 140.6 9.4e-40
Alignments of top-scoring domains:
SecA_protein: domain 1 of 1, from 41 to 52: score -2.4, E = 83
*->eElVQRpfnFAI<-*
+E VQR ++FAI
gi|30366|e 41 DEWVQRALHFAI 52
cystatin: domain 1 of 1, from 32 to 137: score 140.6, E = 9.4e-40
*->GglspaddNendpevqeaadfAvaeyNeks.dgykfelvevvraksQ
Gg+ +ad nd++vq+a++fA++eyN+ ++d+y+ ++++v+ra+ Q
gi|30366|e 32 GGIYDADL--NDEWVQRALHFAISEYNKATeDEYYRRPLQVLRAREQ 76
vVaGtltnYyievevgettCskeskkdledCplldqpeeawegfCkfqvf
+G+ nY+++vevg+t C+k s+++l++C +++qpe++++ +C+f+++
gi|30366|e 77 TFGGV--NYFFDVEVGRTICTK-SQPNLDTCAFHEQPELQKKQLCSFEIY 123
kkpwegelsvltktC<-*
++pwe ++ +l ++
gi|30366|e 124 EVPWE-DRMSLVNSR 137
//
Start with PfamFrag (from /data/patterns/pfam)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /data/patterns/pfam/PfamFrag
Sequence file: tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value N
-------- ----------- ----- ------- ---
cystatin Cystatin domain 140.6 9.4e-40 1
SecA_protein SecA protein, amino terminal region -2.4 83 1
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
SecA_protein 1/1 41 52 .. 199 210 .. -2.4 83
cystatin 1/1 32 137 .. 1 111 [] 140.6 9.4e-40
Alignments of top-scoring domains:
SecA_protein: domain 1 of 1, from 41 to 52: score -2.4, E = 83
*->eElVQRpfnFAI<-*
+E VQR ++FAI
gi|30366|e 41 DEWVQRALHFAI 52
cystatin: domain 1 of 1, from 32 to 137: score 140.6, E = 9.4e-40
*->GglspaddNendpevqeaadfAvaeyNeks.dgykfelvevvraksQ
Gg+ +ad nd++vq+a++fA++eyN+ ++d+y+ ++++v+ra+ Q
gi|30366|e 32 GGIYDADL--NDEWVQRALHFAISEYNKATeDEYYRRPLQVLRAREQ 76
vVaGtltnYyievevgettCskeskkdledCplldqpeeawegfCkfqvf
+G+ nY+++vevg+t C+k s+++l++C +++qpe++++ +C+f+++
gi|30366|e 77 TFGGV--NYFFDVEVGRTICTK-SQPNLDTCAFHEQPELQKKQLCSFEIY 123
kkpwegelsvltktC<-*
++pwe ++ +l ++
gi|30366|e 124 EVPWE-DRMSLVNSR 137
//
Start with Repeat Library (from /data/patterns/repeats-Miguel-Andrade/hmm)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /data/patterns/repeats-Miguel-Andrade/hmm/repeats.hmm-lib
Sequence file: tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value N
-------- ----------- ----- ------- ---
[no hits above thresholds]
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
[no hits above thresholds]
Alignments of top-scoring domains:
[no hits above thresholds]
//
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
Start with Prosite
---------------------------------------------------------
| ppsearch (c) 1994 EMBL Data Library |
| based on MacPattern (c) 1990-1994 R. Fuchs |
---------------------------------------------------------
PROSITE pattern search started: Tue Oct 31 14:37:32 2000
Sequence file: tem14
----------------------------------------
Sequence gi|30366|emb|CAA38478.1| (141 residues):
Matching pattern PS00005 PKC_PHOSPHO_SITE:
22: SSK
Total matches: 1
Matching pattern PS00006 CK2_PHOSPHO_SITE:
22: SSKE
23: SKEE
59: TEDE
Total matches: 3
Matching pattern PS00008 MYRISTYL:
17: GALASS
33: GIYDAD
Total matches: 2
Matching pattern PS00287 CYSTATIN:
75: EQTFGGVNYFFDVE
Total matches: 1
Total no of hits in this sequence: 7
========================================
1314 pattern(s) searched in 1 sequence(s), 141 residues.
Total no of hits in all sequences: 7.
Search time: 00:00 min
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
Start with Profile Search
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
Start with motif search against own library
***** bioMotif : Version V41a DB, 1999 Nov 11 *****
argv[1]=P
argv[2]=-m /data/patterns/own/motif.fa
argv[4]=-seq tem14
***** bioMotif : Version V41a DB, 1999 Nov 11 *****
SeqTyp=2 : PROTEIN search;
>APC D-Box is the MOTIF name
>STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 141 units
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ ~~~
Start with HMM-search search against own library
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /data/patterns/own/own-hmm.lib
Sequence file: tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value N
-------- ----------- ----- ------- ---
[no hits above thresholds]
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
[no hits above thresholds]
Alignments of top-scoring domains:
[no hits above thresholds]
//
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /data/patterns/own/own-hmm-f.lib
Sequence file: tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
Scores for sequence family classification (score includes all domains):
Model Description Score E-value N
-------- ----------- ----- ------- ---
[no hits above thresholds]
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
[no hits above thresholds]
Alignments of top-scoring domains:
[no hits above thresholds]
//
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~
L. Aravind's signalling DB
IMPALA version 1.1 [20-December-1999]
Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting,
Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999),
"IMPALA: Matching a Protein Sequence Against a Collection of
"PSI-BLAST-Constructed Position-Specific Score Matrices",
Bioinformatics 15:1000-1011.
Query= gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
(210 letters)
Searching..................................done
Results from profile search
Score E
Sequences producing significant alignments: (bits) Value
PAP Papain/bleomycin hydrolase like domain 20 4.2
DNASE1 DNASE-1/Sphingomyelinase like domain 19 4.5
ANK Ankyrin repeat 19 5.9
CATH Cathepsin like protease domain 19 7.2
CYCL cyclophilin like peptidyl prolyl isomerases 18 9.8
>PAP Papain/bleomycin hydrolase like domain
Length = 376
Score = 19.6 bits (40), Expect = 4.2
Identities = 9/15 (60%), Positives = 10/15 (66%)
Query: 2 ARPLCTLLLLMATLA 16
A P C L LL+A LA
Sbjct: 5 AHPSCLLALLVAGLA 19
>DNASE1 DNASE-1/Sphingomyelinase like domain
Length = 388
Score = 19.3 bits (39), Expect = 4.5
Identities = 3/44 (6%), Positives = 12/44 (26%), Gaps = 1/44 (2%)
Query: 63 YYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAF 106
+ + Q++ + + + +V + F
Sbjct: 197 FLQDRFQLVNSAKIRLSARTLKTN-QVAIAETLQCCETGRQLCF 239
>ANK Ankyrin repeat
Length = 323
Score = 19.1 bits (39), Expect = 5.9
Identities = 5/22 (22%), Positives = 10/22 (44%)
Query: 36 DADLNDEWVQRALHFAISEYNK 57
DL D+ + A A+ ++
Sbjct: 288 RTDLKDDADRTAADIAVQLGHR 309
>CATH Cathepsin like protease domain
Length = 371
Score = 18.7 bits (38), Expect = 7.2
Identities = 6/27 (22%), Positives = 8/27 (29%), Gaps = 8/27 (29%)
Query: 77 TFGGVNY--------FFDVEVGRTICT 95
GG +Y G T+C
Sbjct: 299 VLGGKSYALTPNQYVLKVTVQGETLCL 325
Score = 18.7 bits (38), Expect = 7.2
Identities = 6/27 (22%), Positives = 8/27 (29%), Gaps = 8/27 (29%)
Query: 147 TFGGVNY--------FFDVEVGRTICT 165
GG +Y G T+C
Sbjct: 299 VLGGKSYALTPNQYVLKVTVQGETLCL 325
>CYCL cyclophilin like peptidyl prolyl isomerases
Length = 165
Score = 18.2 bits (37), Expect = 9.8
Identities = 4/11 (36%), Positives = 5/11 (45%)
Query: 84 FFDVEVGRTIC 94
FFD+ V
Sbjct: 7 FFDIAVDGEPL 17
Score = 18.2 bits (37), Expect = 9.8
Identities = 4/11 (36%), Positives = 5/11 (45%)
Query: 154 FFDVEVGRTIC 164
FFD+ V
Sbjct: 7 FFDIAVDGEPL 17
Underlying Matrix: BLOSUM62
Number of sequences tested against query: 105
Number of sequences better than 10.0: 5
Number of calls to ALIGN: 7
Length of query: 210
Total length of test sequences: 20182
Effective length of test sequences: 16738.0
Effective search space size: 2978524.2
Initial X dropoff for ALIGN: 25.0 bits
Y. Wolf's SCOP PSSM
IMPALA version 1.1 [20-December-1999]
Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting,
Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999),
"IMPALA: Matching a Protein Sequence Against a Collection of
"PSI-BLAST-Constructed Position-Specific Score Matrices",
Bioinformatics 15:1000-1011.
Query= gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
(210 letters)
Searching.................................................done
Results from profile search
Score E
Sequences producing significant alignments: (bits) Value
gi|118183 [37..144] Cystatin-like 146 4e-37
gi|1172449 [23..197] Purine and uridine phosphorylases 24 2.2
gi|999649 [1..162] Methane monooxygenase hydrolase, gamma su... 23 3.8
gi|2463096 [30..336] Trypsin-like serine proteases 23 4.6
gi|1707833 [43..332] Isoprenyl diphosphate synthases 23 5.5
gi|729418 [1..212] DNA-glycosylase 23 6.8
gi|2144317 [224..416] Lactate & malate dehydrogenases, C-ter... 22 9.2
>gi|118183 [37..144] Cystatin-like
Length = 108
Score = 146 bits (365), Expect = 4e-37
Identities = 65/108 (60%), Positives = 81/108 (74%)
Query: 32 GGIYDADLNDEWVQRALHFAISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGR 91
GG DA + +E V+RAL FA+ EYNKA+ D Y+ R LQV+RAR+Q GVNYF DVE+GR
Sbjct: 1 GGPMDASVEEEGVRRALDFAVGEYNKASNDMYHSRALQVVRARKQIVAGVNYFLDVELGR 60
Query: 92 TICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQ 139
T CTK+QPNLD C FH+QP L++K CSF+IY VPW+ M+L S CQ
Sbjct: 61 TTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTMTLSKSTCQ 108
Score = 103 bits (254), Expect = 4e-24
Identities = 43/68 (63%), Positives = 52/68 (76%)
Query: 142 RAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRM 201
RAR+Q GVNYF DVE+GRT CTK+QPNLD C FH+QP L++K CSF+IY VPW+ M
Sbjct: 41 RARKQIVAGVNYFLDVELGRTTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTM 100
Query: 202 SLVNSRCQ 209
+L S CQ
Sbjct: 101 TLSKSTCQ 108
>gi|1172449 [23..197] Purine and uridine phosphorylases
Length = 175
Score = 24.0 bits (52), Expect = 2.2
Identities = 3/66 (4%), Positives = 14/66 (20%), Gaps = 10/66 (15%)
Query: 28 RIIPGGIYDAD----LNDEWVQRALH------FAISEYNKATEDEYYRRPLQVLRAREQT 77
G + + + A ++ + D + ++ + +
Sbjct: 84 GYEKGQLPANPAAFLSDKKLADLAQEIAEKQGQSVKRGLICSGDSFINSEDKIAQIKADF 143
Query: 78 FGGVNY 83
Sbjct: 144 PNVTGV 149
>gi|999649 [1..162] Methane monooxygenase hydrolase, gamma subunit
Length = 162
Score = 23.2 bits (49), Expect = 3.8
Identities = 10/28 (35%), Positives = 14/28 (49%), Gaps = 2/28 (7%)
Query: 33 GIYDADLNDEWVQRALHFAISEYNKATE 60
GI+ D D WV + H ++ KA E
Sbjct: 2 GIHSNDTRDAWVNKIAH--VNTLEKAAE 27
>gi|2463096 [30..336] Trypsin-like serine proteases
Length = 307
Score = 22.8 bits (48), Expect = 4.6
Identities = 4/36 (11%), Positives = 6/36 (16%), Gaps = 2/36 (5%)
Query: 6 CTLLLLMATLAGALASSSKEENRIIPGGIYDADLND 41
C + A + G D
Sbjct: 1 CGISDFEPEFPIAPKVEIGTNHIFK-GK-TIVDDKS 34
>gi|1707833 [43..332] Isoprenyl diphosphate synthases
Length = 290
Score = 22.5 bits (48), Expect = 5.5
Identities = 12/71 (16%), Positives = 31/71 (42%), Gaps = 8/71 (11%)
Query: 6 CTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAIS---EYNKATEDE 62
T+L++ L+ ++++E +I+ + + + E ++RA +Y +
Sbjct: 201 KTILVIK-----TLSEATEDEKKILVSTLGNKEAKKEDLERASEIIRKHSLQYAYDLAKK 255
Query: 63 YYRRPLQVLRA 73
Y ++ LR
Sbjct: 256 YSDLAIENLRE 266
>gi|729418 [1..212] DNA-glycosylase
Length = 212
Score = 22.5 bits (48), Expect = 6.8
Identities = 5/18 (27%), Positives = 6/18 (32%)
Query: 87 VEVGRTICTKSQPNLDTC 104
+ GR C P C
Sbjct: 182 IFFGRYHCKAQSPRCAEC 199
Score = 22.5 bits (48), Expect = 6.8
Identities = 5/18 (27%), Positives = 6/18 (32%)
Query: 157 VEVGRTICTKSQPNLDTC 174
+ GR C P C
Sbjct: 182 IFFGRYHCKAQSPRCAEC 199
>gi|2144317 [224..416] Lactate & malate dehydrogenases, C-terminal domain
Length = 193
Score = 22.0 bits (47), Expect = 9.2
Identities = 13/68 (19%), Positives = 17/68 (24%), Gaps = 12/68 (17%)
Query: 13 ATL----AGALASSS------KEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATEDE 62
ATL AG +I Y L D A +
Sbjct: 85 ATLSMAHAGYKCVVQFVSLLLGNIEQIHG--TYYVPLKDANNFPIAPGADQLLPLVDGAD 142
Query: 63 YYRRPLQV 70
Y+ PL +
Sbjct: 143 YFAIPLTI 150
Underlying Matrix: BLOSUM62
Number of sequences tested against query: 1187
Number of sequences better than 10.0: 7
Number of calls to ALIGN: 9
Length of query: 210
Total length of test sequences: 256703
Effective length of test sequences: 210706.0
Effective search space size: 36199790.6
Initial X dropoff for ALIGN: 25.0 bits