analysis of sequence from tem14
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATEDEYYRRPLQV
LRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQE
A
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

sec.str. with predator

> gi|30366|emb|CAA38478.1|
              .         .         .         .         .
1    MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF   50
     ___HHHHHHHHHHHHHHHHHHH__________________HHHHHHHHHH

              .         .         .         .         .
51   AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN  100
     HHHHHH_________HHHHHHHHHHHH_____EEEEEEEEEEEE______

              .         .         .         .
101  LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQE            140
     __________HHHHHHHHH_EEEEEE___HHHHHH_____


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~


method         :         1
alpha-contents :      75.8 %
beta-contents  :       1.5 %
coil-contents  :      22.7 %
class          :     alpha


method         :         2
alpha-contents :      61.7 %
beta-contents  :       0.0 %
coil-contents  :      38.3 %
class          :     alpha


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

GPI: learning from metazoa
-14.70  -5.84  -3.19  -5.14  -4.00   0.00 -32.00   0.00  -0.08  -6.61  -1.84 -12.00 -12.00   0.00   0.00   0.00  -97.40
 -7.02  -5.79  -5.85  -2.15   0.00   0.00 -24.00   0.00  -0.06  -6.61  -1.84 -12.00 -12.00   0.00 -12.00   0.00  -89.32
ID: gi|30366|emb|CAA38478.1|	AC: xxx Len:  140 1:I   129 Sc:  -89.32 Pv: 9.270793e-01 NO_GPI_SITE
GPI: learning from protozoa
-14.45  -7.00  -3.85  -1.49  -4.00   0.00 -32.00   0.00   0.00  -5.73  -7.18 -12.00 -12.00   0.00   0.00   0.00  -99.70
-22.52  -2.64  -1.42  -2.38  -4.00   0.00   0.00   0.00  -1.28  -8.14  -7.18 -12.00 -12.00 -12.00 -12.00   0.00  -97.56
ID: gi|30366|emb|CAA38478.1|	AC: xxx Len:  140 1:I   119 Sc:  -97.56 Pv: 8.274222e-01 NO_GPI_SITE

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

# SignalP euk predictions
# name       Cmax  pos ?  Ymax  pos ?  Smax  pos ?  Smean ?
gi|30366|em  0.842  21 Y  0.840  21 Y  0.979  11 Y  0.949 Y
# SignalP gram- predictions
# name       Cmax  pos ?  Ymax  pos ?  Smax  pos ?  Smean ?
gi|30366|em  0.540  19 Y  0.650  19 Y  0.993   9 Y  0.956 Y
# SignalP gram+ predictions
# name       Cmax  pos ?  Ymax  pos ?  Smax  pos ?  Smean ?
gi|30366|em  0.258  59 N  0.348  19 Y  0.997   9 Y  0.952 Y

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

low complexity regions: SEG 12 2.2 2.5
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]

                                  1-4    MARP
              lctllllmatlagala    5-20   
                                 21-141  SSSKEENRIIPGGIYDADLNDEWVQRALHF
                                         AISEYNKATEDEYYRRPLQVLRAREQTFGG
                                         VNYFFDVEVGRTICTKSQPNLDTCAFHEQP
                                         ELQKKQLCSFEIYEVPWEDRMSLVNSRCQE
                                         A

low complexity regions: SEG 25 3.0 3.3
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]

                                  1-141  MARPLCTLLLLMATLAGALASSSKEENRII
                                         PGGIYDADLNDEWVQRALHFAISEYNKATE
                                         DEYYRRPLQVLRAREQTFGGVNYFFDVEVG
                                         RTICTKSQPNLDTCAFHEQPELQKKQLCSF
                                         EIYEVPWEDRMSLVNSRCQEA

low complexity regions: SEG 45 3.4 3.75
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]

                                  1-141  MARPLCTLLLLMATLAGALASSSKEENRII
                                         PGGIYDADLNDEWVQRALHFAISEYNKATE
                                         DEYYRRPLQVLRAREQTFGGVNYFFDVEVG
                                         RTICTKSQPNLDTCAFHEQPELQKKQLCSF
                                         EIYEVPWEDRMSLVNSRCQEA


low complexity regions: XNU
# Score cutoff = 21, Search from offsets 1 to 4
# both members of each repeat flagged
# lambda = 0.347, K = 0.200, H = 0.664
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
MARPLCTllllmatlagalassskeenriiPGGIYDADLNDEWVQRALHFAISEYNKATE
DEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSF
EIYEVPWEDRMSLVNSRCQEA
    1 -    7 MARPLCT
    8 -   30   lll lmatlagala ssskeenrii 
   31 -  141 PGGIYDADLN DEWVQRALHF AISEYNKATE DEYYRRPLQV LRAREQTFGG VNYFFDVEVG 
             RTICTKSQPN LDTCAFHEQP ELQKKQLCSF EIYEVPWEDR MSLVNSRCQE A

low complexity regions: DUST
>gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATE
DEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSF
EIYEVPWEDRMSLVNSRCQEA

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

coiled coil prediction for gi|30366|emb|CAA38478.1|
sequence: 140 amino acids, 0 residue(s) in coiled coil state

    .    |     .    |     .    |     .    |     .    |     .   60
MARPLCTLLL LMATLAGALA SSSKEENRII PGGIYDADLN DEWVQRALHF AISEYNKATE
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK  -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local

    .    |     .    |     .    |     .    |     .    |     .  120
DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK  -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local

    .    |     .    | 
EIYEVPWEDR MSLVNSRCQE 
~~~~~~~~~~ ~~~~~~~~~~ 
---------- ---------- 
~~~~~~~~~~ ~~~~~~~~~~ 
~~~~~~~~~~ ~~~~~~~~~~ 
~~~~~~~~~~ ~~~~~~~~~~ 
~~~~~~~~~~ ~~~~~~~~~~ 



~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

prediction of transmembrane regions with toppred2

     ***********************************
     *TOPPREDM with eukaryotic function*
     ***********************************

tem14.___inter___ is a single sequence
Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale
Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok
Using sequence file: tem14.___inter___

 (1 sequences)
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF
AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN
LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQEA


(p)rokaryotic or (e)ukaryotic: e


Charge-pair energy: 0

Length of full window (odd number!): 21

Length of core window (odd number!): 11

Number of residues to add to each end of helix: 1

Critical length: 60

Upper cutoff for candidates: 1

Lower cutoff for candidates: 0.6
Total of 1 structures are to be tested


Candidate membrane-spanning segments:

 Helix Begin   End   Score Certainity
     1     4    24   1.888 Certain

----------------------------------------------------------------------
Structure 1

Transmembrane segments included in this structure:
     Segment       1
 Loop length     3   117
 K+R profile  2.00      
                       +      
CYT-EXT prof     -      
                   -0.40      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 2.00
Tm probability: 1.00
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 3.00
 (NEG-POS)/(NEG+POS): -1.0000
                 NEG: 0.0000
                 POS: 1.0000
-> Orientation: N-in

CYT-EXT difference:   0.40
-> Orientation: N-out

----------------------------------------------------------------------

"tem14" 141 
 4 24 #t 1.8875



     ************************************
     *TOPPREDM with prokaryotic function*
     ************************************

tem14.___inter___ is a single sequence
Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale
Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok
Using sequence file: tem14.___inter___

 (1 sequences)
MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF
AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN
LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQEA


(p)rokaryotic or (e)ukaryotic: p


Charge-pair energy: 0

Length of full window (odd number!): 21

Length of core window (odd number!): 11

Number of residues to add to each end of helix: 1

Critical length: 60

Upper cutoff for candidates: 1

Lower cutoff for candidates: 0.6
Total of 1 structures are to be tested


Candidate membrane-spanning segments:

 Helix Begin   End   Score Certainity
     1     4    24   1.888 Certain

----------------------------------------------------------------------
Structure 1

Transmembrane segments included in this structure:
     Segment       1
 Loop length     3   117
 K+R profile  2.00      
                       +      
CYT-EXT prof     -      
                   -0.40      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 2.00
Tm probability: 1.00
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 3.00
 (NEG-POS)/(NEG+POS): -1.0000
                 NEG: 0.0000
                 POS: 1.0000
-> Orientation: N-in

CYT-EXT difference:   0.40
-> Orientation: N-out

----------------------------------------------------------------------

"tem14" 141 
 4 24 #t 1.8875



~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

NOW EXECUTING:   /bio_software/1D/stat/saps/saps-stroh/SAPS.SSPA/saps /people/maria/tem14.___saps___
SAPS.  Version of April 11, 1996.
Date run: Tue Oct 31 14:35:46 2000

File: /people/maria/tem14.___saps___
ID   gi|30366|emb|CAA38478.1|
DE   cystain S [Homo sapiens]

number of residues:  141;   molecular weight:  16.2 kdal
 
         1  MARPLCTLLL LMATLAGALA SSSKEENRII PGGIYDADLN DEWVQRALHF AISEYNKATE 
        61  DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF 
       121  EIYEVPWEDR MSLVNSRCQE A

--------------------------------------------------------------------------------
COMPOSITIONAL ANALYSIS (extremes relative to: swp23s)

A  : 12( 8.5%); C  :  5( 3.5%); D  :  7( 5.0%); E  : 14( 9.9%); F  :  6( 4.3%)
G  :  6( 4.3%); H  :  2( 1.4%); I  :  6( 4.3%); K  :  5( 3.5%); L  : 15(10.6%)
M  :  3( 2.1%); N  :  6( 4.3%); P  :  6( 4.3%); Q  :  8( 5.7%); R  : 10( 7.1%)
S  :  8( 5.7%); T  :  7( 5.0%); V  :  7( 5.0%); W  :  2( 1.4%); Y  :  6( 4.3%)

KR      :   15 ( 10.6%);   ED      :   21 ( 14.9%);   AGP     :   24 ( 17.0%);
KRED    :   36 ( 25.5%);   KR-ED   :   -6 ( -4.3%);   FIKMNY  :   32 ( 22.7%);
LVIFM   :   37 ( 26.2%);   ST      :   15 ( 10.6%).

--------------------------------------------------------------------------------
CHARGE DISTRIBUTIONAL ANALYSIS
 
         1  00+0000000 0000000000 000+--0+00 00000-0-00 --000+0000 000-00+00- 
        61  --00++0000 0+0+-00000 00000-0-00 +0000+0000 0-00000-00 -00++00000 
       121  -00-000--+ 000000+00- 0

A. CHARGE CLUSTERS.


Positive charge clusters (cmin =  9/30 or 13/45 or 15/60):  none


Negative charge clusters (cmin = 12/30 or 16/45 or 19/60):  none


Mixed charge clusters (cmin = 17/30 or 23/45 or 28/60):  none


B. HIGH SCORING (UN)CHARGED SEGMENTS.

There are no high scoring positive charge segments.
There are no high scoring negative charge segments.
There are no high scoring mixed charge segments.
There are no high scoring uncharged segments.


C. CHARGE RUNS AND PATTERNS.

pattern  (+)|  (-)|  (*)|  (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)|
lmin0     4 |   5 |   7 |  28 |   8 |   9 |  12 |   9 |  11 |  14 |   5 |   7 | 
lmin1     6 |   6 |   9 |  34 |  10 |  11 |  14 |  12 |  13 |  16 |   7 |   8 | 
lmin2     7 |   7 |  10 |  37 |  11 |  13 |  16 |  13 |  15 |  18 |   8 |  10 | 
 (Significance level: 0.010000; Minimal displayed length:  6)
There are no charge runs or patterns exceeding the given minimal lengths.

Run count statistics:

  +  runs >=   3:   0
  -  runs >=   3:   1, at   60;
  *  runs >=   5:   0
  0  runs >=  19:   1, at    4;

--------------------------------------------------------------------------------
DISTRIBUTION OF OTHER AMINO ACID TYPES

1. HIGH SCORING SEGMENTS.
There are no high scoring hydrophobic segments.

____________________________________
High scoring transmembrane segments:

   5.00 (LVIF)   2.00 (AGM)   0.00 (BZX)  -1.00 (YCW)  -2.00 (ST)
  -6.00 (P)  -8.00 (H) -10.00 (NQ) -16.00 (KR) -17.00 (ED)

 Expected score/letter:  -4.397
 M_0.01=  45.75; M_0.05=  36.60;     M_0.30=  25.71

 1) From    5 to   20:  length= 16, score=42.00  * 
       5  LCTLLLLMAT LAGALA
    L:  7(43.8%);  A:  4(25.0%);  T:  2(12.5%);


2. SPACINGS OF C.


H2N-5-C-87-C-9-C-13-C-19-C-3-COOH


2*. SPACINGS OF C and H. (additional deluxe function for ALEX)


H2N-5-C-42-H-44-C-9-C-2-H-10-C-19-C-3-COOH

--------------------------------------------------------------------------------
REPETITIVE STRUCTURES.

A. SEPARATED, TANDEM, AND PERIODIC REPEATS: amino acid alphabet.
Repeat core block length:  4

B. SEPARATED AND TANDEM REPEATS: 11-letter reduced alphabet.
   (i= LVIF; += KR; -= ED; s= AG; o= ST; n= NQ; a= YW; p= P; h= H; m= M; c= C)
Repeat core block length:  8

--------------------------------------------------------------------------------

MULTIPLETS.

A. AMINO ACID ALPHABET.

1. Total number of amino acid multiplets:  10  (Expected range:   0-- 17)

2. Histogram of spacings between consecutive amino acid multiplets:
   (1-5) 5   (6-10) 2   (11-20) 1   (>=21) 3

3. Clusters of amino acid multiplets (cmin = 12/30 or 16/45 or 19/60):  none


B. CHARGE ALPHABET.

1. Total number of charge multiplets:   6  (Expected range:   0
-- 11)
   2 +plets (f+: 10.6%), 4 -plets (f-: 14.9%)
   Total number of charge altplets: 3 (Critical number: 12)

2. Histogram of spacings between consecutive charge multiplets:
   (1-5) 1   (6-10) 0   (11-20) 4   (>=21) 2

--------------------------------------------------------------------------------
PERIODICITY ANALYSIS.

A. AMINO ACID ALPHABET (core:  4; !-core: 5)

Location	Period	Element		Copies	Core	Errors
   8-  11	 1	L         	 4	 4  	 0


B. CHARGE ALPHABET ({+= KR; -= ED; 0}; core:  5; !-core: 6)
   and HYDROPHOBICITY ALPHABET ({*= KRED; i= LVIF; 0}; core:  6; !-core: 8)

Location	Period	Element		Copies	Core	Errors

There are no periodicities of the prescribed length.

--------------------------------------------------------------------------------
SPACING ANALYSIS.

There are no unusual spacings.


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with Pfam (from /data/patterns/pfam)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/pfam/Pfam
Sequence file:            tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  gi|30366|emb|CAA38478.1|  cystain S [Homo sapiens]

Scores for sequence family classification (score includes all domains):
Model        Description                                Score    E-value  N 
--------     -----------                                -----    ------- ---
cystatin     Cystatin domain                            140.6    9.4e-40   1
SecA_protein SecA protein, amino terminal region         -2.4         83   1

Parsed for domains:
Model        Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------     ------- ----- -----    ----- -----      -----  -------
SecA_protein   1/1      41    52 ..   199   210 ..    -2.4       83
cystatin       1/1      32   137 ..     1   111 []   140.6  9.4e-40

Alignments of top-scoring domains:
SecA_protein: domain 1 of 1, from 41 to 52: score -2.4, E = 83
                   *->eElVQRpfnFAI<-*
                      +E VQR ++FAI   
  gi|30366|e    41    DEWVQRALHFAI    52   

cystatin: domain 1 of 1, from 32 to 137: score 140.6, E = 9.4e-40
                   *->GglspaddNendpevqeaadfAvaeyNeks.dgykfelvevvraksQ
                      Gg+ +ad   nd++vq+a++fA++eyN+ ++d+y+ ++++v+ra+ Q
  gi|30366|e    32    GGIYDADL--NDEWVQRALHFAISEYNKATeDEYYRRPLQVLRAREQ 76   

                   vVaGtltnYyievevgettCskeskkdledCplldqpeeawegfCkfqvf
                     +G+  nY+++vevg+t C+k s+++l++C +++qpe++++ +C+f+++
  gi|30366|e    77 TFGGV--NYFFDVEVGRTICTK-SQPNLDTCAFHEQPELQKKQLCSFEIY 123  

                   kkpwegelsvltktC<-*
                   ++pwe ++ +l ++    
  gi|30366|e   124 EVPWE-DRMSLVNSR    137  

//

Start with PfamFrag (from /data/patterns/pfam)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/pfam/PfamFrag
Sequence file:            tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  gi|30366|emb|CAA38478.1|  cystain S [Homo sapiens]

Scores for sequence family classification (score includes all domains):
Model        Description                                Score    E-value  N 
--------     -----------                                -----    ------- ---
cystatin     Cystatin domain                            140.6    9.4e-40   1
SecA_protein SecA protein, amino terminal region         -2.4         83   1

Parsed for domains:
Model        Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------     ------- ----- -----    ----- -----      -----  -------
SecA_protein   1/1      41    52 ..   199   210 ..    -2.4       83
cystatin       1/1      32   137 ..     1   111 []   140.6  9.4e-40

Alignments of top-scoring domains:
SecA_protein: domain 1 of 1, from 41 to 52: score -2.4, E = 83
                   *->eElVQRpfnFAI<-*
                      +E VQR ++FAI   
  gi|30366|e    41    DEWVQRALHFAI    52   

cystatin: domain 1 of 1, from 32 to 137: score 140.6, E = 9.4e-40
                   *->GglspaddNendpevqeaadfAvaeyNeks.dgykfelvevvraksQ
                      Gg+ +ad   nd++vq+a++fA++eyN+ ++d+y+ ++++v+ra+ Q
  gi|30366|e    32    GGIYDADL--NDEWVQRALHFAISEYNKATeDEYYRRPLQVLRAREQ 76   

                   vVaGtltnYyievevgettCskeskkdledCplldqpeeawegfCkfqvf
                     +G+  nY+++vevg+t C+k s+++l++C +++qpe++++ +C+f+++
  gi|30366|e    77 TFGGV--NYFFDVEVGRTICTK-SQPNLDTCAFHEQPELQKKQLCSFEIY 123  

                   kkpwegelsvltktC<-*
                   ++pwe ++ +l ++    
  gi|30366|e   124 EVPWE-DRMSLVNSR    137  

//

Start with Repeat Library (from /data/patterns/repeats-Miguel-Andrade/hmm)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/repeats-Miguel-Andrade/hmm/repeats.hmm-lib
Sequence file:            tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  gi|30366|emb|CAA38478.1|  cystain S [Homo sapiens]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with Prosite
---------------------------------------------------------
|          ppsearch (c) 1994 EMBL Data Library          |
|       based on MacPattern (c) 1990-1994 R. Fuchs      |
---------------------------------------------------------

PROSITE pattern search started: Tue Oct 31 14:37:32 2000

Sequence file: tem14

----------------------------------------
Sequence gi|30366|emb|CAA38478.1| (141 residues):

Matching pattern PS00005 PKC_PHOSPHO_SITE:
   22: SSK
Total matches: 1

Matching pattern PS00006 CK2_PHOSPHO_SITE:
   22: SSKE
   23: SKEE
   59: TEDE
Total matches: 3

Matching pattern PS00008 MYRISTYL:
   17: GALASS
   33: GIYDAD
Total matches: 2

Matching pattern PS00287 CYSTATIN:
   75: EQTFGGVNYFFDVE
Total matches: 1

Total no of hits in this sequence: 7

========================================

1314 pattern(s) searched in 1 sequence(s), 141 residues.
Total no of hits in all sequences: 7.
Search time: 00:00 min

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with Profile Search

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with motif search against own library
***** bioMotif : Version V41a DB, 1999 Nov 11 *****
argv[1]=P 
argv[2]=-m  /data/patterns/own/motif.fa
argv[4]=-seq  tem14

     ***** bioMotif : Version V41a DB, 1999 Nov 11 *****
          SeqTyp=2 : PROTEIN  search; 


>APC D-Box is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 141 units


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ ~~~

Start with HMM-search search against own library
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/own/own-hmm.lib
Sequence file:            tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  gi|30366|emb|CAA38478.1|  cystain S [Homo sapiens]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/own/own-hmm-f.lib
Sequence file:            tem14
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  gi|30366|emb|CAA38478.1|  cystain S [Homo sapiens]

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

L. Aravind's signalling DB
IMPALA version 1.1 [20-December-1999]


Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, 
Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), 
"IMPALA: Matching a Protein Sequence Against a Collection of 
"PSI-BLAST-Constructed Position-Specific Score Matrices",
Bioinformatics 15:1000-1011.

Query= gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
         (210 letters)

Searching..................................done
Results from profile search


                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

PAP  Papain/bleomycin hydrolase like domain                        20  4.2
DNASE1  DNASE-1/Sphingomyelinase like domain                       19  4.5
ANK Ankyrin repeat                                                 19  5.9
CATH  Cathepsin like protease domain                               19  7.2
CYCL cyclophilin like peptidyl prolyl isomerases                   18  9.8

>PAP  Papain/bleomycin hydrolase like domain 
          Length = 376

 Score = 19.6 bits (40), Expect = 4.2
 Identities = 9/15 (60%), Positives = 10/15 (66%)

Query: 2  ARPLCTLLLLMATLA 16
          A P C L LL+A LA
Sbjct: 5  AHPSCLLALLVAGLA 19


>DNASE1  DNASE-1/Sphingomyelinase like domain 
          Length = 388

 Score = 19.3 bits (39), Expect = 4.5
 Identities = 3/44 (6%), Positives = 12/44 (26%), Gaps = 1/44 (2%)

Query: 63  YYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAF 106
           + +   Q++ + +          + +V      +         F
Sbjct: 197 FLQDRFQLVNSAKIRLSARTLKTN-QVAIAETLQCCETGRQLCF 239


>ANK Ankyrin repeat 
          Length = 323

 Score = 19.1 bits (39), Expect = 5.9
 Identities = 5/22 (22%), Positives = 10/22 (44%)

Query: 36  DADLNDEWVQRALHFAISEYNK 57
             DL D+  + A   A+   ++
Sbjct: 288 RTDLKDDADRTAADIAVQLGHR 309


>CATH  Cathepsin like protease domain 
          Length = 371

 Score = 18.7 bits (38), Expect = 7.2
 Identities = 6/27 (22%), Positives = 8/27 (29%), Gaps = 8/27 (29%)

Query: 77  TFGGVNY--------FFDVEVGRTICT 95
             GG +Y              G T+C 
Sbjct: 299 VLGGKSYALTPNQYVLKVTVQGETLCL 325


 Score = 18.7 bits (38), Expect = 7.2
 Identities = 6/27 (22%), Positives = 8/27 (29%), Gaps = 8/27 (29%)

Query: 147 TFGGVNY--------FFDVEVGRTICT 165
             GG +Y              G T+C 
Sbjct: 299 VLGGKSYALTPNQYVLKVTVQGETLCL 325


>CYCL cyclophilin like peptidyl prolyl isomerases 
          Length = 165

 Score = 18.2 bits (37), Expect = 9.8
 Identities = 4/11 (36%), Positives = 5/11 (45%)

Query: 84 FFDVEVGRTIC 94
          FFD+ V     
Sbjct: 7  FFDIAVDGEPL 17


 Score = 18.2 bits (37), Expect = 9.8
 Identities = 4/11 (36%), Positives = 5/11 (45%)

Query: 154 FFDVEVGRTIC 164
           FFD+ V     
Sbjct: 7   FFDIAVDGEPL 17


Underlying Matrix: BLOSUM62
Number of sequences tested against query: 105
Number of sequences better than 10.0: 5 
Number of calls to ALIGN: 7 
Length of query: 210 
Total length of test sequences: 20182  
Effective length of test sequences: 16738.0
Effective search space size: 2978524.2
Initial X dropoff for ALIGN: 25.0 bits

Y. Wolf's SCOP PSSM
IMPALA version 1.1 [20-December-1999]


Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, 
Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), 
"IMPALA: Matching a Protein Sequence Against a Collection of 
"PSI-BLAST-Constructed Position-Specific Score Matrices",
Bioinformatics 15:1000-1011.

Query= gi|30366|emb|CAA38478.1| cystain S [Homo sapiens]
         (210 letters)

Searching.................................................done
Results from profile search


                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

gi|118183 [37..144] Cystatin-like                                 146  4e-37
gi|1172449 [23..197] Purine and uridine phosphorylases             24  2.2
gi|999649 [1..162] Methane monooxygenase hydrolase, gamma su...    23  3.8
gi|2463096 [30..336] Trypsin-like serine proteases                 23  4.6
gi|1707833 [43..332] Isoprenyl diphosphate synthases               23  5.5
gi|729418 [1..212] DNA-glycosylase                                 23  6.8
gi|2144317 [224..416] Lactate & malate dehydrogenases, C-ter...    22  9.2

>gi|118183 [37..144] Cystatin-like 
          Length = 108

 Score =  146 bits (365), Expect = 4e-37
 Identities = 65/108 (60%), Positives = 81/108 (74%)

Query: 32  GGIYDADLNDEWVQRALHFAISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGR 91
           GG  DA + +E V+RAL FA+ EYNKA+ D Y+ R LQV+RAR+Q   GVNYF DVE+GR
Sbjct: 1   GGPMDASVEEEGVRRALDFAVGEYNKASNDMYHSRALQVVRARKQIVAGVNYFLDVELGR 60

Query: 92  TICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQ 139
           T CTK+QPNLD C FH+QP L++K  CSF+IY VPW+  M+L  S CQ
Sbjct: 61  TTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTMTLSKSTCQ 108


 Score =  103 bits (254), Expect = 4e-24
 Identities = 43/68 (63%), Positives = 52/68 (76%)

Query: 142 RAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRM 201
           RAR+Q   GVNYF DVE+GRT CTK+QPNLD C FH+QP L++K  CSF+IY VPW+  M
Sbjct: 41  RARKQIVAGVNYFLDVELGRTTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTM 100

Query: 202 SLVNSRCQ 209
           +L  S CQ
Sbjct: 101 TLSKSTCQ 108


>gi|1172449 [23..197] Purine and uridine phosphorylases 
          Length = 175

 Score = 24.0 bits (52), Expect = 2.2
 Identities = 3/66 (4%), Positives = 14/66 (20%), Gaps = 10/66 (15%)

Query: 28  RIIPGGIYDAD----LNDEWVQRALH------FAISEYNKATEDEYYRRPLQVLRAREQT 77
               G +         + +    A         ++      + D +     ++ + +   
Sbjct: 84  GYEKGQLPANPAAFLSDKKLADLAQEIAEKQGQSVKRGLICSGDSFINSEDKIAQIKADF 143

Query: 78  FGGVNY 83
                 
Sbjct: 144 PNVTGV 149


>gi|999649 [1..162] Methane monooxygenase hydrolase, gamma subunit 
          Length = 162

 Score = 23.2 bits (49), Expect = 3.8
 Identities = 10/28 (35%), Positives = 14/28 (49%), Gaps = 2/28 (7%)

Query: 33 GIYDADLNDEWVQRALHFAISEYNKATE 60
          GI+  D  D WV +  H  ++   KA E
Sbjct: 2  GIHSNDTRDAWVNKIAH--VNTLEKAAE 27


>gi|2463096 [30..336] Trypsin-like serine proteases 
          Length = 307

 Score = 22.8 bits (48), Expect = 4.6
 Identities = 4/36 (11%), Positives = 6/36 (16%), Gaps = 2/36 (5%)

Query: 6  CTLLLLMATLAGALASSSKEENRIIPGGIYDADLND 41
          C +         A        +    G     D   
Sbjct: 1  CGISDFEPEFPIAPKVEIGTNHIFK-GK-TIVDDKS 34


>gi|1707833 [43..332] Isoprenyl diphosphate synthases 
          Length = 290

 Score = 22.5 bits (48), Expect = 5.5
 Identities = 12/71 (16%), Positives = 31/71 (42%), Gaps = 8/71 (11%)

Query: 6   CTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAIS---EYNKATEDE 62
            T+L++       L+ ++++E +I+   + + +   E ++RA         +Y      +
Sbjct: 201 KTILVIK-----TLSEATEDEKKILVSTLGNKEAKKEDLERASEIIRKHSLQYAYDLAKK 255

Query: 63  YYRRPLQVLRA 73
           Y    ++ LR 
Sbjct: 256 YSDLAIENLRE 266


>gi|729418 [1..212] DNA-glycosylase 
          Length = 212

 Score = 22.5 bits (48), Expect = 6.8
 Identities = 5/18 (27%), Positives = 6/18 (32%)

Query: 87  VEVGRTICTKSQPNLDTC 104
           +  GR  C    P    C
Sbjct: 182 IFFGRYHCKAQSPRCAEC 199


 Score = 22.5 bits (48), Expect = 6.8
 Identities = 5/18 (27%), Positives = 6/18 (32%)

Query: 157 VEVGRTICTKSQPNLDTC 174
           +  GR  C    P    C
Sbjct: 182 IFFGRYHCKAQSPRCAEC 199


>gi|2144317 [224..416] Lactate & malate dehydrogenases, C-terminal domain 
          Length = 193

 Score = 22.0 bits (47), Expect = 9.2
 Identities = 13/68 (19%), Positives = 17/68 (24%), Gaps = 12/68 (17%)

Query: 13  ATL----AGALASSS------KEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATEDE 62
           ATL    AG                +I     Y   L D         A          +
Sbjct: 85  ATLSMAHAGYKCVVQFVSLLLGNIEQIHG--TYYVPLKDANNFPIAPGADQLLPLVDGAD 142

Query: 63  YYRRPLQV 70
           Y+  PL +
Sbjct: 143 YFAIPLTI 150


Underlying Matrix: BLOSUM62
Number of sequences tested against query: 1187
Number of sequences better than 10.0: 7 
Number of calls to ALIGN: 9 
Length of query: 210 
Total length of test sequences: 256703  
Effective length of test sequences: 210706.0
Effective search space size: 36199790.6
Initial X dropoff for ALIGN: 25.0 bits