analysis of sequence from CAD21200.1.fa
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

>CAD21200.1 conserved hypothetical protein [Neurospora crassa].
MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS
PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL
TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI
NEAFKGFEVR YYLVIVVEGE QDVVVCFPRL LPRRKIVERV WRGARGCLYE KDGPVLSAGA
GGGGGSHGGN GAWRGGNGNG KGG

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

sec.str. with predator

> CAD21200.1
              .         .         .         .         .
1    MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSV   50
     _____EEEEEE________EEEE______HHHHHHHHH_HHHHHHHHHHH

              .         .         .         .         .
51   IIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQ  100
     HHHHEEEEE________________EEE____HHHHHHHHHHHHH_____

              .         .         .         .         .
101  YLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVLRGLGIQMSSSV  150
     __EEEE______HHHHHH__HHHHHHHHHHH_______HHHHH_EEE___

              .         .         .         .         .
151  GGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKGFEVRYYLVIVVEGE  200
     ________________________HHHHHHHHH____EEEE_EEEEEE__

              .         .         .         .         .
201  QDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGAGGGGGSHGGN  250
     _EEEEE______HHHHHHHHHH____EEEE____EEEEE___________

              .   
251  GAWRGGNGNGKGG                                       263
     _____________


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~


method         :         1
alpha-contents :       0.0 %
beta-contents  :      49.4 %
coil-contents  :      50.6 %
class          :      beta


method         :         2
alpha-contents :       0.0 %
beta-contents  :      49.8 %
coil-contents  :      50.2 %
class          :      beta


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

GPI: learning from metazoa
 -7.49   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00  -8.36  -2.06 -12.00 -12.00   0.00 -12.00   0.00  -53.90
 -8.98   0.00  -0.01   0.00   0.00   0.00   0.00   0.00   0.00  -6.63  -2.06 -12.00 -12.00   0.00 -12.00   0.00  -53.69
ID: CAD21200.1	AC: xxx Len:  263 1:I   245 Sc:  -53.69 Pv: 2.721824e-01 NO_GPI_SITE
GPI: learning from protozoa
 -6.63   0.00   0.00   0.00   0.00   0.00 -16.00   0.00   0.00  -6.27  -7.83 -12.00 -12.00   0.00 -12.00   0.00  -72.73
-11.12   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00  -6.70  -7.83 -12.00 -12.00   0.00 -12.00   0.00  -61.65
ID: CAD21200.1	AC: xxx Len:  263 1:I   244 Sc:  -61.65 Pv: 2.414226e-01 NO_GPI_SITE

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

# SignalP euk predictions
# name       Cmax  pos ?  Ymax  pos ?  Smax  pos ?  Smean ?
CAD21200.1   0.826  57 Y  0.717  57 Y  0.975  41 Y  0.620 Y
# SignalP gram- predictions
# name       Cmax  pos ?  Ymax  pos ?  Smax  pos ?  Smean ?
CAD21200.1   0.474 114 N  0.357 128 N  0.956 112 Y  0.430 N
# SignalP gram+ predictions
# name       Cmax  pos ?  Ymax  pos ?  Smax  pos ?  Smean ?
CAD21200.1   0.471  57 Y  0.485  57 Y  0.998  41 Y  0.721 Y

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

low complexity regions: SEG 12 2.2 2.5
>CAD21200.1 conserved hypothetical protein [Neurospora crassa].

                                  1-20   MLTTTPYLTIRRPSPTTAEF
                  tlttcppltlpl   21-32   
                                 33-77   RAALFGVLCLRFIAVLSVIIGIYAAFFSPT
                                         GLLPPPIFPSGRISF
              ldfdlnnfllhilhll   78-93   
                                 94-150  YISRPGQYLASLAISLPPYAVLALSALTSY
                                         IALFARIHTTESLLVLRGLGIQMSSSV
                 gggnffrlgggtf  151-163  
                                164-193  MKRTRFIPTEKIQDILINEAFKGFEVRYYL
                  vivvegeqdvvv  194-205  
                                206-237  CFPRLLPRRKIVERVWRGARGCLYEKDGPV
                                         LS
    agagggggshggngawrggngngkgg  238-263  

low complexity regions: SEG 25 3.0 3.3
>CAD21200.1 conserved hypothetical protein [Neurospora crassa].

                                  1-1    M
ltttpyltirrpspttaeftlttcppltlp    2-128  
lraalfgvlclrfiavlsviigiyaaffsp
tgllpppifpsgrisfldfdlnnfllhilh
llyisrpgqylaslaislppyavlalsalt
                       syialfa
                                129-131  RIH
ttesllvlrglgiqmsssvgggnffrlggg  132-169  
                      tfmkrtrf
                                170-237  IPTEKIQDILINEAFKGFEVRYYLVIVVEG
                                         EQDVVVCFPRLLPRRKIVERVWRGARGCLY
                                         EKDGPVLS
    agagggggshggngawrggngngkgg  238-263  

low complexity regions: SEG 45 3.4 3.75
>CAD21200.1 conserved hypothetical protein [Neurospora crassa].

mltttpyltirrpspttaeftlttcppltl    1-163  
plraalfgvlclrfiavlsviigiyaaffs
ptgllpppifpsgrisfldfdlnnfllhil
hllyisrpgqylaslaislppyavlalsal
tsyialfarihttesllvlrglgiqmsssv
                 gggnffrlgggtf
                                164-237  MKRTRFIPTEKIQDILINEAFKGFEVRYYL
                                         VIVVEGEQDVVVCFPRLLPRRKIVERVWRG
                                         ARGCLYEKDGPVLS
    agagggggshggngawrggngngkgg  238-263  


low complexity regions: XNU
# Score cutoff = 21, Search from offsets 1 to 4
# both members of each repeat flagged
# lambda = 0.347, K = 0.200, H = 0.664
>CAD21200.1 conserved hypothetical protein [Neurospora crassa].
MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSVIIGIYAAFFS
PTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQYLASLAISLPPYAVLALSAL
TSYIALFARIHTTESLLVLRGLGIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILI
NEAFKGFEVRyylvivvegeqdvvvCFPRLLPRRKIVERVWRGARGCLYEKDGPVLsaga
gggggshggngaWRGGNGNGKGG
    1 -  190 MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS 
             PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL 
             TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI 
             NEAFKGFEVR 
  191 -  205   yylvivvege qdvvv
  206 -  236 CFPRL LPRRKIVERV WRGARGCLYE KDGPVL
  237 -  252   saga gggggshggn ga
  253 -  263 WRGGNGNG KGG

low complexity regions: DUST
>CAD21200.1 conserved hypothetical protein [Neurospora crassa].
MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSVIIGIYAAFFS
PTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQYLASLAISLPPYAVLALSAL
TSYIALFARIHTTESLLVLRGLGIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILI
NEAFKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGA
GGGGGSHGGNGAWRGGNGNGKGG

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

coiled coil prediction for CAD21200.1
sequence: 263 amino acids, 0 residue(s) in coiled coil state

    .    |     .    |     .    |     .    |     .    |     .   60
MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK  -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local

    .    |     .    |     .    |     .    |     .    |     .  120
PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK  -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local

    .    |     .    |     .    |     .    |     .    |     .  180
TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK  -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local

    .    |     .    |     .    |     .    |     .    |     .  240
NEAFKGFEVR YYLVIVVEGE QDVVVCFPRL LPRRKIVERV WRGARGCLYE KDGPVLSAGA
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border
---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK  -w class
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif.
~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local

    .    |     .    |    
GGGGGSHGGN GAWRGGNGNG KGG
~~~~~~~~~~ ~~~~~~~~~~ ~~~
---------- ---------- ---
~~~~~~~~~~ ~~~~~~~~~~ ~~~
~~~~~~~~~~ ~~~~~~~~~~ ~~~
~~~~~~~~~~ ~~~~~~~~~~ ~~~
~~~~~~~~~~ ~~~~~~~~~~ ~~~


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

prediction of transmembrane regions with toppred2

     ***********************************
     *TOPPREDM with eukaryotic function*
     ***********************************

CAD21200.1.fa.___inter___ is a single sequence
Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale
Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok
Using sequence file: CAD21200.1.fa.___inter___

 (1 sequences)
MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSV
IIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQ
YLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVLRGLGIQMSSSV
GGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKGFEVRYYLVIVVEGE
QDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGAGGGGGSHGGN
GAWRGGNGNGKGG


(p)rokaryotic or (e)ukaryotic: e


Charge-pair energy: 0

Length of full window (odd number!): 21

Length of core window (odd number!): 11

Number of residues to add to each end of helix: 1

Critical length: 60

Upper cutoff for candidates: 1

Lower cutoff for candidates: 0.6
Total of 4 structures are to be tested


Candidate membrane-spanning segments:

 Helix Begin   End   Score Certainity
     1    21    41   0.826 Putative
     2    44    64   2.024 Certain
     3   108   128   1.597 Certain
     4   233   253   0.659 Putative

----------------------------------------------------------------------
Structure 1

Transmembrane segments included in this structure:
     Segment       1     2     3     4
 Loop length    20     2    43   104    10
 K+R profile  3.00        2.00        2.00      
                    1.00           +      
CYT-EXT prof     -           -           -      
                       -        0.27      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 6.00
Tm probability: 0.08
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 0.00
 (NEG-POS)/(NEG+POS): -0.3333
                 NEG: 1.0000
                 POS: 2.0000
-> Orientation: undecided

CYT-EXT difference:  -0.27
-> Orientation: N-in

----------------------------------------------------------------------
Structure 2

Transmembrane segments included in this structure:
     Segment       1     2     3
 Loop length    20     2    43   135
 K+R profile  3.00        2.00      
                    1.00           +      
CYT-EXT prof     -           -      
                       -        0.68      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 4.00
Tm probability: 0.57
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 0.00
 (NEG-POS)/(NEG+POS): -0.3333
                 NEG: 1.0000
                 POS: 2.0000
-> Orientation: undecided

CYT-EXT difference:  -0.68
-> Orientation: N-in

----------------------------------------------------------------------
Structure 3

Transmembrane segments included in this structure:
     Segment       2     3
 Loop length    43    43   135
 K+R profile  5.00           +      
                    2.00      
CYT-EXT prof     -        0.68      
                       -      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 3.00
Tm probability: 1.00
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 1.00
 (NEG-POS)/(NEG+POS): -0.6000
                 NEG: 1.0000
                 POS: 4.0000
-> Orientation: N-in

CYT-EXT difference:   0.68
-> Orientation: N-out

----------------------------------------------------------------------
Structure 4

Transmembrane segments included in this structure:
     Segment       2     3     4
 Loop length    43    43   104    10
 K+R profile  5.00           +      
                    2.00        2.00      
CYT-EXT prof     -        0.27      
                       -           -      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 1.00
Tm probability: 0.15
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 1.00
 (NEG-POS)/(NEG+POS): -0.6000
                 NEG: 1.0000
                 POS: 4.0000
-> Orientation: N-in

CYT-EXT difference:   0.27
-> Orientation: N-out

----------------------------------------------------------------------

"CAD21200" 263 
 21 41 #f 0.826042
 44 64 #t 2.02396
 108 128 #t 1.59687
 233 253 #f 0.659375


     ************************************
     *TOPPREDM with prokaryotic function*
     ************************************

CAD21200.1.fa.___inter___ is a single sequence
Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale
Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok
Using sequence file: CAD21200.1.fa.___inter___

 (1 sequences)
MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSV
IIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQ
YLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVLRGLGIQMSSSV
GGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKGFEVRYYLVIVVEGE
QDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGAGGGGGSHGGN
GAWRGGNGNGKGG


(p)rokaryotic or (e)ukaryotic: p


Charge-pair energy: 0

Length of full window (odd number!): 21

Length of core window (odd number!): 11

Number of residues to add to each end of helix: 1

Critical length: 60

Upper cutoff for candidates: 1

Lower cutoff for candidates: 0.6
Total of 4 structures are to be tested


Candidate membrane-spanning segments:

 Helix Begin   End   Score Certainity
     1    21    41   0.826 Putative
     2    44    64   2.024 Certain
     3   108   128   1.597 Certain
     4   233   253   0.659 Putative

----------------------------------------------------------------------
Structure 1

Transmembrane segments included in this structure:
     Segment       1     2     3     4
 Loop length    20     2    43   104    10
 K+R profile  2.00        2.00        2.00      
                    1.00           +      
CYT-EXT prof     -           -           -      
                       -        0.27      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 5.00
Tm probability: 0.08
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 0.00
 (NEG-POS)/(NEG+POS): -0.3333
                 NEG: 1.0000
                 POS: 2.0000
-> Orientation: undecided

CYT-EXT difference:  -0.27
-> Orientation: N-in

----------------------------------------------------------------------
Structure 2

Transmembrane segments included in this structure:
     Segment       1     2     3
 Loop length    20     2    43   135
 K+R profile  2.00        2.00      
                    1.00           +      
CYT-EXT prof     -           -      
                       -        0.68      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 3.00
Tm probability: 0.57
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 0.00
 (NEG-POS)/(NEG+POS): -0.3333
                 NEG: 1.0000
                 POS: 2.0000
-> Orientation: undecided

CYT-EXT difference:  -0.68
-> Orientation: N-in

----------------------------------------------------------------------
Structure 3

Transmembrane segments included in this structure:
     Segment       2     3
 Loop length    43    43   135
 K+R profile  4.00           +      
                    2.00      
CYT-EXT prof     -        0.68      
                       -      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 2.00
Tm probability: 1.00
-> Orientation: N-in

Charge-difference over N-terminal Tm (+-15 residues): 1.00
 (NEG-POS)/(NEG+POS): -0.6000
                 NEG: 1.0000
                 POS: 4.0000
-> Orientation: N-in

CYT-EXT difference:   0.68
-> Orientation: N-out

----------------------------------------------------------------------
Structure 4

Transmembrane segments included in this structure:
     Segment       2     3     4
 Loop length    43    43   104    10
 K+R profile  4.00           +      
                    2.00        2.00      
CYT-EXT prof     -        0.27      
                       -           -      
For CYT-EXT profile neg. values indicate cytoplasmic preference.


K+R difference: 0.00
Tm probability: 0.15
-> Orientation: undecided

Charge-difference over N-terminal Tm (+-15 residues): 1.00
 (NEG-POS)/(NEG+POS): -0.6000
                 NEG: 1.0000
                 POS: 4.0000
-> Orientation: N-in

CYT-EXT difference:   0.27
-> Orientation: N-out

----------------------------------------------------------------------

"CAD21200" 263 
 21 41 #f 0.826042
 44 64 #t 2.02396
 108 128 #t 1.59687
 233 253 #f 0.659375


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

SAPS.  Version of April 11, 1996.
Date run: Thu Feb 28 14:55:26 2002

File: /people/b_eisen/CAD21200.1.fa.___saps___
ID   CAD21200.1
DE   conserved hypothetical protein [Neurospora crassa].

number of residues:  263;   molecular weight:  28.4 kdal
 
         1  MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS 
        61  PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL 
       121  TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI 
       181  NEAFKGFEVR YYLVIVVEGE QDVVVCFPRL LPRRKIVERV WRGARGCLYE KDGPVLSAGA 
       241  GGGGGSHGGN GAWRGGNGNG KGG

--------------------------------------------------------------------------------
COMPOSITIONAL ANALYSIS (extremes relative to: swp23s)

A  : 18( 6.8%); C  :  4( 1.5%); D- :  5( 1.9%); E  :  9( 3.4%); F  : 17( 6.5%)
G+ : 33(12.5%); H  :  4( 1.5%); I  : 19( 7.2%); K  :  6( 2.3%); L+ : 37(14.1%)
M  :  3( 1.1%); N  :  7( 2.7%); P  : 18( 6.8%); Q- :  4( 1.5%); R  : 19( 7.2%)
S  : 16( 6.1%); T  : 17( 6.5%); V  : 16( 6.1%); W  :  2( 0.8%); Y  :  9( 3.4%)

KR      :   25 (  9.5%);   ED    - :   14 (  5.3%);   AGP     :   69 ( 26.2%);
KRED    :   39 ( 14.8%);   KR-ED   :   11 (  4.2%);   FIKMNY  :   61 ( 23.2%);
LVIFM   :   92 ( 35.0%);   ST      :   33 ( 12.5%).

--------------------------------------------------------------------------------
CHARGE DISTRIBUTIONAL ANALYSIS
 
         1  0000000000 ++000000-0 0000000000 00+0000000 00+0000000 0000000000 
        61  0000000000 000+0000-0 -000000000 000000+000 0000000000 0000000000 
       121  00000000+0 000-00000+ 0000000000 000000+000 0000++0+00 00-+00-000 
       181  0-00+00-0+ 0000000-0- 0-000000+0 00+++00-+0 0+00+0000- +-00000000 
       241  0000000000 000+000000 +00

A. CHARGE CLUSTERS.


Positive charge clusters (cmin =  9/30 or 12/45 or 14/60):  none


Negative charge clusters (cmin =  6/30 or  8/45 or 10/60):  none


Mixed charge clusters (cmin = 12/30 or 16/45 or 19/60):

 1) From  165 to  232:   see sequence above
                         see sequence above
    quartile: 3; size: 68, +count: 14, -count: 10, 0count: 44; t-value:  4.75 *
    V:  9 (13.2%);  E:  7 (10.3%);  R:  9 (13.2%);  LVIFM: 24 (35.3%);


B. HIGH SCORING (UN)CHARGED SEGMENTS.

There are no high scoring positive charge segments.
There are no high scoring negative charge segments.
There are no high scoring mixed charge segments.
There are no high scoring uncharged segments.


C. CHARGE RUNS AND PATTERNS.

pattern  (+)|  (-)|  (*)|  (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)|
lmin0     4 |   3 |   5 |  51 |   9 |   7 |  11 |  11 |   9 |  13 |   6 |   7 | 
lmin1     6 |   5 |   7 |  63 |  11 |   9 |  13 |  13 |  11 |  16 |   7 |   9 | 
lmin2     7 |   6 |   8 |  70 |  12 |  10 |  14 |  15 |  13 |  18 |   8 |  10 | 
 (Significance level: 0.010000; Minimal displayed length:  6)
There are no charge runs or patterns exceeding the given minimal lengths.

Run count statistics:

  +  runs >=   3:   1, at  213;
  -  runs >=   3:   0
  *  runs >=   3:   2, at  213;  230;
  0  runs >=  34:   0

--------------------------------------------------------------------------------
DISTRIBUTION OF OTHER AMINO ACID TYPES

1. HIGH SCORING SEGMENTS.
There are no high scoring hydrophobic segments.

____________________________________
High scoring transmembrane segments:

   5.00 (LVIF)   2.00 (AGM)   0.00 (BZX)  -1.00 (YCW)  -2.00 (ST)
  -6.00 (P)  -8.00 (H) -10.00 (NQ) -16.00 (KR) -17.00 (ED)

 Expected score/letter:  -1.582
 M_0.01=  107.5; M_0.05=  84.16;     M_0.30=  56.37

 1) From   34 to   65:  length= 32, score=71.00 
      34  AALFGVLCLR FIAVLSVIIG IYAAFFSPTG LL
    L:  6(18.8%);  A:  5(15.6%);  I:  4(12.5%);  F:  4(12.5%);


2. SPACINGS OF C.


H2N-24-C-15-C-164-C-20-C-36-COOH


2*. SPACINGS OF C and H. (additional deluxe function for ALEX)


H2N-24-C-15-C-46-H-2-H-39-H-74-C-20-C-19-H-16-COOH

--------------------------------------------------------------------------------
REPETITIVE STRUCTURES.

A. SEPARATED, TANDEM, AND PERIODIC REPEATS: amino acid alphabet.
Repeat core block length:  4

B. SEPARATED AND TANDEM REPEATS: 11-letter reduced alphabet.
   (i= LVIF; += KR; -= ED; s= AG; o= ST; n= NQ; a= YW; p= P; h= H; m= M; c= C)
Repeat core block length:  8

--------------------------------------------------------------------------------

MULTIPLETS.

A. AMINO ACID ALPHABET.

1. Total number of amino acid multiplets:  30  (Expected range:   5-- 31)

2. Histogram of spacings between consecutive amino acid multiplets:
   (1-5) 18   (6-10) 7   (11-20) 3   (>=21) 3

3. Clusters of amino acid multiplets (cmin = 17/30 or 22/45 or 26/60):  none

4. Long amino acid multiplets (>= 5; Letter/Length/Position):
    G/5/241


B. CHARGE ALPHABET.

1. Total number of charge multiplets:   3  (Expected range:   0--  8)
   3 +plets (f+: 9.5%), 0 -plets (f-: 5.3%)
   Total number of charge altplets: 3 (Critical number: 9)

2. Histogram of spacings between consecutive charge multiplets:
   (1-5) 0   (6-10) 0   (11-20) 1   (>=21) 3

--------------------------------------------------------------------------------
PERIODICITY ANALYSIS.

A. AMINO ACID ALPHABET (core:  4; !-core: 5)

Location	Period	Element		Copies	Core	Errors
  28-  43	 4	L...      	 4	 4  	 0
  78-  93	 4	L...      	 4	 4  	 0
 239- 252	 2	G.        	 6	 4  	 1
 241- 245	 1	G         	 5	 5 !	 0
 241- 268	 7	GG.G...   	 4	 4  	/0/0/./1/./././
 256- 263	 2	G.        	 4	 4  	 0


B. CHARGE ALPHABET ({+= KR; -= ED; 0}; core:  5; !-core: 5)
   and HYDROPHOBICITY ALPHABET ({*= KRED; i= LVIF; 0}; core:  6; !-core: 9)

Location	Period	Element		Copies	Core	Errors
  30-  71	 7	i.0.00.   	 6	 6  	/0/./2/./2/2/./
  36-  56	 3	i..       	 7	 7  	 0
  65-  99	 5	i0.0.     	 7	 7  	/0/2/./2/./
  65- 124	10	i0.0..0.0.	 6	 6  	/0/1/./1/././1/./1/./


--------------------------------------------------------------------------------
SPACING ANALYSIS.

Location (Quartile) Spacing     Rank       P-value   Interpretation

  62- 121  (2.)     T(  59)T     2 of  18   0.0039   large  2. maximal spacing
 172- 264  (4.)     T(  92)T     1 of  18   0.0089   large  1. maximal spacing


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with Pfam (from /data/patterns/pfam)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/pfam/Pfam
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
Sdh_cyt  Succinate dehydrogenase cytochrome b subunit   -67.7         75   1

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
Sdh_cyt    1/1       4   124 ..     1   125 []   -67.7       75

Alignments of top-scoring domains:
Sdh_cyt: domain 1 of 1, from 4 to 124: score -67.7, E = 75
                   *->kklnRPiSPHLTIYkpQlts...ilSIlHRISGvaLalgvllftllL
                      ++      P+LTI +p++t  + +l  +  ++  + a+++ ++ l +
  CAD21200.1     4    TT------PYLTIRRPSPTTaefTLTTCPPLTLPLRAALFGVLCLRF 44   

                   klltlslesfafyslsvwslnkfskwli....ivikvfilyalfYHlfnG
                   +++++   + + + +s   l  ++  ++++++i+++ f l  ++ H++  
  CAD21200.1    45 IAVLSVIIGIYAAFFSPTGLLPPP--IFpsgrISFLDFDLNNFLLHIL-- 90   

                   IRHLiWDlGygleiegvyksga......yivlvlsvvLall<-*
                     HL++      + +g+y  ++  + ++y+vl+ls++ + +   
  CAD21200.1    91 --HLLY-----ISRPGQYLASLaislppYAVLALSALTSYI    124  

//

Start with PfamFrag (from /data/patterns/pfam)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/pfam/PfamFrag
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model        Description                                Score    E-value  N 
--------     -----------                                -----    ------- ---
Ribosomal_S6 Ribosomal protein S6                         2.1         28   1

Parsed for domains:
Model        Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------     ------- ----- -----    ----- -----      -----  -------
Ribosomal_S6   1/1     191   211 ..    58    78 ..     2.1       28

Alignments of top-scoring domains:
Ribosomal_S6: domain 1 of 1, from 191 to 211: score 2.1, E = 28
                   *->iYvqinfegepqlVdeleRtl<-*
                      +Y++i +ege+++V+ + R+l   
  CAD21200.1   191    YYLVIVVEGEQDVVVCFPRLL    211  

//

Start with Repeat Library (from /data/patterns/repeats-Miguel-Andrade/hmm)
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/repeats-Miguel-Andrade/hmm/repeats.hmm-lib
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with Prosite
---------------------------------------------------------
|          ppsearch (c) 1994 EMBL Data Library          |
|       based on MacPattern (c) 1990-1994 R. Fuchs      |
---------------------------------------------------------

PROSITE pattern search started: Thu Feb 28 14:57:22 2002

Sequence file: CAD21200.1.fa

----------------------------------------
Sequence CAD21200.1 (263 residues):

Matching pattern PS00004 CAMP_PHOSPHO_SITE:
   11: RRPS
Total matches: 1

Matching pattern PS00005 PKC_PHOSPHO_SITE:
    9: TIR
   72: SGR
  172: TEK
Total matches: 3

Matching pattern PS00006 CK2_PHOSPHO_SITE:
   16: TTAE
   76: SFLD
Total matches: 2

Matching pattern PS00008 MYRISTYL:
   53: GIYAAF
   99: GQYLAS
  143: GIQMSS
  223: GARGCL
  239: GAGGGG
  241: GGGGGS
  242: GGGGSH
  244: GGSHGG
  245: GSHGGN
  248: GGNGAW
  251: GAWRGG
  255: GGNGNG
  256: GNGNGK
  258: GNGKGG
Total matches: 14

Total no of hits in this sequence: 20

========================================

1314 pattern(s) searched in 1 sequence(s), 263 residues.
Total no of hits in all sequences: 20.
Search time: 00:00 min

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with Profile Search

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

Start with motif search against own library
     ***** bioMotif : Version V41a DB, 1999 Nov 11 *****
          SeqTyp=2 : PROTEIN  search; 


>APC D-Box is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>ER-GOLGI-traffic signal is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M minimal SH3 binding  is the MOTIF name

>CAD21200.1 conserved hypothetical protein [Neurospora crassa]. ;LENGTH=263; DIRECT_SEQUENCE
n 1 solutions 
m %_PXXP 68-71
f

>STATISTICS Total   : 1 solutions in 1 sequences, 263 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M deubiquitinating enzyme SH3 domain binding motif (Kato, 2000) is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M minimal class I consensus-SH3 binding motif  is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M minimal class II consensus-SH3 binding motif  is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M exact 14-3-3 binding consensus (Muslin 1996 Cell 84 889) is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M 14-3-3 binding motif in RAF and others (Muslin 1996 Cell 84 889) is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M WW domain binding motif in formins (Bedford 1997) is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>INTRA-SIGNAL-M PY motif for WW domain is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>TM-CYTOPLASMIC-M di-hydrophobic endocytosis motifs for internalized transmembrane proteins is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>TM-CYTOPLASMIC-M tyrosine-based endocytosis motif for internalized transmembrane proteins is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>TM-EXTRACELL-M Endocytosis signal for internalized transmembrane proteins is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>EXTRACELL-M minimal furin protease cleavage site motif  is the MOTIF name

>CAD21200.1 conserved hypothetical protein [Neurospora crassa]. ;LENGTH=263; DIRECT_SEQUENCE
n 2 solutions 
m %_RXXR 219-222
f
m %_RXXR 222-225
f

>STATISTICS Total   : 2 solutions in 1 sequences, 263 units;  out of 1 sequences, 263 units

>EXTRACELL-M extended furin protease cleavage site motif  is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>EXTRACELL-M  zinc binding motif in MMPs is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>EXTRACELL-M g alpha binding go loco is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS PDX-1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS QKI-5 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS HCDA experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SV40 LrgT experimentally determined  is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS H2B experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS v-Rel experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Amida experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Amida experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS RanBP3 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Pho4p experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS DNAhelicaseQ1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS LEF-1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS TCF-1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR p53-NLS1 NLS experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hum-Ku70 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS GAL4 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS act/inh betaA experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS BDV-P experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS BDV-P experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS BDV-P experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS TR2 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS THOV NP experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS polyomaVP1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS HIV-1 Tat experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS HIV-1 Rev experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Rex experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SRY experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SRY experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SOX9 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SOX9 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS NS5A experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS DNAse EBV experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS DNAse EBV experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS adenovE1a experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS ystDNApolalpha experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hVDR experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS CPV capsid experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hGlu.cort.experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS cFOS experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS cJUN experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hDNApolalpha experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS  hDNAtopoII experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS  hDNAtopoII experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hBLM experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hARNT experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS influenzaNP experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS influenzaNP experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS p54 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS hProTalpha experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Tst1/Oct6 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS protHsc9 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS protHsci experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS protHsc3 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Ta alpha experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Pax-QNR experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Hunt.Dis.pro experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS MyoD experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS MyoD experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS opaque2 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS CTP experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS HCV experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS HCV experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS p110RB1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS VirD2-Nterm experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS VirD2-Cterm experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Nucloplasmin experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Nucleolin experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS ICP-8 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Nab2 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS M9 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS lscMyc experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS humKprotein experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS FluA experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Mat-alpha experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS polyoma Lrg-T experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS polyoma Lrg-T experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SV40 VP1 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS SV40 VP2 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS polyoma VP2 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS c-myb experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS N-myc experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS p53 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS c-erb-A experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS yeast SKI3 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS L29 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS L29 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS Max experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS L3 experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>NUCLEAR NLS dyskerin experimentally determined NLS is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>PDZ domain binding motif science 278_2075_pawson is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units

>WW domain binding motif science 278_2075_pawson is the MOTIF name

>STATISTICS Total   : 0 solutions in 0 sequences, 0 units;  out of 1 sequences, 263 units


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ ~~~

Start with HMM-search search against own library
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/own/own-hmm.lib
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/own/own-hmm-f.lib
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

L. Aravind's signalling DB+ PSSM from other authors
IMPALA version 1.1 [20-December-1999]


Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, 
Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), 
"IMPALA: Matching a Protein Sequence Against a Collection of 
"PSI-BLAST-Constructed Position-Specific Score Matrices",
Bioinformatics 15:1000-1011.

Query= CAD21200.1 conserved hypothetical protein [Neurospora crassa].
         (263 letters)

Searching..................................done
Results from profile search


                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

S1  S1 RNA binding domain                                          23  0.48
CYCLIN Cyclin/TFIIB domain                                         22  0.79
DHHC Novel zinc finger domain with DHHC signature                  22  1.3
AAA AAA+ ATPase Module                                             21  2.7
INSL Insulinase like Metallo protease domain                       20  3.9
UBA Ubiquitin pathway associated domain                            20  4.1
LRR Leucine rich repeats                                           19  6.8
VWA Von Willebrand factor A domain                                 19  7.6

>S1  S1 RNA binding domain 
          Length = 305

 Score = 23.0 bits (49), Expect = 0.48
 Identities = 8/56 (14%), Positives = 8/56 (14%), Gaps = 1/56 (1%)

Query: 168 RFIPTEKIQDILINEAFKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVERVWRG 223
                                                                   
Sbjct: 140 GFIPRSHLMHKDNMDALVGQVLKAHILEANQDNNKLVLTQRRIQQAE-SMGKIAAG 194


>CYCLIN Cyclin/TFIIB domain 
          Length = 317

 Score = 22.3 bits (47), Expect = 0.79
 Identities = 11/75 (14%), Positives = 11/75 (14%), Gaps = 23/75 (30%)

Query: 75  ISFLDFDLN-----NFLLHILHLLYISRPG-----------QYLASLAI-----SLPPYA 113
                                                                       
Sbjct: 145 IQQLNFHLIVHNPYRPFEGFLIDLKTRYPILENPEILRKTADDFLNRIALTDAYLLYTPS 204

Query: 114 VLALSALTSYIALFA 128
                          
Sbjct: 205 QIALTAI--LSSASR 217


>DHHC Novel zinc finger domain with DHHC signature 
          Length = 217

 Score = 21.6 bits (45), Expect = 1.3
 Identities = 12/56 (21%), Positives = 12/56 (21%), Gaps = 4/56 (7%)

Query: 42  LRFIAVLSVIIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISR 97
                                                                   
Sbjct: 51  LQIVAWLLYLFFAVIGFGILVPLLPHHWVPAGYACMGAI----FAGHLVVHLTAVS 102


>AAA AAA+ ATPase Module 
          Length = 298

 Score = 20.6 bits (42), Expect = 2.7
 Identities = 17/96 (17%), Positives = 17/96 (17%), Gaps = 10/96 (10%)

Query: 124 IALFARIHTTESLLVLRGLGIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILINEA 183
                                                                       
Sbjct: 142 IIFMDEIDSIGSRLEGGSGGDSEVQRTMLELLNQLDGFEATKNIKVIMATNRIDILDSAL 201

Query: 184 FKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVER 219
                                               
Sbjct: 202 LRPG--RIDRKIEFP--------PPNEEARLDILKI 227


>INSL Insulinase like Metallo protease domain 
          Length = 433

 Score = 19.9 bits (41), Expect = 3.9
 Identities = 7/47 (14%), Positives = 7/47 (14%), Gaps = 5/47 (10%)

Query: 160 GGTFMKRTRFIPTEKIQDILINEAFKGFEVRYY----LVIVVEGEQD 202
                                                          
Sbjct: 167 KVSPYRFPIIGFEETIRKFTR-EKLLKFYKSFYQPRNMAVVIVGKVN 212


>UBA Ubiquitin pathway associated domain 
          Length = 255

 Score = 20.0 bits (41), Expect = 4.1
 Identities = 7/33 (21%), Positives = 7/33 (21%)

Query: 230 EKDGPVLSAGAGGGGGSHGGNGAWRGGNGNGKG 262
                                            
Sbjct: 186 MQDVMEGADDMVEGEDIEVTGEAAAAGLGQGEG 218


 Score = 19.6 bits (40), Expect = 5.5
 Identities = 9/29 (31%), Positives = 9/29 (31%)

Query: 235 VLSAGAGGGGGSHGGNGAWRGGNGNGKGG 263
                                        
Sbjct: 95  LFAQAAQGGNASSGALGTTGGATDAAQGG 123


>LRR Leucine rich repeats 
          Length = 339

 Score = 19.1 bits (39), Expect = 6.8
 Identities = 6/24 (25%), Positives = 6/24 (25%)

Query: 62 TGLLPPPIFPSGRISFLDFDLNNF 85
                                  
Sbjct: 22 TAGIPTDIFRMKDLTIIDLSRNQL 45


>VWA Von Willebrand factor A domain 
          Length = 255

 Score = 19.1 bits (39), Expect = 7.6
 Identities = 7/59 (11%), Positives = 7/59 (11%), Gaps = 10/59 (16%)

Query: 147 SSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKG-FEVRYYLVIVVEGEQDVV 204
                                                                      
Sbjct: 63  SEAMLEKDL---------RPNRHAMIIQYAIDFVHEFFDQNPISQMGIIIMRNGLAQLV 112


Underlying Matrix: BLOSUM62
Number of sequences tested against query: 105
Number of sequences better than 10.0: 8 
Number of calls to ALIGN: 9 
Length of query: 263 
Total length of test sequences: 20182  
Effective length of test sequences: 16941.0
Effective search space size: 3944168.7
Initial X dropoff for ALIGN: 25.0 bits

Y. Wolf's SCOP PSSM
IMPALA version 1.1 [20-December-1999]


Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, 
Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), 
"IMPALA: Matching a Protein Sequence Against a Collection of 
"PSI-BLAST-Constructed Position-Specific Score Matrices",
Bioinformatics 15:1000-1011.

Query= CAD21200.1 conserved hypothetical protein [Neurospora crassa].
         (263 letters)

Searching.................................................done
Results from profile search


                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

gi|729418 [1..212] DNA-glycosylase                                 27  0.47
gi|2209100 [9..467] PLP-dependent transferases                     24  2.3
gi|999515 [1..176] NAD(P)-binding Rossmann-fold domains            24  2.3
gi|137178 [233..498] Ligand-binding domain of nuclear recept...    24  2.8
gi|115682 [1..213] CoA-dependent acetyltransferases                24  3.0
gi|266977 [1..97] Ferredoxin-like                                  23  4.9
gi|544221 [55..339] beta/alpha (TIM)-barrel                        23  5.2
gi|280504 [161..330] Glyceraldehyde-3-phosphate dehydrogenas...    22  8.7

>gi|729418 [1..212] DNA-glycosylase 
          Length = 212

 Score = 26.8 bits (59), Expect = 0.47
 Identities = 12/86 (13%), Positives = 12/86 (13%), Gaps = 13/86 (15%)

Query: 134 ESLLVLRGLGIQMSSSV----GGGNFFRLGGGTF--MKRTRFIPT----EKIQDILINEA 183
                                                                       
Sbjct: 110 DELVKLPGVGRKTANVVVSVAFGVPAIAVDTHVERVSKRLGICRWKDSVLEVEKTLMRKV 169

Query: 184 FKGFEVRYYLVIVVEGEQDVVVCFPR 209
                                     
Sbjct: 170 PKEDWSVTHHRLIFFGRY---HCKAQ 192


>gi|2209100 [9..467] PLP-dependent transferases 
          Length = 459

 Score = 24.4 bits (52), Expect = 2.3
 Identities = 14/101 (13%), Positives = 14/101 (13%), Gaps = 2/101 (1%)

Query: 85  FLLHILHLLYISRPGQYLASLAISLPPYAVLALSALTSYIALFARIHTTESLLV--LRGL 142
                                                                       
Sbjct: 281 EVYHECRTLCVVQEGFPTYGGLEGGAMERLAVGLHDGMRQEWLAYRIAQIEYLVAGLEKI 340

Query: 143 GIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILINEA 183
                                                    
Sbjct: 341 GVLCQQPGGHAAFVDAGKLLPHIPADQFPAQALSCELYKVA 381


>gi|999515 [1..176] NAD(P)-binding Rossmann-fold domains 
          Length = 176

 Score = 24.3 bits (52), Expect = 2.3
 Identities = 30/110 (27%), Positives = 30/110 (27%), Gaps = 27/110 (24%)

Query: 174 KIQDILINEAFKGFEV---------------RYYLVIVVEGEQDVVVCFPRLLPRRKIVE 218
                                                                       
Sbjct: 37  KVDDFLANEA-KGTKVLGAHSLEEMVSKLKKPRRIILLVKAGQAVDNFIEKLVPLLDIGD 95

Query: 219 RVWRGA--------RGCLYEKDGPVLSAGAGGGGGSHG---GNGAWRGGN 257
                                                             
Sbjct: 96  IIIDGGNSEYRDTMRRCRDLKDKGILFVGSGVSGGEDGARYGPSLMPGGN 145


>gi|137178 [233..498] Ligand-binding domain of nuclear receptor 
          Length = 266

 Score = 23.9 bits (51), Expect = 2.8
 Identities = 9/15 (60%), Positives = 9/15 (60%)

Query: 239 GAGGGGGSHGGNGAW 253
                          
Sbjct: 106 GAGGGGGGLGHDGSF 120


>gi|115682 [1..213] CoA-dependent acetyltransferases 
          Length = 213

 Score = 24.1 bits (52), Expect = 3.0
 Identities = 10/58 (17%), Positives = 10/58 (17%), Gaps = 9/58 (15%)

Query: 59  FSPTGLLPPPIFPSGRI---SFLDFDLN-----NFLLHILHL-LYISRPGQYLASLAI 107
                                                                     
Sbjct: 128 LFPQGNLPENHLNISSLPWVSFDGFNLNITGNDDYFAPVFTMAKFQQEGDRVLLPVSV 185


>gi|266977 [1..97] Ferredoxin-like 
          Length = 97

 Score = 23.3 bits (50), Expect = 4.9
 Identities = 11/32 (34%), Positives = 11/32 (34%), Gaps = 3/32 (9%)

Query: 191 YYLVIVVEGEQDVVVCFPRLLPRRKIVERVWR 222
                                           
Sbjct: 59  YFLWYQVEMPEDRVNDLAREL---RIRDNVRR 87


>gi|544221 [55..339] beta/alpha (TIM)-barrel 
          Length = 285

 Score = 23.0 bits (49), Expect = 5.2
 Identities = 8/56 (14%), Positives = 8/56 (14%), Gaps = 3/56 (5%)

Query: 134 ESLLVLRGLGIQMSSSVGGGNF-FRLGGGTFMKRTRFIPTEKIQDILINEAFKGFE 188
                                                                   
Sbjct: 71  KYLKPLQDKGIKVILSILGNHDRSGIANLSTARAKAFA--QELKNTCDLYNLDGVF 124


>gi|280504 [161..330] Glyceraldehyde-3-phosphate dehydrogenase-like, C-terminal domain 
          Length = 170

 Score = 22.5 bits (48), Expect = 8.7
 Identities = 5/29 (17%), Positives = 5/29 (17%)

Query: 130 IHTTESLLVLRGLGIQMSSSVGGGNFFRL 158
                                        
Sbjct: 137 IGCQYSSIVDALSTKVLPNPEGQGTLVKV 165


Underlying Matrix: BLOSUM62
Number of sequences tested against query: 1187
Number of sequences better than 10.0: 8 
Number of calls to ALIGN: 8 
Length of query: 263 
Total length of test sequences: 256703  
Effective length of test sequences: 214185.0
Effective search space size: 48653831.6
Initial X dropoff for ALIGN: 25.0 bits

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

calculation of internal repeats with prospero
***** PROSPERO v1.3  Thu Feb 28 14:57:56 2002 *****

Copyright 2000, Richard Mott, Wellcome Trust Centre for Human Genetics, University of Oxford
For help see http://www.well.ox.ac.uk/ariadne  For usage use -help
using gap penalty 11+1k
using matrix BLOSUM62
printing all alignments with eval < 0.100000
using sequence1 CAD21200.1
using self-comparison

> 1 CAD21200.1 len 263 from 241 to 256  vs  CAD21200.1 len 263 from 248 to 263   score 49  eval 8.022621e-03 identity 56.25% K 3.597031e-02 L 2.579737e-01 H 1.317791e+00 alpha 9.395090e-02

  241 GGGGGSHGGNGAWRGG   256  CAD21200.1
      || |   ||||  :||
  248 GGNGAWRGGNGNGKGG   263  CAD21200.1

> 2 CAD21200.1 len 263 from 238 to 260  vs  CAD21200.1 len 263 from 240 to 262   score 45  eval 2.235172e-02 identity 43.48% K 3.597031e-02 L 2.579737e-01 H 1.317791e+00 alpha 9.395090e-02

  238 AGAGGGGGSHGGNGAWRGGNGNG   260  CAD21200.1
      || |||     |      ||| |
  240 AGGGGGSHGGNGAWRGGNGNGKG   262  CAD21200.1


~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

TIGRFAM
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/tigrfam/tigrfam.hmm
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/tigrfam/tigrfam.hmm-f
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model     Description                                   Score    E-value  N 
--------  -----------                                   -----    ------- ---
TIGR00900 2A0121: H+ Antiporter protein                   3.9        5.7   1
TIGR01098 3A0109s03R: phosphonates-binding periplasmi    -0.6         24   1

Parsed for domains:
Model     Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------  ------- ----- -----    ----- -----      -----  -------
TIGR00900   1/1      30    62 ..    83   117 ..     3.9      5.7
TIGR01098   1/1     119   128 ..     1    10 [.    -0.6       24

Alignments of top-scoring domains:
TIGR00900: domain 1 of 1, from 30 to 62: score 3.9, E = 5.7
                   *->lpfvallgGVdvleiwmvyvvafIlaiaqaFFtPa<-*
                      lp+ a+l+G  vl ++ ++v+ +I +i +aFF+P+   
  CAD21200.1    30    LPLRAALFG--VLCLRFIAVLSVIIGIYAAFFSPT    62   

TIGR01098: domain 1 of 1, from 119 to 128: score -0.6, E = 24
                   *->AllSavaLfa<-*
                      Al+S +aLfa   
  CAD21200.1   119    ALTSYIALFA    128  

//
SMART
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/iprscan/data/smart.HMMs
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
	[no hits above thresholds]

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
	[no hits above thresholds]

Alignments of top-scoring domains:
	[no hits above thresholds]
//
COG
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/cogs/cogs.hmm
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
COG3247                                                 -90.6         29   1
COG0472                                                -191.7         98   1
COG1172                                                -271.4         71   1

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
COG3247    1/1       5   177 ..     1   198 []   -90.6       29
COG1172    1/1      25   215 ..     1   403 []  -271.4       71
COG0472    1/1       1   239 [.     1   432 []  -191.7       98

Alignments of top-scoring domains:
COG3247: domain 1 of 1, from 5 to 177: score -90.6, E = 29
                   *->mCmlaimeasplkadLerLkkHLwkavl.lsgVlaLilGvLvLafPa
                         l i  +sp  a+   L      ++l+l++ l   +GvL+L f  
  CAD21200.1     5    TPYLTIRRPSPTTAEFT-LTTC-PPLTLpLRAAL---FGVLCLRF-- 44   

                   vSldVlalvfGaylLv...sGvalvvaaFslrsdaqfrvLSlfLvgvasl
                     + Vl+ + G y+   +++G        s r +     L  fL  +  l
  CAD21200.1    45 --IAVLSVIIGIYAAFfspTGLLPPPIFPSGRISFLDFDLNNFLLHILHL 92   

                   l........lGllafrapelavlaLalfIAglflvaGVirlvSairdRks
                   l  +++++ l+ la+  p  avlaL +     +l+a + ++ S   +   
  CAD21200.1    93 LyisrpgqyLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVL--- 139  

                   lkgeWwsilvGvisiviaGilliAsPfvSvllvalvVGiylVfigvllva
                      +++ i    +s                   + v G      g   ++
  CAD21200.1   140 ---RGLGI---QMS-------------------SSVGGGNFFRLGGGTFM 164  

                   lAlllrKastlka<-*
                      ++  + ++     
  CAD21200.1   165 KRTRFIPTEKIQD    177  

COG1172: domain 1 of 1, from 25 to 215: score -271.4, E = 71
                   *->MmptslstaasastkkkklfkrnlreygllvaLliliaifsi.....
                        p +l++      ++  +    lr ++ l +++ ++a+f  +++  
  CAD21200.1    25    CPPLTLPL------RAALFGVLCLRFIAVLSVIIGIYAAFFSptgll 65   

                   ...lsPgsfn.nFLslnNllnIlrQtsvigilAvGMTfVIltgGIDLSVG
                   +++++P s+   FL+++ l n l  +  ++ +  +               
  CAD21200.1    66 pppIFP-SGRiSFLDFD-LNNFLLHILHLLYISRP--------------- 98   

                   SvlALagvvtAillqsgdnikvFgellgvplllaillgLllGaliGlinG
                         g  +A+l  s                l   ++L+l+al   i+ 
  CAD21200.1    99 ------GQYLASLAIS----------------LPPYAVLALSALTSYIA- 125  

                   llvaklKvppFIaTLgtmtifRGialliTdgvgGsPisgeftgipdsFaw
                    l a+      I T   ++ +RG+ ++++       ++g+       F++
  CAD21200.1   126 -LFAR------IHTTESLLVLRGLGIQMSS-----SVGGG------NFFR 157  

                   lgqgfirGlaligfVlwfvrsrqllqiafkvlkaliAvivlgaiflLngy
                   lg g +                                            
  CAD21200.1   158 LGGGTF-------------------------------------------- 163  

                   lGiPvpviialivliifwfllnKTrFGRniYAiGGNeeAArlSGInVkrv
                                       +++TrF                  I+++++
  CAD21200.1   164 --------------------MKRTRF------------------IPTEKI 175  

                   k.iavFalsGllaAlAGiilasRlgSAqPnAGvgyELDAIAAvViGGtSL
                    +i++                       + A  g+E              
  CAD21200.1   176 QdILI-----------------------NEAFKGFE-------------- 188  

                   aGGvGsiiGtviGaLIigvlnnGLnLLGVssywQqvvkGlvIlaAValDs
                                        ++L++  V +  Q+vv        V + +
  CAD21200.1   189 -------------------VRYYLVI--VVEGEQDVV--------VCFPR 209  

                   lklvalekklrrkkka<-*
                   l+           +++   
  CAD21200.1   210 LL----------PRRK    215  

COG0472: domain 1 of 1, from 1 to 239: score -191.7, E = 98
                   *->MLlmLa.llpalsslnlfsYltafrallalliafllsllltpifipf
                          +++ +p         Ylt +r      +++ ++l  +p +++ 
  CAD21200.1     1    ----MLtTTP---------YLTIRRPS---PTTAEFTLTTCPPLTLP 31   

                   lrklaikigqdirkdgpksHkshKagTPtmGGlaIllsflivlslllwag
                   lr  ++         + ++++           +a+l +  i+ ++ +++ 
  CAD21200.1    32 LRAALF--------GVLCLRF-----------IAVLSV--IIGIYAAFFS 60   

                   lnsganpyevevwlvLlvllgfgliGflDDlfklsrKnnkGLsakiKlll
                    +           l    +++ g i flD  f+l             +ll
  CAD21200.1    61 PT----------GLLPPPIFPSGRISFLD--FDLN-------N----FLL 87   

                   qfiaAvlllilllkfdgslltqlyiPFfkspsfdlgtllylvlavfalVg
                   + i+ +l ++  +++ +s                l  +   + av+al++
  CAD21200.1    88 H-ILHLLYISRPGQYLAS----------------LA-ISLPPYAVLALSA 119  

                   ssNAvNltDGLDGLAaGlsviaalalaliaylsgnvnfAqYLlipyipda
                                L + ++++a +     ++                   
  CAD21200.1   120 -------------LTSYIALFARIHTTESLLVL----------------R 140  

                   gelailclalaGAcLGFLwfNfyPGkAkvFMGDtGSlaLGavlgalavll
                   g    ++  + G+       Nf +                  lg+   + 
  CAD21200.1   141 GLGIQMSSSVGGG-------NFFR------------------LGGGTFMK 165  

                   klklq.ei...lllimggvfvietlsvilqvlsrklrkdptigkrifkma
                   +   +++i+++    +++  +++  +v     +++        +++  + 
  CAD21200.1   166 R---TrFIpteKIQDILINEAFKGFEVRYYLVIVVEG-----EQDVVVCF 207  

                   plHhHfelkgwglkftlrqflifiilcaigiLislslrllreakvvvrfW
                   p        +++                              +k+v r+W
  CAD21200.1   208 P-------RLLP-----------------------------RRKIVERVW 221  

                   iislilaliglatlllaavgvllavifaflrfviwlklrl<-*
                    ++  ++                            ++  +   
  CAD21200.1   222 RGARGCLYEKDGP----------------------VLSAG    239  

//
hmmpfam - search a single seq against HMM database
HMMER 2.1.1 (Dec 1998)
Copyright (C) 1992-1998 Washington University School of Medicine
HMMER is freely distributed under the GNU General Public License (GPL).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /data/patterns/cogs/cogs.hmm-f
Sequence file:            CAD21200.1.fa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query:  CAD21200.1  conserved hypothetical protein [Neurospora crassa].

Scores for sequence family classification (score includes all domains):
Model    Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
COG0842                                                   3.9        4.9   1
COG1254                                                   3.5         20   1
COG1407                                                   1.1         28   1
COG0762                                                   0.2         58   1
COG2386                                                  -0.2         72   1

Parsed for domains:
Model    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
COG0762    1/1      33    58 ..   164   199 .]     0.2       58
COG2386    1/1     113   135 ..   216   238 .]    -0.2       72
COG0842    1/1     100   138 ..   328   365 .]     3.9      4.9
COG1254    1/1     180   191 ..    97   107 .]     3.5       20
COG1407    1/1     183   220 ..   218   255 .]     1.1       28

Alignments of top-scoring domains:
COG0762: domain 1 of 1, from 33 to 58: score 0.2, E = 58
                   *->RAllvavliLqFldvlvlevlrvfalqiLpgllsil<-*
                      RA+l+ vl+L+F+ v          l+  +g+++++   
  CAD21200.1    33    RAALFGVLCLRFIAV----------LSVIIGIYAAF    58   

COG2386: domain 1 of 1, from 113 to 135: score -0.2, E = 72
                   *->AllalavtLspfAiaAalriSvs<-*
                      A lal++ +s +A +A ++  +s   
  CAD21200.1   113    AVLALSALTSYIALFARIHTTES    135  

COG0842: domain 1 of 1, from 100 to 138: score 3.9, E = 4.9
                   *->lg.aglsdvwfsllvLallgllllllgllllrrrekkar<-*
                      ++ a+l++ ++  +vLal +l+++++  +++  +e+++    
  CAD21200.1   100    QYlASLAISLPPYAVLALSALTSYIALFARIHTTESLLV    138  

COG1254: domain 1 of 1, from 180 to 191: score 3.5, E = 20
                   *->i.egFkdFeiry<-*
                      i+e Fk+Fe+ry   
  CAD21200.1   180    InEAFKGFEVRY    191  

COG1407: domain 1 of 1, from 183 to 220: score 1.1, E = 28
                   *->iLresdykefeviaitGEsigLlkfGtldDLlkiakkl<-*
                      +++++++++  vi+++GE   ++ f++l   +ki ++    
  CAD21200.1   183    AFKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVERV    220  

//