analysis of sequence from CAD21200.1.fa ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI NEAFKGFEVR YYLVIVVEGE QDVVVCFPRL LPRRKIVERV WRGARGCLYE KDGPVLSAGA GGGGGSHGGN GAWRGGNGNG KGG ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ sec.str. with predator > CAD21200.1 . . . . . 1 MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSV 50 _____EEEEEE________EEEE______HHHHHHHHH_HHHHHHHHHHH . . . . . 51 IIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQ 100 HHHHEEEEE________________EEE____HHHHHHHHHHHHH_____ . . . . . 101 YLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVLRGLGIQMSSSV 150 __EEEE______HHHHHH__HHHHHHHHHHH_______HHHHH_EEE___ . . . . . 151 GGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKGFEVRYYLVIVVEGE 200 ________________________HHHHHHHHH____EEEE_EEEEEE__ . . . . . 201 QDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGAGGGGGSHGGN 250 _EEEEE______HHHHHHHHHH____EEEE____EEEEE___________ . 251 GAWRGGNGNGKGG 263 _____________ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ method : 1 alpha-contents : 0.0 % beta-contents : 49.4 % coil-contents : 50.6 % class : beta method : 2 alpha-contents : 0.0 % beta-contents : 49.8 % coil-contents : 50.2 % class : beta ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ GPI: learning from metazoa -7.49 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -8.36 -2.06 -12.00 -12.00 0.00 -12.00 0.00 -53.90 -8.98 0.00 -0.01 0.00 0.00 0.00 0.00 0.00 0.00 -6.63 -2.06 -12.00 -12.00 0.00 -12.00 0.00 -53.69 ID: CAD21200.1 AC: xxx Len: 263 1:I 245 Sc: -53.69 Pv: 2.721824e-01 NO_GPI_SITE GPI: learning from protozoa -6.63 0.00 0.00 0.00 0.00 0.00 -16.00 0.00 0.00 -6.27 -7.83 -12.00 -12.00 0.00 -12.00 0.00 -72.73 -11.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -6.70 -7.83 -12.00 -12.00 0.00 -12.00 0.00 -61.65 ID: CAD21200.1 AC: xxx Len: 263 1:I 244 Sc: -61.65 Pv: 2.414226e-01 NO_GPI_SITE ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ # SignalP euk predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? CAD21200.1 0.826 57 Y 0.717 57 Y 0.975 41 Y 0.620 Y # SignalP gram- predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? CAD21200.1 0.474 114 N 0.357 128 N 0.956 112 Y 0.430 N # SignalP gram+ predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? CAD21200.1 0.471 57 Y 0.485 57 Y 0.998 41 Y 0.721 Y ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ low complexity regions: SEG 12 2.2 2.5 >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. 1-20 MLTTTPYLTIRRPSPTTAEF tlttcppltlpl 21-32 33-77 RAALFGVLCLRFIAVLSVIIGIYAAFFSPT GLLPPPIFPSGRISF ldfdlnnfllhilhll 78-93 94-150 YISRPGQYLASLAISLPPYAVLALSALTSY IALFARIHTTESLLVLRGLGIQMSSSV gggnffrlgggtf 151-163 164-193 MKRTRFIPTEKIQDILINEAFKGFEVRYYL vivvegeqdvvv 194-205 206-237 CFPRLLPRRKIVERVWRGARGCLYEKDGPV LS agagggggshggngawrggngngkgg 238-263 low complexity regions: SEG 25 3.0 3.3 >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. 1-1 M ltttpyltirrpspttaeftlttcppltlp 2-128 lraalfgvlclrfiavlsviigiyaaffsp tgllpppifpsgrisfldfdlnnfllhilh llyisrpgqylaslaislppyavlalsalt syialfa 129-131 RIH ttesllvlrglgiqmsssvgggnffrlggg 132-169 tfmkrtrf 170-237 IPTEKIQDILINEAFKGFEVRYYLVIVVEG EQDVVVCFPRLLPRRKIVERVWRGARGCLY EKDGPVLS agagggggshggngawrggngngkgg 238-263 low complexity regions: SEG 45 3.4 3.75 >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. mltttpyltirrpspttaeftlttcppltl 1-163 plraalfgvlclrfiavlsviigiyaaffs ptgllpppifpsgrisfldfdlnnfllhil hllyisrpgqylaslaislppyavlalsal tsyialfarihttesllvlrglgiqmsssv gggnffrlgggtf 164-237 MKRTRFIPTEKIQDILINEAFKGFEVRYYL VIVVEGEQDVVVCFPRLLPRRKIVERVWRG ARGCLYEKDGPVLS agagggggshggngawrggngngkgg 238-263 low complexity regions: XNU # Score cutoff = 21, Search from offsets 1 to 4 # both members of each repeat flagged # lambda = 0.347, K = 0.200, H = 0.664 >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSVIIGIYAAFFS PTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQYLASLAISLPPYAVLALSAL TSYIALFARIHTTESLLVLRGLGIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILI NEAFKGFEVRyylvivvegeqdvvvCFPRLLPRRKIVERVWRGARGCLYEKDGPVLsaga gggggshggngaWRGGNGNGKGG 1 - 190 MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI NEAFKGFEVR 191 - 205 yylvivvege qdvvv 206 - 236 CFPRL LPRRKIVERV WRGARGCLYE KDGPVL 237 - 252 saga gggggshggn ga 253 - 263 WRGGNGNG KGG low complexity regions: DUST >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSVIIGIYAAFFS PTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQYLASLAISLPPYAVLALSAL TSYIALFARIHTTESLLVLRGLGIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILI NEAFKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGA GGGGGSHGGNGAWRGGNGNGKGG ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ coiled coil prediction for CAD21200.1 sequence: 263 amino acids, 0 residue(s) in coiled coil state . | . | . | . | . | . 60 MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 120 PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 180 TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 240 NEAFKGFEVR YYLVIVVEGE QDVVVCFPRL LPRRKIVERV WRGARGCLYE KDGPVLSAGA ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | GGGGGSHGGN GAWRGGNGNG KGG ~~~~~~~~~~ ~~~~~~~~~~ ~~~ ---------- ---------- --- ~~~~~~~~~~ ~~~~~~~~~~ ~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ prediction of transmembrane regions with toppred2 *********************************** *TOPPREDM with eukaryotic function* *********************************** CAD21200.1.fa.___inter___ is a single sequence Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok Using sequence file: CAD21200.1.fa.___inter___ (1 sequences) MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSV IIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQ YLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVLRGLGIQMSSSV GGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKGFEVRYYLVIVVEGE QDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGAGGGGGSHGGN GAWRGGNGNGKGG (p)rokaryotic or (e)ukaryotic: e Charge-pair energy: 0 Length of full window (odd number!): 21 Length of core window (odd number!): 11 Number of residues to add to each end of helix: 1 Critical length: 60 Upper cutoff for candidates: 1 Lower cutoff for candidates: 0.6 Total of 4 structures are to be tested Candidate membrane-spanning segments: Helix Begin End Score Certainity 1 21 41 0.826 Putative 2 44 64 2.024 Certain 3 108 128 1.597 Certain 4 233 253 0.659 Putative ---------------------------------------------------------------------- Structure 1 Transmembrane segments included in this structure: Segment 1 2 3 4 Loop length 20 2 43 104 10 K+R profile 3.00 2.00 2.00 1.00 + CYT-EXT prof - - - - 0.27 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 6.00 Tm probability: 0.08 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): -0.3333 NEG: 1.0000 POS: 2.0000 -> Orientation: undecided CYT-EXT difference: -0.27 -> Orientation: N-in ---------------------------------------------------------------------- Structure 2 Transmembrane segments included in this structure: Segment 1 2 3 Loop length 20 2 43 135 K+R profile 3.00 2.00 1.00 + CYT-EXT prof - - - 0.68 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 4.00 Tm probability: 0.57 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): -0.3333 NEG: 1.0000 POS: 2.0000 -> Orientation: undecided CYT-EXT difference: -0.68 -> Orientation: N-in ---------------------------------------------------------------------- Structure 3 Transmembrane segments included in this structure: Segment 2 3 Loop length 43 43 135 K+R profile 5.00 + 2.00 CYT-EXT prof - 0.68 - For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 3.00 Tm probability: 1.00 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 1.00 (NEG-POS)/(NEG+POS): -0.6000 NEG: 1.0000 POS: 4.0000 -> Orientation: N-in CYT-EXT difference: 0.68 -> Orientation: N-out ---------------------------------------------------------------------- Structure 4 Transmembrane segments included in this structure: Segment 2 3 4 Loop length 43 43 104 10 K+R profile 5.00 + 2.00 2.00 CYT-EXT prof - 0.27 - - For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 1.00 Tm probability: 0.15 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 1.00 (NEG-POS)/(NEG+POS): -0.6000 NEG: 1.0000 POS: 4.0000 -> Orientation: N-in CYT-EXT difference: 0.27 -> Orientation: N-out ---------------------------------------------------------------------- "CAD21200" 263 21 41 #f 0.826042 44 64 #t 2.02396 108 128 #t 1.59687 233 253 #f 0.659375 ************************************ *TOPPREDM with prokaryotic function* ************************************ CAD21200.1.fa.___inter___ is a single sequence Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok Using sequence file: CAD21200.1.fa.___inter___ (1 sequences) MLTTTPYLTIRRPSPTTAEFTLTTCPPLTLPLRAALFGVLCLRFIAVLSV IIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISRPGQ YLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVLRGLGIQMSSSV GGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKGFEVRYYLVIVVEGE QDVVVCFPRLLPRRKIVERVWRGARGCLYEKDGPVLSAGAGGGGGSHGGN GAWRGGNGNGKGG (p)rokaryotic or (e)ukaryotic: p Charge-pair energy: 0 Length of full window (odd number!): 21 Length of core window (odd number!): 11 Number of residues to add to each end of helix: 1 Critical length: 60 Upper cutoff for candidates: 1 Lower cutoff for candidates: 0.6 Total of 4 structures are to be tested Candidate membrane-spanning segments: Helix Begin End Score Certainity 1 21 41 0.826 Putative 2 44 64 2.024 Certain 3 108 128 1.597 Certain 4 233 253 0.659 Putative ---------------------------------------------------------------------- Structure 1 Transmembrane segments included in this structure: Segment 1 2 3 4 Loop length 20 2 43 104 10 K+R profile 2.00 2.00 2.00 1.00 + CYT-EXT prof - - - - 0.27 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 5.00 Tm probability: 0.08 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): -0.3333 NEG: 1.0000 POS: 2.0000 -> Orientation: undecided CYT-EXT difference: -0.27 -> Orientation: N-in ---------------------------------------------------------------------- Structure 2 Transmembrane segments included in this structure: Segment 1 2 3 Loop length 20 2 43 135 K+R profile 2.00 2.00 1.00 + CYT-EXT prof - - - 0.68 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 3.00 Tm probability: 0.57 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): -0.3333 NEG: 1.0000 POS: 2.0000 -> Orientation: undecided CYT-EXT difference: -0.68 -> Orientation: N-in ---------------------------------------------------------------------- Structure 3 Transmembrane segments included in this structure: Segment 2 3 Loop length 43 43 135 K+R profile 4.00 + 2.00 CYT-EXT prof - 0.68 - For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 2.00 Tm probability: 1.00 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 1.00 (NEG-POS)/(NEG+POS): -0.6000 NEG: 1.0000 POS: 4.0000 -> Orientation: N-in CYT-EXT difference: 0.68 -> Orientation: N-out ---------------------------------------------------------------------- Structure 4 Transmembrane segments included in this structure: Segment 2 3 4 Loop length 43 43 104 10 K+R profile 4.00 + 2.00 2.00 CYT-EXT prof - 0.27 - - For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.15 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 1.00 (NEG-POS)/(NEG+POS): -0.6000 NEG: 1.0000 POS: 4.0000 -> Orientation: N-in CYT-EXT difference: 0.27 -> Orientation: N-out ---------------------------------------------------------------------- "CAD21200" 263 21 41 #f 0.826042 44 64 #t 2.02396 108 128 #t 1.59687 233 253 #f 0.659375 ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ SAPS. Version of April 11, 1996. Date run: Thu Feb 28 14:55:26 2002 File: /people/b_eisen/CAD21200.1.fa.___saps___ ID CAD21200.1 DE conserved hypothetical protein [Neurospora crassa]. number of residues: 263; molecular weight: 28.4 kdal 1 MLTTTPYLTI RRPSPTTAEF TLTTCPPLTL PLRAALFGVL CLRFIAVLSV IIGIYAAFFS 61 PTGLLPPPIF PSGRISFLDF DLNNFLLHIL HLLYISRPGQ YLASLAISLP PYAVLALSAL 121 TSYIALFARI HTTESLLVLR GLGIQMSSSV GGGNFFRLGG GTFMKRTRFI PTEKIQDILI 181 NEAFKGFEVR YYLVIVVEGE QDVVVCFPRL LPRRKIVERV WRGARGCLYE KDGPVLSAGA 241 GGGGGSHGGN GAWRGGNGNG KGG -------------------------------------------------------------------------------- COMPOSITIONAL ANALYSIS (extremes relative to: swp23s) A : 18( 6.8%); C : 4( 1.5%); D- : 5( 1.9%); E : 9( 3.4%); F : 17( 6.5%) G+ : 33(12.5%); H : 4( 1.5%); I : 19( 7.2%); K : 6( 2.3%); L+ : 37(14.1%) M : 3( 1.1%); N : 7( 2.7%); P : 18( 6.8%); Q- : 4( 1.5%); R : 19( 7.2%) S : 16( 6.1%); T : 17( 6.5%); V : 16( 6.1%); W : 2( 0.8%); Y : 9( 3.4%) KR : 25 ( 9.5%); ED - : 14 ( 5.3%); AGP : 69 ( 26.2%); KRED : 39 ( 14.8%); KR-ED : 11 ( 4.2%); FIKMNY : 61 ( 23.2%); LVIFM : 92 ( 35.0%); ST : 33 ( 12.5%). -------------------------------------------------------------------------------- CHARGE DISTRIBUTIONAL ANALYSIS 1 0000000000 ++000000-0 0000000000 00+0000000 00+0000000 0000000000 61 0000000000 000+0000-0 -000000000 000000+000 0000000000 0000000000 121 00000000+0 000-00000+ 0000000000 000000+000 0000++0+00 00-+00-000 181 0-00+00-0+ 0000000-0- 0-000000+0 00+++00-+0 0+00+0000- +-00000000 241 0000000000 000+000000 +00 A. CHARGE CLUSTERS. Positive charge clusters (cmin = 9/30 or 12/45 or 14/60): none Negative charge clusters (cmin = 6/30 or 8/45 or 10/60): none Mixed charge clusters (cmin = 12/30 or 16/45 or 19/60): 1) From 165 to 232: see sequence above see sequence above quartile: 3; size: 68, +count: 14, -count: 10, 0count: 44; t-value: 4.75 * V: 9 (13.2%); E: 7 (10.3%); R: 9 (13.2%); LVIFM: 24 (35.3%); B. HIGH SCORING (UN)CHARGED SEGMENTS. There are no high scoring positive charge segments. There are no high scoring negative charge segments. There are no high scoring mixed charge segments. There are no high scoring uncharged segments. C. CHARGE RUNS AND PATTERNS. pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)| lmin0 4 | 3 | 5 | 51 | 9 | 7 | 11 | 11 | 9 | 13 | 6 | 7 | lmin1 6 | 5 | 7 | 63 | 11 | 9 | 13 | 13 | 11 | 16 | 7 | 9 | lmin2 7 | 6 | 8 | 70 | 12 | 10 | 14 | 15 | 13 | 18 | 8 | 10 | (Significance level: 0.010000; Minimal displayed length: 6) There are no charge runs or patterns exceeding the given minimal lengths. Run count statistics: + runs >= 3: 1, at 213; - runs >= 3: 0 * runs >= 3: 2, at 213; 230; 0 runs >= 34: 0 -------------------------------------------------------------------------------- DISTRIBUTION OF OTHER AMINO ACID TYPES 1. HIGH SCORING SEGMENTS. There are no high scoring hydrophobic segments. ____________________________________ High scoring transmembrane segments: 5.00 (LVIF) 2.00 (AGM) 0.00 (BZX) -1.00 (YCW) -2.00 (ST) -6.00 (P) -8.00 (H) -10.00 (NQ) -16.00 (KR) -17.00 (ED) Expected score/letter: -1.582 M_0.01= 107.5; M_0.05= 84.16; M_0.30= 56.37 1) From 34 to 65: length= 32, score=71.00 34 AALFGVLCLR FIAVLSVIIG IYAAFFSPTG LL L: 6(18.8%); A: 5(15.6%); I: 4(12.5%); F: 4(12.5%); 2. SPACINGS OF C. H2N-24-C-15-C-164-C-20-C-36-COOH 2*. SPACINGS OF C and H. (additional deluxe function for ALEX) H2N-24-C-15-C-46-H-2-H-39-H-74-C-20-C-19-H-16-COOH -------------------------------------------------------------------------------- REPETITIVE STRUCTURES. A. SEPARATED, TANDEM, AND PERIODIC REPEATS: amino acid alphabet. Repeat core block length: 4 B. SEPARATED AND TANDEM REPEATS: 11-letter reduced alphabet. (i= LVIF; += KR; -= ED; s= AG; o= ST; n= NQ; a= YW; p= P; h= H; m= M; c= C) Repeat core block length: 8 -------------------------------------------------------------------------------- MULTIPLETS. A. AMINO ACID ALPHABET. 1. Total number of amino acid multiplets: 30 (Expected range: 5-- 31) 2. Histogram of spacings between consecutive amino acid multiplets: (1-5) 18 (6-10) 7 (11-20) 3 (>=21) 3 3. Clusters of amino acid multiplets (cmin = 17/30 or 22/45 or 26/60): none 4. Long amino acid multiplets (>= 5; Letter/Length/Position): G/5/241 B. CHARGE ALPHABET. 1. Total number of charge multiplets: 3 (Expected range: 0-- 8) 3 +plets (f+: 9.5%), 0 -plets (f-: 5.3%) Total number of charge altplets: 3 (Critical number: 9) 2. Histogram of spacings between consecutive charge multiplets: (1-5) 0 (6-10) 0 (11-20) 1 (>=21) 3 -------------------------------------------------------------------------------- PERIODICITY ANALYSIS. A. AMINO ACID ALPHABET (core: 4; !-core: 5) Location Period Element Copies Core Errors 28- 43 4 L... 4 4 0 78- 93 4 L... 4 4 0 239- 252 2 G. 6 4 1 241- 245 1 G 5 5 ! 0 241- 268 7 GG.G... 4 4 /0/0/./1/./././ 256- 263 2 G. 4 4 0 B. CHARGE ALPHABET ({+= KR; -= ED; 0}; core: 5; !-core: 5) and HYDROPHOBICITY ALPHABET ({*= KRED; i= LVIF; 0}; core: 6; !-core: 9) Location Period Element Copies Core Errors 30- 71 7 i.0.00. 6 6 /0/./2/./2/2/./ 36- 56 3 i.. 7 7 0 65- 99 5 i0.0. 7 7 /0/2/./2/./ 65- 124 10 i0.0..0.0. 6 6 /0/1/./1/././1/./1/./ -------------------------------------------------------------------------------- SPACING ANALYSIS. Location (Quartile) Spacing Rank P-value Interpretation 62- 121 (2.) T( 59)T 2 of 18 0.0039 large 2. maximal spacing 172- 264 (4.) T( 92)T 1 of 18 0.0089 large 1. maximal spacing ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Pfam (from /data/patterns/pfam) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/pfam/Pfam Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Sdh_cyt Succinate dehydrogenase cytochrome b subunit -67.7 75 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- Sdh_cyt 1/1 4 124 .. 1 125 [] -67.7 75 Alignments of top-scoring domains: Sdh_cyt: domain 1 of 1, from 4 to 124: score -67.7, E = 75 *->kklnRPiSPHLTIYkpQlts...ilSIlHRISGvaLalgvllftllL ++ P+LTI +p++t + +l + ++ + a+++ ++ l + CAD21200.1 4 TT------PYLTIRRPSPTTaefTLTTCPPLTLPLRAALFGVLCLRF 44 klltlslesfafyslsvwslnkfskwli....ivikvfilyalfYHlfnG +++++ + + + +s l ++ ++++++i+++ f l ++ H++ CAD21200.1 45 IAVLSVIIGIYAAFFSPTGLLPPP--IFpsgrISFLDFDLNNFLLHIL-- 90 IRHLiWDlGygleiegvyksga......yivlvlsvvLall<-* HL++ + +g+y ++ + ++y+vl+ls++ + + CAD21200.1 91 --HLLY-----ISRPGQYLASLaislppYAVLALSALTSYI 124 // Start with PfamFrag (from /data/patterns/pfam) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/pfam/PfamFrag Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Ribosomal_S6 Ribosomal protein S6 2.1 28 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- Ribosomal_S6 1/1 191 211 .. 58 78 .. 2.1 28 Alignments of top-scoring domains: Ribosomal_S6: domain 1 of 1, from 191 to 211: score 2.1, E = 28 *->iYvqinfegepqlVdeleRtl<-* +Y++i +ege+++V+ + R+l CAD21200.1 191 YYLVIVVEGEQDVVVCFPRLL 211 // Start with Repeat Library (from /data/patterns/repeats-Miguel-Andrade/hmm) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/repeats-Miguel-Andrade/hmm/repeats.hmm-lib Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Prosite --------------------------------------------------------- | ppsearch (c) 1994 EMBL Data Library | | based on MacPattern (c) 1990-1994 R. Fuchs | --------------------------------------------------------- PROSITE pattern search started: Thu Feb 28 14:57:22 2002 Sequence file: CAD21200.1.fa ---------------------------------------- Sequence CAD21200.1 (263 residues): Matching pattern PS00004 CAMP_PHOSPHO_SITE: 11: RRPS Total matches: 1 Matching pattern PS00005 PKC_PHOSPHO_SITE: 9: TIR 72: SGR 172: TEK Total matches: 3 Matching pattern PS00006 CK2_PHOSPHO_SITE: 16: TTAE 76: SFLD Total matches: 2 Matching pattern PS00008 MYRISTYL: 53: GIYAAF 99: GQYLAS 143: GIQMSS 223: GARGCL 239: GAGGGG 241: GGGGGS 242: GGGGSH 244: GGSHGG 245: GSHGGN 248: GGNGAW 251: GAWRGG 255: GGNGNG 256: GNGNGK 258: GNGKGG Total matches: 14 Total no of hits in this sequence: 20 ======================================== 1314 pattern(s) searched in 1 sequence(s), 263 residues. Total no of hits in all sequences: 20. Search time: 00:00 min ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Profile Search ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with motif search against own library ***** bioMotif : Version V41a DB, 1999 Nov 11 ***** SeqTyp=2 : PROTEIN search; >APC D-Box is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >ER-GOLGI-traffic signal is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M minimal SH3 binding is the MOTIF name >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. ;LENGTH=263; DIRECT_SEQUENCE n 1 solutions m %_PXXP 68-71 f >STATISTICS Total : 1 solutions in 1 sequences, 263 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M deubiquitinating enzyme SH3 domain binding motif (Kato, 2000) is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M minimal class I consensus-SH3 binding motif is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M minimal class II consensus-SH3 binding motif is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M exact 14-3-3 binding consensus (Muslin 1996 Cell 84 889) is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M 14-3-3 binding motif in RAF and others (Muslin 1996 Cell 84 889) is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M WW domain binding motif in formins (Bedford 1997) is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >INTRA-SIGNAL-M PY motif for WW domain is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >TM-CYTOPLASMIC-M di-hydrophobic endocytosis motifs for internalized transmembrane proteins is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >TM-CYTOPLASMIC-M tyrosine-based endocytosis motif for internalized transmembrane proteins is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >TM-EXTRACELL-M Endocytosis signal for internalized transmembrane proteins is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >EXTRACELL-M minimal furin protease cleavage site motif is the MOTIF name >CAD21200.1 conserved hypothetical protein [Neurospora crassa]. ;LENGTH=263; DIRECT_SEQUENCE n 2 solutions m %_RXXR 219-222 f m %_RXXR 222-225 f >STATISTICS Total : 2 solutions in 1 sequences, 263 units; out of 1 sequences, 263 units >EXTRACELL-M extended furin protease cleavage site motif is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >EXTRACELL-M zinc binding motif in MMPs is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >EXTRACELL-M g alpha binding go loco is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS PDX-1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS QKI-5 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS HCDA experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SV40 LrgT experimentally determined is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS H2B experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS v-Rel experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Amida experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Amida experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS RanBP3 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Pho4p experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS DNAhelicaseQ1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS LEF-1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS TCF-1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR p53-NLS1 NLS experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hum-Ku70 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS GAL4 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS act/inh betaA experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS BDV-P experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS BDV-P experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS BDV-P experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS TR2 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS THOV NP experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS polyomaVP1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS HIV-1 Tat experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS HIV-1 Rev experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Rex experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SRY experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SRY experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SOX9 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SOX9 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS NS5A experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS DNAse EBV experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS DNAse EBV experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS adenovE1a experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS ystDNApolalpha experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hVDR experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS CPV capsid experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hGlu.cort.experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS cFOS experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS cJUN experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hDNApolalpha experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hDNAtopoII experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hDNAtopoII experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hBLM experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hARNT experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS influenzaNP experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS influenzaNP experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS p54 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS hProTalpha experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Tst1/Oct6 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS protHsc9 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS protHsci experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS protHsc3 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Ta alpha experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Pax-QNR experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Hunt.Dis.pro experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS MyoD experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS MyoD experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS opaque2 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS CTP experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS HCV experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS HCV experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS p110RB1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS VirD2-Nterm experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS VirD2-Cterm experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Nucloplasmin experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Nucleolin experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS ICP-8 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Nab2 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS M9 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS lscMyc experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS humKprotein experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS FluA experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Mat-alpha experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS polyoma Lrg-T experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS polyoma Lrg-T experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SV40 VP1 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS SV40 VP2 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS polyoma VP2 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS c-myb experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS N-myc experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS p53 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS c-erb-A experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS yeast SKI3 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS L29 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS L29 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS Max experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS L3 experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >NUCLEAR NLS dyskerin experimentally determined NLS is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >PDZ domain binding motif science 278_2075_pawson is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units >WW domain binding motif science 278_2075_pawson is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 263 units ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ ~~~ Start with HMM-search search against own library hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/own/own-hmm.lib Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/own/own-hmm-f.lib Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ L. Aravind's signalling DB+ PSSM from other authors IMPALA version 1.1 [20-December-1999] Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), "IMPALA: Matching a Protein Sequence Against a Collection of "PSI-BLAST-Constructed Position-Specific Score Matrices", Bioinformatics 15:1000-1011. Query= CAD21200.1 conserved hypothetical protein [Neurospora crassa]. (263 letters) Searching..................................done Results from profile search Score E Sequences producing significant alignments: (bits) Value S1 S1 RNA binding domain 23 0.48 CYCLIN Cyclin/TFIIB domain 22 0.79 DHHC Novel zinc finger domain with DHHC signature 22 1.3 AAA AAA+ ATPase Module 21 2.7 INSL Insulinase like Metallo protease domain 20 3.9 UBA Ubiquitin pathway associated domain 20 4.1 LRR Leucine rich repeats 19 6.8 VWA Von Willebrand factor A domain 19 7.6 >S1 S1 RNA binding domain Length = 305 Score = 23.0 bits (49), Expect = 0.48 Identities = 8/56 (14%), Positives = 8/56 (14%), Gaps = 1/56 (1%) Query: 168 RFIPTEKIQDILINEAFKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVERVWRG 223 Sbjct: 140 GFIPRSHLMHKDNMDALVGQVLKAHILEANQDNNKLVLTQRRIQQAE-SMGKIAAG 194 >CYCLIN Cyclin/TFIIB domain Length = 317 Score = 22.3 bits (47), Expect = 0.79 Identities = 11/75 (14%), Positives = 11/75 (14%), Gaps = 23/75 (30%) Query: 75 ISFLDFDLN-----NFLLHILHLLYISRPG-----------QYLASLAI-----SLPPYA 113 Sbjct: 145 IQQLNFHLIVHNPYRPFEGFLIDLKTRYPILENPEILRKTADDFLNRIALTDAYLLYTPS 204 Query: 114 VLALSALTSYIALFA 128 Sbjct: 205 QIALTAI--LSSASR 217 >DHHC Novel zinc finger domain with DHHC signature Length = 217 Score = 21.6 bits (45), Expect = 1.3 Identities = 12/56 (21%), Positives = 12/56 (21%), Gaps = 4/56 (7%) Query: 42 LRFIAVLSVIIGIYAAFFSPTGLLPPPIFPSGRISFLDFDLNNFLLHILHLLYISR 97 Sbjct: 51 LQIVAWLLYLFFAVIGFGILVPLLPHHWVPAGYACMGAI----FAGHLVVHLTAVS 102 >AAA AAA+ ATPase Module Length = 298 Score = 20.6 bits (42), Expect = 2.7 Identities = 17/96 (17%), Positives = 17/96 (17%), Gaps = 10/96 (10%) Query: 124 IALFARIHTTESLLVLRGLGIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILINEA 183 Sbjct: 142 IIFMDEIDSIGSRLEGGSGGDSEVQRTMLELLNQLDGFEATKNIKVIMATNRIDILDSAL 201 Query: 184 FKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVER 219 Sbjct: 202 LRPG--RIDRKIEFP--------PPNEEARLDILKI 227 >INSL Insulinase like Metallo protease domain Length = 433 Score = 19.9 bits (41), Expect = 3.9 Identities = 7/47 (14%), Positives = 7/47 (14%), Gaps = 5/47 (10%) Query: 160 GGTFMKRTRFIPTEKIQDILINEAFKGFEVRYY----LVIVVEGEQD 202 Sbjct: 167 KVSPYRFPIIGFEETIRKFTR-EKLLKFYKSFYQPRNMAVVIVGKVN 212 >UBA Ubiquitin pathway associated domain Length = 255 Score = 20.0 bits (41), Expect = 4.1 Identities = 7/33 (21%), Positives = 7/33 (21%) Query: 230 EKDGPVLSAGAGGGGGSHGGNGAWRGGNGNGKG 262 Sbjct: 186 MQDVMEGADDMVEGEDIEVTGEAAAAGLGQGEG 218 Score = 19.6 bits (40), Expect = 5.5 Identities = 9/29 (31%), Positives = 9/29 (31%) Query: 235 VLSAGAGGGGGSHGGNGAWRGGNGNGKGG 263 Sbjct: 95 LFAQAAQGGNASSGALGTTGGATDAAQGG 123 >LRR Leucine rich repeats Length = 339 Score = 19.1 bits (39), Expect = 6.8 Identities = 6/24 (25%), Positives = 6/24 (25%) Query: 62 TGLLPPPIFPSGRISFLDFDLNNF 85 Sbjct: 22 TAGIPTDIFRMKDLTIIDLSRNQL 45 >VWA Von Willebrand factor A domain Length = 255 Score = 19.1 bits (39), Expect = 7.6 Identities = 7/59 (11%), Positives = 7/59 (11%), Gaps = 10/59 (16%) Query: 147 SSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILINEAFKG-FEVRYYLVIVVEGEQDVV 204 Sbjct: 63 SEAMLEKDL---------RPNRHAMIIQYAIDFVHEFFDQNPISQMGIIIMRNGLAQLV 112 Underlying Matrix: BLOSUM62 Number of sequences tested against query: 105 Number of sequences better than 10.0: 8 Number of calls to ALIGN: 9 Length of query: 263 Total length of test sequences: 20182 Effective length of test sequences: 16941.0 Effective search space size: 3944168.7 Initial X dropoff for ALIGN: 25.0 bits Y. Wolf's SCOP PSSM IMPALA version 1.1 [20-December-1999] Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), "IMPALA: Matching a Protein Sequence Against a Collection of "PSI-BLAST-Constructed Position-Specific Score Matrices", Bioinformatics 15:1000-1011. Query= CAD21200.1 conserved hypothetical protein [Neurospora crassa]. (263 letters) Searching.................................................done Results from profile search Score E Sequences producing significant alignments: (bits) Value gi|729418 [1..212] DNA-glycosylase 27 0.47 gi|2209100 [9..467] PLP-dependent transferases 24 2.3 gi|999515 [1..176] NAD(P)-binding Rossmann-fold domains 24 2.3 gi|137178 [233..498] Ligand-binding domain of nuclear recept... 24 2.8 gi|115682 [1..213] CoA-dependent acetyltransferases 24 3.0 gi|266977 [1..97] Ferredoxin-like 23 4.9 gi|544221 [55..339] beta/alpha (TIM)-barrel 23 5.2 gi|280504 [161..330] Glyceraldehyde-3-phosphate dehydrogenas... 22 8.7 >gi|729418 [1..212] DNA-glycosylase Length = 212 Score = 26.8 bits (59), Expect = 0.47 Identities = 12/86 (13%), Positives = 12/86 (13%), Gaps = 13/86 (15%) Query: 134 ESLLVLRGLGIQMSSSV----GGGNFFRLGGGTF--MKRTRFIPT----EKIQDILINEA 183 Sbjct: 110 DELVKLPGVGRKTANVVVSVAFGVPAIAVDTHVERVSKRLGICRWKDSVLEVEKTLMRKV 169 Query: 184 FKGFEVRYYLVIVVEGEQDVVVCFPR 209 Sbjct: 170 PKEDWSVTHHRLIFFGRY---HCKAQ 192 >gi|2209100 [9..467] PLP-dependent transferases Length = 459 Score = 24.4 bits (52), Expect = 2.3 Identities = 14/101 (13%), Positives = 14/101 (13%), Gaps = 2/101 (1%) Query: 85 FLLHILHLLYISRPGQYLASLAISLPPYAVLALSALTSYIALFARIHTTESLLV--LRGL 142 Sbjct: 281 EVYHECRTLCVVQEGFPTYGGLEGGAMERLAVGLHDGMRQEWLAYRIAQIEYLVAGLEKI 340 Query: 143 GIQMSSSVGGGNFFRLGGGTFMKRTRFIPTEKIQDILINEA 183 Sbjct: 341 GVLCQQPGGHAAFVDAGKLLPHIPADQFPAQALSCELYKVA 381 >gi|999515 [1..176] NAD(P)-binding Rossmann-fold domains Length = 176 Score = 24.3 bits (52), Expect = 2.3 Identities = 30/110 (27%), Positives = 30/110 (27%), Gaps = 27/110 (24%) Query: 174 KIQDILINEAFKGFEV---------------RYYLVIVVEGEQDVVVCFPRLLPRRKIVE 218 Sbjct: 37 KVDDFLANEA-KGTKVLGAHSLEEMVSKLKKPRRIILLVKAGQAVDNFIEKLVPLLDIGD 95 Query: 219 RVWRGA--------RGCLYEKDGPVLSAGAGGGGGSHG---GNGAWRGGN 257 Sbjct: 96 IIIDGGNSEYRDTMRRCRDLKDKGILFVGSGVSGGEDGARYGPSLMPGGN 145 >gi|137178 [233..498] Ligand-binding domain of nuclear receptor Length = 266 Score = 23.9 bits (51), Expect = 2.8 Identities = 9/15 (60%), Positives = 9/15 (60%) Query: 239 GAGGGGGSHGGNGAW 253 Sbjct: 106 GAGGGGGGLGHDGSF 120 >gi|115682 [1..213] CoA-dependent acetyltransferases Length = 213 Score = 24.1 bits (52), Expect = 3.0 Identities = 10/58 (17%), Positives = 10/58 (17%), Gaps = 9/58 (15%) Query: 59 FSPTGLLPPPIFPSGRI---SFLDFDLN-----NFLLHILHL-LYISRPGQYLASLAI 107 Sbjct: 128 LFPQGNLPENHLNISSLPWVSFDGFNLNITGNDDYFAPVFTMAKFQQEGDRVLLPVSV 185 >gi|266977 [1..97] Ferredoxin-like Length = 97 Score = 23.3 bits (50), Expect = 4.9 Identities = 11/32 (34%), Positives = 11/32 (34%), Gaps = 3/32 (9%) Query: 191 YYLVIVVEGEQDVVVCFPRLLPRRKIVERVWR 222 Sbjct: 59 YFLWYQVEMPEDRVNDLAREL---RIRDNVRR 87 >gi|544221 [55..339] beta/alpha (TIM)-barrel Length = 285 Score = 23.0 bits (49), Expect = 5.2 Identities = 8/56 (14%), Positives = 8/56 (14%), Gaps = 3/56 (5%) Query: 134 ESLLVLRGLGIQMSSSVGGGNF-FRLGGGTFMKRTRFIPTEKIQDILINEAFKGFE 188 Sbjct: 71 KYLKPLQDKGIKVILSILGNHDRSGIANLSTARAKAFA--QELKNTCDLYNLDGVF 124 >gi|280504 [161..330] Glyceraldehyde-3-phosphate dehydrogenase-like, C-terminal domain Length = 170 Score = 22.5 bits (48), Expect = 8.7 Identities = 5/29 (17%), Positives = 5/29 (17%) Query: 130 IHTTESLLVLRGLGIQMSSSVGGGNFFRL 158 Sbjct: 137 IGCQYSSIVDALSTKVLPNPEGQGTLVKV 165 Underlying Matrix: BLOSUM62 Number of sequences tested against query: 1187 Number of sequences better than 10.0: 8 Number of calls to ALIGN: 8 Length of query: 263 Total length of test sequences: 256703 Effective length of test sequences: 214185.0 Effective search space size: 48653831.6 Initial X dropoff for ALIGN: 25.0 bits ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ calculation of internal repeats with prospero ***** PROSPERO v1.3 Thu Feb 28 14:57:56 2002 ***** Copyright 2000, Richard Mott, Wellcome Trust Centre for Human Genetics, University of Oxford For help see http://www.well.ox.ac.uk/ariadne For usage use -help using gap penalty 11+1k using matrix BLOSUM62 printing all alignments with eval < 0.100000 using sequence1 CAD21200.1 using self-comparison > 1 CAD21200.1 len 263 from 241 to 256 vs CAD21200.1 len 263 from 248 to 263 score 49 eval 8.022621e-03 identity 56.25% K 3.597031e-02 L 2.579737e-01 H 1.317791e+00 alpha 9.395090e-02 241 GGGGGSHGGNGAWRGG 256 CAD21200.1 || | |||| :|| 248 GGNGAWRGGNGNGKGG 263 CAD21200.1 > 2 CAD21200.1 len 263 from 238 to 260 vs CAD21200.1 len 263 from 240 to 262 score 45 eval 2.235172e-02 identity 43.48% K 3.597031e-02 L 2.579737e-01 H 1.317791e+00 alpha 9.395090e-02 238 AGAGGGGGSHGGNGAWRGGNGNG 260 CAD21200.1 || ||| | ||| | 240 AGGGGGSHGGNGAWRGGNGNGKG 262 CAD21200.1 ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ TIGRFAM hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/tigrfam/tigrfam.hmm Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/tigrfam/tigrfam.hmm-f Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- TIGR00900 2A0121: H+ Antiporter protein 3.9 5.7 1 TIGR01098 3A0109s03R: phosphonates-binding periplasmi -0.6 24 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- TIGR00900 1/1 30 62 .. 83 117 .. 3.9 5.7 TIGR01098 1/1 119 128 .. 1 10 [. -0.6 24 Alignments of top-scoring domains: TIGR00900: domain 1 of 1, from 30 to 62: score 3.9, E = 5.7 *->lpfvallgGVdvleiwmvyvvafIlaiaqaFFtPa<-* lp+ a+l+G vl ++ ++v+ +I +i +aFF+P+ CAD21200.1 30 LPLRAALFG--VLCLRFIAVLSVIIGIYAAFFSPT 62 TIGR01098: domain 1 of 1, from 119 to 128: score -0.6, E = 24 *->AllSavaLfa<-* Al+S +aLfa CAD21200.1 119 ALTSYIALFA 128 // SMART hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/iprscan/data/smart.HMMs Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // COG hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/cogs/cogs.hmm Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- COG3247 -90.6 29 1 COG0472 -191.7 98 1 COG1172 -271.4 71 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- COG3247 1/1 5 177 .. 1 198 [] -90.6 29 COG1172 1/1 25 215 .. 1 403 [] -271.4 71 COG0472 1/1 1 239 [. 1 432 [] -191.7 98 Alignments of top-scoring domains: COG3247: domain 1 of 1, from 5 to 177: score -90.6, E = 29 *->mCmlaimeasplkadLerLkkHLwkavl.lsgVlaLilGvLvLafPa l i +sp a+ L ++l+l++ l +GvL+L f CAD21200.1 5 TPYLTIRRPSPTTAEFT-LTTC-PPLTLpLRAAL---FGVLCLRF-- 44 vSldVlalvfGaylLv...sGvalvvaaFslrsdaqfrvLSlfLvgvasl + Vl+ + G y+ +++G s r + L fL + l CAD21200.1 45 --IAVLSVIIGIYAAFfspTGLLPPPIFPSGRISFLDFDLNNFLLHILHL 92 l........lGllafrapelavlaLalfIAglflvaGVirlvSairdRks l +++++ l+ la+ p avlaL + +l+a + ++ S + CAD21200.1 93 LyisrpgqyLASLAISLPPYAVLALSALTSYIALFARIHTTESLLVL--- 139 lkgeWwsilvGvisiviaGilliAsPfvSvllvalvVGiylVfigvllva +++ i +s + v G g ++ CAD21200.1 140 ---RGLGI---QMS-------------------SSVGGGNFFRLGGGTFM 164 lAlllrKastlka<-* ++ + ++ CAD21200.1 165 KRTRFIPTEKIQD 177 COG1172: domain 1 of 1, from 25 to 215: score -271.4, E = 71 *->MmptslstaasastkkkklfkrnlreygllvaLliliaifsi..... p +l++ ++ + lr ++ l +++ ++a+f +++ CAD21200.1 25 CPPLTLPL------RAALFGVLCLRFIAVLSVIIGIYAAFFSptgll 65 ...lsPgsfn.nFLslnNllnIlrQtsvigilAvGMTfVIltgGIDLSVG +++++P s+ FL+++ l n l + ++ + + CAD21200.1 66 pppIFP-SGRiSFLDFD-LNNFLLHILHLLYISRP--------------- 98 SvlALagvvtAillqsgdnikvFgellgvplllaillgLllGaliGlinG g +A+l s l ++L+l+al i+ CAD21200.1 99 ------GQYLASLAIS----------------LPPYAVLALSALTSYIA- 125 llvaklKvppFIaTLgtmtifRGialliTdgvgGsPisgeftgipdsFaw l a+ I T ++ +RG+ ++++ ++g+ F++ CAD21200.1 126 -LFAR------IHTTESLLVLRGLGIQMSS-----SVGGG------NFFR 157 lgqgfirGlaligfVlwfvrsrqllqiafkvlkaliAvivlgaiflLngy lg g + CAD21200.1 158 LGGGTF-------------------------------------------- 163 lGiPvpviialivliifwfllnKTrFGRniYAiGGNeeAArlSGInVkrv +++TrF I+++++ CAD21200.1 164 --------------------MKRTRF------------------IPTEKI 175 k.iavFalsGllaAlAGiilasRlgSAqPnAGvgyELDAIAAvViGGtSL +i++ + A g+E CAD21200.1 176 QdILI-----------------------NEAFKGFE-------------- 188 aGGvGsiiGtviGaLIigvlnnGLnLLGVssywQqvvkGlvIlaAValDs ++L++ V + Q+vv V + + CAD21200.1 189 -------------------VRYYLVI--VVEGEQDVV--------VCFPR 209 lklvalekklrrkkka<-* l+ +++ CAD21200.1 210 LL----------PRRK 215 COG0472: domain 1 of 1, from 1 to 239: score -191.7, E = 98 *->MLlmLa.llpalsslnlfsYltafrallalliafllsllltpifipf +++ +p Ylt +r +++ ++l +p +++ CAD21200.1 1 ----MLtTTP---------YLTIRRPS---PTTAEFTLTTCPPLTLP 31 lrklaikigqdirkdgpksHkshKagTPtmGGlaIllsflivlslllwag lr ++ + ++++ +a+l + i+ ++ +++ CAD21200.1 32 LRAALF--------GVLCLRF-----------IAVLSV--IIGIYAAFFS 60 lnsganpyevevwlvLlvllgfgliGflDDlfklsrKnnkGLsakiKlll + l +++ g i flD f+l +ll CAD21200.1 61 PT----------GLLPPPIFPSGRISFLD--FDLN-------N----FLL 87 qfiaAvlllilllkfdgslltqlyiPFfkspsfdlgtllylvlavfalVg + i+ +l ++ +++ +s l + + av+al++ CAD21200.1 88 H-ILHLLYISRPGQYLAS----------------LA-ISLPPYAVLALSA 119 ssNAvNltDGLDGLAaGlsviaalalaliaylsgnvnfAqYLlipyipda L + ++++a + ++ CAD21200.1 120 -------------LTSYIALFARIHTTESLLVL----------------R 140 gelailclalaGAcLGFLwfNfyPGkAkvFMGDtGSlaLGavlgalavll g ++ + G+ Nf + lg+ + CAD21200.1 141 GLGIQMSSSVGGG-------NFFR------------------LGGGTFMK 165 klklq.ei...lllimggvfvietlsvilqvlsrklrkdptigkrifkma + +++i+++ +++ +++ +v +++ +++ + CAD21200.1 166 R---TrFIpteKIQDILINEAFKGFEVRYYLVIVVEG-----EQDVVVCF 207 plHhHfelkgwglkftlrqflifiilcaigiLislslrllreakvvvrfW p +++ +k+v r+W CAD21200.1 208 P-------RLLP-----------------------------RRKIVERVW 221 iislilaliglatlllaavgvllavifaflrfviwlklrl<-* ++ ++ ++ + CAD21200.1 222 RGARGCLYEKDGP----------------------VLSAG 239 // hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/cogs/cogs.hmm-f Sequence file: CAD21200.1.fa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CAD21200.1 conserved hypothetical protein [Neurospora crassa]. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- COG0842 3.9 4.9 1 COG1254 3.5 20 1 COG1407 1.1 28 1 COG0762 0.2 58 1 COG2386 -0.2 72 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- COG0762 1/1 33 58 .. 164 199 .] 0.2 58 COG2386 1/1 113 135 .. 216 238 .] -0.2 72 COG0842 1/1 100 138 .. 328 365 .] 3.9 4.9 COG1254 1/1 180 191 .. 97 107 .] 3.5 20 COG1407 1/1 183 220 .. 218 255 .] 1.1 28 Alignments of top-scoring domains: COG0762: domain 1 of 1, from 33 to 58: score 0.2, E = 58 *->RAllvavliLqFldvlvlevlrvfalqiLpgllsil<-* RA+l+ vl+L+F+ v l+ +g+++++ CAD21200.1 33 RAALFGVLCLRFIAV----------LSVIIGIYAAF 58 COG2386: domain 1 of 1, from 113 to 135: score -0.2, E = 72 *->AllalavtLspfAiaAalriSvs<-* A lal++ +s +A +A ++ +s CAD21200.1 113 AVLALSALTSYIALFARIHTTES 135 COG0842: domain 1 of 1, from 100 to 138: score 3.9, E = 4.9 *->lg.aglsdvwfsllvLallgllllllgllllrrrekkar<-* ++ a+l++ ++ +vLal +l+++++ +++ +e+++ CAD21200.1 100 QYlASLAISLPPYAVLALSALTSYIALFARIHTTESLLV 138 COG1254: domain 1 of 1, from 180 to 191: score 3.5, E = 20 *->i.egFkdFeiry<-* i+e Fk+Fe+ry CAD21200.1 180 InEAFKGFEVRY 191 COG1407: domain 1 of 1, from 183 to 220: score 1.1, E = 28 *->iLresdykefeviaitGEsigLlkfGtldDLlkiakkl<-* +++++++++ vi+++GE ++ f++l +ki ++ CAD21200.1 183 AFKGFEVRYYLVIVVEGEQDVVVCFPRLLPRRKIVERV 220 //