analysis of sequence from tem14 ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ >gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATEDEYYRRPLQV LRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQE A ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ sec.str. with predator > gi|30366|emb|CAA38478.1| . . . . . 1 MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF 50 ___HHHHHHHHHHHHHHHHHHH__________________HHHHHHHHHH . . . . . 51 AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN 100 HHHHHH_________HHHHHHHHHHHH_____EEEEEEEEEEEE______ . . . . 101 LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQE 140 __________HHHHHHHHH_EEEEEE___HHHHHH_____ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ method : 1 alpha-contents : 75.8 % beta-contents : 1.5 % coil-contents : 22.7 % class : alpha method : 2 alpha-contents : 61.7 % beta-contents : 0.0 % coil-contents : 38.3 % class : alpha ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ GPI: learning from metazoa -14.70 -5.84 -3.19 -5.14 -4.00 0.00 -32.00 0.00 -0.08 -6.61 -1.84 -12.00 -12.00 0.00 0.00 0.00 -97.40 -7.02 -5.79 -5.85 -2.15 0.00 0.00 -24.00 0.00 -0.06 -6.61 -1.84 -12.00 -12.00 0.00 -12.00 0.00 -89.32 ID: gi|30366|emb|CAA38478.1| AC: xxx Len: 140 1:I 129 Sc: -89.32 Pv: 9.270793e-01 NO_GPI_SITE GPI: learning from protozoa -14.45 -7.00 -3.85 -1.49 -4.00 0.00 -32.00 0.00 0.00 -5.73 -7.18 -12.00 -12.00 0.00 0.00 0.00 -99.70 -22.52 -2.64 -1.42 -2.38 -4.00 0.00 0.00 0.00 -1.28 -8.14 -7.18 -12.00 -12.00 -12.00 -12.00 0.00 -97.56 ID: gi|30366|emb|CAA38478.1| AC: xxx Len: 140 1:I 119 Sc: -97.56 Pv: 8.274222e-01 NO_GPI_SITE ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ # SignalP euk predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? gi|30366|em 0.842 21 Y 0.840 21 Y 0.979 11 Y 0.949 Y # SignalP gram- predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? gi|30366|em 0.540 19 Y 0.650 19 Y 0.993 9 Y 0.956 Y # SignalP gram+ predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? gi|30366|em 0.258 59 N 0.348 19 Y 0.997 9 Y 0.952 Y ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ low complexity regions: SEG 12 2.2 2.5 >gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] 1-4 MARP lctllllmatlagala 5-20 21-141 SSSKEENRIIPGGIYDADLNDEWVQRALHF AISEYNKATEDEYYRRPLQVLRAREQTFGG VNYFFDVEVGRTICTKSQPNLDTCAFHEQP ELQKKQLCSFEIYEVPWEDRMSLVNSRCQE A low complexity regions: SEG 25 3.0 3.3 >gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] 1-141 MARPLCTLLLLMATLAGALASSSKEENRII PGGIYDADLNDEWVQRALHFAISEYNKATE DEYYRRPLQVLRAREQTFGGVNYFFDVEVG RTICTKSQPNLDTCAFHEQPELQKKQLCSF EIYEVPWEDRMSLVNSRCQEA low complexity regions: SEG 45 3.4 3.75 >gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] 1-141 MARPLCTLLLLMATLAGALASSSKEENRII PGGIYDADLNDEWVQRALHFAISEYNKATE DEYYRRPLQVLRAREQTFGGVNYFFDVEVG RTICTKSQPNLDTCAFHEQPELQKKQLCSF EIYEVPWEDRMSLVNSRCQEA low complexity regions: XNU # Score cutoff = 21, Search from offsets 1 to 4 # both members of each repeat flagged # lambda = 0.347, K = 0.200, H = 0.664 >gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] MARPLCTllllmatlagalassskeenriiPGGIYDADLNDEWVQRALHFAISEYNKATE DEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSF EIYEVPWEDRMSLVNSRCQEA 1 - 7 MARPLCT 8 - 30 lll lmatlagala ssskeenrii 31 - 141 PGGIYDADLN DEWVQRALHF AISEYNKATE DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF EIYEVPWEDR MSLVNSRCQE A low complexity regions: DUST >gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATE DEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSF EIYEVPWEDRMSLVNSRCQEA ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ coiled coil prediction for gi|30366|emb|CAA38478.1| sequence: 140 amino acids, 0 residue(s) in coiled coil state . | . | . | . | . | . 60 MARPLCTLLL LMATLAGALA SSSKEENRII PGGIYDADLN DEWVQRALHF AISEYNKATE ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 120 DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | EIYEVPWEDR MSLVNSRCQE ~~~~~~~~~~ ~~~~~~~~~~ ---------- ---------- ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ prediction of transmembrane regions with toppred2 *********************************** *TOPPREDM with eukaryotic function* *********************************** tem14.___inter___ is a single sequence Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok Using sequence file: tem14.___inter___ (1 sequences) MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQEA (p)rokaryotic or (e)ukaryotic: e Charge-pair energy: 0 Length of full window (odd number!): 21 Length of core window (odd number!): 11 Number of residues to add to each end of helix: 1 Critical length: 60 Upper cutoff for candidates: 1 Lower cutoff for candidates: 0.6 Total of 1 structures are to be tested Candidate membrane-spanning segments: Helix Begin End Score Certainity 1 4 24 1.888 Certain ---------------------------------------------------------------------- Structure 1 Transmembrane segments included in this structure: Segment 1 Loop length 3 117 K+R profile 2.00 + CYT-EXT prof - -0.40 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 2.00 Tm probability: 1.00 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 3.00 (NEG-POS)/(NEG+POS): -1.0000 NEG: 0.0000 POS: 1.0000 -> Orientation: N-in CYT-EXT difference: 0.40 -> Orientation: N-out ---------------------------------------------------------------------- "tem14" 141 4 24 #t 1.8875 ************************************ *TOPPREDM with prokaryotic function* ************************************ tem14.___inter___ is a single sequence Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok Using sequence file: tem14.___inter___ (1 sequences) MARPLCTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHF AISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPN LDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQEA (p)rokaryotic or (e)ukaryotic: p Charge-pair energy: 0 Length of full window (odd number!): 21 Length of core window (odd number!): 11 Number of residues to add to each end of helix: 1 Critical length: 60 Upper cutoff for candidates: 1 Lower cutoff for candidates: 0.6 Total of 1 structures are to be tested Candidate membrane-spanning segments: Helix Begin End Score Certainity 1 4 24 1.888 Certain ---------------------------------------------------------------------- Structure 1 Transmembrane segments included in this structure: Segment 1 Loop length 3 117 K+R profile 2.00 + CYT-EXT prof - -0.40 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 2.00 Tm probability: 1.00 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 3.00 (NEG-POS)/(NEG+POS): -1.0000 NEG: 0.0000 POS: 1.0000 -> Orientation: N-in CYT-EXT difference: 0.40 -> Orientation: N-out ---------------------------------------------------------------------- "tem14" 141 4 24 #t 1.8875 ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ NOW EXECUTING: /bio_software/1D/stat/saps/saps-stroh/SAPS.SSPA/saps /people/maria/tem14.___saps___ SAPS. Version of April 11, 1996. Date run: Tue Oct 31 14:35:46 2000 File: /people/maria/tem14.___saps___ ID gi|30366|emb|CAA38478.1| DE cystain S [Homo sapiens] number of residues: 141; molecular weight: 16.2 kdal 1 MARPLCTLLL LMATLAGALA SSSKEENRII PGGIYDADLN DEWVQRALHF AISEYNKATE 61 DEYYRRPLQV LRAREQTFGG VNYFFDVEVG RTICTKSQPN LDTCAFHEQP ELQKKQLCSF 121 EIYEVPWEDR MSLVNSRCQE A -------------------------------------------------------------------------------- COMPOSITIONAL ANALYSIS (extremes relative to: swp23s) A : 12( 8.5%); C : 5( 3.5%); D : 7( 5.0%); E : 14( 9.9%); F : 6( 4.3%) G : 6( 4.3%); H : 2( 1.4%); I : 6( 4.3%); K : 5( 3.5%); L : 15(10.6%) M : 3( 2.1%); N : 6( 4.3%); P : 6( 4.3%); Q : 8( 5.7%); R : 10( 7.1%) S : 8( 5.7%); T : 7( 5.0%); V : 7( 5.0%); W : 2( 1.4%); Y : 6( 4.3%) KR : 15 ( 10.6%); ED : 21 ( 14.9%); AGP : 24 ( 17.0%); KRED : 36 ( 25.5%); KR-ED : -6 ( -4.3%); FIKMNY : 32 ( 22.7%); LVIFM : 37 ( 26.2%); ST : 15 ( 10.6%). -------------------------------------------------------------------------------- CHARGE DISTRIBUTIONAL ANALYSIS 1 00+0000000 0000000000 000+--0+00 00000-0-00 --000+0000 000-00+00- 61 --00++0000 0+0+-00000 00000-0-00 +0000+0000 0-00000-00 -00++00000 121 -00-000--+ 000000+00- 0 A. CHARGE CLUSTERS. Positive charge clusters (cmin = 9/30 or 13/45 or 15/60): none Negative charge clusters (cmin = 12/30 or 16/45 or 19/60): none Mixed charge clusters (cmin = 17/30 or 23/45 or 28/60): none B. HIGH SCORING (UN)CHARGED SEGMENTS. There are no high scoring positive charge segments. There are no high scoring negative charge segments. There are no high scoring mixed charge segments. There are no high scoring uncharged segments. C. CHARGE RUNS AND PATTERNS. pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)| lmin0 4 | 5 | 7 | 28 | 8 | 9 | 12 | 9 | 11 | 14 | 5 | 7 | lmin1 6 | 6 | 9 | 34 | 10 | 11 | 14 | 12 | 13 | 16 | 7 | 8 | lmin2 7 | 7 | 10 | 37 | 11 | 13 | 16 | 13 | 15 | 18 | 8 | 10 | (Significance level: 0.010000; Minimal displayed length: 6) There are no charge runs or patterns exceeding the given minimal lengths. Run count statistics: + runs >= 3: 0 - runs >= 3: 1, at 60; * runs >= 5: 0 0 runs >= 19: 1, at 4; -------------------------------------------------------------------------------- DISTRIBUTION OF OTHER AMINO ACID TYPES 1. HIGH SCORING SEGMENTS. There are no high scoring hydrophobic segments. ____________________________________ High scoring transmembrane segments: 5.00 (LVIF) 2.00 (AGM) 0.00 (BZX) -1.00 (YCW) -2.00 (ST) -6.00 (P) -8.00 (H) -10.00 (NQ) -16.00 (KR) -17.00 (ED) Expected score/letter: -4.397 M_0.01= 45.75; M_0.05= 36.60; M_0.30= 25.71 1) From 5 to 20: length= 16, score=42.00 * 5 LCTLLLLMAT LAGALA L: 7(43.8%); A: 4(25.0%); T: 2(12.5%); 2. SPACINGS OF C. H2N-5-C-87-C-9-C-13-C-19-C-3-COOH 2*. SPACINGS OF C and H. (additional deluxe function for ALEX) H2N-5-C-42-H-44-C-9-C-2-H-10-C-19-C-3-COOH -------------------------------------------------------------------------------- REPETITIVE STRUCTURES. A. SEPARATED, TANDEM, AND PERIODIC REPEATS: amino acid alphabet. Repeat core block length: 4 B. SEPARATED AND TANDEM REPEATS: 11-letter reduced alphabet. (i= LVIF; += KR; -= ED; s= AG; o= ST; n= NQ; a= YW; p= P; h= H; m= M; c= C) Repeat core block length: 8 -------------------------------------------------------------------------------- MULTIPLETS. A. AMINO ACID ALPHABET. 1. Total number of amino acid multiplets: 10 (Expected range: 0-- 17) 2. Histogram of spacings between consecutive amino acid multiplets: (1-5) 5 (6-10) 2 (11-20) 1 (>=21) 3 3. Clusters of amino acid multiplets (cmin = 12/30 or 16/45 or 19/60): none B. CHARGE ALPHABET. 1. Total number of charge multiplets: 6 (Expected range: 0 -- 11) 2 +plets (f+: 10.6%), 4 -plets (f-: 14.9%) Total number of charge altplets: 3 (Critical number: 12) 2. Histogram of spacings between consecutive charge multiplets: (1-5) 1 (6-10) 0 (11-20) 4 (>=21) 2 -------------------------------------------------------------------------------- PERIODICITY ANALYSIS. A. AMINO ACID ALPHABET (core: 4; !-core: 5) Location Period Element Copies Core Errors 8- 11 1 L 4 4 0 B. CHARGE ALPHABET ({+= KR; -= ED; 0}; core: 5; !-core: 6) and HYDROPHOBICITY ALPHABET ({*= KRED; i= LVIF; 0}; core: 6; !-core: 8) Location Period Element Copies Core Errors There are no periodicities of the prescribed length. -------------------------------------------------------------------------------- SPACING ANALYSIS. There are no unusual spacings. ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Pfam (from /data/patterns/pfam) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/pfam/Pfam Sequence file: tem14 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- cystatin Cystatin domain 140.6 9.4e-40 1 SecA_protein SecA protein, amino terminal region -2.4 83 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- SecA_protein 1/1 41 52 .. 199 210 .. -2.4 83 cystatin 1/1 32 137 .. 1 111 [] 140.6 9.4e-40 Alignments of top-scoring domains: SecA_protein: domain 1 of 1, from 41 to 52: score -2.4, E = 83 *->eElVQRpfnFAI<-* +E VQR ++FAI gi|30366|e 41 DEWVQRALHFAI 52 cystatin: domain 1 of 1, from 32 to 137: score 140.6, E = 9.4e-40 *->GglspaddNendpevqeaadfAvaeyNeks.dgykfelvevvraksQ Gg+ +ad nd++vq+a++fA++eyN+ ++d+y+ ++++v+ra+ Q gi|30366|e 32 GGIYDADL--NDEWVQRALHFAISEYNKATeDEYYRRPLQVLRAREQ 76 vVaGtltnYyievevgettCskeskkdledCplldqpeeawegfCkfqvf +G+ nY+++vevg+t C+k s+++l++C +++qpe++++ +C+f+++ gi|30366|e 77 TFGGV--NYFFDVEVGRTICTK-SQPNLDTCAFHEQPELQKKQLCSFEIY 123 kkpwegelsvltktC<-* ++pwe ++ +l ++ gi|30366|e 124 EVPWE-DRMSLVNSR 137 // Start with PfamFrag (from /data/patterns/pfam) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/pfam/PfamFrag Sequence file: tem14 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- cystatin Cystatin domain 140.6 9.4e-40 1 SecA_protein SecA protein, amino terminal region -2.4 83 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- SecA_protein 1/1 41 52 .. 199 210 .. -2.4 83 cystatin 1/1 32 137 .. 1 111 [] 140.6 9.4e-40 Alignments of top-scoring domains: SecA_protein: domain 1 of 1, from 41 to 52: score -2.4, E = 83 *->eElVQRpfnFAI<-* +E VQR ++FAI gi|30366|e 41 DEWVQRALHFAI 52 cystatin: domain 1 of 1, from 32 to 137: score 140.6, E = 9.4e-40 *->GglspaddNendpevqeaadfAvaeyNeks.dgykfelvevvraksQ Gg+ +ad nd++vq+a++fA++eyN+ ++d+y+ ++++v+ra+ Q gi|30366|e 32 GGIYDADL--NDEWVQRALHFAISEYNKATeDEYYRRPLQVLRAREQ 76 vVaGtltnYyievevgettCskeskkdledCplldqpeeawegfCkfqvf +G+ nY+++vevg+t C+k s+++l++C +++qpe++++ +C+f+++ gi|30366|e 77 TFGGV--NYFFDVEVGRTICTK-SQPNLDTCAFHEQPELQKKQLCSFEIY 123 kkpwegelsvltktC<-* ++pwe ++ +l ++ gi|30366|e 124 EVPWE-DRMSLVNSR 137 // Start with Repeat Library (from /data/patterns/repeats-Miguel-Andrade/hmm) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/repeats-Miguel-Andrade/hmm/repeats.hmm-lib Sequence file: tem14 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Prosite --------------------------------------------------------- | ppsearch (c) 1994 EMBL Data Library | | based on MacPattern (c) 1990-1994 R. Fuchs | --------------------------------------------------------- PROSITE pattern search started: Tue Oct 31 14:37:32 2000 Sequence file: tem14 ---------------------------------------- Sequence gi|30366|emb|CAA38478.1| (141 residues): Matching pattern PS00005 PKC_PHOSPHO_SITE: 22: SSK Total matches: 1 Matching pattern PS00006 CK2_PHOSPHO_SITE: 22: SSKE 23: SKEE 59: TEDE Total matches: 3 Matching pattern PS00008 MYRISTYL: 17: GALASS 33: GIYDAD Total matches: 2 Matching pattern PS00287 CYSTATIN: 75: EQTFGGVNYFFDVE Total matches: 1 Total no of hits in this sequence: 7 ======================================== 1314 pattern(s) searched in 1 sequence(s), 141 residues. Total no of hits in all sequences: 7. Search time: 00:00 min ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Profile Search ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with motif search against own library ***** bioMotif : Version V41a DB, 1999 Nov 11 ***** argv[1]=P argv[2]=-m /data/patterns/own/motif.fa argv[4]=-seq tem14 ***** bioMotif : Version V41a DB, 1999 Nov 11 ***** SeqTyp=2 : PROTEIN search; >APC D-Box is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 141 units ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ ~~~ Start with HMM-search search against own library hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/own/own-hmm.lib Sequence file: tem14 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/own/own-hmm-f.lib Sequence file: tem14 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ L. Aravind's signalling DB IMPALA version 1.1 [20-December-1999] Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), "IMPALA: Matching a Protein Sequence Against a Collection of "PSI-BLAST-Constructed Position-Specific Score Matrices", Bioinformatics 15:1000-1011. Query= gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] (210 letters) Searching..................................done Results from profile search Score E Sequences producing significant alignments: (bits) Value PAP Papain/bleomycin hydrolase like domain 20 4.2 DNASE1 DNASE-1/Sphingomyelinase like domain 19 4.5 ANK Ankyrin repeat 19 5.9 CATH Cathepsin like protease domain 19 7.2 CYCL cyclophilin like peptidyl prolyl isomerases 18 9.8 >PAP Papain/bleomycin hydrolase like domain Length = 376 Score = 19.6 bits (40), Expect = 4.2 Identities = 9/15 (60%), Positives = 10/15 (66%) Query: 2 ARPLCTLLLLMATLA 16 A P C L LL+A LA Sbjct: 5 AHPSCLLALLVAGLA 19 >DNASE1 DNASE-1/Sphingomyelinase like domain Length = 388 Score = 19.3 bits (39), Expect = 4.5 Identities = 3/44 (6%), Positives = 12/44 (26%), Gaps = 1/44 (2%) Query: 63 YYRRPLQVLRAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAF 106 + + Q++ + + + +V + F Sbjct: 197 FLQDRFQLVNSAKIRLSARTLKTN-QVAIAETLQCCETGRQLCF 239 >ANK Ankyrin repeat Length = 323 Score = 19.1 bits (39), Expect = 5.9 Identities = 5/22 (22%), Positives = 10/22 (44%) Query: 36 DADLNDEWVQRALHFAISEYNK 57 DL D+ + A A+ ++ Sbjct: 288 RTDLKDDADRTAADIAVQLGHR 309 >CATH Cathepsin like protease domain Length = 371 Score = 18.7 bits (38), Expect = 7.2 Identities = 6/27 (22%), Positives = 8/27 (29%), Gaps = 8/27 (29%) Query: 77 TFGGVNY--------FFDVEVGRTICT 95 GG +Y G T+C Sbjct: 299 VLGGKSYALTPNQYVLKVTVQGETLCL 325 Score = 18.7 bits (38), Expect = 7.2 Identities = 6/27 (22%), Positives = 8/27 (29%), Gaps = 8/27 (29%) Query: 147 TFGGVNY--------FFDVEVGRTICT 165 GG +Y G T+C Sbjct: 299 VLGGKSYALTPNQYVLKVTVQGETLCL 325 >CYCL cyclophilin like peptidyl prolyl isomerases Length = 165 Score = 18.2 bits (37), Expect = 9.8 Identities = 4/11 (36%), Positives = 5/11 (45%) Query: 84 FFDVEVGRTIC 94 FFD+ V Sbjct: 7 FFDIAVDGEPL 17 Score = 18.2 bits (37), Expect = 9.8 Identities = 4/11 (36%), Positives = 5/11 (45%) Query: 154 FFDVEVGRTIC 164 FFD+ V Sbjct: 7 FFDIAVDGEPL 17 Underlying Matrix: BLOSUM62 Number of sequences tested against query: 105 Number of sequences better than 10.0: 5 Number of calls to ALIGN: 7 Length of query: 210 Total length of test sequences: 20182 Effective length of test sequences: 16738.0 Effective search space size: 2978524.2 Initial X dropoff for ALIGN: 25.0 bits Y. Wolf's SCOP PSSM IMPALA version 1.1 [20-December-1999] Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), "IMPALA: Matching a Protein Sequence Against a Collection of "PSI-BLAST-Constructed Position-Specific Score Matrices", Bioinformatics 15:1000-1011. Query= gi|30366|emb|CAA38478.1| cystain S [Homo sapiens] (210 letters) Searching.................................................done Results from profile search Score E Sequences producing significant alignments: (bits) Value gi|118183 [37..144] Cystatin-like 146 4e-37 gi|1172449 [23..197] Purine and uridine phosphorylases 24 2.2 gi|999649 [1..162] Methane monooxygenase hydrolase, gamma su... 23 3.8 gi|2463096 [30..336] Trypsin-like serine proteases 23 4.6 gi|1707833 [43..332] Isoprenyl diphosphate synthases 23 5.5 gi|729418 [1..212] DNA-glycosylase 23 6.8 gi|2144317 [224..416] Lactate & malate dehydrogenases, C-ter... 22 9.2 >gi|118183 [37..144] Cystatin-like Length = 108 Score = 146 bits (365), Expect = 4e-37 Identities = 65/108 (60%), Positives = 81/108 (74%) Query: 32 GGIYDADLNDEWVQRALHFAISEYNKATEDEYYRRPLQVLRAREQTFGGVNYFFDVEVGR 91 GG DA + +E V+RAL FA+ EYNKA+ D Y+ R LQV+RAR+Q GVNYF DVE+GR Sbjct: 1 GGPMDASVEEEGVRRALDFAVGEYNKASNDMYHSRALQVVRARKQIVAGVNYFLDVELGR 60 Query: 92 TICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRMSLVNSRCQ 139 T CTK+QPNLD C FH+QP L++K CSF+IY VPW+ M+L S CQ Sbjct: 61 TTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTMTLSKSTCQ 108 Score = 103 bits (254), Expect = 4e-24 Identities = 43/68 (63%), Positives = 52/68 (76%) Query: 142 RAREQTFGGVNYFFDVEVGRTICTKSQPNLDTCAFHEQPELQKKQLCSFEIYEVPWEDRM 201 RAR+Q GVNYF DVE+GRT CTK+QPNLD C FH+QP L++K CSF+IY VPW+ M Sbjct: 41 RARKQIVAGVNYFLDVELGRTTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTM 100 Query: 202 SLVNSRCQ 209 +L S CQ Sbjct: 101 TLSKSTCQ 108 >gi|1172449 [23..197] Purine and uridine phosphorylases Length = 175 Score = 24.0 bits (52), Expect = 2.2 Identities = 3/66 (4%), Positives = 14/66 (20%), Gaps = 10/66 (15%) Query: 28 RIIPGGIYDAD----LNDEWVQRALH------FAISEYNKATEDEYYRRPLQVLRAREQT 77 G + + + A ++ + D + ++ + + Sbjct: 84 GYEKGQLPANPAAFLSDKKLADLAQEIAEKQGQSVKRGLICSGDSFINSEDKIAQIKADF 143 Query: 78 FGGVNY 83 Sbjct: 144 PNVTGV 149 >gi|999649 [1..162] Methane monooxygenase hydrolase, gamma subunit Length = 162 Score = 23.2 bits (49), Expect = 3.8 Identities = 10/28 (35%), Positives = 14/28 (49%), Gaps = 2/28 (7%) Query: 33 GIYDADLNDEWVQRALHFAISEYNKATE 60 GI+ D D WV + H ++ KA E Sbjct: 2 GIHSNDTRDAWVNKIAH--VNTLEKAAE 27 >gi|2463096 [30..336] Trypsin-like serine proteases Length = 307 Score = 22.8 bits (48), Expect = 4.6 Identities = 4/36 (11%), Positives = 6/36 (16%), Gaps = 2/36 (5%) Query: 6 CTLLLLMATLAGALASSSKEENRIIPGGIYDADLND 41 C + A + G D Sbjct: 1 CGISDFEPEFPIAPKVEIGTNHIFK-GK-TIVDDKS 34 >gi|1707833 [43..332] Isoprenyl diphosphate synthases Length = 290 Score = 22.5 bits (48), Expect = 5.5 Identities = 12/71 (16%), Positives = 31/71 (42%), Gaps = 8/71 (11%) Query: 6 CTLLLLMATLAGALASSSKEENRIIPGGIYDADLNDEWVQRALHFAIS---EYNKATEDE 62 T+L++ L+ ++++E +I+ + + + E ++RA +Y + Sbjct: 201 KTILVIK-----TLSEATEDEKKILVSTLGNKEAKKEDLERASEIIRKHSLQYAYDLAKK 255 Query: 63 YYRRPLQVLRA 73 Y ++ LR Sbjct: 256 YSDLAIENLRE 266 >gi|729418 [1..212] DNA-glycosylase Length = 212 Score = 22.5 bits (48), Expect = 6.8 Identities = 5/18 (27%), Positives = 6/18 (32%) Query: 87 VEVGRTICTKSQPNLDTC 104 + GR C P C Sbjct: 182 IFFGRYHCKAQSPRCAEC 199 Score = 22.5 bits (48), Expect = 6.8 Identities = 5/18 (27%), Positives = 6/18 (32%) Query: 157 VEVGRTICTKSQPNLDTC 174 + GR C P C Sbjct: 182 IFFGRYHCKAQSPRCAEC 199 >gi|2144317 [224..416] Lactate & malate dehydrogenases, C-terminal domain Length = 193 Score = 22.0 bits (47), Expect = 9.2 Identities = 13/68 (19%), Positives = 17/68 (24%), Gaps = 12/68 (17%) Query: 13 ATL----AGALASSS------KEENRIIPGGIYDADLNDEWVQRALHFAISEYNKATEDE 62 ATL AG +I Y L D A + Sbjct: 85 ATLSMAHAGYKCVVQFVSLLLGNIEQIHG--TYYVPLKDANNFPIAPGADQLLPLVDGAD 142 Query: 63 YYRRPLQV 70 Y+ PL + Sbjct: 143 YFAIPLTI 150 Underlying Matrix: BLOSUM62 Number of sequences tested against query: 1187 Number of sequences better than 10.0: 7 Number of calls to ALIGN: 9 Length of query: 210 Total length of test sequences: 256703 Effective length of test sequences: 210706.0 Effective search space size: 36199790.6 Initial X dropoff for ALIGN: 25.0 bits