analysis of sequence from tem38 ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ >tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRICVCDNGKVLC DDVICDETKNCPGAEVPEGECCPVCPDGSESPTDQETTGVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGL PGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGGISVPGPMGPSGPRGLPGPPGAPGPQGFQGPPGEPGE PGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDA GPAGPKGEPGSPGENGAPGQMGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGPPGFPGAVG AKGEAGPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGANGAPGIAGAPGFPGARGPSGP QGPGGPPGPKGNSGEPGAPGSKGDTGAKGEPGPVGVQGPPGPAGEEGKRGARGEPGPTGLPGPPGERGGP GSRGFPGADGVAGPKGPAGERGSPGPAGPKGSPGEAGRPGEAGLPGAKGLTGSPGSPGPDGKTGPPGPAG QDGRPGPPGPPGARGQAGVMGFPGPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEAGAQGPPGPAGPAGE RGEQGPAGSPGFQGLPGPAGPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGAN GAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGSPGKDGVRGLTGPIG PPGPAGAPGDKGESGPSGPAGPTGARGAPGDRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGP PGPAGPAGPPGPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKEGGKGPR GETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGAPGTPGPQGIAGQRGVVGLPGQRGERGFPGLPGPSG EPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPAAEGSPGRDGSPGAKGDRGETGPAGPPGAPGA PGAPGPVGPAGKSGDRGETGPAGPAGPVGPVGARGPAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQGPP GPPGSPGEQGPSGASGPAGPRGPPGSAGAPGKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGPPGPPG PPSAGFDFSFLPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEGSRKNPARTCR DLKMCHSDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCVYPTQPSVAQKNWYISKNPKDKRHVWFGESM TDGFQFEYGGQGSDPADVAIQLTFLRLMSTEASQNITYHCKNSVAYMDQQTGNLKKALLLKGSNEIEIRA EGNSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPIIDVAPLDVGAPDQEFGFDVGPVCFL ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ sec.str. with predator > tem38_gi|1418928|emb|CAA98968.1| . . . . . 1 MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDR 50 ___HHHHHHHHHHHHHHHHH______________________________ . . . . . 51 DVWKPEPCRICVCDNGKVLCDDVICDETKNCPGAEVPEGECCPVCPDGSE 100 ________EEEEE_____EEE_EEE_________________________ . . . . . 101 SPTDQETTGVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGLPGPPGPPGPP 150 __________________________________________________ . . . . . 151 GPPGLGGNFAPQLSYGYDEKSTGGISVPGPMGPSGPRGLPGPPGAPGPQG 200 __________________________________________________ . . . . . 201 FQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 250 __________________________________________________ . . . . . 251 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQ 300 __________________________________________________ . . . . . 301 MGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGPPGFPGAVG 350 __________________________________________________ . . . . . 351 AKGEAGPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGAN 400 __________________________________________________ . . . . . 401 GAPGIAGAPGFPGARGPSGPQGPGGPPGPKGNSGEPGAPGSKGDTGAKGE 450 __________________________________________________ . . . . . 451 PGPVGVQGPPGPAGEEGKRGARGEPGPTGLPGPPGERGGPGSRGFPGADG 500 __________________________________________________ . . . . . 501 VAGPKGPAGERGSPGPAGPKGSPGEAGRPGEAGLPGAKGLTGSPGSPGPD 550 __________________________________________________ . . . . . 551 GKTGPPGPAGQDGRPGPPGPPGARGQAGVMGFPGPKGAAGEPGKAGERGV 600 __________________________________________________ . . . . . 601 PGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGFQGLPGPAG 650 __________________________________________________ . . . . . 651 PPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGAN 700 __________________________________________________ . . . . . 701 GAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGP 750 __________________________________________________ . . . . . 751 KGADGSPGKDGVRGLTGPIGPPGPAGAPGDKGESGPSGPAGPTGARGAPG 800 __________________________________________________ . . . . . 801 DRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPP 850 __________________________________________________ . . . . . 851 GPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGP 900 __________________________________________________ . . . . . 901 AGKEGGKGPRGETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGAPGTPG 950 __________________________________________________ . . . . . 951 PQGIAGQRGVVGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPM 1000 ________EEEE______________________________________ . . . . . 1001 GPPGLAGPPGESGREGAPAAEGSPGRDGSPGAKGDRGETGPAGPPGAPGA 1050 __________________________________________________ . . . . . 1051 PGAPGPVGPAGKSGDRGETGPAGPAGPVGPVGARGPAGPQGPRGDKGETG 1100 __________________________________________________ . . . . . 1101 EQGDRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGAP 1150 __________________________________________________ . . . . . 1151 GKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGPPGPPGPPSAGFDFSF 1200 __________________________________________________ . . . . . 1201 LPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEG 1250 ______________EEE_____EEE____HHHHHHHHHHHHHHHH_____ . . . . . 1251 SRKNPARTCRDLKMCHSDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCV 1300 _______HHHHHHH_________EEE_________EEEEEEE_____EEE . . . . . 1301 YPTQPSVAQKNWYISKNPKDKRHVWFGESMTDGFQFEYGGQGSDPADVAI 1350 E_______EEEEEE_______EEEEE________EEE_________HHHH . . . . . 1351 QLTFLRLMSTEASQNITYHCKNSVAYMDQQTGNLKKALLLKGSNEIEIRA 1400 HHHHHHHHHHHHHHEEEEEE_____________HHHHHHH____EEEEEE . . . . . 1401 EGNSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPIIDVAPLDVGAP 1450 _____EEEEEEE____________EEEEEE_______EEEEEE_______ . 1451 DQEFGFDVGPVCFL 1464 ______________ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ method : 1 alpha-contents : 0.0 % beta-contents : 0.0 % coil-contents : 100.0 % class : irregular method : 2 alpha-contents : 0.0 % beta-contents : 0.0 % coil-contents : 100.0 % class : irregular ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ GPI: learning from metazoa -16.14 -1.94 -1.37 -2.18 0.00 0.00 0.00 0.00 -0.48 -2.13 -1.80 -12.00 -12.00 0.00 0.00 0.00 -50.05 -18.84 -0.22 -0.33 0.00 0.00 0.00 0.00 0.00 -0.85 -2.10 -1.80 -12.00 -12.00 0.00 0.00 0.00 -48.15 ID: tem38_gi|1418928|emb|CAA98968.1| AC: xxx Len: 1400 1:I 1373 Sc: -48.15 Pv: 2.046476e-01 NO_GPI_SITE GPI: learning from protozoa -26.23 -2.20 -1.13 -0.72 -4.00 0.00 0.00 0.00 -0.08 -2.00 -7.07 -12.00 -12.00 0.00 0.00 0.00 -67.42 -24.64 -1.30 -1.78 -0.22 -4.00 0.00 0.00 0.00 -0.04 -2.20 -7.07 -12.00 -12.00 0.00 0.00 0.00 -65.26 ID: tem38_gi|1418928|emb|CAA98968.1| AC: xxx Len: 1400 1:I 1371 Sc: -65.26 Pv: 2.831094e-01 NO_GPI_SITE ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ # SignalP euk predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? tem38_gi|14 0.931 23 Y 0.884 23 Y 0.990 10 Y 0.921 Y # SignalP gram- predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? tem38_gi|14 0.574 589 Y 0.485 23 Y 0.995 9 Y 0.789 Y # SignalP gram+ predictions # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? tem38_gi|14 0.683 382 Y 0.334 1366 N 0.998 10 Y 0.083 N ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ low complexity regions: SEG 12 2.2 2.5 >tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] 1-6 MFSFVD lrlllllaatallt 7-20 21-21 H gqeegqvegq 22-31 32-111 DEDIPPITCVQNGLRYHDRDVWKPEPCRIC VCDNGKVLCDDVICDETKNCPGAEVPEGEC CPVCPDGSESPTDQETTGVE gpkgdtgprgprgpagppgrdgipgqpglp 112-157 gppgppgppgppglgg 158-177 NFAPQLSYGYDEKSTGGISV pgpmgpsgprglpgppgapgpqgfqgppge 178-230 pgepgasgpmgprgppgppgkng 231-232 DD geagkpgrpgergppgpqgarglpg 233-257 258-271 TAGLPGMKGHRGFS gldgakgdagpagpkgepgspgeng 272-296 297-301 APGQM gprglpgergrpgapgpagargndgatgaa 302-353 gppgptgpagppgfpgavgakg 354-364 EAGPQGPRGSE gpqgvrgepgppgpagaagpagnpgadgqp 365-437 gakgangapgiagapgfpgargpsgpqgpg gppgpkgnsgepg 438-447 APGSKGDTGA kgepgpvgvqgppgpageegkrgargepgp 448-497 tglpgppgerggpgsrgfpg 498-511 ADGVAGPKGPAGER gspgpagpkgspgeag 512-527 528-537 RPGEAGLPGA kgltgspgspgpdgktgppgpagqdgrpgp 538-578 pgppgargqag 579-582 VMGF pgpkgaagepgkage 583-597 598-598 R gvpgppgavgpag 599-611 612-613 KD geagaqgppgpagpagergeqgpag 614-638 639-639 S pgfqglpgpagppgeagkpgeqg 640-662 663-685 VPGDLGAPGPSGARGERGFPGER gvqgppgpagprgangapgndgakgdagap 686-725 gapgsqgapg 726-766 LQGMPGERGAAGLPGPKGDRGDAGPKGADG SPGKDGVRGLT gpigppgpagapg 767-779 780-781 DK gesgpsgpagptgargapgdrgepgppgpa 782-861 gfagppgadgqpgakgepgdagakgdagpp gpagpagppgpignvgapga 862-865 KGAR gsagppgatgfpgaagrvgppgpsgnagpp 866-956 gppgpagkeggkgprgetgpagrpgevgpp gppgpagekgspgadgpagapgtpgpqgia g 957-991 QRGVVGLPGQRGERGFPGLPGPSGEPGKQG PSGAS gergppgpmgppglagppg 992-1010 1011-1039 ESGREGAPAAEGSPGRDGSPGAKGDRGET gpagppgapgapgapgpvgpag 1040-1061 1062-1069 KSGDRGET gpagpagpvgpvgargpagpqgprg 1070-1094 1095-1117 DKGETGEQGDRGIKGHRGFSGLQ gppgppgspgeqgpsgasgpagprgppgsa 1118-1151 gapg 1152-1152 K dglnglpgpigppgprgrtgdagpvgppgp 1153-1192 pgppgppgpp 1193-1216 SAGFDFSFLPQPPQEKAHDGGRYY raddanvvrdrd 1217-1228 1229-1464 LEVDTTLKSLSQQIENIRSPEGSRKNPART CRDLKMCHSDWKSGEYWIDPNQGCNLDAIK VFCNMETGETCVYPTQPSVAQKNWYISKNP KDKRHVWFGESMTDGFQFEYGGQGSDPADV AIQLTFLRLMSTEASQNITYHCKNSVAYMD QQTGNLKKALLLKGSNEIEIRAEGNSRFTY SVTVDGCTSHTGAWGKTVIEYKTTKSSRLP IIDVAPLDVGAPDQEFGFDVGPVCFL low complexity regions: SEG 25 3.0 3.3 >tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] 1-6 MFSFVD lrlllllaatallt 7-20 21-81 HGQEEGQVEGQDEDIPPITCVQNGLRYHDR DVWKPEPCRICVCDNGKVLCDDVICDETKN C pgaevpegeccpvcpdgsesptdqettgve 82-166 gpkgdtgprgprgpagppgrdgipgqpglp gppgppgppgppglggnfapqlsyg 167-172 YDEKST ggisvpgpmgpsgprglpgppgapgpqgfq 173-1195 gppgepgepgasgpmgprgppgppgkngdd geagkpgrpgergppgpqgarglpgtaglp gmkghrgfsgldgakgdagpagpkgepgsp gengapgqmgprglpgergrpgapgpagar gndgatgaagppgptgpagppgfpgavgak geagpqgprgsegpqgvrgepgppgpagaa gpagnpgadgqpgakgangapgiagapgfp gargpsgpqgpggppgpkgnsgepgapgsk gdtgakgepgpvgvqgppgpageegkrgar gepgptglpgppgerggpgsrgfpgadgva gpkgpagergspgpagpkgspgeagrpgea glpgakgltgspgspgpdgktgppgpagqd grpgppgppgargqagvmgfpgpkgaagep gkagergvpgppgavgpagkdgeagaqgpp gpagpagergeqgpagspgfqglpgpagpp geagkpgeqgvpgdlgapgpsgargergfp gergvqgppgpagprgangapgndgakgda gapgapgsqgapglqgmpgergaaglpgpk gdrgdagpkgadgspgkdgvrgltgpigpp gpagapgdkgesgpsgpagptgargapgdr gepgppgpagfagppgadgqpgakgepgda gakgdagppgpagpagppgpignvgapgak gargsagppgatgfpgaagrvgppgpsgna gppgppgpagkeggkgprgetgpagrpgev gppgppgpagekgspgadgpagapgtpgpq giagqrgvvglpgqrgergfpglpgpsgep gkqgpsgasgergppgpmgppglagppges gregapaaegspgrdgspgakgdrgetgpa gppgapgapgapgpvgpagksgdrgetgpa gpagpvgpvgargpagpqgprgdkgetgeq gdrgikghrgfsglqgppgppgspgeqgps gasgpagprgppgsagapgkdglnglpgpi gppgprgrtgdagpvgppgppgppgppgpp sag 1196-1464 FDFSFLPQPPQEKAHDGGRYYRADDANVVR DRDLEVDTTLKSLSQQIENIRSPEGSRKNP ARTCRDLKMCHSDWKSGEYWIDPNQGCNLD AIKVFCNMETGETCVYPTQPSVAQKNWYIS KNPKDKRHVWFGESMTDGFQFEYGGQGSDP ADVAIQLTFLRLMSTEASQNITYHCKNSVA YMDQQTGNLKKALLLKGSNEIEIRAEGNSR FTYSVTVDGCTSHTGAWGKTVIEYKTTKSS RLPIIDVAPLDVGAPDQEFGFDVGPVCFL low complexity regions: SEG 45 3.4 3.75 >tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] 1-95 MFSFVDLRLLLLLAATALLTHGQEEGQVEG QDEDIPPITCVQNGLRYHDRDVWKPEPCRI CVCDNGKVLCDDVICDETKNCPGAEVPEGE CCPVC pdgsesptdqettgvegpkgdtgprgprgp 96-1195 agppgrdgipgqpglpgppgppgppgppgl ggnfapqlsygydekstggisvpgpmgpsg prglpgppgapgpqgfqgppgepgepgasg pmgprgppgppgkngddgeagkpgrpgerg ppgpqgarglpgtaglpgmkghrgfsgldg akgdagpagpkgepgspgengapgqmgprg lpgergrpgapgpagargndgatgaagppg ptgpagppgfpgavgakgeagpqgprgseg pqgvrgepgppgpagaagpagnpgadgqpg akgangapgiagapgfpgargpsgpqgpgg ppgpkgnsgepgapgskgdtgakgepgpvg vqgppgpageegkrgargepgptglpgppg erggpgsrgfpgadgvagpkgpagergspg pagpkgspgeagrpgeaglpgakgltgspg spgpdgktgppgpagqdgrpgppgppgarg qagvmgfpgpkgaagepgkagergvpgppg avgpagkdgeagaqgppgpagpagergeqg pagspgfqglpgpagppgeagkpgeqgvpg dlgapgpsgargergfpgergvqgppgpag prgangapgndgakgdagapgapgsqgapg lqgmpgergaaglpgpkgdrgdagpkgadg spgkdgvrgltgpigppgpagapgdkgesg psgpagptgargapgdrgepgppgpagfag ppgadgqpgakgepgdagakgdagppgpag pagppgpignvgapgakgargsagppgatg fpgaagrvgppgpsgnagppgppgpagkeg gkgprgetgpagrpgevgppgppgpagekg spgadgpagapgtpgpqgiagqrgvvglpg qrgergfpglpgpsgepgkqgpsgasgerg ppgpmgppglagppgesgregapaaegspg rdgspgakgdrgetgpagppgapgapgapg pvgpagksgdrgetgpagpagpvgpvgarg pagpqgprgdkgetgeqgdrgikghrgfsg lqgppgppgspgeqgpsgasgpagprgppg sagapgkdglnglpgpigppgprgrtgdag pvgppgppgppgppgppsag 1196-1464 FDFSFLPQPPQEKAHDGGRYYRADDANVVR DRDLEVDTTLKSLSQQIENIRSPEGSRKNP ARTCRDLKMCHSDWKSGEYWIDPNQGCNLD AIKVFCNMETGETCVYPTQPSVAQKNWYIS KNPKDKRHVWFGESMTDGFQFEYGGQGSDP ADVAIQLTFLRLMSTEASQNITYHCKNSVA YMDQQTGNLKKALLLKGSNEIEIRAEGNSR FTYSVTVDGCTSHTGAWGKTVIEYKTTKSS RLPIIDVAPLDVGAPDQEFGFDVGPVCFL low complexity regions: XNU # Score cutoff = 21, Search from offsets 1 to 4 # both members of each repeat flagged # lambda = 0.347, K = 0.200, H = 0.664 >tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] MFSFVDLRlllllaatallTHgqeegqvegqdeDIPPITCVQNGLRYHDRDVWKPEPCRI CVCDNGKVLCDDVICDETKNCPGAEVPEGECcpvcpdgsesptdqettgvegpkgdtgpr gprgpagppgrdgipgqpglpgppgppgppgppgLGGNFAPQLSYGYDEKSTGGISVPgp mgpsgprglpgppgapgpqgfqgppgepgepgasgpmgprgppgppgkngddgeagkpgr pgergppgpqgarglpgtaglpgmkghrgfsgldgakgdagpagpkgepgspgengapgq mgprglpgergrpgapgpagargndgatgaagppgptgpagppgfpgavgakgeagpqgp rgsegpqgvrgepgppgpagaagpagnpgadgqpgakgangapgiagapgfpgargpsgp qgpggppgpkgnsgepgapgskgdtgakgepgpvgvqgppgpageegkrgargepgptgl pgppgerggpgsrgfpgadgvagpkgpagergspgpagpkgspgeagrpgeaglpgakgl tgspgspgpdgktgppgpagqdgrpgppgppgargqagvmgfpgpkgaagepgkagergv pgppgavgpagkdgeagaqgppgpagpagergeqgpagspgfqglpgpagppgeagkpge qgvpgdlgapgpsgargergfpgergvqgppgpagprgangapgndgakgdagapgapgs qgapglqgmpgergaaglpgpkgdrgdagpkgadgspgkdgvrgltgpigppgpagapgd kgesgpsgpagptgargapgdrgepgppgpagfagppgadgqpgakgepgdagakgdagp pgpagpagppgpignvgapgakgargsagppgatgfpgaagrvgppgpsgnagppgppgp agkeggkgprgetgpagrpgevgppgppgpagekgspgadgpagapgtpgpqgiagqrgv vglpgqrgergfpglpgpsgepgkqgpsgasgergppgpmgppglagppgesgregapaa egspgrdgspgakgdrgetgpagppgapgapgapgpvgpagksgdrgetgpagpagpvgp vgargpagpqgprgdkgetgeqgdrgikghrgfsglqgppgppgspgeqgpsgasgpagp rgppgsagapgkdglnglpgpigppgprgrtgdagpvgppgppgppgppgppsagfdfsf LPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEGSRKNPARTCR DLKMCHSDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCVYPTQPSVAQKNWYISKNPKD KRHVWFGESMTDGFQFEYGGQGSDPADVAIQLTFLRLMSTEASQNITYHCKNSVAYMDQQ TGNLKKALLLKGSNEIEIRAEGNSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPII DVAPLDVGAPDQEFGFDVGPVCFL 1 - 8 MFSFVDLR 9 - 19 ll lllaatall 20 - 21 T H 22 - 33 gqeegqveg qde 34 - 91 DIPPITC VQNGLRYHDR DVWKPEPCRI CVCDNGKVLC DDVICDETKN CPGAEVPEGE C 92 - 154 cpvcpdgse sptdqettgv egpkgdtgpr gprgpagppg rdgipgqpgl pgppgppgpp g ppg 155 - 178 LGGNFA PQLSYGYDEK STGGISVP 179 - 1200 gp mgpsgprglp gppgapgpqg fqgppgepge pgasgpmgpr gppgppgkng ddgeagkp gr pgergppgpq garglpgtag lpgmkghrgf sgldgakgda gpagpkgepg spgengap gq mgprglpger grpgapgpag argndgatga agppgptgpa gppgfpgavg akgeagpq gp rgsegpqgvr gepgppgpag aagpagnpga dgqpgakgan gapgiagapg fpgargps gp qgpggppgpk gnsgepgapg skgdtgakge pgpvgvqgpp gpageegkrg argepgpt gl pgppgerggp gsrgfpgadg vagpkgpage rgspgpagpk gspgeagrpg eaglpgak gl tgspgspgpd gktgppgpag qdgrpgppgp pgargqagvm gfpgpkgaag epgkager gv pgppgavgpa gkdgeagaqg ppgpagpage rgeqgpagsp gfqglpgpag ppgeagkp ge qgvpgdlgap gpsgargerg fpgergvqgp pgpagprgan gapgndgakg dagapgap gs qgapglqgmp gergaaglpg pkgdrgdagp kgadgspgkd gvrgltgpig ppgpagap gd kgesgpsgpa gptgargapg drgepgppgp agfagppgad gqpgakgepg dagakgda gp pgpagpagpp gpignvgapg akgargsagp pgatgfpgaa grvgppgpsg nagppgpp gp agkeggkgpr getgpagrpg evgppgppgp agekgspgad gpagapgtpg pqgiagqr gv vglpgqrger gfpglpgpsg epgkqgpsga sgergppgpm gppglagppg esgregap aa egspgrdgsp gakgdrgetg pagppgapga pgapgpvgpa gksgdrgetg pagpagpv gp vgargpagpq gprgdkgetg eqgdrgikgh rgfsglqgpp gppgspgeqg psgasgpa gp rgppgsagap gkdglnglpg pigppgprgr tgdagpvgpp gppgppgppg ppsagfdf sf 1201 - 1464 LPQPPQEKAH DGGRYYRADD ANVVRDRDLE VDTTLKSLSQ QIENIRSPEG SRKNPARTCR DLKMCHSDWK SGEYWIDPNQ GCNLDAIKVF CNMETGETCV YPTQPSVAQK NWYISKNPKD KRHVWFGESM TDGFQFEYGG QGSDPADVAI QLTFLRLMST EASQNITYHC KNSVAYMDQQ TGNLKKALLL KGSNEIEIRA EGNSRFTYSV TVDGCTSHTG AWGKTVIEYK TTKSSRLPII DVAPLDVGAP DQEFGFDVGP VCFL low complexity regions: DUST >tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRI CVCDNGKVLCDDVICDETKNCPGAEVPEGECCPVCPDGSESPTDQETTGVEGPKGDTGPR GPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGGISVPGP MGPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGR PGERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQ MGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGPPGFPGAVGAKGEAGPQGP RGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGANGAPGIAGAPGFPGARGPSGP QGPGGPPGPKGNSGEPGAPGSKGDTGAKGEPGPVGVQGPPGPAGEEGKRGARGEPGPTGL PGPPGERGGPGSRGFPGADGVAGPKGPAGERGSPGPAGPKGSPGEAGRPGEAGLPGAKGL TGSPGSPGPDGKTGPPGPAGQDGRPGPPGPPGARGQAGVMGFPGPKGAAGEPGKAGERGV PGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGFQGLPGPAGPPGEAGKPGE QGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGS QGAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGSPGKDGVRGLTGPIGPPGPAGAPGD KGESGPSGPAGPTGARGAPGDRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGP PGPAGPAGPPGPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGP AGKEGGKGPRGETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGAPGTPGPQGIAGQRGV VGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPAA EGSPGRDGSPGAKGDRGETGPAGPPGAPGAPGAPGPVGPAGKSGDRGETGPAGPAGPVGP VGARGPAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGP RGPPGSAGAPGKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGPPGPPGPPSAGFDFSF LPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEGSRKNPARTCR DLKMCHSDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCVYPTQPSVAQKNWYISKNPKD KRHVWFGESMTDGFQFEYGGQGSDPADVAIQLTFLRLMSTEASQNITYHCKNSVAYMDQQ TGNLKKALLLKGSNEIEIRAEGNSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPII DVAPLDVGAPDQEFGFDVGPVCFL ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ coiled coil prediction for tem38_gi|1418928|emb|CAA98968.1| sequence: 1400 amino acids, 0 residue(s) in coiled coil state . | . | . | . | . | . 60 MFSFVDLRLL LLLAATALLT HGQEEGQVEG QDEDIPPITC VQNGLRYHDR DVWKPEPCRI ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 120 CVCDNGKVLC DDVICDETKN CPGAEVPEGE CCPVCPDGSE SPTDQETTGV EGPKGDTGPR ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 180 GPRGPAGPPG RDGIPGQPGL PGPPGPPGPP GPPGLGGNFA PQLSYGYDEK STGGISVPGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 240 MGPSGPRGLP GPPGAPGPQG FQGPPGEPGE PGASGPMGPR GPPGPPGKNG DDGEAGKPGR ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 300 PGERGPPGPQ GARGLPGTAG LPGMKGHRGF SGLDGAKGDA GPAGPKGEPG SPGENGAPGQ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 360 MGPRGLPGER GRPGAPGPAG ARGNDGATGA AGPPGPTGPA GPPGFPGAVG AKGEAGPQGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 420 RGSEGPQGVR GEPGPPGPAG AAGPAGNPGA DGQPGAKGAN GAPGIAGAPG FPGARGPSGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 480 QGPGGPPGPK GNSGEPGAPG SKGDTGAKGE PGPVGVQGPP GPAGEEGKRG ARGEPGPTGL ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 540 PGPPGERGGP GSRGFPGADG VAGPKGPAGE RGSPGPAGPK GSPGEAGRPG EAGLPGAKGL ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 600 TGSPGSPGPD GKTGPPGPAG QDGRPGPPGP PGARGQAGVM GFPGPKGAAG EPGKAGERGV ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 660 PGPPGAVGPA GKDGEAGAQG PPGPAGPAGE RGEQGPAGSP GFQGLPGPAG PPGEAGKPGE ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 720 QGVPGDLGAP GPSGARGERG FPGERGVQGP PGPAGPRGAN GAPGNDGAKG DAGAPGAPGS ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 780 QGAPGLQGMP GERGAAGLPG PKGDRGDAGP KGADGSPGKD GVRGLTGPIG PPGPAGAPGD ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 840 KGESGPSGPA GPTGARGAPG DRGEPGPPGP AGFAGPPGAD GQPGAKGEPG DAGAKGDAGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 900 PGPAGPAGPP GPIGNVGAPG AKGARGSAGP PGATGFPGAA GRVGPPGPSG NAGPPGPPGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 960 AGKEGGKGPR GETGPAGRPG EVGPPGPPGP AGEKGSPGAD GPAGAPGTPG PQGIAGQRGV ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1020 VGLPGQRGER GFPGLPGPSG EPGKQGPSGA SGERGPPGPM GPPGLAGPPG ESGREGAPAA ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1080 EGSPGRDGSP GAKGDRGETG PAGPPGAPGA PGAPGPVGPA GKSGDRGETG PAGPAGPVGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1140 VGARGPAGPQ GPRGDKGETG EQGDRGIKGH RGFSGLQGPP GPPGSPGEQG PSGASGPAGP ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1200 RGPPGSAGAP GKDGLNGLPG PIGPPGPRGR TGDAGPVGPP GPPGPPGPPG PPSAGFDFSF ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1260 LPQPPQEKAH DGGRYYRADD ANVVRDRDLE VDTTLKSLSQ QIENIRSPEG SRKNPARTCR ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~4 4467777777 7777777~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1320 DLKMCHSDWK SGEYWIDPNQ GCNLDAIKVF CNMETGETCV YPTQPSVAQK NWYISKNPKD ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | . | . | . | . 1380 KRHVWFGESM TDGFQFEYGG QGSDPADVAI QLTFLRLMST EASQNITYHC KNSVAYMDQQ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 -w border ---------- ---------- ---------- ---------- ---------- ---------- * 21 M'95 -w register ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 M'95 +w polar ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 21 MTK -w class ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 28 M'95 -w signif. ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ * 14 M'95 -w local . | . | TGNLKKALLL KGSNEIEIRA ~~~~~~~~~~ ~~~~~~~~~~ ---------- ---------- ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ prediction of transmembrane regions with toppred2 *********************************** *TOPPREDM with eukaryotic function* *********************************** tem38.___inter___ is a single sequence Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok Using sequence file: tem38.___inter___ (1 sequences) MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDR DVWKPEPCRICVCDNGKVLCDDVICDETKNCPGAEVPEGECCPVCPDGSE SPTDQETTGVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGLPGPPGPPGPP GPPGLGGNFAPQLSYGYDEKSTGGISVPGPMGPSGPRGLPGPPGAPGPQG FQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQ MGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGPPGFPGAVG AKGEAGPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGAN GAPGIAGAPGFPGARGPSGPQGPGGPPGPKGNSGEPGAPGSKGDTGAKGE PGPVGVQGPPGPAGEEGKRGARGEPGPTGLPGPPGERGGPGSRGFPGADG VAGPKGPAGERGSPGPAGPKGSPGEAGRPGEAGLPGAKGLTGSPGSPGPD GKTGPPGPAGQDGRPGPPGPPGARGQAGVMGFPGPKGAAGEPGKAGERGV PGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGFQGLPGPAG PPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGAN GAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGP KGADGSPGKDGVRGLTGPIGPPGPAGAPGDKGESGPSGPAGPTGARGAPG DRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPP GPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGP AGKEGGKGPRGETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGAPGTPG PQGIAGQRGVVGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPM GPPGLAGPPGESGREGAPAAEGSPGRDGSPGAKGDRGETGPAGPPGAPGA PGAPGPVGPAGKSGDRGETGPAGPAGPVGPVGARGPAGPQGPRGDKGETG EQGDRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGAP GKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGPPGPPGPPSAGFDFSF LPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEG SRKNPARTCRDLKMCHSDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCV YPTQPSVAQKNWYISKNPKDKRHVWFGESMTDGFQFEYGGQGSDPADVAI QLTFLRLMSTEASQNITYHCKNSVAYMDQQTGNLKKALLLKGSNEIEIRA EGNSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPIIDVAPLDVGAP DQEFGFDVGPVCFL (p)rokaryotic or (e)ukaryotic: e Charge-pair energy: 0 Length of full window (odd number!): 21 Length of core window (odd number!): 11 Number of residues to add to each end of helix: 1 Critical length: 60 Upper cutoff for candidates: 1 Lower cutoff for candidates: 0.6 Total of 8 structures are to be tested Candidate membrane-spanning segments: Helix Begin End Score Certainity 1 2 22 0.700 Putative 2 331 351 0.844 Putative 3 1041 1061 0.758 Putative ---------------------------------------------------------------------- Structure 1 Transmembrane segments included in this structure: Segment 1 2 3 Loop length 1 308 689 403 K+R profile 1.00 + + + CYT-EXT prof - 0.61 0.33 0.81 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 1.00 Tm probability: 0.06 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 5399089840598723119226988666434663743532463987515388551976903392781192631681024.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: -0.54 -> Orientation: N-in ---------------------------------------------------------------------- Structure 2 Transmembrane segments included in this structure: Segment 1 3 Loop length 1 1018 403 K+R profile 1.00 + + CYT-EXT prof - 0.81 0.56 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 1.00 Tm probability: 0.10 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 5399089840598723119226988666434663743532463987515388551976903392781192631681024.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: 0.26 -> Orientation: N-out ---------------------------------------------------------------------- Structure 3 Transmembrane segments included in this structure: Segment 1 2 Loop length 1 308 1113 K+R profile 1.00 + + CYT-EXT prof - 0.67 0.33 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 1.00 Tm probability: 0.15 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 339144483842856283565095402520707072.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: 0.34 -> Orientation: N-out ---------------------------------------------------------------------- Structure 4 Transmembrane segments included in this structure: Segment 1 Loop length 1 1442 K+R profile 1.00 + CYT-EXT prof - 0.61 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 1.00 Tm probability: 0.25 -> Orientation: N-in Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 0.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: -0.61 -> Orientation: N-in ---------------------------------------------------------------------- Structure 5 Transmembrane segments included in this structure: Segment 2 3 Loop length 330 689 403 K+R profile + + + CYT-EXT prof 0.31 0.81 0.61 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.24 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): 0.1818 NEG: 39.0000 POS: 27.0000 -> Orientation: undecided CYT-EXT difference: 0.51 -> Orientation: N-out ---------------------------------------------------------------------- Structure 6 Transmembrane segments included in this structure: Segment 3 Loop length 1040 403 K+R profile + + CYT-EXT prof 0.55 0.81 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.40 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): 0.0492 NEG: 96.0000 POS: 87.0000 -> Orientation: undecided CYT-EXT difference: -0.26 -> Orientation: N-in ---------------------------------------------------------------------- Structure 7 Transmembrane segments included in this structure: Segment 2 Loop length 330 1113 K+R profile + + CYT-EXT prof 0.31 0.67 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.61 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): 0.1818 NEG: 39.0000 POS: 27.0000 -> Orientation: undecided CYT-EXT difference: -0.37 -> Orientation: N-in ---------------------------------------------------------------------- Structure 8 Transmembrane segments included in this structure: Segment Loop length 1464 K+R profile + CYT-EXT prof 0.61 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 1.00 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): -4.00 (NEG-POS)/(NEG+POS): 0.0444 NEG: 141.0000 POS: 129.0000 -> Orientation: N-out CYT-EXT difference: 0.61 -> Orientation: N-out ---------------------------------------------------------------------- "tem38" 1464 2 22 #f 0.7 331 351 #f 0.84375 1041 1061 #f 0.758333 ************************************ *TOPPREDM with prokaryotic function* ************************************ tem38.___inter___ is a single sequence Using hydrophobicity file: /bio_software/2D/toppredm/lib/Engelman-Steitz.scale Using cyt/ext file: /bio_software/2D/toppredm/lib/Cyt-Ext.prok Using sequence file: tem38.___inter___ (1 sequences) MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDR DVWKPEPCRICVCDNGKVLCDDVICDETKNCPGAEVPEGECCPVCPDGSE SPTDQETTGVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGLPGPPGPPGPP GPPGLGGNFAPQLSYGYDEKSTGGISVPGPMGPSGPRGLPGPPGAPGPQG FQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQ MGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGPPGFPGAVG AKGEAGPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGAN GAPGIAGAPGFPGARGPSGPQGPGGPPGPKGNSGEPGAPGSKGDTGAKGE PGPVGVQGPPGPAGEEGKRGARGEPGPTGLPGPPGERGGPGSRGFPGADG VAGPKGPAGERGSPGPAGPKGSPGEAGRPGEAGLPGAKGLTGSPGSPGPD GKTGPPGPAGQDGRPGPPGPPGARGQAGVMGFPGPKGAAGEPGKAGERGV PGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGFQGLPGPAG PPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGAN GAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGP KGADGSPGKDGVRGLTGPIGPPGPAGAPGDKGESGPSGPAGPTGARGAPG DRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPP GPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGP AGKEGGKGPRGETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGAPGTPG PQGIAGQRGVVGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPM GPPGLAGPPGESGREGAPAAEGSPGRDGSPGAKGDRGETGPAGPPGAPGA PGAPGPVGPAGKSGDRGETGPAGPAGPVGPVGARGPAGPQGPRGDKGETG EQGDRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGAP GKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGPPGPPGPPSAGFDFSF LPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEG SRKNPARTCRDLKMCHSDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCV YPTQPSVAQKNWYISKNPKDKRHVWFGESMTDGFQFEYGGQGSDPADVAI QLTFLRLMSTEASQNITYHCKNSVAYMDQQTGNLKKALLLKGSNEIEIRA EGNSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPIIDVAPLDVGAP DQEFGFDVGPVCFL (p)rokaryotic or (e)ukaryotic: p Charge-pair energy: 0 Length of full window (odd number!): 21 Length of core window (odd number!): 11 Number of residues to add to each end of helix: 1 Critical length: 60 Upper cutoff for candidates: 1 Lower cutoff for candidates: 0.6 Total of 8 structures are to be tested Candidate membrane-spanning segments: Helix Begin End Score Certainity 1 2 22 0.700 Putative 2 331 351 0.844 Putative 3 1041 1061 0.758 Putative ---------------------------------------------------------------------- Structure 1 Transmembrane segments included in this structure: Segment 1 2 3 Loop length 1 308 689 403 K+R profile 0.00 + + + CYT-EXT prof - 0.61 0.33 0.81 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.06 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 5399089840598723119226988666434663743532463987515388551976903392781192631681024.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: -0.54 -> Orientation: N-in ---------------------------------------------------------------------- Structure 2 Transmembrane segments included in this structure: Segment 2 3 Loop length 330 689 403 K+R profile + + + CYT-EXT prof 0.31 0.81 0.61 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.24 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): 0.1818 NEG: 39.0000 POS: 27.0000 -> Orientation: undecided CYT-EXT difference: 0.51 -> Orientation: N-out ---------------------------------------------------------------------- Structure 3 Transmembrane segments included in this structure: Segment 1 3 Loop length 1 1018 403 K+R profile 0.00 + + CYT-EXT prof - 0.81 0.56 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.10 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 5399089840598723119226988666434663743532463987515388551976903392781192631681024.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: 0.26 -> Orientation: N-out ---------------------------------------------------------------------- Structure 4 Transmembrane segments included in this structure: Segment 3 Loop length 1040 403 K+R profile + + CYT-EXT prof 0.55 0.81 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.40 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): 0.0492 NEG: 96.0000 POS: 87.0000 -> Orientation: undecided CYT-EXT difference: -0.26 -> Orientation: N-in ---------------------------------------------------------------------- Structure 5 Transmembrane segments included in this structure: Segment 1 2 Loop length 1 308 1113 K+R profile 0.00 + + CYT-EXT prof - 0.67 0.33 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.15 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 339144483842856283565095402520707072.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: 0.34 -> Orientation: N-out ---------------------------------------------------------------------- Structure 6 Transmembrane segments included in this structure: Segment 2 Loop length 330 1113 K+R profile + + CYT-EXT prof 0.31 0.67 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.61 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 0.00 (NEG-POS)/(NEG+POS): 0.1818 NEG: 39.0000 POS: 27.0000 -> Orientation: undecided CYT-EXT difference: -0.37 -> Orientation: N-in ---------------------------------------------------------------------- Structure 7 Transmembrane segments included in this structure: Segment 1 Loop length 1 1442 K+R profile 0.00 + CYT-EXT prof - 0.61 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 0.25 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): 6.00 (NEG-POS)/(NEG+POS): 0.0000 NEG: 0.0000 POS: 0.0000 -> Orientation: N-in CYT-EXT difference: -0.61 -> Orientation: N-in ---------------------------------------------------------------------- Structure 8 Transmembrane segments included in this structure: Segment Loop length 1464 K+R profile + CYT-EXT prof 0.61 For CYT-EXT profile neg. values indicate cytoplasmic preference. K+R difference: 0.00 Tm probability: 1.00 -> Orientation: undecided Charge-difference over N-terminal Tm (+-15 residues): -4.00 (NEG-POS)/(NEG+POS): 0.0444 NEG: 141.0000 POS: 129.0000 -> Orientation: N-out CYT-EXT difference: 0.61 -> Orientation: N-out ---------------------------------------------------------------------- "tem38" 1464 2 22 #f 0.7 331 351 #f 0.84375 1041 1061 #f 0.758333 ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ NOW EXECUTING: /bio_software/1D/stat/saps/saps-stroh/SAPS.SSPA/saps /people/maria/tem38.___saps___ SAPS. Version of April 11, 1996. Date run: Tue Oct 31 18:34:55 2000 File: /people/maria/tem38.___saps___ ID tem38_gi|1418928|emb|CAA98968.1| DE prepro-alpha1(I) collagen [Homo sapiens] number of residues: 1464; molecular weight: 138.9 kdal 1 MFSFVDLRLL LLLAATALLT HGQEEGQVEG QDEDIPPITC VQNGLRYHDR DVWKPEPCRI 61 CVCDNGKVLC DDVICDETKN CPGAEVPEGE CCPVCPDGSE SPTDQETTGV EGPKGDTGPR 121 GPRGPAGPPG RDGIPGQPGL PGPPGPPGPP GPPGLGGNFA PQLSYGYDEK STGGISVPGP 181 MGPSGPRGLP GPPGAPGPQG FQGPPGEPGE PGASGPMGPR GPPGPPGKNG DDGEAGKPGR 241 PGERGPPGPQ GARGLPGTAG LPGMKGHRGF SGLDGAKGDA GPAGPKGEPG SPGENGAPGQ 301 MGPRGLPGER GRPGAPGPAG ARGNDGATGA AGPPGPTGPA GPPGFPGAVG AKGEAGPQGP 361 RGSEGPQGVR GEPGPPGPAG AAGPAGNPGA DGQPGAKGAN GAPGIAGAPG FPGARGPSGP 421 QGPGGPPGPK GNSGEPGAPG SKGDTGAKGE PGPVGVQGPP GPAGEEGKRG ARGEPGPTGL 481 PGPPGERGGP GSRGFPGADG VAGPKGPAGE RGSPGPAGPK GSPGEAGRPG EAGLPGAKGL 541 TGSPGSPGPD GKTGPPGPAG QDGRPGPPGP PGARGQAGVM GFPGPKGAAG EPGKAGERGV 601 PGPPGAVGPA GKDGEAGAQG PPGPAGPAGE RGEQGPAGSP GFQGLPGPAG PPGEAGKPGE 661 QGVPGDLGAP GPSGARGERG FPGERGVQGP PGPAGPRGAN GAPGNDGAKG DAGAPGAPGS 721 QGAPGLQGMP GERGAAGLPG PKGDRGDAGP KGADGSPGKD GVRGLTGPIG PPGPAGAPGD 781 KGESGPSGPA GPTGARGAPG DRGEPGPPGP AGFAGPPGAD GQPGAKGEPG DAGAKGDAGP 841 PGPAGPAGPP GPIGNVGAPG AKGARGSAGP PGATGFPGAA GRVGPPGPSG NAGPPGPPGP 901 AGKEGGKGPR GETGPAGRPG EVGPPGPPGP AGEKGSPGAD GPAGAPGTPG PQGIAGQRGV 961 VGLPGQRGER GFPGLPGPSG EPGKQGPSGA SGERGPPGPM GPPGLAGPPG ESGREGAPAA 1021 EGSPGRDGSP GAKGDRGETG PAGPPGAPGA PGAPGPVGPA GKSGDRGETG PAGPAGPVGP 1081 VGARGPAGPQ GPRGDKGETG EQGDRGIKGH RGFSGLQGPP GPPGSPGEQG PSGASGPAGP 1141 RGPPGSAGAP GKDGLNGLPG PIGPPGPRGR TGDAGPVGPP GPPGPPGPPG PPSAGFDFSF 1201 LPQPPQEKAH DGGRYYRADD ANVVRDRDLE VDTTLKSLSQ QIENIRSPEG SRKNPARTCR 1261 DLKMCHSDWK SGEYWIDPNQ GCNLDAIKVF CNMETGETCV YPTQPSVAQK NWYISKNPKD 1321 KRHVWFGESM TDGFQFEYGG QGSDPADVAI QLTFLRLMST EASQNITYHC KNSVAYMDQQ 1381 TGNLKKALLL KGSNEIEIRA EGNSRFTYSV TVDGCTSHTG AWGKTVIEYK TTKSSRLPII 1441 DVAPLDVGAP DQEFGFDVGP VCFL -------------------------------------------------------------------------------- COMPOSITIONAL ANALYSIS (extremes relative to: swp23s) A :141( 9.6%); C : 18( 1.2%); D : 66( 4.5%); E : 75( 5.1%); F : 27( 1.8%) G++:390(26.6%); H : 9( 0.6%); I- : 24( 1.6%); K : 58( 4.0%); L--: 48( 3.3%) M : 13( 0.9%); N : 28( 1.9%); P++:278(19.0%); Q : 48( 3.3%); R : 71( 4.8%) S : 61( 4.2%); T- : 43( 2.9%); V- : 47( 3.2%); W : 6( 0.4%); Y- : 13( 0.9%) KR : 129 ( 8.8%); ED : 141 ( 9.6%); AGP ++: 809 ( 55.3%); KRED : 270 ( 18.4%); KR-ED : -12 ( -0.8%); FIKMNY- : 163 ( 11.1%); LVIFM --: 159 ( 10.9%); ST - : 104 ( 7.1%). -------------------------------------------------------------------------------- CHARGE DISTRIBUTIONAL ANALYSIS 1 00000-0+00 0000000000 000--000-0 0---000000 00000+00-+ -00+0-00+0 61 000-00+000 --000--0+0 0000-00-0- 000000-00- 000-0-0000 -00+0-000+ 121 00+0000000 +-00000000 0000000000 0000000000 0000000--+ 0000000000 181 000000+000 0000000000 000000-00- 000000000+ 0000000+00 --0-00+00+ 241 00-+000000 00+0000000 0000+00+00 000-00+0-0 00000+0-00 000-000000 301 000+0000-+ 0+00000000 0+00-00000 0000000000 0000000000 0+0-000000 361 +00-00000+ 0-00000000 0000000000 -00000+000 0000000000 0000+00000 421 000000000+ 0000-00000 0+0-000+0- 0000000000 0000--0++0 0+0-000000 481 00000-+000 00+00000-0 0000+0000- +00000000+ 0000-00+00 -000000+00 541 000000000- 0+00000000 0-0+000000 000+000000 00000+0000 -00+00-+00 601 0000000000 0+-0-00000 000000000- +0-0000000 0000000000 000-00+00- 661 00000-0000 00000+0-+0 000-+00000 000000+000 00000-00+0 -000000000 721 0000000000 0-+0000000 0+0-+0-000 +00-0000+- 00+0000000 000000000- 781 +0-0000000 00000+0000 -+0-000000 000000000- 00000+0-00 -000+0-000 841 0000000000 0000000000 0+00+00000 0000000000 0+00000000 0000000000 901 00+-00+00+ 0-00000+00 -000000000 00-+00000- 0000000000 0000000+00 961 000000+0-+ 0000000000 -00+000000 00-+000000 0000000000 -00+-00000 1021 -0000+-000 00+0-+0-00 0000000000 0000000000 0+00-+0-00 0000000000 1081 000+000000 00+0-+0-00 -00-+00+00 +000000000 0000000-00 0000000000 1141 +000000000 0+-0000000 0000000+0+ 00-0000000 0000000000 000000-000 1201 000000-+00 -00+00+0-- 0000+-+-0- 0-000+0000 00-00+00-0 0++000+00+ 1261 -0+0000-0+ 00-000-000 0000-00+00 000-00-000 000000000+ 00000+00+- 1321 ++00000-00 0-0000-000 000-00-000 00000+0000 -000000000 +000000-00 1381 0000++0000 +000-0-0+0 -000+00000 00-0000000 000+000-0+ 00+00+0000 1441 -0000-0000 -0-000-000 0000 A. CHARGE CLUSTERS. Positive charge clusters (cmin = 9/30 or 12/45 or 15/60): none Negative charge clusters (cmin = 10/30 or 13/45 or 16/60): none Mixed charge clusters (cmin = 15/30 or 20/45 or 24/60): none B. HIGH SCORING (UN)CHARGED SEGMENTS. There are no high scoring positive charge segments. There are no high scoring negative charge segments. There are no high scoring mixed charge segments. There are no high scoring uncharged segments. C. CHARGE RUNS AND PATTERNS. pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)| lmin0 5 | 5 | 7 | 50 | 10 | 10 | 13 | 12 | 12 | 16 | 6 | 7 | lmin1 6 | 6 | 9 | 60 | 12 | 12 | 16 | 15 | 15 | 20 | 7 | 9 | lmin2 7 | 8 | 10 | 67 | 13 | 14 | 18 | 17 | 17 | 22 | 8 | 10 | (Significance level: 0.010000; Minimal displayed length: 6) There are no charge runs or patterns exceeding the given minimal lengths. Run count statistics: + runs >= 3: 0 - runs >= 3: 1, at 32; * runs >= 5: 0 0 runs >= 33: 1, at 133; -------------------------------------------------------------------------------- DISTRIBUTION OF OTHER AMINO ACID TYPES 1. HIGH SCORING SEGMENTS. There are no high scoring hydrophobic segments. There are no high scoring transmembrane segments. 2. SPACINGS OF C. H2N-39-C-17-C-2-C-1-C-6-C-4-C-5-C-9-C-C-2-C-1163-C-5-C-16-C-8-C-7-C-70-C-44-C-46-C-2-COOH 2*. SPACINGS OF C and H. (additional deluxe function for ALEX) H2N-20-H-18-C-7-H-9-C-2-C-1-C-6-C-4-C-5-C-9-C-C-2-C-171-H-842-H-99-H-48-C-5-C-H-15-C-8-C-7-C-23-H-45-H-C-44-C-2-H-43-C-2-COOH -------------------------------------------------------------------------------- REPETITIVE STRUCTURES. A. SEPARATED, TANDEM, AND PERIODIC REPEATS: amino acid alphabet. Repeat core block length: 5 Aligned matching blocks: [ 112- 116] GPKGD [ 740- 744] GPKGD ______________________________ [ 114- 118] KGDTG [ 442- 446] KGDTG ______________________________ [ ]--------[ ]--------[ 118- 122]-( -5)-[ 118- 125]-( -8)- [ ]--------[ ]--------[ 121- 125]-( -5)-[ 121- 128]-------- [ 206- 207]-( 4)-[ 212- 224]-( -7)-[ 218- 222]-( -5)-[ 218- 225]-( -11)- [1127-1128]-( 4)-[1133-1145]-( -7)-[1139-1143]--------[ ]-------- [ 118- 130] [ ] [ 215- 227] [ ] [ 212- 224] GASGPMGPRGPPG [1133-1145] GASGPAGPRGPPG [ 118- 122] GPRGP [ 121- 125] GPRGP [ 218- 222] GPRGP [1139-1143] GPRGP [ 118- 125] GPRGPRGP [ 121- 128] GPRGPAGP [ 218- 225] GPRGPPGP [ 118- 130] GPRGPRGPAGPPG [ 215- 227] GPMGPRGPPGPPG ______________________________ [ 123- 130] RGPAGPPG [ 220- 227] RGPPGPPG [ 415- 424] RGPSGPQGPG [ 865- 872] RGSAGP__PG with superset: [ 123- 128] RGPAGP [ 220- 225] RGPPGP [ 244- 249] RGPPGP [ 415- 420] RGPSGP [ 745- 750] RGDAGP [ 865- 870] RGSAGP [ 994- 999] RGPPGP [1084-1089] RGPAGP ______________________________ [ 126- 130] AGPPG [ 331- 336] AGPPGP [ 340- 344] AGPPG [ 649- 654] AGPPGE [ 814- 819] AGPPGA [ 838- 843] AGPPGP [ 847- 852] AGPPGP [ 868- 873] AGPPGA [ 892- 897] AGPPGP [1006-1011] AGPPGE [1042-1047] AGPPGA ______________________________ [ 129- 136] PGRDGIPG [1024-1031] PGRDGSPG ______________________________ [ 139- 149] GLPGPPGPPGP [ 188- 198] GLPGPPGAPGP with superset: [ 139- 143] GLPGP [ 188- 192] GLPGP [ 479- 483] GLPGP [ 644- 648] GLPGP [ 737- 741] GLPGP [ 974- 978] GLPGP [1157-1161] GLPGP and: [ 139- 145] GLPGPPG [ 188- 194] GLPGPPG [ 479- 485] GLPGPPG ______________________________ [ 145- 155] GPPGPPGPPGL [ 995-1005] GPPGPMGPPGL ______________________________ [ 179- 186] GPMGPSGP [ 215- 222] GPMGPRGP with superset: [ 179- 183] GPMGP [ 215- 219] GPMGP [ 998-1002] GPMGP ______________________________ [ 185- 194] GPRGLPGPPG [ 218- 227] GPRGPPGPPG [ 476- 485] GPTGLPGPPG with superset: [ 185- 191] GPRGLPG [ 218- 224] GPRGPPG [ 251- 257] GARGLPG [ 302- 308] GPRGLPG [ 476- 482] GPTGLPG [1139-1145] GPRGPPG ______________________________ [ 187- 191] RGLPG [ 253- 257] RGLPG [ 304- 308] RGLPG ______________________________ [ 191- 197] GPPGAPG [1043-1049] GPPGAPG ______________________________ [ 196- 200] PGPQG [ 247- 251] PGPQG [ 949- 953] PGPQG ______________________________ [ 203- 207] GPPGE [ 482- 486] GPPGE [ 650- 654] GPPGE [1007-1011] GPPGE ______________________________ [ 208- 227] PGEPGASGPMGPRGPPGPPG [ 409- 428] PGFPGARGPSGPQGPGGPPG with superset: [ 190- 195] PGPPGA [ 208- 213] PGEPGA [ 343- 348] PGFPGA [ 409- 414] PGFPGA [ 568- 573] PGPPGA [ 601- 606] PGPPGA [1045-1050] PGAPGA and: [ 190- 198] PGPPGAPGP [ 208- 216] PGEPGASGP [ 409- 417] PGFPGARGP [ 601- 609] PGPPGAVGP ______________________________ [ 191- 192]-( 4)-[ 197- 198]-( 4)-[ 203- 204]-( 4)-[ 209- 213] [ 416- 417]-( 4)-[ 422- 423]-( 4)-[ 428- 429]-( 4)-[ 434- 438] [ 209- 213] GEPGA [ 434- 438] GEPGA ______________________________ [ 205- 213]-( 3)-[ 217- 224] [ 289- 297]-( 3)-[ 301- 308] [ 205- 213] PGEPGEPGA [ 289- 297] PGSPGENGA [ 217- 224] MGPRGPPG [ 301- 308] MGPRGLPG ______________________________ [ 232- 236] DGEAG [ 613- 617] DGEAG ______________________________ [ 233- 234]-( 4)-[ 239- 249] [ 911- 912]-( 4)-[ 917- 927] [ 239- 249] GRPGERGPPGP [ 917- 927] GRPGEVGPPGP with superset: [ 239- 243] GRPGE [ 527- 531] GRPGE [ 917- 921] GRPGE ______________________________ [ 241- 248] PGERGPPG [ 307- 314] PGERGRPG [ 484- 491] PGERGGPG with superset: [ 241- 245] PGERG [ 307- 311] PGERG [ 484- 488] PGERG [ 682- 686] PGERG [ 730- 734] PGERG ______________________________ [ ]--------[ 244- 248]-( -5)-[ 244- 252] [ 982- 999]-( -6)-[ 994- 998]--------[ ] [1126-1143]-( -3)-[1141-1145]-( -5)-[1141-1149] [ 982- 999] PGKQGPSGASGERGPPGP [1126-1143] PGEQGPSGASGPAGPRGP with superset: [ 671- 675] GPSGA [ 986- 990] GPSGA [1130-1134] GPSGA [ 244- 248] RGPPG [ 994- 998] RGPPG [1141-1145] RGPPG [ 244- 252] RGPPGPQGA [1141-1149] RGPPGSAGA ______________________________ [ 259- 266]-( -8)-[ 259- 269] [ 532- 539]--------[ ] [ 736- 743]-( -8)-[ 736- 746] [ 259- 266] AGLPGMKG [ 532- 539] AGLPGAKG [ 736- 743] AGLPGPKG [ 259- 269] AGLPGMKGHRG [ 736- 746] AGLPGPKGDRG ______________________________ [ 265- 273] KGHRGFSGL [1108-1116] KGHRGFSGL ______________________________ [ 281- 290]-( -8)-[ 283- 287] [ ]--------[ 502- 506] [ 515- 524]-( -8)-[ 517- 521] [ ]--------[ 748- 752] [ 281- 290] GPAGPKGEPG [ 515- 524] GPAGPKGSPG [ 283- 287] AGPKG [ 502- 506] AGPKG [ 517- 521] AGPKG [ 748- 752] AGPKG ______________________________ [ 286- 290] KGEPG [ 448- 452] KGEPG [ 826- 830] KGEPG ______________________________ [ 289- 294] PGSPGE [ 544- 548] PGSPG [1123-1128] PGSPGE ______________________________ [ 284- 285]-( 4)-[ 290- 294] [ 515- 516]-( 4)-[ 521- 525] [1118-1119]-( 4)-[1124-1128] [ 290- 294] GSPGE [ 521- 525] GSPGE [1124-1128] GSPGE ______________________________ [ 314- 323] GAPGPAGARG [ 668- 677] GAPGPSGARG with superset: [ 194- 198] GAPGP [ 314- 318] GAPGP [ 668- 672] GAPGP [1052-1056] GAPGP ______________________________ [ 320- 335] GARGNDGATGAAGPPG [ 701- 716] GAPGNDGAKGDAGAPG with superset: [ 274- 281] DGAKGDAG [ 325- 332] DGATGAAG [ 706- 713] DGAKGDAG ______________________________ [ 337- 341]-( -5)-[ 337- 351]-( -15)-[ 337- 344]-( 10)-[ 355- 362] [ 913- 917]--------[ ]--------[ 913- 920]--------[ ] [1039-1043]-( -5)-[1039-1053]-( -15)-[1039-1046]--------[ ] [1069-1073]--------[ ]--------[ ]--------[1087-1094] [ 337- 341] TGPAG [ 913- 917] TGPAG [1039-1043] TGPAG [1069-1073] TGPAG [ 337- 351] TGPAGPPGFPGAVGA [1039-1053] TGPAGPPGAPGAPGA [ 337- 344] TGPAGPPG [ 913- 920] TGPAGRPG [1039-1046] TGPAGPPG [ 355- 362] AGPQGPRG [1087-1094] AGPQGPRG with superset: [ 356- 360] GPQGP [ 419- 423] GPQGP [1088-1092] GPQGP ______________________________ [ 343- 350] PGFPGAVG [ 601- 608] PGPPGAVG ______________________________ [ ]--------[ 344- 348] [ ]--------[ 410- 414] [ 446- 452]-( 41)-[ 494- 498] [ 824- 830]-( 44)-[ 875- 879] [ 446- 452] GAKGEPG [ 824- 830] GAKGEPG with superset: [ 350- 354] GAKGE [ 446- 450] GAKGE [ 824- 828] GAKGE [ 344- 348] GFPGA [ 410- 414] GFPGA [ 494- 498] GFPGA [ 875- 879] GFPGA ______________________________ [ 370- 384] RGEPGPPGPAGAAGP [ 802- 816] RGEPGPPGPAGFAGP with superset: [ 370- 377] RGEPGPPG [ 469- 476] RGARGEPG [ 676- 683] RGERGFPG [ 802- 809] RGEPGPPG [ 967- 974] RGERGFPG ______________________________ [ 377- 378]-( 3)-[ 382- 389] [ 839- 840]-( 3)-[ 844- 851] [ 382- 389] AGPAGNPG [ 844- 851] AGPAGPPG with superset: [ 280- 284] AGPAG [ 382- 386] AGPAG [ 625- 629] AGPAG [ 844- 848] AGPAG [1072-1076] AGPAG ______________________________ [ 388- 392]-( -11)-[ 382- 398]--------[ ]--------[ 433- 437] [ 496- 500]--------[ ]--------[ 496- 503]--------[ ] [ 817- 821]-( -11)-[ 811- 827]--------[ ]--------[ ] [ 937- 941]--------[ ]--------[ 937- 944]-( 34)-[ 979- 983] [ 388- 392] PGADG [ 496- 500] PGADG [ 817- 821] PGADG [ 937- 941] PGADG with superset: [ 388- 396] PGADGQPGA [ 817- 825] PGADGQPGA [ 937- 945] PGADGPAGA [ 382- 398] AGPAGNPGADGQPGAKG [ 811- 827] AGFAGPPGADGQPGAKG with superset: [ 388- 396] PGADGQPGA [ 817- 825] PGADGQPGA [ 937- 945] PGADGPAGA [ 496- 503] PGADGVAG [ 937- 944] PGADGPAG [ 433- 437] SGEPG [ 979- 983] SGEPG ______________________________ [ 398- 408] GANGAPGIAGA [ 698- 708] GANGAPGNDGA with superset: [ 295- 299] NGAPG [ 400- 404] NGAPG [ 700- 704] NGAPG ______________________________ [ 413- 423] GARGPSGPQGP [1082-1092] GARGPAGPQGP ______________________________ [ 416- 423] GPSGPQGP [ 785- 792] GPSGPAGP with superset: [ 182- 186] GPSGP [ 416- 420] GPSGP [ 785- 789] GPSGP ______________________________ [ 409- 416]-( 10)-[ 427- 431]-( -5)-[ 427- 437] [ 568- 575]-( 7)-[ 583- 587]-( -5)-[ 583- 593] [ ]--------[ 739- 743]--------[ ] [ 409- 416] PGFPGARG [ 568- 575] PGPPGARG [ 427- 431] PGPKG [ 583- 587] PGPKG [ 739- 743] PGPKG [ 427- 437] PGPKGNSGEPG [ 583- 593] PGPKGAAGEPG ______________________________ [ 449- 453] GEPGP [ 473- 477] GEPGP [ 803- 807] GEPGP ______________________________ [ 446- 447]-( 3)-[ 451- 455] [1049-1050]-( 3)-[1054-1058] [ 451- 455] PGPVG [1054-1058] PGPVG ______________________________ [ 473- 474]-( 4)-[ 479- 486] [ 968- 969]-( 4)-[ 974- 981] [ 479- 486] GLPGPPGE [ 974- 981] GLPGPSGE ______________________________ [ 493- 512] RGFPGADGVAGPKGPAGERG [ 679- 698] RGFPGERGVQGPPGPAGPRG with superset: [ 493- 497] RGFPG [ 679- 683] RGFPG [ 970- 974] RGFPG ______________________________ [ 503- 518]-( -11)-[ 508- 516] [ ]--------[ 595- 603] [ 623- 638]-( -11)-[ 628- 636] [ 503- 518] GPKGPAGERGSPGPAG [ 623- 638] GPAGPAGERGEQGPAG [ 508- 516] AGERGSPGP [ 595- 603] AGERGVPGP [ 628- 636] AGERGEQGP ______________________________ [ 512- 516] GSPGP [ 545- 549] GSPGP ______________________________ [ 514- 524] PGPAGPKGSPG [ 928- 938] PGPAGEKGSPG ______________________________ [ 523- 536] PGEAGRPGEAGLPG [ 646- 659] PGPAGPPGEAGKPG with superset: [ 526- 531] AGRPGE [ 649- 654] AGPPGE [ 655- 660] AGKPGE [ 916- 921] AGRPGE [1006-1011] AGPPGE ______________________________ [ 569- 576] GPPGARGQ [ 815- 822] GPPGADGQ with superset: [ 569- 573] GPPGA [ 602- 606] GPPGA [ 815- 819] GPPGA [ 869- 873] GPPGA [1043-1047] GPPGA ______________________________ [ 584- 588] GPKGA [ 749- 753] GPKGA ______________________________ [ 590- 594] GEPGK [ 980- 984] GEPGK ______________________________ [ 601- 612]-( -9)-[ 604- 612] [ ]--------[ 895- 903] [1051-1062]-( -9)-[1054-1062] [ 601- 612] PGPPGAVGPAGK [1051-1062] PGAPGPVGPAGK [ 604- 612] PGAVGPAGK [ 895- 903] PGPPGPAGK [1054-1062] PGPVGPAGK ______________________________ [ 614- 615]-( 3)-[ 619- 627] [ 683- 684]-( 3)-[ 688- 696] [ 619- 627] QGPPGPAGP [ 688- 696] QGPPGPAGP with superset: [ 202- 206] QGPPG [ 457- 461] QGPPG [ 619- 623] QGPPG [ 688- 692] QGPPG [1117-1121] QGPPG and: [ 457- 463] QGPPGPA [ 619- 625] QGPPGPA [ 688- 694] QGPPGPA ______________________________ [ 619- 630] QGPPGPAGPAGE [ 643- 654] QGLPGPAGPPGE with superset: [ 421- 426] QGPGGP [ 457- 462] QGPPGP [ 619- 624] QGPPGP [ 643- 648] QGLPGP [ 688- 693] QGPPGP [1117-1122] QGPPGP and: [ 457- 464] QGPPGPAG [ 619- 626] QGPPGPAG [ 643- 650] QGLPGPAG [ 688- 695] QGPPGPAG ______________________________ [ 623- 627]-( 4)-[ 632- 636] [1118-1122]-( 4)-[1127-1131] [ 623- 627] GPAGP [1118-1122] GPPGP [ 632- 636] GEQGP [1127-1131] GEQGP ______________________________ [ 644- 648]-( -5)-[ 644- 653] [ 737- 741]--------[ ] [ 974- 978]--------[ ] [1157-1161]-( -5)-[1157-1166] [ 644- 648] GLPGP [ 737- 741] GLPGP [ 974- 978] GLPGP [1157-1161] GLPGP [ 644- 653] GLPGPAGPPG [1157-1166] GLPGPIGPPG ______________________________ [ 650- 653]-( 4)-[ 658- 662] [1118-1121]-( 4)-[1126-1130] [ 650- 653] GPPG [1118-1121] GPPG [ 658- 662] PGEQG [1126-1130] PGEQG ______________________________ [ 670- 674]--------[ ] [ 886- 890]-( 44)-[ 935- 939] [ 976- 980]-( 47)-[1028-1032] [ 670- 674] PGPSG [ 886- 890] PGPSG [ 976- 980] PGPSG [ 935- 939] GSPGA [1028-1032] GSPGA ______________________________ [ 682- 686] PGERG [ 730- 734] PGERG ______________________________ [ 703- 716] PGNDGAKGDAGAPG [ 829- 842] PGDAGAKGDAGPPG with superset: [ 275- 279] GAKGD [ 707- 711] GAKGD [ 833- 837] GAKGD [1031-1035] GAKGD and: [ 275- 281] GAKGDAG [ 707- 713] GAKGDAG [ 833- 839] GAKGDAG ______________________________ [ 710- 714] GDAGA [ 830- 834] GDAGA ______________________________ [ 707- 708]-( 3)-[ 712- 722] [ 938- 939]-( 3)-[ 943- 953] [ 712- 722] AGAPGAPGSQG [ 943- 953] AGAPGTPGPQG with superset: [ 406- 410] AGAPG [ 712- 716] AGAPG [ 775- 779] AGAPG [ 943- 947] AGAPG [1147-1151] AGAPG and: [ 406- 413] AGAPGFPG [ 712- 719] AGAPGAPG [ 943- 950] AGAPGTPG ______________________________ [ 739- 750] PGPKGDRGDAGP [1030-1041] PGAKGDRGETGP ______________________________ [ 754- 758] DGSPG [1027-1031] DGSPG ______________________________ [ 757- 774] PGKDGVRGLTGPIGPPGP [1150-1167] PGKDGLNGLPGPIGPPGP ______________________________ [ 773- 779] GPAGAPG [ 941- 947] GPAGAPG ______________________________ [ 775- 779] AGAPG [ 943- 947] AGAPG [1147-1151] AGAPG ______________________________ [ 767- 771]-( 4)-[ 776- 791] [ 788- 792]-( 4)-[ 797- 812] [ 767- 771] GPIGP [ 788- 792] GPAGP [ 776- 791] GAPGDKGESGPSGPAG [ 797- 812] GAPGDRGEPGPPGPAG ______________________________ [ 770- 774]-( 4)-[ 779- 783] [1085-1089]-( 4)-[1094-1098] [ 770- 774] GPPGP [1085-1089] GPAGP [ 779- 783] GDKGE [1094-1098] GDKGE ______________________________ [ 800- 819] GDRGEPGPPGPAGFAGPPGA [1064-1083] GDRGETGPAGPAGPVGPVGA ______________________________ [ 836- 852] GDAGPPGPAGPAGPPGP [1172-1188] GDAGPVGPPGPPGPPGP ______________________________ [ 850- 854] PGPIG [1159-1163] PGPIG ______________________________ [ 859- 870] PGAKGARGSAGP [1030-1041] PGAKGDRGETGP ______________________________ [ 884- 885]-( 3)-[ 889- 902] [1130-1131]-( 3)-[1135-1148] [ 889- 902] SGNAGPPGPPGPAG [1135-1148] SGPAGPRGPPGSAG with superset: [ 214- 219] SGPMGP [ 418- 423] SGPQGP [ 787- 792] SGPAGP [ 889- 894] SGNAGP [1135-1140] SGPAGP and: [ 214- 224] SGPMGPRGPPG [ 418- 428] SGPQGPGGPPG [ 889- 899] SGNAGPPGPPG [1135-1145] SGPAGPRGPPG ______________________________ [ 922- 930] VGPPGPPGP [1177-1185] VGPPGPPGP with superset: [ 883- 888] VGPPGP [ 922- 927] VGPPGP [1177-1182] VGPPGP ______________________________ [1055-1059] GPVGP [1076-1080] GPVGP [1175-1179] GPVGP ______________________________ [1114-1125] SGLQGPPGPPGS [1135-1146] SGPAGPRGPPGS ______________________________ Simple tandem repeat: [ 523- 528] PGEAGR [ 529- 534] PGEAGL [ 535- 540] PGAKGL Highly repetitive regions: From 118 to 1192 with major motif GERGPPGPA. From 124 to 1141 with major motif GPAGPP. From 138 to 1192 with major motif PGPPGPP. From 141 to 1192 with major motif PGPPGPA. From 141 to 1191 with major motif PGPPGP. From 142 to 1192 with major motif GPPGPP. From 187 to 1072 with major motif RGEPGPP. From 280 to 1180 with major motif AGPPGPP. From 280 to 1182 with major motif AGPPGPRGP. From 316 to 933 with major motif PGPAGP. B. SEPARATED AND TANDEM REPEATS: 11-letter reduced alphabet. (i= LVIF; += KR; -= ED; s= AG; o= ST; n= NQ; a= YW; p= P; h= H; m= M; c= C) Repeat core block length: 9 Aligned matching blocks: [ 747- 761] -ssp+ss-sops+-s [1015-1028] -ssp_ss-sops+-s ______________________________ [ 776- 800] ssps-+s-ospospssposs+ssps [1031-1055] ss+s-+s-ospssppsspsspssps with superset: [ 112- 122] sp+s-osp+sp [ 275- 285] ss+s-sspssp [ 509- 519] s-+sopspssp [ 779- 789] s-+s-osposp [ 800- 810] s-+s-psppsp [1034-1044] s-+s-ospssp [1064-1074] s-+s-ospssp and: [ 779- 791] s-+s-osposp_ss [ 800- 812] s-+s-psppspss [1034-1047] s-+s-ospssppss [1064-1076] s-+s-ospsspss and: [ 779- 798] s-+s-ospos_pssposs+ss [ 800- 819] s-+s-pspps_pssissppss [1034-1053] s-+s-ospssppssps_spss -------------------------------------------------------------------------------- MULTIPLETS. A. AMINO ACID ALPHABET. 1. Total number of amino acid multiplets: 83 (Expected range: 118--189) low 1 ........LL LLLAA..LL. ...EE..... .....PP... .......... .......... 61 .......... DD........ .......... CC........ ......TT.. .......... 121 .......PP. .......... ..PP.PP.PP .PP..GG... .......... ..GG...... 181 .......... .PP....... ...PP..... .......... .PP.PP.... DD........ 241 .....PP... .......... .......... .......... .......... .......... 301 .......... .......... .........A A.PP...... .PP....... .......... 361 .......... ....PP.... AA........ .......... .......... .......... 421 ...GGPP... .......... .......... ........PP ....EE.... .......... 481 ..PP...GG. .......... .......... .......... .......... .......... 541 .......... ....PP.... ......PP.P P......... .......AA. .......... 601 ..PP...... .......... PP........ .......... .......... PP........ 661 .......... .......... .........P P......... .......... .......... 721 .......... ....AA.... .......... .......... .......... PP........ 781 .......... .......... ......PP.. .....PP... .......... .........P 841 P.......PP .......... .........P P.......AA ....PP.... ...PP.PP.. 901 ....GG.... .......... ...PP.PP.. .......... .......... .........V 961 V......... .......... .......... .....PP... .PP....PP. ........AA 1021 .......... .......... ...PP..... .......... .......... .......... 1081 .......... .......... .......... ........PP .PP....... .......... 1141 ..PP...... .......... ...PP..... ........PP .PP.PP.PP. PP........ 1201 ...PP..... .GG.YY..DD ..VV...... ..TT.....Q Q......... .......... 1261 .......... .......... .......... .......... .......... .......... 1321 .......... ........GG .......... .......... .......... ........QQ 1381 ....KK.LLL .......... .......... .......... .......... TT.SS...II 1441 .......... .......... .... 2. Histogram of spacings between consecutive amino acid multiplets: (1-5) 32 (6-10) 11 (11-20) 21 (>=21) 20 3. Clusters of amino acid multiplets (cmin = 10/30 or 13/45 or 16/60): none 4. Significant specific amino acid altplet counts: Letters Observed (Critical number) AG 113 (93) at 83 (l= 2) 126 (l= 2) 194 (l= 2) 212 (l= 2) 235 (l= 2) 251 (l= 2) 259 (l= 2) 275 (l= 2) 280 (l= 2) 283 (l= 2) 296 (l= 2) 314 (l= 2) 319 (l= 3) 326 (l= 2) 329 (l= 2) 331 (l= 2) 340 (l= 2) 347 (l= 2) 350 (l= 2) 355 (l= 2) 379 (l= 3) 382 (l= 2) 385 (l= 2) 389 (l= 2) 395 (l= 2) 398 (l= 2) 401 (l= 2) 406 (l= 3) 413 (l= 2) 437 (l= 2) 446 (l= 2) 463 (l= 2) 470 (l= 2) 497 (l= 2) 502 (l= 2) 508 (l= 2) 517 (l= 2) 526 (l= 2) 532 (l= 2) 536 (l= 2) 559 (l= 2) 572 (l= 2) 577 (l= 2) 587 (l= 2) 589 (l= 2) 595 (l= 2) 605 (l= 2) 610 (l= 2) 616 (l= 3) 625 (l= 2) 628 (l= 2) 637 (l= 2) 649 (l= 2) 655 (l= 2) 668 (l= 2) 674 (l= 2) 694 (l= 2) 698 (l= 2) 701 (l= 2) 707 (l= 2) 712 (l= 3) 716 (l= 2) 722 (l= 2) 734 (l= 2) 736 (l= 2) 748 (l= 2) 752 (l= 2) 775 (l= 3) 790 (l= 2) 794 (l= 2) 797 (l= 2) 811 (l= 2) 814 (l= 2) 818 (l= 2) 824 (l= 2) 832 (l= 3) 838 (l= 2) 844 (l= 2) 847 (l= 2) 857 (l= 2) 860 (l= 2) 863 (l= 2) 868 (l= 2) 872 (l= 2) 878 (l= 2) 880 (l= 2) 892 (l= 2) 901 (l= 2) 916 (l= 2) 931 (l= 2) 938 (l= 2) 943 (l= 3) 955 (l= 2) 989 (l= 2) 1006 (l= 2) 1016 (l= 2) 1031 (l= 2) 1042 (l= 2) 1046 (l= 2) 1049 (l= 2) 1052 (l= 2) 1060 (l= 2) 1072 (l= 2) 1075 (l= 2) 1082 (l= 2) 1087 (l= 2) 1133 (l= 2) 1138 (l= 2) 1147 (l= 3) 1174 (l= 2) 1194 (l= 2) 1420 (l= 2) 1448 (l= 2) GP 203 (156) at 82 (l= 2) 112 (l= 2) 118 (l= 2) 121 (l= 2) 124 (l= 2) 127 (l= 2) 129 (l= 2) 135 (l= 2) 138 (l= 2) 141 (l= 3) 144 (l= 3) 147 (l= 3) 150 (l= 3) 153 (l= 2) 178 (l= 3) 182 (l= 2) 185 (l= 2) 190 (l= 3) 193 (l= 2) 196 (l= 3) 203 (l= 2) 205 (l= 2) 208 (l= 2) 211 (l= 2) 215 (l= 2) 218 (l= 2) 221 (l= 2) 223 (l= 3) 226 (l= 2) 238 (l= 2) 241 (l= 2) 245 (l= 2) 247 (l= 3) 256 (l= 2) 262 (l= 2) 281 (l= 2) 284 (l= 2) 289 (l= 2) 292 (l= 2) 298 (l= 2) 302 (l= 2) 307 (l= 2) 313 (l= 2) 316 (l= 3) 332 (l= 2) 334 (l= 3) 338 (l= 2) 341 (l= 2) 343 (l= 2) 346 (l= 2) 356 (l= 2) 359 (l= 2) 365 (l= 2) 373 (l= 3) 376 (l= 3) 383 (l= 2) 388 (l= 2) 394 (l= 2) 403 (l= 2) 409 (l= 2) 412 (l= 2) 416 (l= 2) 419 (l= 2) 422 (l= 3) 425 (l= 2) 427 (l= 3) 436 (l= 2) 439 (l= 2) 451 (l= 3) 458 (l= 2) 460 (l= 3) 475 (l= 3) 481 (l= 3) 484 (l= 2) 489 (l= 3) 496 (l= 2) 503 (l= 2) 506 (l= 2) 514 (l= 3) 518 (l= 2) 523 (l= 2) 529 (l= 2) 535 (l= 2) 544 (l= 2) 547 (l= 3) 554 (l= 2) 556 (l= 3) 565 (l= 3) 568 (l= 3) 571 (l= 2) 583 (l= 3) 592 (l= 2) 601 (l= 3) 604 (l= 2) 608 (l= 2) 620 (l= 2) 622 (l= 3) 626 (l= 2) 635 (l= 2) 640 (l= 2) 646 (l= 3) 650 (l= 2) 652 (l= 2) 658 (l= 2) 664 (l= 2) 670 (l= 3) 682 (l= 2) 689 (l= 2) 691 (l= 3) 695 (l= 2) 703 (l= 2) 715 (l= 2) 71 8 (l= 2) 724 (l= 2) 730 (l= 2) 739 (l= 3) 749 (l= 2) 757 (l= 2) 767 (l= 2) 770 (l= 2) 772 (l= 3) 778 (l= 2) 785 (l= 2) 788 (l= 2) 791 (l= 2) 799 (l= 2) 805 (l= 3) 808 (l= 3) 815 (l= 2) 817 (l= 2) 823 (l= 2) 829 (l= 2) 839 (l= 2) 841 (l= 3) 845 (l= 2) 848 (l= 2) 850 (l= 3) 859 (l= 2) 869 (l= 2) 871 (l= 2) 877 (l= 2) 884 (l= 2) 886 (l= 3) 893 (l= 2) 895 (l= 3) 898 (l= 3) 908 (l= 2) 914 (l= 2) 919 (l= 2) 923 (l= 2) 925 (l= 3) 928 (l= 3) 937 (l= 2) 941 (l= 2) 946 (l= 2) 949 (l= 3) 964 (l= 2) 973 (l= 2) 976 (l= 3) 982 (l= 2) 986 (l= 2) 995 (l= 2) 997 (l= 3) 1001 (l= 2) 1003 (l= 2) 1007 (l= 2) 1009 (l= 2) 1024 (l= 2) 1030 (l= 2) 1040 (l= 2) 1043 (l= 2) 1045 (l= 2) 1048 (l= 2) 1051 (l= 2) 1054 (l= 3) 1058 (l= 2) 1070 (l= 2) 1073 (l= 2) 1076 (l= 2) 1079 (l= 2) 1085 (l= 2) 1088 (l= 2) 1091 (l= 2) 1118 (l= 2) 1120 (l= 3) 1123 (l= 2) 1126 (l= 2) 1130 (l= 2) 1136 (l= 2) 1139 (l= 2) 1142 (l= 2) 1144 (l= 2) 1150 (l= 2) 1159 (l= 3) 1163 (l= 2) 1165 (l= 3) 1175 (l= 2) 1178 (l= 2) 1180 (l= 3) 1183 (l= 3) 1186 (l= 3) 1189 (l= 3) 1459 (l= 2) 5. Long amino acid multiplets (>= 5; Letter/Length/Position): L/5/9 B. CHARGE ALPHABET. 1. Total number of charge multiplets: 12 (Expected range: 8-- 37) 4 +plets (f+: 8.8%), 8 -plets (f-: 9.6%) Total number of charge altplets: 32 (Critical number: 42) 2. Histogram of spacings between consecutive charge multiplets: (1-5) 2 (6-10) 1 (11-20) 0 (>=21) 10 -------------------------------------------------------------------------------- PERIODICITY ANALYSIS. A. AMINO ACID ALPHABET (core: 4; !-core: 5) Location Period Element Copies Core Errors 9- 13 1 L 5 5 ! 0 109- 159 3 G.. 17 17 ! 0 173-1192 3 G.. 338 280 ! 2 B. CHARGE ALPHABET ({+= KR; -= ED; 0}; core: 5; !-core: 6) and HYDROPHOBICITY ALPHABET ({*= KRED; i= LVIF; 0}; core: 6; !-core: 8) Location Period Element Copies Core Errors 228- 245 3 *00 6 6 /0/2/0/ -------------------------------------------------------------------------------- SPACING ANALYSIS. Location (Quartile) Spacing Rank P-value Interpretation 47- 165 (1.) Y( 118)Y 2 of 14 0.9994 small 2. maximal spacing 53-1269 (2.) W(1216)W 1 of 7 0.0002 large 1. maximal spacing 95-1259 (2.) C(1164)C 1 of 19 0.0000 large 1. maximal spacing 167-1215 (2.) Y(1048)Y 1 of 14 0.0000 large 1. maximal spacing 170- 228 (1.) K( 58)K 2 of 59 0.9982 small 2. maximal spacing 267-1110 (2.) H( 843)H 1 of 10 0.0043 large maximal spacing 286- 352 (1.) K( 66)K 1 of 59 0.9976 small 1. maximal spacing 310- 312 (1.) R( 2)R 72 of 72 0.0006 large minimal spacing 1168-1170 (4.) R( 2)R 70 of 72 0.0006 matching minimum 1205-1248 (4.) P( 43)P 2 of 279 0.0003 large 2. maximal spacing 1213-1250 (4.) G( 37)G 2 of 391 0.0000 large 2. maximal spacing 1225-1227 (4.) R( 2)R 71 of 72 0.0006 matching minimum 1299-1370 (4.) C( 71)C 2 of 19 1.0000 small 2. maximal spacing 1325-1422 (4.) W( 97)W 2 of 7 0.9996 small 2. maximal spacing 1342-1382 (4.) G( 40)G 1 of 391 0.0013 large 1. maximal spacing 1345-1438 (4.) P( 93)P 1 of 279 0.0000 large 1. maximal spacing ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Pfam (from /data/patterns/pfam) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/pfam/Pfam Sequence file: tem38 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Collagen Collagen triple helix repeat (20 copies) 970.8 3.3e-288 18 COLFI Fibrillar collagen C-terminal domain 565.2 2e-220 1 vwc von Willebrand factor type C domain 89.7 5.8e-23 1 fibrinogen_C Fibrinogen beta and gamma chains, C-term -0.3 50 1 DUF41 Domain of unknown function DUF41 -71.4 30 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- vwc 1/1 40 95 .. 1 84 [] 89.7 5.8e-23 Collagen 1/18 107 165 .. 1 60 [] 26.8 0.00013 Collagen 2/18 177 235 .. 1 60 [] 51.4 2e-11 Collagen 3/18 236 295 .. 1 60 [] 77.7 2.4e-19 Collagen 4/18 296 355 .. 1 60 [] 66.9 4.3e-16 Collagen 5/18 356 415 .. 1 60 [] 63.6 4.2e-15 Collagen 6/18 416 475 .. 1 60 [] 63.1 5.9e-15 Collagen 7/18 476 535 .. 1 60 [] 65.9 8.5e-16 Collagen 8/18 536 595 .. 1 60 [] 66.6 5.3e-16 Collagen 9/18 596 655 .. 1 60 [] 64.1 3e-15 Collagen 10/18 656 715 .. 1 60 [] 62.6 8.4e-15 Collagen 11/18 716 775 .. 1 60 [] 72.2 1.1e-17 Collagen 12/18 779 838 .. 1 60 [] 70.3 3.9e-17 Collagen 13/18 839 898 .. 1 60 [] 62.4 9.4e-15 Collagen 14/18 899 958 .. 1 60 [] 61.2 2.3e-14 Collagen 15/18 959 1018 .. 1 60 [] 64.6 2.1e-15 Collagen 16/18 1020 1078 .. 1 60 [] 55.4 1.2e-12 Collagen 17/18 1079 1138 .. 1 60 [] 75.9 8.5e-19 Collagen 18/18 1139 1198 .. 1 60 [] 35.6 1.1e-06 fibrinogen_C 1/1 1271 1295 .. 18 43 .. -0.3 50 DUF41 1/1 4 1308 .. 1 247 [] -71.4 30 COLFI 1/1 1245 1463 .. 1 226 [] 565.2 2e-220 Alignments of top-scoring domains: vwc: domain 1 of 1, from 40 to 95: score 89.7, E = 5.8e-23 *->CvqnGvvYengetWkpdsqPnGvdkCtyiCtCddiedavrlggkvlC CvqnG +Y+++++Wkp++ C+ iC+Cd+ gkvlC tem38_gi|1 40 CVQNGLRYHDRDVWKPEP-------CR-ICVCDN--------GKVLC 70 dkitCppelLpsldCpnprrvdalvippGECCpewvC<-* d+++C+++ +Cp + + p+GECCp vC tem38_gi|1 71 DDVICDET----KNCPGA------EVPEGECCP--VC 95 Collagen: domain 1 of 18, from 107 to 165: score 26.8, E = 0.00013 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G Gp+G++Gp+Gp+Gp+Gp+G G pG pG pGpPGppGppGp tem38_gi|1 107 -TTGVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGP 152 pGppGapGapGpp<-* pG G+ + tem38_gi|1 153 PGLGGNFAPQLSY 165 Collagen: domain 2 of 18, from 177 to 235: score 51.4, E = 2e-11 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp pGp+Gp Gp+G pGppG+pGp+G++GppG pGepG+ Gp Gp Gp tem38_gi|1 177 -VPGPMGPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGP 222 pGppGapGapGpp<-* pGppG+ G+ G++ tem38_gi|1 223 PGPPGKNGDDGEA 235 Collagen: domain 3 of 18, from 236 to 295: score 77.7, E = 2.4e-19 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG+pG++GppGp G++G pG aG pG++G++G++G +G++G +Gp tem38_gi|1 236 GKPGRPGERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGP 282 pGppGapGapGpp<-* +Gp+G+pG+pG++ tem38_gi|1 283 AGPKGEPGSPGEN 295 Collagen: domain 4 of 18, from 296 to 355: score 66.9, E = 4.3e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG++Gp+G+pG++G+pG+pGpaGa+G+ G+ G++GpPGp Gp+Gp tem38_gi|1 296 GAPGQMGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGP 342 pGppGapGapGpp<-* pG pGa Ga+G++ tem38_gi|1 343 PGFPGAVGAKGEA 355 Collagen: domain 5 of 18, from 356 to 415: score 63.6, E = 4.2e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp Gp+G+ Gp+G +G+pGppGpaGa+Gp+G+pG++G+PG++G++G+ tem38_gi|1 356 GPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGANGA 402 pGppGapGapGpp<-* pG +GapG pG++ tem38_gi|1 403 PGIAGAPGFPGAR 415 Collagen: domain 6 of 18, from 416 to 475: score 63.1, E = 5.9e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp Gp+Gp GppGp+G++G+pG++G++G+ G++GepGp G +GppGp tem38_gi|1 416 GPSGPQGPGGPPGPKGNSGEPGAPGSKGDTGAKGEPGPVGVQGPPGP 462 pGppGapGapGpp<-* +G++G+ Ga G+p tem38_gi|1 463 AGEEGKRGARGEP 475 Collagen: domain 7 of 18, from 476 to 535: score 65.9, E = 8.5e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp G+pGppG++G pG +G+pG++G +Gp+Gp+Ge+G+PGp+Gp G+ tem38_gi|1 476 GPTGLPGPPGERGGPGSRGFPGADGVAGPKGPAGERGSPGPAGPKGS 522 pGppGapGapGpp<-* pG++G+pG++G p tem38_gi|1 523 PGEAGRPGEAGLP 535 Collagen: domain 8 of 18, from 536 to 595: score 66.6, E = 5.3e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G++G+ G+pG pGp+G+ GppGpaG G pGppG+pG+ G++G++G+ tem38_gi|1 536 GAKGLTGSPGSPGPDGKTGPPGPAGQDGRPGPPGPPGARGQAGVMGF 582 pGppGapGapGpp<-* pGp+Ga+G+pG++ tem38_gi|1 583 PGPKGAAGEPGKA 595 Collagen: domain 9 of 18, from 596 to 655: score 64.1, E = 3e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G++G pGppG+ Gp+G+ G++G++G+pGp+Gp+Ge+G++Gp+G pG+ tem38_gi|1 596 GERGVPGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGF 642 pGppGapGapGpp<-* +G pG++G+pG++ tem38_gi|1 643 QGLPGPAGPPGEA 655 Collagen: domain 10 of 18, from 656 to 715: score 62.6, E = 8.4e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG++G pG+ G+pGp+G+ G++G+pG++G +G+pGp Gp+G++G+ tem38_gi|1 656 GKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGANGA 702 pGppGapGapGpp<-* pG++Ga+G++G+p tem38_gi|1 703 PGNDGAKGDAGAP 715 Collagen: domain 11 of 18, from 716 to 775: score 72.2, E = 1.1e-17 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG++G+pG++G+pG++G++G +G++G++G++G++G++G+pG++G tem38_gi|1 716 GAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGSPGKDGV 762 pGppGapGapGpp<-* +G +G++G+pGp+ tem38_gi|1 763 RGLTGPIGPPGPA 775 Collagen: domain 12 of 18, from 779 to 838: score 70.3, E = 3.9e-17 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G +G+ Gp Gp+Gp+G++G+pG++G+pGppGp+G++GpPG++G+pG+ tem38_gi|1 779 GDKGESGPSGPAGPTGARGAPGDRGEPGPPGPAGFAGPPGADGQPGA 825 pGppGapGapGpp<-* +G+pG +Ga+G + tem38_gi|1 826 KGEPGDAGAKGDA 838 Collagen: domain 13 of 18, from 839 to 898: score 62.4, E = 9.4e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp GppGp+Gp+GppGp G+ G+pG++Ga+G++GppG+ G+PG++G+ Gp tem38_gi|1 839 GPPGPAGPAGPPGPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGP 885 pGppGapGapGpp<-* pGp G++G+pGpp tem38_gi|1 886 PGPSGNAGPPGPP 898 Collagen: domain 14 of 18, from 899 to 958: score 61.2, E = 2.3e-14 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp+G+ G++Gp+G++Gp+G pG+ G+pGppGp+Ge+G+PG++Gp+G+ tem38_gi|1 899 GPAGKEGGKGPRGETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGA 945 pGppGapGapGpp<-* pG pG+ G +G++ tem38_gi|1 946 PGTPGPQGIAGQR 958 Collagen: domain 15 of 18, from 959 to 1018: score 64.6, E = 2.1e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G G+pG +G++G pG pGp G++G++Gp G++Ge+GpPGp GppG tem38_gi|1 959 GVVGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGL 1005 pGppGapGapGpp<-* +GppG++G +G+p tem38_gi|1 1006 AGPPGESGREGAP 1018 Collagen: domain 16 of 18, from 1020 to 1078: score 55.4, E = 1.2e-12 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp + G+pG+ G pG++G++G++GpaG+pG pG+pG+pGp Gp+G+ G tem38_gi|1 1020 -AEGSPGRDGSPGAKGDRGETGPAGPPGAPGAPGAPGPVGPAGKSGD 1065 pGppGapGapGpp<-* +G++G++G++Gp+ tem38_gi|1 1066 RGETGPAGPAGPV 1078 Collagen: domain 17 of 18, from 1079 to 1138: score 75.9, E = 8.5e-19 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp G +Gp+Gp+Gp+G++G++G++G +G +G++G++G +GppGppG+ tem38_gi|1 1079 GPVGARGPAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQGPPGPPGS 1125 pGppGapGapGpp<-* pG++G++Ga Gp+ tem38_gi|1 1126 PGEQGPSGASGPA 1138 Collagen: domain 18 of 18, from 1139 to 1198: score 35.6, E = 1.1e-06 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp+GppG++G+pG +G G pGp G+pGp+G G++Gp GppGppGp tem38_gi|1 1139 GPRGPPGSAGAPGKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGP 1185 pGppGapGapGpp<-* pGppG+p a tem38_gi|1 1186 PGPPGPPSAGFDF 1198 fibrinogen_C: domain 1 of 1, from 1271 to 1295: score -0.3, E = 50 *->SPPGlYtIqPd.gakeqpllVYCDmet<-* S G Y I P++g + +++V+C met tem38_gi|1 1271 S--GEYWIDPNqGCNLDAIKVFCNMET 1295 DUF41: domain 1 of 1, from 4 to 1308: score -71.4, E = 30 *->lteeQLlstFsNvkhliGslevqnTnfkslsFLanLesIecg..... +++ l+ + l T + + + + ++e+++++ + tem38_gi|1 4 FVD---LRLL---------LLLAATALLTHG--QEEGQVEGQdedip 36 .................................................. + + +++ + ++++ ++++ + ++++ ++ +++++ ++ + tem38_gi|1 37 pitcvqnglryhdrdvwkpepcricvcdngkvlcddvicdetkncpgaev 86 .................................................. ++++ + ++++++++++++++ +++++++++++++++ +++++++ ++ tem38_gi|1 87 pegeccpvcpdgsesptdqettgvegpkgdtgprgprgpagppgrdgipg 136 .................................................. +++ ++++++++++++++ +++ ++ + + +++++++ + +++ +++++ tem38_gi|1 137 qpglpgppgppgppgppglggnfapqlsygydekstggisvpgpmgpsgp 186 .................................................. ++ +++++ +++++ +++++++++++ +++ +++++++++++++++++ + tem38_gi|1 187 rglpgppgapgpqgfqgppgepgepgasgpmgprgppgppgkngddgeag 236 .................................................. +++++++++++++++ ++ +++ + ++ +++++ ++ ++ +++ ++ +++ tem38_gi|1 237 kpgrpgergppgpqgarglpgtaglpgmkghrgfsgldgakgdagpagpk 286 .................................................. ++++++++++ +++ ++++ ++++++++ +++ + +++++ ++ +++++ tem38_gi|1 287 gepgspgengapgqmgprglpgergrpgapgpagargndgatgaagppgp 336 .................................................. +++ ++++ ++ + +++ +++++++++++++ +++++++++ + ++ + tem38_gi|1 337 tgpagppgfpgavgakgeagpqgprgsegpqgvrgepgppgpagaagpag 386 .................................................. +++ +++++ ++ ++ ++ + ++ ++ ++++++++++++++++++++++ tem38_gi|1 387 npgadgqpgakgangapgiagapgfpgargpsgpqgpggppgpkgnsgep 436 .................................................. + ++++++++ ++++++ + ++++++ +++++++ ++++++++ ++++++ tem38_gi|1 437 gapgskgdtgakgepgpvgvqgppgpageegkrgargepgptglpgppge 486 .................................................. ++++++++ ++ ++ +++++ ++++++++ ++++++++ +++++ + ++ tem38_gi|1 487 rggpgsrgfpgadgvagpkgpagergspgpagpkgspgeagrpgeaglpg 536 .................................................. ++ ++++++++++++++++++ +++++++++++++ +++ + + ++++ tem38_gi|1 537 akgltgspgspgpdgktgppgpagqdgrpgppgppgargqagvmgfpgpk 586 .................................................. + +++++ ++++ +++++ ++ +++++ + ++++++ ++ ++++++++ tem38_gi|1 587 gaagepgkagergvpgppgavgpagkdgeagaqgppgpagpagergeqgp 636 .................................................. ++++ ++ +++ +++++ +++++++ +++ + +++++ +++++ +++++ tem38_gi|1 637 agspgfqglpgpagppgeagkpgeqgvpgdlgapgpsgargergfpgerg 686 .................................................. ++++++ ++++ ++ +++++ +++ + ++ +++++ ++ ++ +++++ tem38_gi|1 687 vqgppgpagprgangapgndgakgdagapgapgsqgapglqgmpgergaa 736 .................................................. + +++++++++ ++++ ++++++++ ++ +++ +++++ + +++++++++ tem38_gi|1 737 glpgpkgdrgdagpkgadgspgkdgvrgltgpigppgpagapgdkgesgp 786 .................................................. +++ ++++ ++ ++++++++++++ + ++++ +++++ ++++++ + ++ tem38_gi|1 787 sgpagptgargapgdrgepgppgpagfagppgadgqpgakgepgdagakg 836 .................................................. + +++++ ++ +++++ ++ + ++ ++ +++ ++++ ++ ++ ++ +++ tem38_gi|1 837 dagppgpagpagppgpignvgapgakgargsagppgatgfpgaagrvgpp 886 .................................................. +++++ ++++++++ ++++++++++++++ +++++ ++++++++ +++++ tem38_gi|1 887 gpsgnagppgppgpagkeggkgprgetgpagrpgevgppgppgpagekgs 936 .................................................. ++ +++ + ++++++++ ++++ + ++++++++ ++ +++++++++++ tem38_gi|1 937 pgadgpagapgtpgpqgiagqrgvvglpgqrgergfpglpgpsgepgkqg 986 .................................................. +++ +++++++++ ++++ ++++++++++ + +++++++++++ ++++ tem38_gi|1 987 psgasgergppgpmgppglagppgesgregapaaegspgrdgspgakgdr 1036 .................................................. +++++ ++++ ++ ++ +++ ++ +++++++++++ ++ ++ ++ + +++ tem38_gi|1 1037 getgpagppgapgapgapgpvgpagksgdrgetgpagpagpvgpvgargp 1086 .................................................. +++++++++++++++++++ +++++ ++ +++++++++++++++++ ++ tem38_gi|1 1087 agpqgprgdkgetgeqgdrgikghrgfsglqgppgppgspgeqgpsgasg 1136 .................................................. + ++++++++ + +++++ ++ +++ +++++++++++ ++ +++++++++ tem38_gi|1 1137 pagprgppgsagapgkdglnglpgpigppgprgrtgdagpvgppgppgpp 1186 ..................................irk.rnkdrvrkildn +++++++ + + + +++++++ +++++ + ++ r +d + + tem38_gi|1 1187 gppgppsagfdfsflpqppqekahdggryyraddANVvRDRDLEVDTT-- 1234 ihdnpfswidnqnmlelgllnlTnmtrlgLpilsnldlnkLnlpnlknis lk++s tem38_gi|1 1235 ---------------------------------------------LKSLS 1239 npnstgekiivnfenlhpdFClTteEllnfflnsnvsienleakyCepks ++ +en +++ E+ +++a C + tem38_gi|1 1240 QQ----------IEN------IRSPEGS----------RKNPARTCRDL- 1262 rifflikktdngivyklCnfkslsssvnLdngCtiIfGdLvIgpgdEeyV k+C++ s G ++I+p+ tem38_gi|1 1263 ---------------KMCHSDWKS-------------GEYWIDPNQG--- 1281 skLknveviFGsLiIqNTnLtnidFLenLkyIasLedsvs<-* +L+ +v + n ++ ++ + sv+ tem38_gi|1 1282 CNLDAIKV-------F-CNMETGE-----TCVYPTQPSVA 1308 COLFI: domain 1 of 1, from 1245 to 1463: score 565.2, E = 2e-220 *->lksPeGksrknPARtCkDLfLchpefksGeYWiDPNqGCikDAikVf ++sPeG srknPARtC+DL++ch+++ksGeYWiDPNqGC++DAikVf tem38_gi|1 1245 IRSPEG-SRKNPARTCRDLKMCHSDWKSGEYWIDPNQGCNLDAIKVF 1290 CnkrfetGvgeTCisptpksvpkRiksWykgks.kdkKhvWFgetmegGf Cn +etG eTC++pt+ sv++ k+Wy +k++kdk+hvWFge+m++Gf tem38_gi|1 1291 CN--METG--ETCVYPTQPSVAQ--KNWYISKNpKDKRHVWFGESMTDGF 1334 kfsYiddelnpeisnvQlTFLRLLSteAsQNiTYhCKNSvAYmDeatGNl +f+Y++++++p+++++QlTFLRL+SteAsQNiTYhCKNSvAYmD++tGNl tem38_gi|1 1335 QFEYGGQGSDPADVAIQLTFLRLMSTEASQNITYHCKNSVAYMDQQTGNL 1384 kkAlilmgSnDvElsadgnskFtYtvlGeDGCssrtgewgKTViEyeTkK kkAl+l+gSn++E++a+gns+FtY+v+ +DGC+s+tg+wgKTViEy+T+K tem38_gi|1 1385 KKALLLKGSNEIEIRAEGNSRFTYSVT-VDGCTSHTGAWGKTVIEYKTTK 1433 ttRLPIvDiApsDiGgedQeFGveiGPVCF<-* +RLPI+D+Ap+D+G +dQeFG+++GPVCF tem38_gi|1 1434 SSRLPIIDVAPLDVGAPDQEFGFDVGPVCF 1463 // Start with PfamFrag (from /data/patterns/pfam) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/pfam/PfamFrag Sequence file: tem38 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Collagen Collagen triple helix repeat (20 copies) 946.7 5.9e-281 18 COLFI Fibrillar collagen C-terminal domain 565.2 2e-220 1 fibrinogen_C Fibrinogen beta and gamma chains, C-term -0.3 50 1 CBIA Cobyrinic acid a,c-diamide synthase -0.7 93 1 LBP_BPI_CETP LBP / BPI / CETP family -0.7 57 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- LBP_BPI_CETP 1/1 7 29 .. 1 23 [. -0.7 57 Collagen 1/18 109 158 .. 1 50 [. 27.3 5e-06 CBIA 1/1 174 189 .. 1 16 [. -0.7 93 Collagen 2/18 177 235 .. 1 60 [] 50.4 2.3e-12 Collagen 3/18 236 295 .. 1 60 [] 75.7 2.5e-19 Collagen 4/18 296 355 .. 1 60 [] 64.9 2.4e-16 Collagen 5/18 356 415 .. 1 60 [] 61.6 1.9e-15 Collagen 6/18 416 475 .. 1 60 [] 61.1 2.6e-15 Collagen 7/18 476 535 .. 1 60 [] 63.9 4.4e-16 Collagen 8/18 536 595 .. 1 60 [] 64.6 2.9e-16 Collagen 9/18 596 655 .. 1 60 [] 62.1 1.4e-15 Collagen 10/18 656 715 .. 1 60 [] 60.6 3.6e-15 Collagen 11/18 716 775 .. 1 60 [] 70.2 8.4e-18 Collagen 12/18 779 838 .. 1 60 [] 68.4 2.7e-17 Collagen 13/18 839 898 .. 1 60 [] 60.5 4e-15 Collagen 14/18 899 958 .. 1 60 [] 59.2 8.8e-15 Collagen 15/18 959 1018 .. 1 60 [] 62.7 9.9e-16 Collagen 16/18 1020 1078 .. 1 60 [] 54.4 1.8e-13 Collagen 17/18 1079 1138 .. 1 60 [] 73.9 8.1e-19 Collagen 18/18 1139 1192 .. 1 54 [. 40.6 1.2e-09 fibrinogen_C 1/1 1271 1295 .. 18 43 .. -0.3 50 COLFI 1/1 1245 1463 .. 1 226 [] 565.2 2e-220 Alignments of top-scoring domains: LBP_BPI_CETP: domain 1 of 1, from 7 to 29: score -0.7, E = 57 *->alllllvlislavalrtnPgivv<-* ++llll+++ ++++++ +g v+ tem38_gi|1 7 LRLLLLLAATALLTHGQEEGQVE 29 Collagen: domain 1 of 18, from 109 to 158: score 27.3, E = 5e-06 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G Gp G Gp+Gp+Gp+GppG +G pG pG pG+pGpPGppGppG tem38_gi|1 109 GVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGL 155 pGp<-* G+ tem38_gi|1 156 GGN 158 CBIA: domain 1 of 1, from 174 to 189: score -0.7, E = 93 *->almiaGtsSgaGKttl<-* ++ ++G++ +G+++l tem38_gi|1 174 GISVPGPMGPSGPRGL 189 Collagen: domain 2 of 18, from 177 to 235: score 50.4, E = 2.3e-12 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp pGp+Gp Gp+G pGppG+pGp+G++GppG pGepG+ Gp Gp Gp tem38_gi|1 177 -VPGPMGPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGP 222 pGppGapGapGpp<-* pGppG+ G+ G++ tem38_gi|1 223 PGPPGKNGDDGEA 235 Collagen: domain 3 of 18, from 236 to 295: score 75.7, E = 2.5e-19 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG+pG++GppGp G++G pG aG pG++G++G++G +G++G +Gp tem38_gi|1 236 GKPGRPGERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGP 282 pGppGapGapGpp<-* +Gp+G+pG+pG++ tem38_gi|1 283 AGPKGEPGSPGEN 295 Collagen: domain 4 of 18, from 296 to 355: score 64.9, E = 2.4e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG++Gp+G+pG++G+pG+pGpaGa+G+ G+ G++GpPGp Gp+Gp tem38_gi|1 296 GAPGQMGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGP 342 pGppGapGapGpp<-* pG pGa Ga+G++ tem38_gi|1 343 PGFPGAVGAKGEA 355 Collagen: domain 5 of 18, from 356 to 415: score 61.6, E = 1.9e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp Gp+G+ Gp+G +G+pGppGpaGa+Gp+G+pG++G+PG++G++G+ tem38_gi|1 356 GPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGANGA 402 pGppGapGapGpp<-* pG +GapG pG++ tem38_gi|1 403 PGIAGAPGFPGAR 415 Collagen: domain 6 of 18, from 416 to 475: score 61.1, E = 2.6e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp Gp+Gp GppGp+G++G+pG++G++G+ G++GepGp G +GppGp tem38_gi|1 416 GPSGPQGPGGPPGPKGNSGEPGAPGSKGDTGAKGEPGPVGVQGPPGP 462 pGppGapGapGpp<-* +G++G+ Ga G+p tem38_gi|1 463 AGEEGKRGARGEP 475 Collagen: domain 7 of 18, from 476 to 535: score 63.9, E = 4.4e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp G+pGppG++G pG +G+pG++G +Gp+Gp+Ge+G+PGp+Gp G+ tem38_gi|1 476 GPTGLPGPPGERGGPGSRGFPGADGVAGPKGPAGERGSPGPAGPKGS 522 pGppGapGapGpp<-* pG++G+pG++G p tem38_gi|1 523 PGEAGRPGEAGLP 535 Collagen: domain 8 of 18, from 536 to 595: score 64.6, E = 2.9e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G++G+ G+pG pGp+G+ GppGpaG G pGppG+pG+ G++G++G+ tem38_gi|1 536 GAKGLTGSPGSPGPDGKTGPPGPAGQDGRPGPPGPPGARGQAGVMGF 582 pGppGapGapGpp<-* pGp+Ga+G+pG++ tem38_gi|1 583 PGPKGAAGEPGKA 595 Collagen: domain 9 of 18, from 596 to 655: score 62.1, E = 1.4e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G++G pGppG+ Gp+G+ G++G++G+pGp+Gp+Ge+G++Gp+G pG+ tem38_gi|1 596 GERGVPGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGF 642 pGppGapGapGpp<-* +G pG++G+pG++ tem38_gi|1 643 QGLPGPAGPPGEA 655 Collagen: domain 10 of 18, from 656 to 715: score 60.6, E = 3.6e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG++G pG+ G+pGp+G+ G++G+pG++G +G+pGp Gp+G++G+ tem38_gi|1 656 GKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGANGA 702 pGppGapGapGpp<-* pG++Ga+G++G+p tem38_gi|1 703 PGNDGAKGDAGAP 715 Collagen: domain 11 of 18, from 716 to 775: score 70.2, E = 8.4e-18 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G+pG++G+pG++G+pG++G++G +G++G++G++G++G++G+pG++G tem38_gi|1 716 GAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGSPGKDGV 762 pGppGapGapGpp<-* +G +G++G+pGp+ tem38_gi|1 763 RGLTGPIGPPGPA 775 Collagen: domain 12 of 18, from 779 to 838: score 68.4, E = 2.7e-17 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G +G+ Gp Gp+Gp+G++G+pG++G+pGppGp+G++GpPG++G+pG+ tem38_gi|1 779 GDKGESGPSGPAGPTGARGAPGDRGEPGPPGPAGFAGPPGADGQPGA 825 pGppGapGapGpp<-* +G+pG +Ga+G + tem38_gi|1 826 KGEPGDAGAKGDA 838 Collagen: domain 13 of 18, from 839 to 898: score 60.5, E = 4e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp GppGp+Gp+GppGp G+ G+pG++Ga+G++GppG+ G+PG++G+ Gp tem38_gi|1 839 GPPGPAGPAGPPGPIGNVGAPGAKGARGSAGPPGATGFPGAAGRVGP 885 pGppGapGapGpp<-* pGp G++G+pGpp tem38_gi|1 886 PGPSGNAGPPGPP 898 Collagen: domain 14 of 18, from 899 to 958: score 59.2, E = 8.8e-15 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp+G+ G++Gp+G++Gp+G pG+ G+pGppGp+Ge+G+PG++Gp+G+ tem38_gi|1 899 GPAGKEGGKGPRGETGPAGRPGEVGPPGPPGPAGEKGSPGADGPAGA 945 pGppGapGapGpp<-* pG pG+ G +G++ tem38_gi|1 946 PGTPGPQGIAGQR 958 Collagen: domain 15 of 18, from 959 to 1018: score 62.7, E = 9.9e-16 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp G G+pG +G++G pG pGp G++G++Gp G++Ge+GpPGp GppG tem38_gi|1 959 GVVGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGL 1005 pGppGapGapGpp<-* +GppG++G +G+p tem38_gi|1 1006 AGPPGESGREGAP 1018 Collagen: domain 16 of 18, from 1020 to 1078: score 54.4, E = 1.8e-13 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp + G+pG+ G pG++G++G++GpaG+pG pG+pG+pGp Gp+G+ G tem38_gi|1 1020 -AEGSPGRDGSPGAKGDRGETGPAGPPGAPGAPGAPGPVGPAGKSGD 1065 pGppGapGapGpp<-* +G++G++G++Gp+ tem38_gi|1 1066 RGETGPAGPAGPV 1078 Collagen: domain 17 of 18, from 1079 to 1138: score 73.9, E = 8.1e-19 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp G +Gp+Gp+Gp+G++G++G++G +G +G++G++G +GppGppG+ tem38_gi|1 1079 GPVGARGPAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQGPPGPPGS 1125 pGppGapGapGpp<-* pG++G++Ga Gp+ tem38_gi|1 1126 PGEQGPSGASGPA 1138 Collagen: domain 18 of 18, from 1139 to 1192: score 40.6, E = 1.2e-09 *->GppGppGppGppGppGppGppGpaGapGppGppGepGpPGppGppGp Gp+GppG++G+pG +G G pGp G+pGp+G G++Gp GppGppGp tem38_gi|1 1139 GPRGPPGSAGAPGKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPGP 1185 pGppGap<-* pGppG+p tem38_gi|1 1186 PGPPGPP 1192 fibrinogen_C: domain 1 of 1, from 1271 to 1295: score -0.3, E = 50 *->SPPGlYtIqPd.gakeqpllVYCDmet<-* S G Y I P++g + +++V+C met tem38_gi|1 1271 S--GEYWIDPNqGCNLDAIKVFCNMET 1295 COLFI: domain 1 of 1, from 1245 to 1463: score 565.2, E = 2e-220 *->lksPeGksrknPARtCkDLfLchpefksGeYWiDPNqGCikDAikVf ++sPeG srknPARtC+DL++ch+++ksGeYWiDPNqGC++DAikVf tem38_gi|1 1245 IRSPEG-SRKNPARTCRDLKMCHSDWKSGEYWIDPNQGCNLDAIKVF 1290 CnkrfetGvgeTCisptpksvpkRiksWykgks.kdkKhvWFgetmegGf Cn +etG eTC++pt+ sv++ k+Wy +k++kdk+hvWFge+m++Gf tem38_gi|1 1291 CN--METG--ETCVYPTQPSVAQ--KNWYISKNpKDKRHVWFGESMTDGF 1334 kfsYiddelnpeisnvQlTFLRLLSteAsQNiTYhCKNSvAYmDeatGNl +f+Y++++++p+++++QlTFLRL+SteAsQNiTYhCKNSvAYmD++tGNl tem38_gi|1 1335 QFEYGGQGSDPADVAIQLTFLRLMSTEASQNITYHCKNSVAYMDQQTGNL 1384 kkAlilmgSnDvElsadgnskFtYtvlGeDGCssrtgewgKTViEyeTkK kkAl+l+gSn++E++a+gns+FtY+v+ +DGC+s+tg+wgKTViEy+T+K tem38_gi|1 1385 KKALLLKGSNEIEIRAEGNSRFTYSVT-VDGCTSHTGAWGKTVIEYKTTK 1433 ttRLPIvDiApsDiGgedQeFGveiGPVCF<-* +RLPI+D+Ap+D+G +dQeFG+++GPVCF tem38_gi|1 1434 SSRLPIIDVAPLDVGAPDQEFGFDVGPVCF 1463 // Start with Repeat Library (from /data/patterns/repeats-Miguel-Andrade/hmm) hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/repeats-Miguel-Andrade/hmm/repeats.hmm-lib Sequence file: tem38 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Prosite --------------------------------------------------------- | ppsearch (c) 1994 EMBL Data Library | | based on MacPattern (c) 1990-1994 R. Fuchs | --------------------------------------------------------- PROSITE pattern search started: Tue Oct 31 18:41:24 2000 Sequence file: tem38 ---------------------------------------- Sequence tem38_gi|1418928|emb|CAA98968.1| (1464 residues): Matching pattern PS00001 ASN_GLYCOSYLATION: 1365: NITY Total matches: 1 Matching pattern PS00005 PKC_PHOSPHO_SITE: 1012: SGR 1234: TLK 1251: SRK 1258: TCR 1431: TTK 1434: SSR Total matches: 6 Matching pattern PS00006 CK2_PHOSPHO_SITE: 3: SFVD 101: SPTD 103: TDQE 108: TGVE 271: SGLD 291: SPGE 441: SKGD 522: SPGE 1012: SGRE 1125: SPGE 1258: TCRD 1329: SMTD 1425: TVIE Total matches: 13 Matching pattern PS00007 TYR_PHOSPHO_SITE: 1208: KAHDGGRY Total matches: 1 Matching pattern PS00008 MYRISTYL: 22: GQEEGQ 26: GQVEGQ 154: GLGGNF 254: GLPGTA 272: GLDGAK 320: GARGND 323: GNDGAT 326: GATGAA 347: GAVGAK 386: GNPGAD 392: GQPGAK 395: GAKGAN 437: GAPGSK 488: GGPGSR 533: GLPGAK 701: GAPGND 704: GNDGAK 716: GAPGSQ 821: GQPGAK 857: GAPGAK 860: GAKGAR 863: GARGSA 935: GSPGAD 1016: GAPAAE 1028: GSPGAK 1339: GGQGSD 1342: GSDPAD Total matches: 27 Matching pattern PS00009 AMIDATION: 466: EGKR Total matches: 1 Matching pattern PS00016 RGD: 745: RGD 1093: RGD Total matches: 2 Matching pattern PS01208 VWFC: 58: CRICVCDNGKVLCDDVICDETKNCPGAEVPEGECCPVC Total matches: 1 Total no of hits in this sequence: 52 ======================================== 1314 pattern(s) searched in 1 sequence(s), 1464 residues. Total no of hits in all sequences: 52. Search time: 00:00 min ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with Profile Search ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ Start with motif search against own library ***** bioMotif : Version V41a DB, 1999 Nov 11 ***** argv[1]=P argv[2]=-m /data/patterns/own/motif.fa argv[4]=-seq tem38 ***** bioMotif : Version V41a DB, 1999 Nov 11 ***** SeqTyp=2 : PROTEIN search; >APC D-Box is the MOTIF name >STATISTICS Total : 0 solutions in 0 sequences, 0 units; out of 1 sequences, 1464 units ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ ~~~ Start with HMM-search search against own library hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/own/own-hmm.lib Sequence file: tem38 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // hmmpfam - search a single seq against HMM database HMMER 2.1.1 (Dec 1998) Copyright (C) 1992-1998 Washington University School of Medicine HMMER is freely distributed under the GNU General Public License (GPL). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /data/patterns/own/own-hmm-f.lib Sequence file: tem38 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- [no hits above thresholds] Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- [no hits above thresholds] Alignments of top-scoring domains: [no hits above thresholds] // ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ L. Aravind's signalling DB IMPALA version 1.1 [20-December-1999] Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), "IMPALA: Matching a Protein Sequence Against a Collection of "PSI-BLAST-Constructed Position-Specific Score Matrices", Bioinformatics 15:1000-1011. Query= tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] (1464 letters) Searching..................................done Results from profile search Score E Sequences producing significant alignments: (bits) Value 14-3-3 14-3-3 protein alpha Helical domain 27 0.13 FYVE Zinc Finger domain involved in PtdIns(3)P binding 27 0.18 UBA Ubiquitin pathway associated domain 27 0.23 MATH The Meprin associated TRAF homology domain 26 0.50 RASGAP RAS-type GTPase GTP hydrolysis activating protein 25 0.61 MIZFIN MIZ type Cysteine zinc DNA binding domain 25 0.82 RASGEF RAS-type GTPase GDP exchange factor 24 1.2 SET Su(var)3-9, Enhancer of Zeste, trithorax domain (A chrom... 23 2.1 BRIGHT BRIGHT domain (Alpha helical DNA binding domain) 23 2.4 DHHC Novel zinc finger domain with DHHC signature 22 4.3 PHD PHD zinc finger(A cysteine rich DNA binding domain) 22 4.9 INSL Insulinase like Metallo protease domain 21 8.9 >14-3-3 14-3-3 protein alpha Helical domain Length = 270 Score = 27.3 bits (60), Expect = 0.13 Identities = 5/27 (18%), Positives = 5/27 (18%) Query: 820 DGQPGAKGEPGDAGAKGDAGPPGPAGP 846 G P G A P Sbjct: 240 SAAAAGGNTEGAQENAPSNAPEGEAEP 266 >FYVE Zinc Finger domain involved in PtdIns(3)P binding Length = 99 Score = 27.0 bits (59), Expect = 0.18 Identities = 14/41 (34%), Positives = 19/41 (46%), Gaps = 10/41 (24%) Query: 59 RICVCDN-GKVLCDDVICDETKNCPGAEVPE---GECCPVC 95 R+ D GK++C D+ NC E PE +CC C Sbjct: 2 RLFSADEHGKLMCWDM------NCKRVETPEWKTSDCCQKC 36 >UBA Ubiquitin pathway associated domain Length = 255 Score = 26.6 bits (58), Expect = 0.23 Identities = 25/82 (30%), Positives = 30/82 (36%), Gaps = 5/82 (6%) Query: 813 FAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPPGPIGNVGAPGAKGARGSAGPPG 872 G P QP EP A P A A P ++ A A+G S+G G Sbjct: 57 LMGIPENLRQP----EPQQQTAAAAEQPSTAATTAEQPAED-DLFAQAAQGGNASSGALG 111 Query: 873 ATGFPGAAGRVGPPGPSGNAGP 894 TG A + GPPG G Sbjct: 112 TTGGATDAAQGGPPGSIGLTVE 133 Score = 22.7 bits (48), Expect = 3.5 Identities = 22/85 (25%), Positives = 31/85 (35%), Gaps = 8/85 (9%) Query: 972 FPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPAAEGSPGRDGSPG 1031 G+P +P Q + A+ E+ P A E E A+ + G + S G Sbjct: 57 LMGIPENLRQPEPQQQTAAAAEQ--------PSTAATTAEQPAEDDLFAQAAQGGNASSG 108 Query: 1032 AKGDRGETGPAGPPGAPGAPGAPGP 1056 A G G A G PG+ G Sbjct: 109 ALGTTGGATDAAQGGPPGSIGLTVE 133 Score = 22.3 bits (47), Expect = 4.2 Identities = 22/74 (29%), Positives = 26/74 (34%), Gaps = 3/74 (4%) Query: 1116 LQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGAPGKDGL--NGLPGPIGPPGPRGRTGD 1173 L G P P Q + A+ P +A P +D L G G G TG Sbjct: 57 LMGIPENLRQPEPQQQTAAAAEQ-PSTAATTAEQPAEDDLFAQAAQGGNASSGALGTTGG 115 Query: 1174 AGPVGPPGPPGPPG 1187 A GPPG G Sbjct: 116 ATDAAQGGPPGSIG 129 >MATH The Meprin associated TRAF homology domain Length = 209 Score = 25.6 bits (56), Expect = 0.50 Identities = 7/18 (38%), Positives = 8/18 (43%) Query: 925 PGPPGPAGEKGSPGADGP 942 P PP PA P A+ Sbjct: 5 PSPPPPAEMSSGPVAESW 22 Score = 21.8 bits (46), Expect = 6.8 Identities = 7/16 (43%), Positives = 9/16 (55%) Query: 805 PGPPGPAGFAGPPGAD 820 P PP PA + P A+ Sbjct: 5 PSPPPPAEMSSGPVAE 20 Score = 21.4 bits (45), Expect = 9.7 Identities = 5/14 (35%), Positives = 6/14 (42%) Query: 177 VPGPMGPSGPRGLP 190 VP P P+ P Sbjct: 4 VPSPPPPAEMSSGP 17 >RASGAP RAS-type GTPase GTP hydrolysis activating protein Length = 292 Score = 25.1 bits (54), Expect = 0.61 Identities = 16/61 (26%), Positives = 29/61 (47%), Gaps = 11/61 (18%) Query: 1220 DANVVRDRDLEVDTTLKSLSQQIENI-----RSPEGSRKNPARTCRDLKMCHSDWKSGEY 1274 D + ++DR VDT L +L +E + +S + K + DL+ C +GE+ Sbjct: 137 DPSKIKDRS-AVDTNLHNLQDYVERVFEAITKSADRCPKVLCQIFHDLREC-----AGEH 190 Query: 1275 W 1275 + Sbjct: 191 F 191 >MIZFIN MIZ type Cysteine zinc DNA binding domain Length = 172 Score = 24.6 bits (53), Expect = 0.82 Identities = 18/90 (20%), Positives = 30/90 (33%), Gaps = 17/90 (18%) Query: 58 CRICVCDNGKVLCDDVICD--------ETKNCPGAEV-PEGECCPVCP--DGSESPTDQE 106 C +C + K + +I D + + + +G CP+ P + + T Q Sbjct: 50 CPVC---DKKAAYESLILDGLFMEILNDCSDVDEIKFQEDGSWCPMRPKKEAMKV-TSQP 105 Query: 107 TTGVEGPKGDTGP--RGPRGPAGPPGRDGI 134 T VE + P A D I Sbjct: 106 CTKVESSSVFSKPCSVTVASDASKKKIDVI 135 >RASGEF RAS-type GTPase GDP exchange factor Length = 196 Score = 24.4 bits (53), Expect = 1.2 Identities = 20/105 (19%), Positives = 31/105 (29%), Gaps = 19/105 (18%) Query: 1344 DPADVAIQLTFLRLMSTEASQNITY-HCKNSVAYMDQQTGNLKKALLLKGSNEIEIRAEG 1402 D VA Q+T L+ E I + + M + + L L NE G Sbjct: 5 DSLSVAQQMT---LIEKEILGEIDWKDLLDLK--MKHEGPQVISWLQLLVRNE---TLSG 56 Query: 1403 NSRFTYSVTVDGCTSHTGAWGKTVIEYKTTKSSRLPI----IDVA 1443 + T W + I + + + I VA Sbjct: 57 IDLAISR------FNLTVDWIISEILLTKSSKMKRNVIQRFIHVA 95 >SET Su(var)3-9, Enhancer of Zeste, trithorax domain (A chromatin associated domain) Length = 219 Score = 23.4 bits (50), Expect = 2.1 Identities = 9/60 (15%), Positives = 16/60 (26%), Gaps = 9/60 (15%) Query: 30 GQDEDIP-PITCVQNGLRYHDRD-----VWKPEPCRICVCDNGKVLCDDVICDETKNCPG 83 G D IP P+ V+ L++ + +C + G Sbjct: 17 GIDSAIPYPVRRVEQLLQFSFLPELQFQNAAVKQRIQRLCYREEKRLA---VSSLAKWLG 73 >BRIGHT BRIGHT domain (Alpha helical DNA binding domain) Length = 172 Score = 23.4 bits (50), Expect = 2.4 Identities = 7/29 (24%), Positives = 7/29 (24%) Query: 413 GARGPSGPQGPGGPPGPKGNSGEPGAPGS 441 G R G P P PG Sbjct: 132 GRRSSYGQYEAMHNQMPMTPISRPSLPGG 160 Score = 22.3 bits (47), Expect = 4.4 Identities = 6/28 (21%), Positives = 7/28 (24%) Query: 881 GRVGPPGPSGNAGPPGPPGPAGKEGGKG 908 GR G P P + G Sbjct: 132 GRRSSYGQYEAMHNQMPMTPISRPSLPG 159 >DHHC Novel zinc finger domain with DHHC signature Length = 217 Score = 22.4 bits (47), Expect = 4.3 Identities = 9/34 (26%), Positives = 11/34 (31%), Gaps = 2/34 (5%) Query: 52 VWKPEPCRIC-VCDNGKVLCDDVICDETKNCPGA 84 V + C C+ V D C NC G Sbjct: 141 VDVSARSKHCSACNK-CVCGFDHHCKWLNNCVGE 173 >PHD PHD zinc finger(A cysteine rich DNA binding domain) Length = 54 Score = 22.3 bits (47), Expect = 4.9 Identities = 12/53 (22%), Positives = 16/53 (29%), Gaps = 17/53 (32%) Query: 58 CRICVCDNGK-----VLCDDVICDET--KNCPG-------AEVPEGE-CCPVC 95 C +C V CD C+ + C + P GE C C Sbjct: 3 CSVCQRLQSPPKNRIVFCDG--CNTPFHQLCHEPYISDELLDSPNGEWFCDDC 53 >INSL Insulinase like Metallo protease domain Length = 433 Score = 21.4 bits (45), Expect = 8.9 Identities = 5/47 (10%), Positives = 13/47 (27%), Gaps = 1/47 (2%) Query: 1214 RYYRADDANVVRDRDLEVDTTLKSLSQQIENIR-SPEGSRKNPARTC 1259 +Y+ + VV + + + + P + P Sbjct: 196 SFYQPRNMAVVIVGKVNPKEVEEEVMKTFGKEEGRPVPKVQIPTEPE 242 Underlying Matrix: BLOSUM62 Number of sequences tested against query: 105 Number of sequences better than 10.0: 12 Number of calls to ALIGN: 17 Length of query: 1464 Total length of test sequences: 20182 Effective length of test sequences: 16637.0 Effective search space size: 23806017.2 Initial X dropoff for ALIGN: 25.0 bits Y. Wolf's SCOP PSSM IMPALA version 1.1 [20-December-1999] Reference: Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul (1999), "IMPALA: Matching a Protein Sequence Against a Collection of "PSI-BLAST-Constructed Position-Specific Score Matrices", Bioinformatics 15:1000-1011. Query= tem38_gi|1418928|emb|CAA98968.1| prepro-alpha1(I) collagen [Homo sapiens] (1464 letters) Searching.................................................done Results from profile search Score E Sequences producing significant alignments: (bits) Value gi|230410 [1..153] beta-Trefoil 32 0.069 gi|155099 [19..420] S-adenosyl-L-methionine-dependent methyl... 28 1.1 gi|1170529 [121..268] beta-Trefoil 28 1.4 gi|544107 [14..282] Protein kinases (PK), catalytic core 27 1.5 gi|1825699 [8..257] Ribonuclease H-like motif 26 4.1 gi|223347 [1..236] Prealbumin-like 26 5.4 gi|442904 [1..106] Ferredoxin-like 25 9.0 >gi|230410 [1..153] beta-Trefoil Length = 153 Score = 31.9 bits (72), Expect = 0.069 Identities = 11/38 (28%), Positives = 18/38 (46%), Gaps = 6/38 (15%) Query: 1306 SVAQKNWYISKNPKDKRHVWFG-----ESMTDGFQFEY 1338 S NWYIS + + V+ G + +TD F ++ Sbjct: 114 SAQFPNWYISTSQAENMPVFLGGTKGGQDITD-FTMQF 150 >gi|155099 [19..420] S-adenosyl-L-methionine-dependent methyltransferases Length = 402 Score = 27.9 bits (61), Expect = 1.1 Identities = 3/64 (4%), Positives = 12/64 (18%), Gaps = 2/64 (3%) Query: 15 ATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEP--CRICVCDNGKVLCDD 72 A + G + D + + + + + + + Sbjct: 34 LRAFREAHGTGYRFVGVEIDPHALDLPPWAEGVVADFLLWEPGEAFDLILGNPPYGIVGE 93 Query: 73 VICD 76 Sbjct: 94 ASKY 97 >gi|1170529 [121..268] beta-Trefoil Length = 148 Score = 27.6 bits (61), Expect = 1.4 Identities = 13/62 (20%), Positives = 20/62 (31%), Gaps = 9/62 (14%) Query: 1277 DPNQGCNLDAIK--VFCNMETGETCVYPTQPSVAQKNWYISKNPKDKRHVWFG-ESMTDG 1333 P + + T SVA N +I+ + + G S+TD Sbjct: 90 IPKTTTGGETNSLSSWETRGTK-----NYFISVAHPNLFIATKHDNWVCLAKGLPSITD- 143 Query: 1334 FQ 1335 FQ Sbjct: 144 FQ 145 >gi|544107 [14..282] Protein kinases (PK), catalytic core Length = 269 Score = 27.4 bits (59), Expect = 1.5 Identities = 5/61 (8%), Positives = 9/61 (14%), Gaps = 6/61 (9%) Query: 1259 CRDLKMCHS------DWKSGEYWIDPNQGCNLDAIKVFCNMETGETCVYPTQPSVAQKNW 1312 L+ D +D G + + Sbjct: 101 SSALEYLEKHGILHRDIHPNNILLDSMNGPAYLSDFSIAWSKQHPGEEVQELIPQIGTGH 160 Query: 1313 Y 1313 Y Sbjct: 161 Y 161 >gi|1825699 [8..257] Ribonuclease H-like motif Length = 250 Score = 26.3 bits (57), Expect = 4.1 Identities = 6/47 (12%), Positives = 10/47 (20%), Gaps = 4/47 (8%) Query: 1376 YMDQQTGNLKKALLLKGSNEIEIRAEGNSRFTYSVTVDGCTSHTGAW 1422 Q +KK + G S+ + T Sbjct: 52 LFLQFLRVIKK--AYETLPPNAHVDVGLCTQRNSIVLWN--KRTLKE 94 >gi|223347 [1..236] Prealbumin-like Length = 236 Score = 25.7 bits (56), Expect = 5.4 Identities = 6/38 (15%), Positives = 8/38 (20%) Query: 357 PQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQP 394 PQ + GP G G+ Sbjct: 40 PQSISETTGPNFSHLGFGAHDHDLLLNFNNGGLPIGER 77 >gi|442904 [1..106] Ferredoxin-like Length = 106 Score = 24.9 bits (53), Expect = 9.0 Identities = 10/67 (14%), Positives = 16/67 (22%), Gaps = 11/67 (16%) Query: 39 TCVQNGLRYHDRDVWKPEP----CRICV--CDNG-KVLCDDVICDETKNCPGAEVPEGEC 91 C + + C +C C D+V D + E Sbjct: 19 VCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLN----AEL 74 Query: 92 CPVCPDG 98 V P+ Sbjct: 75 AEVWPNI 81 Underlying Matrix: BLOSUM62 Number of sequences tested against query: 1187 Number of sequences better than 10.0: 7 Number of calls to ALIGN: 7 Length of query: 1464 Total length of test sequences: 256703 Effective length of test sequences: 210706.0 Effective search space size: 300338576.9 Initial X dropoff for ALIGN: 25.0 bits