How to use the DAS-TMfilter prediction tool


Read this page when you are using the DAS-TMfilter prediction server for the first time or when the server have been updated since your last visit. (See the time-stamp on the main page.) Read about the theory behind the server here.

Sections on this page:

Input format

The DAS-TMfilter prediction tool takes fasta-format protein queries as input. Each entry starts with one header line with the ">" character at the beginning of the line followed by an arbitrary comment until the end of the header line. This text is for the identification of your query. The header is followed by the protein sequence itself using single letter all upper case code for the residues. You can break down your sequence into as many lines as you like. The sequence should not contain any other character but the residue codes. (No numbering, no spaces etc..) The end of an entry is marked by the header of the next one or by the last line.

This is a sample query with five entries (the output of this query used for demonstration of various features of the server later in this documentation):

> My first protein ID: prot0001
MSSNAQVKTPLPPAPAPKKESNFLIDFLMGGVSAAVAKTAASPIERVKLLIQNQDEMLKQ
GTLDRKYAGILDCFKRTATQEGVISFWRGNTANVIRYFPTQALNFAFKDKIKAMFGFKKE
EGYAKWFAGNLASGGAAGALSLLFVYSLDYARTRLAADSKSSKKGGARQFNGLIDVYKKT
LKSDGVAGLYRGFLPSVVGIVVYRGLYFGMYDSLKPLLLTGSLEGSFLASFLLGWVVTTG
ASTCSYPLDTVRRRMMMTSGQAVKYDGAFDCLRKIVAAEGVGSLFKGCGANILRGVAGAG
VISMYDQLQMILFGKKFK
> Here goes the second one ID: prot0002
MSTSKSENYLSELRKIIWPIEQYENKKFLPLAFMMFCILLNYSTLRSIKDGFVVTDIGTE
SISFLKTYIVLPSAVIAMIIYVKLCDILKQENVFYVITSFFLGYFALFAFVLYPYPDLVH
PDHKTIESLSLAYPNFKWFIKIVGKWSFASFYTIAELWGTMMLSLLFWQFANQITKIAEA
KRFYSMFGLLANLALPVTSVVIGYFLHEKTQIVAEHLKFVPLFVIMITSSFLIILTYRWM
NKNVLTDPRLYDPALVKEKKTKAKLSFIESLKMIFTSKYVGYIALLIIAYGVSVNLVEGV
WKSKVKELYPTKEAYTIYMGQFQFYQGWVAIAFMLIGSNILRKVSWLTAAMITPLMMFIT
GAAFFSFIFFDSVIAMNLTGILASSPLTLAVMIGMIQNVLSKGVKYSLFDATKNMAYIPL
DKDLRVKGQAAVEVIGGRLGKSGGAIIQSTFFILFPVFGFIEATPYFASIFFIIVILWIF
AVKGLNKEYQVLVNKNEK
> Non-TM protein with a fasle positive peak ID: glob0001
MKDNTVPLKLIALLANGEFHSGEQLGETLGMSRAAINKHIQTLRDWGVDVFTVPGKGYSL
PEPIQLLNAKQILGQLDGGSVAVLPVIDSTNQYLLDRIGELKSGDACIAEYQQAGSPFGA
NLYLSMFWRLEQPAAAIGLSLVIGIVMAEVLRKLGADKVRVKWPNDLYLQDRKLAGILVE
LTGAAQIVIGAGINMAMWITLQEAGINLDRNTLAAMLIRELRAALELFEQEGLAPYLSRW
EKLDNFINRPVKLIIGDKEIFGISRGIDKQGALLLEQDGIIKPWMGGEISLR
> One more globular ID: glob0002
SVGTSCIPGMAIPHNPLDSCRWYVSTRTCGVGPRLATQEMKARCCRQLEAIPAYCRCEAV
RILMDGVVTSSGQHEGRLLQDLPGCPRQVQRAFAPKLVTEVECNLATIHGGPFCLSLLGA
GE
> The last one, globular ID: glob0003
MLDKIVIANRGEIALRILRACKELGIKTVAVHSSADRDLKHVLLADETVCIGPAPSVKSY
LNIPAIISAAEITGAVAIHPGYGFLSENANFAEQVERSGFIFIGPKAETIRLMGDKVSAI
AAMKKAGVPCVPGSDGDDMDKNRAIAKRIGYPVIIKRVVRGDAELAQSISMTRAYMEKYL
ENPRHVEIQVLADGQGNAIYLAERDCSMQRRHQKVVEEAPAPGITPELRRYIGERCAKAC
VDIGYRGAGTFEFLFENGEFYFIEMNTRIQVEHPVTEMITGVDLIKEQLRIAAGQPLSIK
QEEVHVRGHAVECRINAEDPNTFLPSPGKITRFHAPGGFGVRWESHIYAGYTVPPYYDSM
IGKLICYGENRDVAIARMKNALQELIIDGIKTNVDLQIRIMNDENFQHGGTNIHYLEKKL
GLQE
Back to the top

Output switches

The operation of the server is controlled by three sets of alternative switches. The first one controls the format: you can have "short" format output or a more detailed "long" one. The others effect the evaluation of the query.

The list of predicted non-TM-protein codes echoed followed by the list of TM-protein codes (if any). The "=== Result of the prediction ===" line marks the start of the list of results. For each entry the header line appears first with the user defined comment in it. It is followed by the number of detected TM-helix segments and the Q-score of the entry. Then the peaks of the DAS-curve above the empirical cutoff limit are listed - one peak per line. The "@" followed by the position of the peak in the sequence, the value of the DAS-curve at this point, the core segment (i.e. the portion of the curve above the cutoff) and the "E-value" of the peak. (The E-value is the propability that the peak is a false positive hit.) The lines may contain warnings too. The entries in the query are separated by "end of list" lines.

The short output for the sample query above looks like this:

Calculating prediction for the following proteins:

> My first protein ID%3A prot0001
> Here goes the second one ID%3A prot0002
> Non-TM protein with a fasle positive peak ID%3A glob0001
> One more globular ID%3A glob0002
> The last one%2C globular ID%3A glob0003

... Done.

*** List of predicted non-TM-protein codes ***

> Non-TM protein with a fasle positive peak ID%3A glob0001
> One more globular ID%3A glob0002
> The last one%2C globular ID%3A glob0003

*** List of predicted TM-protein codes ***

> My first protein ID%3A prot0001
> Here goes the second one ID%3A prot0002

=== Result of the prediction ===

> My first protein ID%3A prot0001
# TMH:  3 Q: trusted
@  142   3.014 core:  138 ..  146 1.097e-02
@  200   3.148 core:  194 ..  205 6.850e-03
@  232   3.317 core:  225 ..  238 3.773e-03

<-------- end of list -------->


> Here goes the second one ID%3A prot0002
# TMH: 12 Q: trusted
@   37   3.932 core:   31 ..   41 6.741e-04
@   76   4.348 core:   67 ..   83 1.551e-04
@  104   5.351 core:   95 ..  114 4.504e-06
@  163   2.886 core:  158 ..  166 2.705e-02
@  195   3.842 core:  188 ..  203 9.274e-04
@  227   5.103 core:  219 ..  236 1.082e-05
@  286   4.880 core:  279 ..  294 2.374e-05
@  334   3.146 core:  330 ..  337 1.080e-02
@  364   4.052 core:  349 ..  394 4.405e-04 Twin peaks - two TMH with a short linker
@  389   3.344 core:  349 ..  394 5.376e-03
@  454   4.140 core:  449 ..  482 3.234e-04 Twin peaks - two TMH with a short linker
@  474   6.781 core:  449 ..  482 2.893e-08

<-------- end of list -------->


> Non-TM protein with a fasle positive peak ID%3A glob0001
# TMH:  1 Q:  0.53 !!! Warning! Non-TM protein!
@  143   3.014 core:  139 ..  147 1.009e-02

<-------- end of list -------->


> One more globular ID%3A glob0002
# TMH:  0 Q: trusted !!! Warning! Non-TM protein!

<-------- end of list -------->


> The last one%2C globular ID%3A glob0003
# TMH:  0 Q: trusted !!! Warning! Non-TM protein!

<-------- end of list -------->


Note: the non-alphanumeric characters in the comment sections of the headers are echoed in their HTML-coded form ( for example the ":" is converted to "%3A" in this case)

The long output format contains all the information of the short format plus the ASCII-graphic representation of the DAS-curve. Each line contains one residue number, the value of the DAS-curve at that position and a marker string of proportional length with respect to the value of the curve. The empirical cutoff value - 2.5 - is marked by the "|" character. The values higher than 5.0 are marked as "....5+".

Here is the first lines of the second entry of the sample query in long format:


.... snip ....

> Here goes the second one ID%3A prot0002
# TMH: 12 Q: trusted
@   37   3.932 core:   31 ..   41 6.741e-04
@   76   4.348 core:   67 ..   83 1.551e-04
@  104   5.351 core:   95 ..  114 4.504e-06
@  163   2.886 core:  158 ..  166 2.705e-02
@  195   3.842 core:  188 ..  203 9.274e-04
@  227   5.103 core:  219 ..  236 1.082e-05
@  286   4.880 core:  279 ..  294 2.374e-05
@  334   3.146 core:  330 ..  337 1.080e-02
@  364   4.052 core:  349 ..  394 4.405e-04 Twin peaks - two TMH with a short linker
@  389   3.344 core:  349 ..  394 5.376e-03
@  454   4.140 core:  449 ..  482 3.234e-04 Twin peaks - two TMH with a short linker
@  474   6.781 core:  449 ..  482 2.893e-08
    1   0.039 
    2   0.071 .
    3   0.100 .
    4   0.132 .
    5   0.173 ..
    6   0.227 ..
    7   0.279 ...
    8   0.366 ....
    9   0.464 ....:
   10   0.542 ....:
   11   0.591 ....:.
   12   0.636 ....:.
   13   0.697 ....:..
   14   0.685 ....:..
   15   0.690 ....:..
   16   0.723 ....:..
   17   0.727 ....:..
   18   0.686 ....:..
   19   0.636 ....:.
   20   0.604 ....:.
   21   0.525 ....:
   22   0.480 ....:
   23   0.483 ....:
   24   0.505 ....:
   25   0.586 ....:.
   26   0.756 ....:...
   27   1.088 ....:....1.
   28   1.614 ....:....1....:.
   29   2.033 ....:....1....:....2
   30   2.381 ....:....1....:....2....
   31   2.789 ....:....1....:....2....|...
   32   3.091 ....:....1....:....2....|....3.
   33   3.434 ....:....1....:....2....|....3....
   34   3.658 ....:....1....:....2....|....3....:..
   35   3.819 ....:....1....:....2....|....3....:...
   36   3.953 ....:....1....:....2....|....3....:....4
   37   3.955 ....:....1....:....2....|....3....:....4
   38   3.888 ....:....1....:....2....|....3....:....
   39   3.669 ....:....1....:....2....|....3....:..
   40   3.291 ....:....1....:....2....|....3...
   41   2.754 ....:....1....:....2....|...
   42   2.420 ....:....1....:....2....
   43   2.128 ....:....1....:....2.
   44   1.856 ....:....1....:....
   45   1.679 ....:....1....:..
   46   1.420 ....:....1....
   47   1.304 ....:....1...
   48   1.222 ....:....1..
   49   1.074 ....:....1.
   50   1.055 ....:....1.
   51   1.107 ....:....1.
   52   1.213 ....:....1..
   53   1.314 ....:....1...
   54   1.344 ....:....1...
   55   1.300 ....:....1...
   56   1.262 ....:....1...
   57   1.304 ....:....1...
   58   1.291 ....:....1...
   59   1.350 ....:....1....
   60   1.394 ....:....1....
   61   1.536 ....:....1....:
   62   1.738 ....:....1....:..
   63   1.877 ....:....1....:....
   64   2.083 ....:....1....:....2.
   65   2.234 ....:....1....:....2..
   66   2.320 ....:....1....:....2...
   67   2.596 ....:....1....:....2....|.
   68   2.975 ....:....1....:....2....|....3
   69   3.350 ....:....1....:....2....|....3....
   70   3.644 ....:....1....:....2....|....3....:.
   71   3.795 ....:....1....:....2....|....3....:...
   72   3.851 ....:....1....:....2....|....3....:....
   73   3.997 ....:....1....:....2....|....3....:....4
   74   4.116 ....:....1....:....2....|....3....:....4.
   75   4.265 ....:....1....:....2....|....3....:....4...
   76   4.377 ....:....1....:....2....|....3....:....4....
   77   4.296 ....:....1....:....2....|....3....:....4...
   78   4.206 ....:....1....:....2....|....3....:....4..
   79   4.085 ....:....1....:....2....|....3....:....4.
   80   3.789 ....:....1....:....2....|....3....:...
   81   3.393 ....:....1....:....2....|....3....
   82   3.004 ....:....1....:....2....|....3
   83   2.580 ....:....1....:....2....|.
   84   2.344 ....:....1....:....2...
   85   2.068 ....:....1....:....2.
   86   1.776 ....:....1....:...
   87   1.605 ....:....1....:.
   88   1.383 ....:....1....
   89   1.133 ....:....1.
   90   1.093 ....:....1.
   91   1.185 ....:....1..
   92   1.434 ....:....1....
   93   1.873 ....:....1....:....
   94   2.263 ....:....1....:....2...
   95   2.642 ....:....1....:....2....|.
   96   3.065 ....:....1....:....2....|....3.
   97   3.402 ....:....1....:....2....|....3....
   98   3.684 ....:....1....:....2....|....3....:..
   99   4.029 ....:....1....:....2....|....3....:....4
  100   4.481 ....:....1....:....2....|....3....:....4....:
  101   4.875 ....:....1....:....2....|....3....:....4....:....
  102   5.152 ....:....1....:....2....|....3....:....4....:....5+
  103   5.266 ....:....1....:....2....|....3....:....4....:....5+
  104   5.390 ....:....1....:....2....|....3....:....4....:....5+
  105   5.378 ....:....1....:....2....|....3....:....4....:....5+
  106   5.244 ....:....1....:....2....|....3....:....4....:....5+
  107   5.200 ....:....1....:....2....|....3....:....4....:....5+
  108   5.025 ....:....1....:....2....|....3....:....4....:....5
  109   4.743 ....:....1....:....2....|....3....:....4....:..
  110   4.476 ....:....1....:....2....|....3....:....4....:
  111   4.146 ....:....1....:....2....|....3....:....4.
  112   3.696 ....:....1....:....2....|....3....:..
  113   3.123 ....:....1....:....2....|....3.
  114   2.598 ....:....1....:....2....|.
  115   2.190 ....:....1....:....2..
  116   1.822 ....:....1....:...
  117   1.516 ....:....1....:
  118   1.331 ....:....1...
  119   1.121 ....:....1.
  120   0.861 ....:....
  121   0.696 ....:..
  122   0.601 ....:.
  123   0.565 ....:.
  124   0.579 ....:.
  125   0.643 ....:.
  126   0.716 ....:..

.... snip ....


Back to the top

The evaluation of the query is controlled by the second and third switch sets. A quality score for an entry of the query against a library of known TM-proteins is used to judge whether the entry is a TM-protein or not. In "trusted" mode of operation the server will not compute this quality score for clear cases but only for questionable ones. In the current implementation this applies only for cases when only one TM-helix segment detected in the query. Then upon the value of the score the server will decide the type of the entry in question.

In the "unconditional" mode of operation the server is forced to calculate the quality score even for the trivial cases. In the output "Q: trusted" will appear when the score is actually not computed, while the value of the score - a real number between 0 and 1 (higher the better) - will replace the "trusted" string when it is evaluated. Note: this decision-making back-end of the algorithm is under constant development. We are keep on tuning this part of the program as more data become available.

The third switch set selects the library of reference TM-sequences. The predictive power of the documented TM sequences is different in terms of the accurrancy of the result and the sensitivity for the query type. Therefore the sequences are ranked according to their performance and the user can pick the right size of TM-library: small library for speed or larger one for precision.

Here is the output of the sample query with "unconditional" evaluation:

> My first protein ID%3A prot0001
# TMH:  3 Q:  0.93
@  142   2.796 core:  139 ..  145 2.367e-02
@  200   2.906 core:  196 ..  203 1.606e-02
@  232   3.085 core:  226 ..  237 8.544e-03

<-------- end of list -------->


> Here goes the second one ID%3A prot0002
# TMH: 12 Q:  0.99
@   37   3.805 core:   31 ..   41 1.056e-03
@   76   4.211 core:   68 ..   82 2.515e-04
@  104   5.186 core:   95 ..  114 8.047e-06
@  163   2.787 core:  159 ..  166 3.829e-02
@  195   3.719 core:  189 ..  203 1.430e-03
@  227   4.946 core:  219 ..  236 1.883e-05
@  286   4.730 core:  279 ..  294 4.031e-05
@  334   3.043 core:  330 ..  337 1.551e-02
@  364   3.922 core:  349 ..  394 6.973e-04 Twin peaks - two TMH with a short linker
@  389   3.237 core:  349 ..  394 7.836e-03
@  454   4.013 core:  449 ..  482 5.070e-04 Twin peaks - two TMH with a short linker
@  474   6.581 core:  449 ..  482 5.854e-08

<-------- end of list -------->


> Non-TM protein with a fasle positive peak ID%3A glob0001
# TMH:  1 Q:  0.61 !!! Warning! Non-TM protein!
@  143   3.050 core:  139 ..  147 8.898e-03

<-------- end of list -------->


> One more globular ID%3A glob0002
# TMH:  0 Q:  0.00 !!! Warning! Non-TM protein!

<-------- end of list -------->


> The last one%2C globular ID%3A glob0003
# TMH:  0 Q:  0.00 !!! Warning! Non-TM protein!

<-------- end of list -------->
Back to the top

Limitations

The DAS-TMfilter server can process aproximately 1000 sequences per hour. (Several users may use the server at the same time so do not expect that the full power is yours.)

Back to the top

Warning messages

Here is the list of warnings and error messages, what they mean and how to interpret them.

Back to the top

Practical suggestions

Follow these guidelines for the efficient use of the DAS-TMfilter server:

Miklos Cserzo, miklos@bip.bham.ac.uk

Back to the top