MYRbase | ||
EvOluation of proteome-wide predictions of glycine myristoylation |
Myristoylation is a common lipid modification of proteins in Eukaryotes and their Viruses as well as some Bacteria and essential for the function of several important proteins (such as G proteins, SRC and related kinases, ADP ribosylation factors, HIV gag, HIV nef,...). The saturated 14-carbon fatty acid (Myristate) is attached most often co-translationally by the enzyme NMT (MyristoylCoA:Protein N-Myristoyltransferase) to N-terminal glycines or glycines that become N-terminal after proteolytic cleavage.
Based on sequence variability of known substrate proteins, physical property profiles and structural models of NMT-substrate interactions (J Mol Biol. 2002 Apr 5;317[4]:523-40), we developed a powerful prediction tool for glycine myristoylation (J Mol Biol. 2002 Apr 5;317[4]:541-57) that is available as webserver (http://mendel.imp.univie.ac.at/myristate/) and whose sensitivity allows large-scale database runs.
To facilitate selection of targets for experimental verification of our predictions, we evaluate the evolutionary conservation of the predicted myristoylation motif within close homologues (EvOluation). If a sequence is predicted to be myristoylated and the same applies to its homologues (preferably in a series of different organisms), we not only add another dimension of credibility to our prediction but derive that the lipid anchor might play an essential role for that protein's function.
Such an analysis has been applied in a large-scale approach to the proteins included in the SwissProt and Genbank databases. The corresponding predicted entries and their homologues were annotated and summarized in tabular form accessible from MYRbase.
Select database and taxon to enter MYRbase:
Database | Taxon | # evaluated entries | # predicted entries | # clusters of predicted entries | ||
SwissProt | Eukaryota | 61577 | 491 | 196 | Explore Clusters | |
Viruses | 8468 | 258 | 54 | Explore Clusters | ||
Genbank | Eukaryota | 600916 | 4409 | 1985 | Explore Clusters | |
Viruses | 152015 | 5681 | 188 | Explore Clusters | ||
Eukaryotic Subgroups |
Mammalia | 264332 | 2168 | 968 | Explore Clusters | |
Viridiplantae | 119436 | 1009 | 429 | Explore Clusters | ||
Insecta | 61716 | 294 | 197 | Explore Clusters | ||
Fungi | 31743 | 164 | 81 | Explore Clusters | ||
Nematoda | 28731 | 227 | 145 | Explore Clusters |
(Sites use Javascript! Please make sure that your browser is
capable of and has enabled Javascript.
Javascript free webpages are available for SwissProt - Eukaryota
and Swissprot - Viruses)
Supplementary material to:
MYRbase: Analysis of genome-wide glycine myristoylation enlarges
functional spectrum of eukaryotic myristoylated proteins
Mass spectra of in vitro myristoylation assay of N-terminal peptides
(PDF) by Masaki Gouda
and Nobuhiro Hayashi.
Differences between theoretical and experimental masses for peptides ending in serine are most likely due to
carboxy-terminal sodium salt formation. Protein identifiers can be found in the upper left corner of each
page.
Aim of this project
Our systematic theoretical analysis of myristoylation in a proteomic scale has unveiled extensive lists of NEW potential targets for this important lipid modification. As a computational group we would like to raise the interest of experimentators to discuss, verify or reject our predictions. This information is important to us to be able to refine the predictor. Therefore, we would like to invite anyone to see whether a protein of her/his interest with unclear myristoylation status is within our lists or check their sequences for myristoylation signals with our predictor.
or
Test whether your protein is predicted to be myristoylated with the MYR predictor (other predictors from our group exist for GPI-anchors and the peroxisomal targeting signal 1).
In a similar manner as viral proteins can be myristoylated also some bacteria take advantage of the host NMT and let it modify some of their proteins. So far, only very view examples are known. These comprise proteins of bacteria utilizing a type III secretion system (TTSS) to inject own proteins into the host cell. We have combined an algorithm to filter for proteins following type III secretion with our myristoylation predictor and applied it over genomes of pathogenic bacteria that are known to have this translocation system. As the motif requirement for type III secretion generally is not fully understood we might miss some true positives but also include false positives. Therefore, it is important to make sure that the protein is actually translocated into the host cell before taking into account the possibility of a myristoyl lipid modification. 28 bacterial proteins predicted to be translocated and myristoylated by the eukaryotic host group into a total of 20 clusters and can be accessed here:
MYRbase - TTSS Bacteria |
Additionally, we have analyzed the domain distribution among a set of eukaryotic entries in MYRbase with less than 90% sequence identity (to avoid disproportional contribution of highly redundant entries). We list the domains that were found with an HMMer search against the PFAM domain library using an E-value of 0.01, ordered by the number of unique entries hitting each domain. To emphasize novel functions as indicated by differing domain repertoires, we separate the analysis into experimentally verified MYRbase entries including their closer homologues while the other set should comprise "functionally" new predictions. For the new predictions we require a leading methionine (to avoid potential fragments) and only list domains occurring in at least two entries (to exclude potential false positives).
Domain List: Exp.Ver.+Hom. |
Domain List: New Predictions |
How to pick interesting targets for experimental verification from MYRbase? Please read the following:
Description and User Guide for MYRbase
The procedure: |
Whole databases have been analyzed with the MYR predictor and the predicted entries were clustered into homologous groups with the program cd-hit using a 40% identity threshold. The predicted entries and their clusters have undergone several parsing and annotation steps to provide a multiplicity of information. |
The tables: |
The presented tables include one representative entry for each cluster, ordered by cluster size. Navigate between tables by using the previous, next or Back to Start links. To return to the database and taxon selection, click Back to Start and from there Back to Index. |
The entries: |
Example (see below) |
Homologous Cluster Size |
Protein Information | Pos. | Myristoylation Motif | Score | Prob.FPP | Prediction | ||||||||||||||||||||||||||||||
~31 org. 72
(92 tot.) +SW MYR-Ann. |
SWISSPROT - 21263684 Guanine nucleotide-binding protein alpha-4 subunit [Caenorhabditis elegans] Ann.: MYRISTATE (BY SIMILARITY). CD BLAST | 2 | GCFHSTGSEAKKRSKLI |
|
0.000 | RELIABLE |
|
First column: |
The major information here is the size of the
homologous cluster (e.g. 72). The bigger the number, the more homologous
sequences are also predicted to be myristoylated, possibly pointing to an increased functional importance of
the lipid anchor. Click on it and a a popup with a table of all cluster members in a similar format to the
example will appear. |
|
Second column: |
This column contains the protein information.
If the protein is part of SwissProt, PIR or PDB, this is indicated in the beginning as it also means that
already more detailed annotations are available. If you click on the GeneInfo identfier (e.g. 21263684), a new window,
showing the full entry in Genbank, opens. After a short general description of the protein (e.g.
Guanine nucleotide-binding protein alpha-4 subunit), the respective organism can be displayed in brackets (e.g. [Caenorhabditis elegans]). If the protein already has a
SwissProt annotation for myristate, details of the annotation are given (e.g. Ann.: MYRISTATE (BY SIMILARITY)). |
|
Third column: |
The position of the glycine that is predicted to be myristoylated within the sequence is indicated (e.g. 2). |
|
Fourth column: |
The full myristoylation motif is shown, highlighting cysteines that could potentially become palmitoylated with yellow and positive charges with blue background (e.g. GCFHSTGSEAKKRSKLI). Both factors can strengthen membrane attachment and putatively influence membrane subcompartment localization. |
|
Fifth column: |
Moving the mouse over the score assigned by the MYR predictor (try in example entry above), triggers display of all score components (profile and physical property terms) that are described in more detail in Maurer-Stroh et al. (J Mol Biol. 2002 Apr 5;317[4]:541-57). |
|
Sixth column: |
The MYR predictor estimates a probability of false positive prediction (P value *100 = Probability in %). In the example, this probability is below 0.0%. |
|
Seventh column: |
Finally, we divide predictions into two
quality assignments for a simplified categorization of the suitability of sequences to be myristoylated by NMT.
Myristoylation sites predicted as 'RELIABLE' comply with the sequence motif as implemented in the
present version of the predictor and will most likely be processed by NMT when provided as substrate. This
prediction does not necessarily imply a biological context for the query protein that allows in vivo access to
the NMT. |
Please be aware that even a minimal rate of false
positive predictions (below 0.5% for entries predicted with the attribute 'RELIABLE') results in a significant number
of false positive predictions when going over large-datasets (such as the over 1 million eukaryotic sequences in
Genbank). Our evOluation procedure aims to emphasize evolutionary importance of the predicted lipid modifications and
should rank the more interesting targets first in the tables. However, if a sequence occurs disproportionally more
often in highly similar copies, the number of false positives is also increased disproportionally and the cluster
size of predicted homologues becomes less indicative. Consequently, clusters with only weakly predicted entries that
are small compared to the total occurrence of related sequences in the database might resemble false positive
predictions.
On the other hand, several interesting targets might be underrepresented in current databases or have appeared late
in evolution which restricts them to a limited subset of taxa. Therefore, also smaller clusters in the end of the
tables might bear some "pearls" waiting to be harvested. Good luck !!!
For questions and comments please contact:
Sebastian Maurer-Stroh
Contributors (in alphabetical order):
Birgit Eisenhaber
Frank Eisenhaber
Masaki Gouda
Nobuhiro Hayashi
Fernanda Sirota Leite
Georg Neuberger
Maria Novatchkova
Alexander Schleiffer
Michael Wildpaner