PrePS - Prenylation Prediction Suite
Protein CaaX Farnesylation, CaaX Geranylgeranylation and Rab Geranylgeranylation

PRENbase
Database of Prenylated Proteins


What is protein prenylation?

Prenylation refers to the posttranslational modification of proteins with isoprenyl anchors. These lipid moieties are typically involved in mediating protein-membrane but also protein-protein interactions of prominent cellular proteins. 3 eukaryotic enzymes are known to catalyze the lipid transfer. The first two, farnesyltransferase (FT) and geranylgeranyltransferase 1 (GGT1), recognize the so-called CaaX-box in the C-termini of substrate proteins and attach farnesyl (15 carbon polyisopren) or geranylgeranyl (20 carbon polyisopren), respectively, to a required and spatially fixed cysteine in that motif. The third enzyme, geranylgeranyltransferase 2 (GGT2 or RabGGT) recognizes the complex of Rab GTPase substrate proteins with a specific Rab escort protein (REP) to attach one or two geranylgeranyl anchors to cysteines in a more flexible but also C-terminal motif.

Animated Scheme of Protein Prenylation

Literature reviews:

Casey PJ, Seabra MC.
Protein prenyltransferases.
J Biol Chem. 1996 Mar 8;271(10):5289-92.

Sinensky M.
Recent advances in the study of prenylated proteins.
Biochim Biophys Acta. 2000 Apr 12;1484(2-3):93-106.

Roskoski R Jr.
Protein prenylation: a pivotal posttranslational process.
Biochem Biophys Res Commun. 2003 Mar 28;303(1):1-7.

Maurer-Stroh S, Washietl S, Eisenhaber F.
Protein prenyltransferases.
Genome Biol. 2003;4(4):212. Epub 2003 Apr 01.

Back to top

What is PrePS?

PrePS stands for Prenylation Prediction Suite and combines three predictors for protein CaaX farnesylation, CaaX geranylgeranylation and Rab geranylgeranylation in one webinterface. The predictors aim to model the substrate-enzyme interactions based on refinement of the recognition motifs for each of the prenyltransferases. Motif information has been extracted from sets of known substrates (learning sets) and specific scoring functions have been created utilizing both sequence as well as physical property profiles including interpositional correlations to account for partially overlapping substrate specificities. The PrePS selectively assigns the modifying enzyme to predicted substrate proteins and sensitively filters out false positive predictions based on the methodology that has already been applied successfully for the prediction of GPI-anchors, myristoylation and PTS1 peroxisomal targeting.

Back to top
 

How do I submit a sequence?

Simply copy/paste your sequence in the input form in single letter amino acid code or FASTA format.

Back to top


What are the provided options?

You can select the prenylation of which of the three enzymes you want to be predicted.

The option for evOluation over NCBI's non-redundant database (NR) allows for evaluating the evolutionary motif conservation (evOluation). When selected, this option initiates a BLAST search for homologues in NR which takes, depending on our server load, between 0.5 and 2 minutes. Then, the BLAST hits are automatically submitted to PrePS and the prenylation prediction results annotated in the C-terminal alignment of the homologues.

The option for evOluation over PRENbase will initiate a BLAST against PRENbase. The BLAST hits of your sequence in PRENbase are arranged into their respective clusters and ranked according to the best E-value with any cluster member. The output format of the clusters is very similar to the standard PRENbase (see here for explanation of the format). The "Explore" view leads to the individual hitting sequences with the E-value added to the default motif and prediction information.

Back to top


How do I interpret PrePS results?

The two CaaX prenylation predictors (FT and GGT1) provide a score and estimation of the probability of false positive prediction, while the GGT2 predictor gives scores and E-values of the HMMer searches. Details of the profile and physical property terms of the scoring function or the HMMer alignments, respectively, are provided (follow link). Penalties on specific positions or regions can also be used to rationalize whether and why certain query sequences or artificial constructs (e.g. intended for membrane targeting) might be less suitable prenylation targets.

Please keep in mind that there are overlapping substrate specificities between the three prenyltransferases. As a simple rule, if a protein is a predicted GGT2 substrate then it often is geranylgeranylated by GGT2 in vivo also when it could be alternatively modified by FT or GGT1 in vitro (as would be indicated by a prediction as FT or GGT1 substrate). In the case of predicted substrates for both FT and GGT1, our method was found to correlate best with experimentally verified enzyme affinities when comparing the relative scores of the predictors with the GGT1 score divided by 3 (because GGT1 seems to have a 3-fold lower activity compared to FT). Proteins ending with a -CxxL motif were previously thought to be classical GGT1 substrates but can also be alternatively modified to a certain extent by FT in vitro due to its broad substrate specificity. In vivo, however, these proteins ending with a leucine are likely to be geranylgeranylated by GGT1.

PrePS gives additional quality assignments for better comparison of the individual predictions. These are, with decreasing likelihood of being a prenylation target: +++, ++, +, -, -- and ---. The different PLUS attributes describe the quality of proteins as PREDICTED prenylation substrates. A MINUS attribute means that the query sequence is NOT PREDICTED to be prenylated with distinction on how far the query has been from the prediction limit.

In case the evOluation option was used, the evolutionary motif conservation can be evaluated from the alignment of the C-termini of homologous sequences in combination with the respective PrePS predictions. For example, a prenylation motif that is conserved in a series of homologues in different organisms might be biologically more important than an isolated prediction.

IMPORTANT: Please note that PrePS only can tell you whether a protein would be processed by the prenylating enzymes when provided as substrate. This prediction does not necessarily imply a cellular context for the query protein that would allow in vivo access to the respective enzymes.

Back to top


How do I cite PrePS?

The manuscript describing PrePS is available from the Genome Biology website (http://genomebiology.com/2005/6/6/R55) and can be cited as:

Maurer-Stroh S, Eisenhaber F
Refinement and prediction of protein prenylation motifs
Genome Biology 2005, 6:R55 doi:10.1186/gb-2005-6-6-r55

Back to top


What is PRENbase?

PRENbase is an annotated database of known and predicted prenylated proteins (predicted using the PrePS - Prenylation Prediction Suite). Homologous proteins are merged into clusters. A search interface allows sophisticated queries for the experimental status of the modification (known/predicted...), exclusive or shared types of modifying enzymes (FT, GGT1, GGT2) as well as for evolutionary conservation by constraining the taxonomic distribution within these clusters or for single sequences.

Go to PRENbase

Back to top
 


What is HumanPRENbase?

HumanPRENbase is a derivate of PRENbase with focus on human prenylated proteins. In PRENbase, paralogous proteins are clustered together in their respective larger family when they are highly similar to each other. This happens, for example, with the different Ras proteins H-Ras, K-Ras, N-ras, etc. Paralogues can nevertheless have different functions or often similar but more specialized roles. Especially in the light of the importance of prenylation of several members of the Ras, Rab and Rho GTPase families, we sought a solution to investigate the prenylation status of individual human proteins rather than larger families they are part of. Therefore, we employed a scheme of reciprocal BLASTs in order to identify the true orthologues of a set of prenylated human proteins (see accompanying publication for details). 238 individual human proteins and their orthologues form the clusters in HumanPRENbase.

Go to HumanPRENbase

Back to top
 

Do I have to understand the search interface to use PRENbase?

No. The default settings will give access to the collection of both known and predicted eukaryotic and viral prenylated proteins, which can then be browsed. Below the interface we have listed a series of standard queries that you might find useful or are particularly biologically important.

Back to top


Tutorial for customized searches.

The search interface is designed to allow sophisticated queries for the experimental status of the modification (known/predicted...), exclusive or shared types of modifying enzymes (FT, GGT1, GGT2) as well as for evolutionary conservation by constraining the taxonomic distribution within these clusters or for single sequences.

List only  CLUSTERS  with the following features:
 
 Modification Status:
KNOWN
(experimentally verified
cluster members)
 
LIKELY
(experimentally verified
more distant homologues)
 
NEW
(no clear experimentally
verified homologues)
 
OUT-OF-CONTEXT 
  exclude include
 Minimum
 clustersize:


 Modifying Enzyme:
FT 
GGT1 
GGT2 
  exclude allow require
 
 Eukaryotic selection:  (overruled by kingdom selection)
Mammalia
Aves
Amphibia
Ray-finned Fishes
Insecta
Nematoda
Fungi
Plants
Other
exclude allow require

 Kingdoms of Life:
Viruses
Eukaryota
(fine-tuning on the right)
Synthetic
  exclude allow require

The Mendel Site  Details & Help  Contact

 
List only  SEQUENCES  with the following features:
 
 Modifying Enzyme:
FT 
GGT1 
GGT2 
exclude allow require
 
 Eukaryotic selection:
Mammalia
Aves
Amphibia
Ray-finned Fishes
Insecta
Nematoda
Fungi
Plants
Other
exclude include
 NON-Eukaryotic 
 Selection:
Viruses
Synthetic
  exclude include
Overrule above TAX selection and only list HUMAN sequences
 Rank results by:
size of full cluster
size of cluster after selection
evOluation of full cluster
evOluation of cluster after selection
phylogenic complexity of full cluster
phylogenic complexity of cluster after selection





The interface is essentially divided into three sections (green, blue and yellow background). In the green section, the selection relates to features of clusters (protein families) and in the blue section to individual sequences. The green cluster section includes additional options regarding the modification status of the cluster as well as the minimal cluster size. The yellow section contains the selection of the ranking scheme and the submit and reset buttons.

For example, if one would want to select only NEW predictions. The buttons of the modification status should be chosen as follows: Set "KNOWN", "LIKELY" and "OUT-OF-CONTEXT" to "exclude", set "NEW" to "include".

Since there exist cross-specificities among the three prenylating enzymes, we allow to choose the modifying enzyme or combinations thereof. For example, possibly clinically important FTI targets (see accompanying publication) are characterised by being substrate only to FT but not GGT1 or GGT2. To list only sequences predicted to be FTI targets, the "Modifying Enzyme" options in the blue sequence section (!) should be selected like this: require FT, exclude GGT1, exclude GGT2.

Please note that there is a difference between the selections of the "Modifying Enzyme" in the green or blue section, since the former selects for properties of a cluster and the latter for individual sequences. For example, setting FT to "require" in the green section, while GGT1 and GGT2 are "allowed" will list all clusters with at least one FT substrate. These clusters can also include sequences that are no FT substrates at all (e.g. pure GGT1 substrates) that are listed because they are part of a cluster with FT substrate(s). The same selection in the blue section, on the other hand, will only list FT substrates. Some of these, nevertheless, may also be GGT1 or GGT2 in addition to FT substrates.

Further constraints can be given to the taxonomic range of clusters and sequences. For example, if we want to list clusters with conserved prenylation between mammals and insects, we require Mammalia and Insecta in the green section while all other taxa are set to be allowed. The resulting listed clusters will contain at least one mammalian as well as insect sequence, but can also contain sequences from all other allowed taxa. Similar to above, selections in the blue or green section have different effects.

The different ranking schemes can be chosen in the yellow section. The rationale of the rankings is to sort the clusters according to evolutionary conservation indicating importance of the motif. They substantially influence the distribution of clusters and might be used to select targets based on emphasis of specificity of the motif for the protein family (evOluation) or taxonomic diversity (phylogenic complexity). We observe that already the simple ranking by clustersize brings the KNOWN or likely prenylated proteins to the front of the list. However, also some false positives can appear highly ranked. Using the evOluation score for ranking retains the KNOWN in front and moves the OUT-OF-CONTEXT to the back. The phylogenic complexity approach performs somewhat worse in downranking the OUT-OF-CONTEXT clusters, but therefore keeps larger NEW clusters closer to the top of the list.

The choice between ranking features with the attributes "of full cluster" or "of cluster after selection" has been introduced to either adjust the ranking calculation to the selected limited subset or to keep the evolutionary information of the principal cluster (protein family) as sorting criteria. For example, to only list the prenylated proteins of a taxonomic group of special interest to you (e.g. preferred model organism), all other taxa can be set to be excluded in the blue section. It is recommended in this case to use a ranking feature "of full cluster" in order to rank the output according to the evolutionary information of the related protein family in general (including other not listed cluster members). Do not forget to also adjust the minimal cluster size for such searches to "1" since most of the clusters will only contain one or a few sequences (the cluster representatives of your selected taxonomic groups).

Tips:
The numbers of listed clusters and sequences resulting from your search can be found at the bottom of the results page.
Once you have found the settings for your customized query of choice, take note of the associated query code on top of the results in order to quickly access the same query again.

Back to top


How do I read PRENbase results?

The header of the results summarizes the selected query parameters. On top of the table, the query code specific to the selection is listed (copy and save it to recover the same query at a later time point). Then follows the ranked list of clusters (one cluster by line). The higher up in the list, the higher is the estimated importance of the conserved motif for the protein family. Similar to the query interface, information regarding clusters or sequences are in the green or blue sections, respectively.


CLUSTER parameter selection

REQUIRE one of the features for at least one member of listed clusters:
Status-KNOWN Status-LIKELY Status-NEW
EXCLUDE clusters with the following features for at least one member:
Status-OUT-OF-CONTEXT
Other features for cluster members are ALLOWED.
Minimum cluster-size: 3

SEQUENCE parameter selection

INCLUDE entries with the following features:
Viruses Mammalia Aves Amphibia Fishes Insecta Nematoda Fungi Plants Other-Eukaryota
EXCLUDE entries with the following features:
Synthetic-constructs
Ranked by evOluation of full cluster

Querycode: cs2220ce111ct11111111111111cm3se111st00202222222221r2 (save for easy recovery of the identical query parameters)
 Clusters  Selected Sequences
General Information Stats Stats Enzyme Taxonomic Kingdom Taxonomic Detail Eukaryota
Status Rank ClustID Name Pren Total PC EvO Number PC EvO FT GGT1 GGT2 Viruses Synthetic Eukaryota Mammalia Birds Amphibia Fishes Insecta Nematoda Fungi Plants Other
+ 1 46229098 Rab - small GTPases 1314 2725 104.167 633.613 1304
Explore
103.333 624.006 167 45 1279 1 0 1303 344 44 66 100 109 57 160 261 162
+ 2 54639180 Rho/Rac - small GTPases 425 814 63.000 221.898 425
Explore
63.000 221.898 414 369 0 0 0 425 89 15 20 40 21 20 125 51 44
+ 3 3074 Ras/Ral/Rap - small GTPases 374 957 59.250 146.161 372
Explore
59.250 144.602 368 271 1 10 0 362 97 12 26 41 35 20 90 0 41


The line for each cluster starts with the annotation of the modification status (Status). "+" indicates that at least one member of the cluster/protein family has been verified to be prenylated experimentally ("KNOWN"). "*" is the symbol for "LIKELY" prenylation because of the experimental validation of prenylation for a homologue (BLAST E-value < 1e-10) that is not part of the same cluster. By clicking on the symbol, a new window is opened containing information on the links of cluster members to experimentally verified prenylated proteins, including the literature reference to the corresponding experiments. "?" and "#" signify "NEW" predictions with or without manual annotation, respectively. "NEW" means that no clear homologue has been shown experimentally to be prenylated so far. "-" are OUT-OF-CONTEXT predictions in the sense that although the motif could be in vitro substrate for prenylation when provided to the enzymes, the in vivo context of the individual example gives evidence that the motif will not be modified in the cell (e.g. cysteine involved in a disulfide bridge).

In the next column, the rank of the cluster is indicated (the better, the more conserved/important is the lipid anchor in the protein family/cluster). Clusters can have different ranking numbers depending on the selection/filters of the query as well as the ranking method, of course. Next, a cluster identifier is listed (ClustID), followed by the name of the cluster (Name), the number of predicted prenylated proteins in the cluster (Pren), the total number of proteins in the cluster (Total; including proteins not predicted to be prenylated), as well as the phylogenic complexity (PC) and the evOluation (EvO) scores for the whole cluster.

The sequence section starts with the number of sequences selected from the cluster (Number). The "Explore" link below will open a window with details of the selected sequences within the cluster (including their C-terminal motif and the respective predictions). Then, the phylogenic complexity (PC) and the evOluation (EvO) scores for the selected sequences follow. Next, the number of sequences per enzyme is listed (FT, GGT1, GGT2; note that there can be overlapping substrate specificities and the sum of the three values can be higher than the total number of selected sequences). Then, the number of sequences per taxonomic kingdom is followed by a more detailed distribution of the selected sequences among eukaryotic subgroups.

Back to top


What are "isoinpas" in HumanPRENbase?

HumanPRENbase is centered around individual human proteins. In the procedure to collect their true orthologues through iterative reciprocal BLASTs, we also find human isoforms and in-paralogues (duplication has occurred after last speciation event) of these proteins. These are merged into "isoinpa" clusters that are represented by only one sequence, to avoid redundant appearances. In the end of the procedure, we remain with 238 representative human proteins that together with their orthologues form the clusters in HumanPRENbase.

Back to top


Can I save or recover a previous query?

Yes. Each set of query parameters receives a unique code that can be found on top of the results table (e.g. cs2220ce111ct11111111111111cm3se111st00202222222221r2). Copy and save it for later use. To recover the query, simply paste the code into the dedicated field below the search interface. Selections in the form above will be ignored if a valid query code is entered.

You can also retrieve your query by pasting it here:

Back to top


Can I find a specific cluster when I know its ClustID?

Yes. You can do so with your browser search function (often, CTRL+F serves as shortcut) over the output list of clusters (which are always on one single page). In case you don't find your ClustID, please check that you have typed it correctly and that the parameters for the output do not preclude listing of this cluster. To ensure avoiding the latter you can find the absolute complete list of clusters in PRENbase (here) or HumanPRENbase (here), respectively.

Back to top


How do I make a keyword search?

You can do so within the cluster descriptions with the help of your browser search function (often, CTRL+F serves as shortcut) over the output list of clusters (which are always on one single page). You can find the complete list of clusters here (PRENbase) or here (HumanPRENbase), respectively.

Back to top


Can I BLAST my own sequence against PRENbase?

Yes, you can do so using the PrePS - Prenylation Prediction Suite. Paste your sequence into the query form. Next, select the "EvOluation" option and, then, further select "PRENbase". After pressing submit, your sequence will be checked for prenylation motifs and also BLASTed against PRENbase. The BLAST hits of your sequence in PRENbase are arranged into their respective clusters and ranked according to the best E-value with any cluster member. The output format of the clusters is very similar to the standard PRENbase. The "Explore" view leads to the individual hitting sequences with the E-value added to the default motif and prediction information.

Click here to BLAST your sequence against PRENbase!

Back to top


How can I report an update/correction to a cluster modification status?

We are very grateful for any help in keeping PRENbase correct and up-to-date. Since the aim of the project was to provide experimentalists with ranked lists of predicted prenylation targets, we are always happy to be notified about experimental verification or dismissal of any such predictions. If you know about a new or overseen experimental verification for members of clusters that are not annotated with "+" or know about evidence that a predicted prenylation motif is not modified in vivo, please email us the cluster identifier (ClustID) and the reference to the literature where the experiments are documented. Many thanks.

Back to top


How do I cite PRENbase?

The manuscript describing PRENbase is freely accessible from PLoS Computational Biology.

Back to top