IMP GPI Lipid Anchor Project IMP-Bioinformatics

The GPI-modification motif and its recognition in proprotein sequences

Contact:
Birgit Eisenhaber (MDC/IMP)
Peer Bork (MDC/EMBL)
Frank Eisenhaber (IMP)

Glycosylphosphatidylinositol (GPI) lipid anchoring is a common posttranslational modification known mainly from extracellular eukaryotic proteins. Attachment of the GPI moiety to the carboxyl terminus (omega-site) of the polypeptide follows after proteolytic cleavage of a C-terminal propeptide. For the first time, a new prediction technique locating potential GPI-modification sites in precursor sequences has been applied for large-scale protein sequence database searches. The composite prediction function (with separate parametrisation for metazoan and protozoan proteins) consists of terms evaluating both amino acid type preferences at sequence positions near a supposed omega-site as well as the concordance with general physical properties encoded in multi-residue correlation within the motif sequence. The latter terms are especially successful in rejecting non-appropriate sequences from consideration. The algorithm has been validated with a self-consistency and two jackknife tests for the learning set of fully annotated sequences from the SWISS-PROT database as well as with a newly created database "big-PI" (more than 300 GPI-motif mutations extracted from original literature sources). The accuracy of predicting the effect of the mutations in the GPI sequence motif was above 83%. Lists of potential precursor proteins which are non-annotated in SWISS-PROT and SPTrEMBL are presented on this WWW-page. The algorithm has been implemented in the prototype software "big-PI predictor" which may find application as a genome annotation and target selection tool.

  1. Sequence analysis of GPI-anchored proteins on the proprotein level
    An analysis of physical properties of amino acid residues at given sequence positions in the vicinity of the GPI-modification site allowed the construction of a model of the active site of the putative transamidase complex (Eisenhaber et al., Prot.Eng., 11, 1155-1161,1998).

    Data sheets: Learning set and prediction results

  2. The learning set - a list of 169 SWISS-PROT/SWISS-NEW entries with annotated GPI-site
  3. The self-consistency-test of the learning set

  4. Protozoa
    Metazoa
  5. The jack-knife-test I over the whole learning set

  6. Protozoa
    Metazoa
  7. The jack-knife-test II over the largest subset of non-homologous sequences only

  8. Protozoa
    Metazoa

  9. The mutation set - a list of 293 metazoan mutation experiments of the C-terminal end of GPI-anchored proteins
  10. The prediction of the mutation set
  11. Validation of GPI-modification site prediction with mutation data: Summary Table
  12. GPI-modification site prediction in non-annotated sequences

  13. 1.SWISSPROT rel.37/SWISSNEW 12th April 1999 : entries with keyword GPI-ANCHOR, but without site annotation
    Protozoa
    Metazoa
    2. SWISSPROT rel.37/SWISSNEW 12th April 1999 : entries without keyword GPI-ANCHOR
    Protozoa
    Metazoa
    other
    3. SPTREMBL rel.9 (Jan 1999)
    Protozoa
    Metazoa
    other


Last modified: 12th June 2002