IMP GPI Lipid Anchor Project IMP-Bioinformatics

The learning set

Contact:
Birgit Eisenhaber (IMP/Austria)
Peer Bork (MDC/EMBL)
Frank Eisenhaber (IMP/Austria)

SWISSNEW(27th of Jan, 1999)-part of the learning set (total 9 entries)

G13B_DICDI
GP63_LEIMA
LY6E_MOUSE
TIP1_YEAST
TREA_RABIT
UPAR_RAT
VSG7_TRYBR
CD59_PIG
THY1_MACMU

SWISSPROT(rel.36)-part of the learning set (total 160 entries)

5NTD_BOVIN
5NTD_DISOM
5NTD_HUMAN
5NTD_RAT
ACES_TORCA
ACES_TORMA
AMPM_HELVI
AMPM_MANSE
AXO1_HUMAN
BM86_BOOMI
BST1_HUMAN
BST1_MOUSE
BST1_RAT
CADD_CHICK
CADD_HUMAN
CD24_HUMAN
CD24_MOUSE
CD24_RAT
CD48_HUMAN
CD48_MOUSE
CD48_RAT
CD52_HUMAN
CD52_MACFA
CD59_AOTTR
CD59_CALSQ
CD59_CERAE
CD59_HSVSA
CD59_HUMAN
CD59_PAPSP
CD59_RAT
CD59_SAISC
CEPU_CHICK
CGM6_HUMAN
CNTR_CHICK
CNTR_HUMAN
CNTR_RAT
CONN_DROME
CSA_DICDI
CWP1_YEAST
CWP2_YEAST
DAF1_MOUSE
DAF_HUMAN
DAF_PONPY
FOL1_HUMAN
FOL1_MOUSE
FOL2_HUMAN
FOL2_MOUSE
G13A_DICDI
GAS1_YEAST
GLYP_HUMAN
GLYP_RAT
GP63_LEICH
GP63_LEIDO
GP85_TRYCR
GPCK_MOUSE
HYA1_CAVPO
HYA1_HUMAN
HYA1_MACFA
HYR1_CANAL
LACH_DROME
LACH_SCHAM
LAMP_HUMAN
LAMP_RAT
LY6A_MOUSE
LY6C_MOUSE
LY6F_MOUSE
LY6G_MOUSE
MDP1_HUMAN
MDP1_MOUSE
MDP1_PIG
MDP1_RABIT
MDP1_RAT
MDP1_SHEEP
MKC7_YEAST
MSA1_SARMU
NAR3_HUMAN
NART_MOUSE
NCA_HUMAN
NRT1_RAT
NRT2_RAT
NRTR_CHICK
NRTR_HUMAN
NRTR_MOUSE
NTRI_RAT
OPCM_BOVIN
OPCM_HUMAN
OPCM_RAT
PAG1_TRYBB
PARA_TRYBB
PARB_TRYBB
PARC_TRYBB
PONA_DICDI
PPB1_HUMAN
PPB2_HUMAN
PPB3_HUMAN
PPBE_MOUSE
PPBI_BOVIN
PPBI_HUMAN
PPBI_RAT
PPBJ_RAT
PRIO_ATEGE
PRIO_ATEPA
PRIO_CALJA
PRIO_CEBAP
PRIO_CERAE
PRIO_CERAT
PRIO_CERMO
PRIO_CERPA
PRIO_CERTO
PRIO_COLGU
PRIO_CRIGR
PRIO_CRIMI
PRIO_GORGO
PRIO_HUMAN
PRIO_MACFA
PRIO_MANSP
PRIO_MESAU
PRIO_MOUSE
PRIO_PANTR
PRIO_PONPY
PRIO_PREFR
PRIO_RAT
PSA_DICDI
SP63_STRPU
THY1_CHICK
THY1_HUMAN
THY1_MOUSE
THY1_RAT
TIR1_YEAST
TREA_HUMAN
UPAR_BOVIN
UPAR_HUMAN
UPAR_MOUSE
VSA1_TRYBB
VSA8_TRYBB
VSAC_TRYBB
VSE2_TRYBR
VSG2_TRYEQ
VSG4_TRYBR
VSI1_TRYBB
VSI2_TRYBB
VSI3_TRYBB
VSI4_TRYBB
VSI5_TRYBB
VSI6_TRYBB
VSIB_TRYBB
VSM0_TRYBB
VSM1_TRYBB
VSM2_TRYBB
VSM4_TRYBB
VSM5_TRYBB
VSM5_TRYBR
VSM6_TRYBB
VSWA_TRYBR
VSWB_TRYBR
VSY1_TRYCO
YAP3_YEAST
YJ9O_YEAST
YJ9P_YEAST
CAH4_HUMAN

Comments to the selection of entries for the learning set

Origin of information on GPI-modification sites

 

Annotated proteins with known GPI-modification carry the keyword "GPI-anchor" and the token "FT LIPID" in SWISS-PROT.

 

Annotated entries from SWISS-NEW (27th of Jan. 1999)

9

 

9

Annotated entries from SWISS-PROT (rel. 36)

179

   
 
  • Sequence identical with a SWISS-NEW entry
 

7

 
 
  • Entries with "?" in "FT LIPID" (CD52_CANFA, CD52_MOUSE, CD52_RAT, VSY3_TRYCO)
 

 

4

 
 
  • Too many variants of splicing (DAF_CAVPO)
 

1

 
 
  • Total remaining
   

167

Non-annotated entries but full information in literature (CAH4_HUMAN, Okuyama et al., Arch. Biochem. Biophys., 1995, 10, 315-322)

   

 

1

Total number of entries

   

177

 

 

This set of 177 entries contains information on 126 metazoan sequences, 40 protozoan sequences, one viral sequence and 10 fungal sequences. The omega-sites for ACES_TORMA and ACES_TORCA have been edited in accordance with new literature data (Bucht and Hjalmarsson, BBA, 1996, 1292, 223-232).

 

 

Comments to protozoan entries

 

Four protozoan sequences attracted attention as a result of their deviating propeptide length and other strange sequence properties. In all four cases, the omega-site annotated in the database was not verified experimentally. Two entries (GP46_LEIAM, GP63_LEIGU) were deleted due to their extreme propeptide length (>31 residues) and the absence of an obviously reasonable alternative omega-site. In two entries with extremely short propeptide (<16 residues), the omega-site was edited in accordance with homology considerations and/or an expert sequence property analysis. The final learning set for protozoa consists of 38 sequences.

 

entry

Sequence length

annotated site in SWISS-PROT

new site

MSA1_SARMU

280

264

256

PAG1_TRYBB

405

396

391

 

 

 

Comments to metazoan entries

 

Out of the original set of 126 entries, 6 were deleted due to extreme propeptide length and the absence of an alternative omega-site. For another 21 sequences, the site has been edited in accordance with homology considerations and/or an expert sequence property analysis. In all 27 cases, the omega-site was not validated with an adequate experimental method. Thus, the final learning set contained 120 sequences.

 

List of deleted entries with extreme propeptide length (totally 6 entries)

 

Entry

motiv length

GDNR_CHICK

>31

GDNR_HUMAN

>31

GDNR_MOUSE

>31

GDNR_RAT

>31

GP42_RAT

<17

VCA1_MOUSE

<17

 

 

List of entries with extreme propeptide length for which an alternative site was assigned (totally 10 entries)

 

entry

sequence length

annotated site in SWISS-PROT

New site

LY6A_MOUSE

134

119

112

LY6C_MOUSE

131

116

109

LY6E_MOUSE

136

121

108

LY6F_MOUSE

134

119

112

LY6G_MOUSE

111

96

89

HYA1_CAVPO

529

492

499

BST1_MOUSE

311

279

286

BST1_RAT

319

287

294

CNTR_HUMAN

372

336

342

CNTR_RAT

372

336

342

 

List of entries with normal propeptide length for which an alternative site was assigned (totally 11 entries)

 

entry

sequence length

annotated site in SWISS-PROT

strange amino acid at omega-site or homologous sequence

new site

 

NRTR_MOUSE

463

443

NRTR_CHICK

444

PRIO_MOUSE

254

230

PRIO_RAT

231

TREA_HUMAN

583

559

K

556

TREA_RABIT

578

558

Q

555

BST1_HUMAN

318

300

Q

293

CNTR_CHICK

362

334

CNTR_HUMAN

341

CD24_RAT

76

56

CD24_HUMAN

55

CD24_MOUSE

76

53

CD24_HUMAN

55

CD59_PIG

123

97

K

99

CONN_DROME

682

665

Q

658

NRTR_HUMAN

464

444

P at -1, R at +1

440

 


Last modified: 12th June 2000