HOMEPAGE Brix Domain

*This document is not part of the official presentation of the Institute of Molecular Pathology. The contents of this page do not represent the opinions of IMP representatives.

Brix Domain Proteins

This page supplies supplementary material for the manuscript

"The Brix domain protein family - a key to the ribosomal biogenesis pathway"
by Frank Eisenhaber, Christian Wechselberger, Guenther Kreil

Acknowledgements to:
M. Breitenbach, F.M. Jantsch, G. Lepperdinger
for sharing experimental data with respect to Brix and yol077c with us prior to publication, for discussion of the sequence analysis results and for carefully reading the manuscript.

You might want to know:

Sequences and alignments
fasta-file with the X. laevis Brix sequence (AF319877)
Members of the Brix domain protein superfamily
The search history of the Brix domain superfamily and its internal structure
fasta-file of Brix Domain Proteins
Clustal-formatted grand alignment of Brix proteins
The same alignment in postscript and colour

HMMs and domain locations
The HMM for global domain searches
The HMM for local/fragmented domain searches
The domain boundaries for Brix domain proteins (hmmsearch output)
Reference for HMM: Eddy SR.
Profile hidden Markov models.
Bioinformatics. 1998;14(9):755-63.

Compositions and sequence lengths
Charged AA in the regions flanking the Brix domain
Length of Brix domain proteins

supplementary information kindly supplied by Mensur Dlakic (University Michigan - Medical School) from June 12th, 2001
another alignment of Brix proteins in color
this alignment in MSF format
the corresponding HMM for global domain searches
the corresponding HMM for fragmented domain searches
the search output from SWISS-ALL

Features of four Brix domain proteins with unusually large sequence length in their family indicating possible gene prediction or genome assembly inaccuracies:

The length of proteins in the Peter Pan family and in group 1 of hypothetical proteins is typically <475 residues (346-475 residues and 295-434 residues respectively). But four proteins (all of them are hypothetical, conceptual translations of genomic sequences) have an extraordinary length of ~700 residues.

CAB77212 [unidentified organism, sequence with similarity to human emb|CAB99252.1] (Peter Pan family II)

Beginning with residue ~450 (up to the C-terminus), a seven TM region with sequence similarity to GPCRs starts. None of the other Brix domain proteins has TM regions.

 predicted TM regions
 450 470
 492 512
 526 546
 567 587
 625 645
 667 687
 729 749

PFAM:
7tm_1: domain 1 of 1, from 465 to 693: score 116.8, E = 4.5e-36
                   *->GNlLVilvilrtkklr..tptnifilNLAvADLLflltlppwalyyl
                       N+L+++ ++  +k r+  p+ +f + LAv+DLL +ltlpp a+y +
  UI_emb|CAB   465    SNGLALYRFSI-RKQRpwHPAVVFSVQLAVSDLLCALTLPPLAAYLY 510

                   vggsedWpfGsalCklvtaldvvnmyaSillLtaISiDRYlAIvhPlryr
                        +W +G+a C+l  +l+++n+  S+ ++t+IS+ RYl IvhP+ +r
  UI_emb|CAB   511 PP--KHWRYGEAACRLERFLFTCNLLGSVIFITCISLNRYLGIVHPFFAR 558

                   rrrtsprrAkvvillvWvlalllslPpllfswvktveegngtln......
                   ++ + p++A++v+++ Wvla+ll++P l fs +k+++  +  +++ +  +
  UI_emb|CAB   559 SHLR-PKHAWAVSAAGWVLAALLAMPTLSFSHLKRPPQQG--AGncsvar 605

                   .vnvtvClidfpeestasvstwlrsyvllstlvgFllPllvilvcYtrIl
                   ++  + Cl + +++      + +r+y l++  +g  lPll+ l +Y+
  UI_emb|CAB   606 pEACIKCLGTADHGL-----AAYRAYSLVLAGLGCGLPLLLTLAAYGALG 650

                   rtlr...........kaaktllvvvvvFvlCWlPyfivllldt<-*
                   r++ ++++ +  ++ ++a +++  v + + + +Py+i+ +l++
  UI_emb|CAB   651 RAVLrspgmtvaeklRVAALVASGVALYASSYVPYHIMRVLNV    693

T32923 [C. elegans] (Peter Pan family II)

Beginning with sequence position ~470, this protein has an unusually compositioned, long low complexity regions with almost monotonous repeats of FRGG (translation of a non-coding region ?).

low complexity regions: SEG 12 2.2 2.5
>gi|7505687|pir||T32923 hypothetical protein K09H9.6 - Caenorhabditis elegans^Agi|2804465|gb|AAB97575.1| (AF043700) contains similarity to human RNA-binding protein FUS/TLS (SW:Q28009) [Caenorhabditis elegans]

                                  1-1    M
                    kmkkgkkqrk    2-11
                                 12-32   SGAHNRGNVDALSQKDALYHQ
                ekfvkklqkqkfle   33-46
                                 47-86   NKEIELARQPHCLVIHRGDVGKYVKGLESD
                                         LRNLVEPNTA
                  knlkilkrnnik   87-98
                                 99-384  DFIVNGAVLGVTNMMVLTSSDASLQLRMMR
                                         FSQGPTLSFKVKQYSLARHVVNCQKRPVAT
                                         DKLFKSSPLVVMNGFGDGTQKHLSLVQTFI
                                         QNMFPSINVDTIQLGNLKRCLIVSYDEETD
                                         EIQMRHFAIRVVASGLNKSVKKLMQAEKTM
                                         GKNIPNLSTYKDISDYFLNPGQFRIRTKLN
                                         FSNINHKIIIKKYSIYLNFPSFSSPGQLSD
                                         SEFEGDQQEVELPQDISEGRGCGVGQKSNV
                                         RLHEIGPRLTLELVKIEEGIDEGEVLYHKH
                                         NAKTPDELIKLRAHMD
              kkkqmkkrreqeseqr  385-400
                                401-477  VIRRLTIVKEQQDAEEAEVKAIRENAARKQ
                                         AAATGQVEEVENQKEKDREIAMNRERDLKR
                                         ANEEWGTSEASKRPRYE
dsrggfrggfrgrgedrggfrgrggdrggf  478-608
rgrdrdgggfrgrsvdrggfrggggdrggf
rgrssdrggdrggfrgrsgdrdggfrggfg
grggggfrggdrggfrgrgggggfrggrgg
                   drgggfrggrr

P45073 [C. elegans] (family III/group 1 of hypothetical proteins)
Within the N-terminal ~300 residues, this protein has a region with some similarity to the AAA+ ATPase module (detected with IMPALA and the signalling domain library of L. Aravind, E=3e-04, and with PFAM, E=0.15, see comments for the somewhat shorter T19409).

T19409 [C. elegans] (family III/group 1 of hypothetical proteins)

Within the N-terminal ~300 residues, this protein has a region with some similarity to the AAA+ ATPase module (detected with IMPALA and the signalling domain library of L. Aravind, E=3-04, and with PFAM, E=0.15).

IMPALA: AAA AAA+ ATPase Module
          Length = 298

 Score = 35.0 bits (79), Expect = 3e-04
 Identities = 32/193 (16%), Positives = 32/193 (16%), Gaps = 14/193 (7%)

Query: 95  RKPLVLSFHGYTGSGKNYVAEIIANNTFRLGLRSTFVQHIVATNDFPDKNKLEEYQVELR 154

Sbjct: 78  AQPKGVLLYGPPGTGKTLLARAVAHHTDC---------TFIRVSGSELVQKFIGEGARMV 128

Query: 155 NRILTTVQKCQRSIFIFDEADKLPEQLLGAIKPFLDYYSTISGVDFRRSIFILLSNKGGG 214

Sbjct: 129 RELFVMAREHAPSIIFMDEIDSIGSRLEGGSGGDSEVQRTMLELLNQLDGFEATKN---- 184

Query: 215 EIARITKEQYESGYPREQLRLEAFERELMNFSYNEKGGLQMSELISNHLIDHFVPFLPLQ 274

Sbjct: 185 -IKVIMATNRIDILDSALLRPGRIDRKIEFPPPNEEARLDILKIHSRKMNLTRGINLRKI 243

Query: 275 REHVRSCVGAYLR 287

Sbjct: 244 AELMPGASGAEVK 256

PFAM: AAA              1/1      99   287 ..     1   216 []   -25.3     0.15
AAA: domain 1 of 1, from 99 to 287: score -25.3, E = 0.15
                   *->gvLLyGPPGtGKTlLAkavAkelg......vpfisisg......sel
                       + ++G  G+GK  +A+++A+ + + + ++  +  i ++++ ++
  CE_pir|T19    99    VLSFHGYTGSGKNYVAEIIANNTFrlglrsTFVQHIVAtndfpdKNK 145

                   vskyvGesekrvralfelArkslkkaaPspiiFIDEiDalapkRgdegdv
                   +++y  e + r+  + ++  +s        i ++DE+D+l  +
  CE_pir|T19   146 LEEYQVELRNRILTTVQKCQRS--------IFIFDEADKLPEQ------- 180

                   servvnqLLtemDLerigfekhylr..vsdvvDlsgviviaaTNrpdlld
                          LL  +      f ++y++ ++   vD+++ i+i+ +N  +  +
  CE_pir|T19   181 -------LLGAIK----PFLDYYSTisG---VDFRRSIFILLSNKGGGEI 216

                   paLlrpGRfdrrievplPdeeerleIlkihlkkmplalc..qerselakd
                   + +       ++ e ++P e+ rle +++ l ++  +++++  +sel ++
  CE_pir|T19   217 ARITK-----EQYESGYPREQLRLEAFERELMNFSYNEKggLQMSELISN 261

                   vdldelakelArrtpgfsgadlaa..lcreAalralr<-*
                   + +d+           f+     +++++r    + lr
  CE_pir|T19   262 HLIDH-----------FVPFLPLQreHVRSCVGAYLR    287

Reference for IMPALA:
Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF
IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices.
Bioinformatics 1999 Dec;15(12):1000-11

Last modified: Feb 2001

Brix Domain Proteins

Features of four Brix domain proteins with unusually large sequence length in their family indicating possible gene prediction or genome assembly inaccuracies:

CAB77212 [unidentified organism, sequence with similarity to human emb|CAB99252.1] (Peter Pan family II)

T32923 [C. elegans] (Peter Pan family II)

P45073 [C. elegans] (family III/group 1 of hypothetical proteins)

T19409 [C. elegans] (family III/group 1 of hypothetical proteins)