Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME Research * Simple Sequences
 
RESEARCH TOPICS
 
   
Simple Sequences in Proteins and DNA

 
Simple sequences are regions of low complexity made up of short sequence repeats (1-6 elements). The repeats may have a tandem organization or form segments of imperfect repeats, also called cryptic simple sequences. In DNA, as well as in proteins, regions of low complexity are extremely abundant. For example about 71% of the yeast proteins show significant overall simplicity as measured by the SIMPLE algorithm. This algorithm has been implemented into the SIMPLE v. 3.0 program for the analysis of simplicity in any nucleic acid or protein sequence and can be accessed online.

Many short repeats are believed to have originated by DNA slippage and misaligning during replication, recombination or repair. We have studied the codon composition in regions of genes that encode for homopeptides in order to determine whether amino acid repeats correlate with trinucleotide repeats in the gene. A high correlation would be consistent with slippage while a mixture of codons could be indicative of selection of the homopeptide region. In mammals two populations of glutamine repeats can be clearly differentiated. The first is encoded by pure trinucleotide tracts (CAG) and the second by very mixed tracts (CAA/CAG). The latter type tends to be conserved in human and mouse than the pure tracts. The results suggest that while a subset may have been recently originated by slippage, and may therefore be neutral, some of the polyglutamine segments appear to have been preserved throughout evolution.
 

 
Relevant publications
 

  • M.M. Albà, M.F. Santibáñez-Koref and J.M Hancock.
    "The comparative genomics of polyglutamine repeats: extreme difference in the codon organization of repeat-encoding regions between mammals and Drosophila."
    Journal of Molecular Evolution, 52:249-259 (2001).

  • M.M. Albà, M.F. Santibáñez-Koref and J.M. Hancock.
    "Conservation of polyglutamine tract size between mouse and human depends on codon interruption."
    Molecular Biology and Evolution, 16:1641-1644 (1999).

  • M.M. Albà, M.F. Santibáñez-Koref and J.M Hancock.
    "Amino acid reiterations in yeast are over-represented in particular classes of proteins and show evidence of a slippage-like mutational process."
    Journal of Molecular Evolution, 49:789-797 (1999).

 
  Disclaimer webmaster