Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME Research * SelenoProteins
Characterization of the Eukaryotic Selenoproteome

In selenoproteins, incorporation of the amino acid selenocysteine is specified by the UGA codon, usually a stop signal. The alternative decoding of UGA is conferred by an mRNA structure, the SECIS element, located in the 3'-untranslated region of the selenoprotein mRNA (see figure). Because of the non-standard use of the UGA codon, current computational gene prediction methods are unable to identify selenoproteins in the sequence of eukaryotic genomes.

We are developing methods to predict selenoproteins in genomic sequences. The methods rely on the prediction of SECIS elements in coordination with the prediction of genes in which the strong codon bias characteristic of protein coding regions extends beyond a TGA codon interrupting the open reading frame. The program geneid and its ability to handle external data to tune the predicted genes, together with its capacity to predict genes with a TGA in-frame is basic for our approach. However, prediction of selenoprotein genes in genomic sequences is particularly difficult and misprediction of only a single amino acid (the selenocysteine residue) may lead to misannotation of selenoproteins. In consequence, eukaryotic selenoproteomes (set of all selenoproteins) remain poorly characterized. The correct identification of selenoproteins is important since they are thought to mediate the biological functions of selenium, which is implicated in processes as diverse as male infertility, prevention of cancer and heart diseases, reduction of viral expression, ageing and the immune function (Hatfield, 2001).

First, in collaboration with two experimental groups leaded by Montserrat Coromines and Florenci Serras at the Universitat de Barcelona and Marla J. Berry at the University of Harvard, we applied the method to the Drosophila melanogaster genome, and predicted 3 selenoprotein candidates. One of them belongs to a known family of selenoproteins (SPS2), and we have tested experimentally the two other predictions with positive results (Castellano et al. 2001). They belong to the SelH and SelK selenoprotein families.

Second, in collaboration with Vadim Gladyshev's group at the University of Nebraska, we have also used this method and more sophisticated SECIS prediction tools (SECISearch 2.0) to analyze mammalian genomes. After gene and SECIS prediction paired with extensive human-rodent comparisons, we believe the human selenoproteome consists of 25 selenoproteins.

SECIS and gene prediction. (A) General form 1 SECIS divided into structural units. Form 2 has an extra short stem-loop in the apical loop. (B) PatScan SECIS pattern to search for both form 1 and form 2 SECIS. The extra stem-loop in form 2 is not taken into account when searching. (C) The two possible ways of geneid prediction for an ideal two exons gene: as a normal gene or as a selenoprotein gene with a TGA in-frame and a SECIS. Exon defining signals are shown. (D) False positive selenoprotein genes with either a TGA in-frame or a SECIS. These partial predictions are not permitted in the gene prediction.

Third, we have screened other nonmammalian vertebrate genomes in collaboration with Vadim Gladyshev's group and Alain Krol's lab at the Institut de Biologie Moléculaire et Cellulaire in Strasbourg. By means of a comparative gene prediction method between human and a puffer fish (Takifugu rubripes), a novel nonmammalian selenoprotein family was found, termed SelU.

Finally, we have contributed to the annotation of selenoprotein genes in vertebrate genomes (Tetraodon and chicken). In collaboration with Vadim Gladyshev's group, we are testing a potential new selenoprotein family in fishes.

As a result of this studies and previous works from many groups, 19-20 selenoprotein families have been so far identified in eukaryotic genomes, some of them containing several members. Different families do not show sequence similarity, or related functions. Although selenoproteins have been studied in only a few eukaryotic organisms, existing data suggests that selenoprotein genes and their Cys-containing homologs, are distributed across the whole eukaryotic spectrum in what appears to be a quite species-specific fashion. In any case, if the results obtained here through the analysis of model organisms are representative of more divergent eukaryotic genomes, the certain conclusion is that we comprehend today only a fraction of the selenium-dependent world.

Relevant publications

  • International Chicken Genome Sequencing Consortium
    Sequencing and comparative analysis of the chicken genome
    Nature, in press

  • International Tetraodon Genome Sequencing Consortium
    Analysis of the Tetraodon nigroviridis genome reveals the vertebrate protokaryotype and its duplication in fish
    Nature, in press

  • S. Castellano, S.V. Novoselov, G.V. Kryukov, A. Lescure, E. Blanco, A. Krol. V.N. Gladyshev and R. Guigó.
    Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution.
    EMBO reports, 5(1):71-77 (2004)   [Abstract]   [Full text]   [Datasets]   [Commentary to this paper]

  • G.V. Kryukov, S. Castellano, S.V. Novoselov, A.V. Lobanov, O. Zehtab, R. Guigó and V.N. Gladyshev
    Characterization of mammalian selenoproteomes.
    Science, 300(5624):1439-1443 (2003)   [Abstract]   [Full Text]   [Datasets]

  • S. Castellano, N. Morozova, M. Morey, M.J. Berry, F. Serras, M. Corominas and R. Guigó.
    In silico identification of novel selenoproteins in the Drosophila melanogaster genome.
    EMBO Reports 2(8):697-702 (2001)   [Abstract]   [Full Text]   [Datasets]

  Disclaimer webmaster