Genome Informatics Research Lab

  IMIM * UPF * CRG * GRIB HOME DATASETS Selenoprotein prediction
Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution

S. Castellano, S. V. Novoselov, G. V. Kryukov, A. Lescure, E. Blanco, A. Krol, V. N. Gladyshev and R. Guigó*.

EMBO reports, 5(1):71-77 (2004) [Abstract] [Full Text]

*To whom correspondence should be adressed.


In this site we describe and provide all the programs and data used used to predict selenoproteins in fugu and human genomes.

Selenoproteins overview

Major points on selenoproteins are:

  1. They incorporate the aa selenocysteine (U or Sec) which is the 21st aa. It has its own tRNA which carries the anticodon for UGA (which we were taught was only a STOP codon !).

  2. So, why not all UGA codons code for Sec? because the alternative decoding of UGA is conferred by an mRNA secondary structure, termed SECIS. This structure, by means of one or more proteins, directs the ribosome to incorporate Sec.

  3. They are everywhere: Eukarya, Bacteria and Archaea. But the SECIS element is located in the 3' UTR in eukaryotes and archaeas while in the coding region in bacterias (just after the UGA). Eukarya, Bacteria and Archaea SECIS elements differ substantially.

  4. Try standard gene prediction and, as much, you will get truncated selenoprotein genes. Why not accepting that UGA can code for Sec and then, to find them, compare two or more sets of such genes from different species? This is the work presented here.

Comparative genomics

According to the Human Genome Project, Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a better understanding of how species have evolved and to determine the function of genes and noncoding regions of the genome. Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Genome researchers look at many different features when comparing genomes: sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount of noncoding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and as complex as humans.

In this case, sequence conservation across a TGA codon, between two or more genes from different species, strongly suggests selenocysteine coding function.

Genome sequences


Please, for a general introduction browse the geneid page . The modified geneid version able to predict selenoproteins can be found just below (source code in ansi C and some parameters file):


The parameters file is an external flat file read by geneid at running time. Take a look at it ! . It carries the statistical information, for a given organism, used to predict genes and the gene model (which states the relationships of the exons predicted along a sequence). Please, read the geneid handbook for details.

Human: Seleno3iso.default.1TGA.both.15.No_SECIS.0.75.param

Fugu (and Tetraodon): tetraodon.param.3.No_SECIS.1.8.param


The SECIS structure, located in the 3' UTR in both eukaryotic and archaea mRNAs, is the secondary/tertiary RNA structure which directs the UGA codon recoding. Eukaryotic and Archaea SECIS structure differ substantially.

SECISearch 2.0 identifies candidate SECIS elements in nucleotide sequence databases on the basis of their primary sequence structure and predicted free energy criteria. An online version of the program was used. Available canonical and non-canonical patterns were ran on selected fugu candidates.

SelU: a novel selenoprotein family

  Disclaimer webmaster