Genome Informatics Research Lab

  IMIM * UPF * CRG * GRIB HOME DATASETS Selenoprotein prediction
Characterization of mammalian selenoproteomes

G. V. Kryukov, S. Castellano, S. V. Novoselov, A. V. Lobanov, O. Zehtab, R. Guigó and V. N. Gladyshev*.

Science, 300(5624):1439-1443 (2003) [
Abstract] [Full Text]

*To whom correspondence should be adressed.


In this site we describe and provide all the programs and data used used to predict selenoproteins in the human genome.

Selenoproteins overview

Major points on selenoproteins are:

  1. They incorporate the aa selenocysteine (U or Sec) which is the 21st aa. It has its own tRNA which carries the anticodon for UGA (which we were taught was only a STOP codon !).

  2. So, why not all UGA codons code for Sec? because the alternative decoding of UGA is conferred by an mRNA secondary structure, termed SECIS. This structure, by means of one or more proteins, directs the ribosome to incorporate Sec.

  3. They are everywhere: Eukarya, Bacteria and Archaea. But the SECIS element is located in the 3' UTR in eukaryotes and archaeas while in the coding region in bacterias (just after the UGA). Eukarya, Bacteria and Archaea SECIS elements differ substantially.

  4. Try standard gene prediction and, as much, you will get truncated selenoprotein genes. Why not accepting that UGA codes for Sec as long as there is a potential SECIS around? why not running SECIS prediction and pinpoint real ones using a comparative approach? This is the work presented here.

Genome sequences


The SECIS structure, located in the 3' UTR in both eukaryotic and archaea mRNAS, is the secondary/tertiary RNA structure which directs the UGA codon recoding. Eukaryotic and Archaea SECIS structure differ substantially.

SECISearch 2.0 identifies candidate SECIS elements in nucleotide sequence databases on the basis of their primary sequence structure and predicted free energy criteria. The program has 3 modules:

1- Search for SECIS (based on PatScan):

2- Thermodynamic evaluation (based on RNAfold):

3- SECIS visualization (RNAnice):

The program can be accessed through an online web server . Connect and check the SECIS patterns used in this work.


Please, for a general introduction browse the geneid page . The modified geneid version able to predict selenoproteins can be found just below (source code in ansi C and some parameters file):


The parameters file is an external flat file read by geneid at running time. Take a look at it ! . It carries the statistical information, for a given organism, used to predict genes and the gene model (which states the relationships of the exons predicted along a sequence). Please, read the geneid handbook for details.

Human with SECIS: Seleno3iso.default.1TGA.both.15.param

Human without SECIS: Seleno3iso.default.1TGA.both.15.No_SECIS.0.75.param

Fugu (and Tetraodon) without SECIS: tetraodon.param.3.No_SECIS.1.8.param

Novel selenoproteins

Protein sequence (U stand for Sec) and SECIS sequence divided into structural units for each novel selenoprotein gene in human:

SelV: protein and SECIS

SelH: protein and SECIS

SelK: protein and SECIS

SelS: protein and SECIS

SelI: protein and SECIS

SelO: protein and SECIS

GPx6: protein and SECIS

  Disclaimer webmaster