1. Presentation

Precise control of gene transcription initiation is one of the most important steps in the regulation of gene expression. The information for the control of the initiation of the RNA synthesis by the RNA polymerase II is mostly contained in the gene promoter, a region usually 200 to 2000 bp long upstream of the transcription start site (TSS) of the gene. The transcription factors (TFs) interact with sequence specific elements or motifs (the TF binding sites, TFBSs) in the promoter regions. The promoter region can be seen as a linear array of binding motifs that integrates information about the current status of the cell to alter the rate of gene transcription initiation. One promoter usually contains 10 to 50 TFBSs to harbour 5 to 15 different TFs. TFBSs are tipically 5-15 bp long. In addition, TFBSs associated to the same TF are known to tolerate one or more specific substitutions without losing functionality.

Experimental detection of TFBSs is extremely laborious and complex. Thus, computational approaches have been widely used to overcome the problem. However, computational searches of TFBSs on a promoter sequence are often useless because of the high probability of predicting false positives. Recently, the phylogenetic footprinting methods that align several promoters of related genes have been proved to be useful to elucidate the conserved sites within the regulatory sequences. However, the training of these programs is difficult due to the lack of abundant experimental data, specially in the case of orthologous genes.

ABS is a public database of experimentally verified orthologous transcription factor binding sites (TFBSs). Annotations have been collected from the literature and are manually curated. For each gene, TFBSs conserved in orthologous sequences from at least two different species must be available. Promoter sequences as well as the original GenBank or RefSeq entries are additionally supplied in case of future identification conflicts. The final TSS annotation has been refined using the database dbTSS. Up to this release, 500 bps upstream the annotated transcription start site (TSS) according to REFSEQ annotations have been always extracted to form the collection of promoter sequences from human, mouse, rat and chicken.

For each regulatory site, the position, the motif and the sequence in which the site is present are available in a very simple format. Cross-references to EntrezGene, PubMed and RefSeq are also provided for each annotation. Apart from the experimental promoter annotations, predictions by popular collections of weight matrices are also provided for each promoter sequence. In addition, global and local alignments and graphical dotplots are also available.


Region of DNA that controls a discrete hereditary characteristic

Copying of one strand of DNA into a complementary RNA sequence by the enzyme RNA polymerase

Gene Promoter:
Sequence of DNA upstream the gene to which RNA polymerase binds to begin transcription

Transcription Factors:
Proteins required to initiate or regulate transcription in eukaryotes

Transcription Factor Binding Site:
Short fragment of DNA in the promoter recognized and bound by a certain transcription factor

Orthologous binding sites:
A binding site conserved and functional in the orthologous copy of a gene in another species

A word on a sequence representing a functional element

Pattern Discovery:
A search of related patterns that are previously unknown in a set of sequences

