Evaluation of Gene Prediction Programs
One of the first useful products from the human genome will be a set
of predicted genes. Besides its intrinsic scientific interest, the
accuracy and completeness of this data set is of considerable
importance for human health and medicine. Though progress has been
made on computational gene identification during the past decade, the
accuracy of gene prediction tools is not sufficient to locate the
genes reliably in higher eukariotic genomes. Thus, while the precise
sequence of the human genome is increasingly deciphered, gene number
estimations estimations are becoming increasingly variable.
We have contributed to the understanding of the accuracy of gene
prediction programs. In 1996 we published a
of gene prediction programs accuracy (Burset and Guigó, 1996). Both,
the methodology and the test sequences generated in this work, have
been since then "de facto" standard for the evaluation of gene
prediction software. Recently we have published a
revised version of
this evaluation (Guigó et al., 2000). This revised evaluation suggest
that though gene prediction will improve with every new protein that
is discovered and through improvements in the current set of tools, we
still have a long way to go before we can decipher the precise exonic
structure of every gene in the human genome using purely computational
- R. Guigó and T. Wiehe.
"Gene Prediction Accuracy in Large DNA Sequences."
In M.Y. Galperin and E.V. Koonin, editors:
Frontiers in Computational Genomics. Chapter 1, Pp:1-33.
(Functional Genomics Series, Volume 3)
Caister Academic Press, United Kingdom, 2003. ISBN: 0-9542464-4-6. [Table of Contents]
- R. Guigó, P. Agarwal, J.F. Abril, M. Burset and J.W. Fickett.
"An Assessment of Gene Prediction Accuracy in Large DNA Sequences."
Genome Research 10(10):1631-1642 (2000) [Abstract] [Datasets]
- R. Guigó.
"Computational gene identification: An open problem."
Computers and Chemistry, 21(4):215-222 (1997) [PubMed Abstract]
- M. Burset and R. Guigó.
"Evaluation of gene structure prediction programs."
Genomics, 34(3):353-357 (1996) [Abstract] [Datasets]