Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME Research * Evaluation
Evaluation of Gene Prediction Programs

One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification during the past decade, the accuracy of gene prediction tools is not sufficient to locate the genes reliably in higher eukariotic genomes. Thus, while the precise sequence of the human genome is increasingly deciphered, gene number estimations estimations are becoming increasingly variable.

We have contributed to the understanding of the accuracy of gene prediction programs. In 1996 we published a comprehensive evaluation of gene prediction programs accuracy (Burset and Guigó, 1996). Both, the methodology and the test sequences generated in this work, have been since then "de facto" standard for the evaluation of gene prediction software. Recently we have published a revised version of this evaluation (Guigó et al., 2000). This revised evaluation suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.

Relevant publications

  • R. Guigó and T. Wiehe.
    "Gene Prediction Accuracy in Large DNA Sequences."
    In M.Y. Galperin and E.V. Koonin, editors:
    Frontiers in Computational Genomics. Chapter 1, Pp:1-33.
      (Functional Genomics Series, Volume 3)
    Caister Academic Press, United Kingdom, 2003. ISBN: 0-9542464-4-6.   [Table of Contents]

  • R. Guigó, P. Agarwal, J.F. Abril, M. Burset and J.W. Fickett.
    "An Assessment of Gene Prediction Accuracy in Large DNA Sequences."
    Genome Research 10(10):1631-1642 (2000)   [Abstract]   [Datasets]

  • R. Guigó.
    "Computational gene identification: An open problem."
    Computers and Chemistry, 21(4):215-222 (1997)   [PubMed Abstract]

  • M. Burset and R. Guigó.
    "Evaluation of gene structure prediction programs."
    Genomics, 34(3):353-357 (1996)   [Abstract]   [Datasets]

  Disclaimer webmaster