Gene Prediction and Identification

Enrique Blanco and Charles Chapple

Centre de Regulaciˇ Gen˛mica, Barcelona (Spain)

Lisbon, 2004


With contributions from: Josep Francesc Abril, Sergi Castellano, Robert Castelo, Genis Parra and Roderic Guigo
1. Monday, April 5th

a. Lecture: Presentation of the course

b. Paper discussion: Analysis of the Human genome

Lander, E.S., et al. Initial sequencing and analysis of the human genome.
Nature 409: 860-921 (2001).

Group 1. Human genome general features
- Broad genomic landscape(pg 875)
- Long variation in GC content(pg 876)
- CpG islands (pgs 877,878)
- Repeat content of the human genome (pg 879)

Group 2. Human genome gene annotation
- Gene content of the human genome (pg 892)
- Protein-coding genes (pgs 896-901)

c. Practical exercise: Learning to use a genome browser



2. Tuesday, April 6th

a. Lecture: The genefinding problem

b. Practical exercise: Searching patterns using consensus and weight matrices

c. Paper discussion: Coding statistics

Guigo, R. DNA composition, codon usage and exon prediction.
Genetic databases 53-80. Academic press (1999).

Group 3:
- Introduction(pg 2)
- Measures dependent on a model of coding DNA (pgs 5-16)

d. Practical exercise: Finding coding regions in DNA sequences with PERL



3. Wednesday, April 7th

a. Paper discussion: GENEID, a gene finding tool

Parra, G., Blanco, E. and Guigo, R. Geneid in Drosophila.
Genome Research 10:511-515 (2000).

Group 4: sections Introduction, Methods and Results

b. Practical exercise: Annotation of the human sequence HS307871

c. Practical exercise: Downloading geneid into the local computer



4. Thursday, April 8th

a. Paper discussion: Comparative genomics

Mouse Genome Sequencing Consortium. Initial sequencing and
comparative analysis of the mouse genome.
Nature 420: 520-562.

Group 5: differences between human and mouse genome
- (G+C) content (pg 528)
- CpG islands (pgs 529,530)
- Mouse genes (pg 536)
- Initial and current human gene catalogue (pg 537,538)
- Mouse gene catalogue (pg 538)
- Pseudogenes (pg 538)
- Comparison of mouse and human gene sets (pg 539)
- De novo gene prediction (pg 539)
- Conservation of gene structure (pg 551)
- Conservation in known regulatory regions (pg 552)

b. Practical exercise: Comparative gene prediction using SGP

c. Paper discussion: New problems, new solutions (in gene finding)

Castellano, S. et al. In silico identification of novel selenoproteins
in the Drosophila melanogaster genome.
EMBO Reports 2:697-702 (2001).

Group 6: sections Introduction, Results and Discussion

d. Closing remarks



Enrique Blanco Garcia © 2004
eblanco@imim.es