Evaluations of GENCODE v2.2 annotations
This page contains links to evaluations of GENCODE annotations by various research groups. The input used for these evaluations is here.
The summary file is a compilation of results for all GENCODE transcripts.
When a transcript was not found in a particular evaluation dataset, a '.' (dot) is placed in the corresponding column.
Columns of the summary file are, from left to right:
-
- "region": ENCODE region name.
- "gene_id": GENCODE-HAVANA gene ID.
- "vega_category": VEGA locus category.
- "transcript_id": GENCODE-HAVANA gene ID.
- "MIT_overall_score": Evaluation score ('overall score' per transcript, computed by Mike Lin in Manolis Kellis' Group at MIT) based on evolutionary conservation across human (hg17) , dog (canFam1) , mouse (mm5), and rat (rn3) genomes.
For a full explanation of how to read each evaluation, see Mike Lin's scoring rubric.
-
Detailed results:
The homepage of this study is available here.
-
- "Goldman_overall_score": 'Oddness' score (per transcript, computed by Fabio Pardi in Nick Goldman's group at EBI), quantifying the deviation from the norm of the estimated dN/dS selection pattern [1].
-
Detailed results:
- Individual exon scores: text / html (with links to plots that show the estimated selection pattern for the transcript containing the selected exon).
- Correspondence table between Fabio's exon names (based on genomic coordinates) and GENCODE exon names.
-
- "PDB_template:%coverage,": Comma-separated list of PDB hits found by BLAST with at least 90% sequence identity (data generated by Michael Tress in Alfonso Valencia's group in Madrid). Each item of the list is formatted as follows: "PDB_template_ID:%coverage_of_the_template". In case no PDB hit could be found, this field gets a "none_found" flag.
-
Detailed results:
- Tab-separated file
- Michael Tress has compiled a great deal of other BioSapiens studies here
-
- "integrated_score": Integration of the 3 above scores. integrated_score is comprised between 0 and 1 (the closest to 0, the more dubious the transcript). It is an average of the (normalised) 3 scores. Please note that this integrated score does not get any penalty when one of the 3 scores is missing.
[1]: The oddness scores ('avg_score' in the 'individual exon scores' file) were initially calculated for each coding exon, based on Kolmogorov-Smirnov tests comparing the sitewise dN/dS estimates in the exon to the collection of estimates from all the Gencode exons.
More specifically, the 'avg_score' is simply the mean of the (a) 'best_scenario' and (b) 'worst_scenario' scores, defined so that:
(a) measures how strange the exon would look in the optimistic scenario that all its sitewise dN/dS coincide with the lower bounds of their confidence interval;
(b) measures how strange the exon would look in the pessimistic scenario that all its sitewise dN/dS coincide with the higher bounds of their confidence interval.
Each transcript was then assigned the maximum oddness score among all exons that compose that transcript.
Contact: Julien Lagarde (jlagardeatimim.es)