The Adh Region Poster

"The Challenge of Annotating a Complete Eukaryotic Genome"

Genome Informatics Group

J.F. Abril,  G. Parra,  M. Burset,  S. Castellano,  E. Blanco,  R. Guigò



*  BACK to
 GFF2PS
 HOME PAGE
* BACK to
 Josep F. ABRIL's
  HOME PAGE 

* INDEX


*  Clicking on the figures you will visualize the original PostScript file.
*  Clicking on the green loop you will visualize the configuration file.
*  Clicking on the stars you also can download a tarball for each figure, containing the PostScript file, a brief report on how it was obtained and the configuration file used (if any).

Introduction

     We are developing tools for schematic views of genomic sequences and annotation features. As genomic sequences accumulate, visualization tools are becoming essential to analyze and interpret sequence data. Our program makes easier the comparisons among different genomic structures. gff2ps works with single or multiple sequences and produces high quality PostScript plots. Features such as gene structure, exon predictions, EST positions, and so on, are displayed at according positions with user-defined attributes.
     The posters shown at ISMB'99 meeting are an example of what can be done with that tool. Here you can find three examples of what can be generated from the same data-set, applying a slightly modified customization file and few command-line options.
     Basically, the following plots show the submitted genomic predictions for the tutorial #3 on "Drosophila melanogaster ADH region annotation experiment (GASP1)".


Acknowledgments

     We thank Martin Reese, Michael Ashburner and the people working in the Drosophila melanogaster teams for giving us the opportunity to make the tutorial poster.
     We also thank organizers of ISMB'99 to let us to highlight our poster for the whole meeting.

*  Back to Index   

The Input Files


     The sequence contig of 2.9Mb is reported in the Adh paper in Genetics. The original GFF data-set can be downloaded from GASP1 data directories as All.gff, which contains in a single file all the submissions.
     I've splited this file in separate files for submissions of each participating group in GASP1. I've done that to easily check any problem which will appear on the data-sets and to easily perform any appropriate change on such files (like simplifying long group names in order to improve the visualization of the labels for groups that are closer). This trick also brings us another advantage, because swapping filenames on the command-line we can swap directly source ordering on the plot while we are tuning the final plots. You can obtain the tarball containing the GFF-files used to produce all the plots from our ftp server.
     The following table reports how much GFF-elements and Groups are defined for each submission, their total record number and their distribution on each strand.

  FILE/AUTHORPROGRAMGFF record #StrandsGroup #Strands
+ . - + . -
Adh std1 513 246 0 267 43 20 0 23
std3 3756 1922 0 1834 221 117 0 104
Vector Fourier 1 0 1 0 1 0 1 0
GCcontent 1 0 1 0 1 0 1 0
GROUP A Birney GeneWise 252 148 0 104 184 111 0 73
Borodovsky GeneMarkHMM 3219 1656 0 1563 265 140 0 125
GaasterlandMAGPIE 551 397 0 154 53 30 0 23
MAGPIE_exon 1835 985 0 850 496 256 0 240
Guigo GeneID 3342 1842 0 1500 1 0 0 1
Henikoff BLOCKS 346 193 0 153 0 0 0 0
Krogh HMMGene 2560 1389 0 1171 214 122 0 92
Mural GRAIL 855 477 0 378 21 0 0 21
Reese Genie 2358 1317 0 1041 0 0 0 0
GenieEST 2547 1353 0 1194 0 0 0 0
GenieESTHOM 2694 1440 0 1254 0 0 0 0
SolovyevFGenesCGG1 3334 1668 0 1666 2 1 0 1
FGenesCGG2 598 324 0 274 133 58 0 75
FGenesCGG3 8688 4341 0 4347 454 220 0 234
GROUP B Gaasterland MAGPIE_Promoter 421 220 0 201 0 0 0 0
Ohler LME_IMC 3062 1508 0 1554 0 0 0 0
LME_SSM 2552 1271 0 1281 0 0 0 0
Reese GeniePROM 234 135 0 99 0 0 0 0
Werner CoreInspectorTSS 23 12 0 11 0 0 0 0
GROUP C Benson TandemRepeatFinder 299 0 299 0 0 0 0 0
Gaasterland MAGPIE_Calypso 1567 0 1567 0 0 0 0 0
TOTAL 45608 22844 1868 20896 2089 1075 2 1012

GROUP A includes all gene-feature predictions,
GROUP B all predictions on promoters and TSS (transcription start sites),
GROUP C contains tandem-repeat predicitions.


The original poster was obtained with version 0.92b of gff2ps, version 0.94 was used to make the current three examples of the Adh Poster.

*  Back to Index   

The Big Poster


.
Josep, Moisés and the ISMB'99 Adh Poster.

The nearby 3Mb from Drosophila Adh region were splited into three b0-size pages (100x145cm), each page containing 1Mb. There are four blocks with 250Kb each. We have three areas on each block, the upper area is for forward predictions, the lower area for reverse ones, and in the middle we have the vectors as color gradients and the predictions on tandem repeats (Group C predictions). Training sets, named with std1 and std3, are the most external tracks, followed by the gene-feature predictions (Group A) and the promoters and TSS (Group B).

The bash command-line was:
$BIN/gff2ps -VC ISMB_b0.rc -p -B 4 -P 3 -s b0 --                                     \
        Adh.std1.gff Adh.std3.gff                                                    \
        Birney.GeneWise.gff Borodovsky.GeneMarkHMM.gff Gaasterland.MAGPIE.gff        \
        Gaasterland.MAGPIE_exon.gff Guigo.GeneID.gff Henikoff.BLOCKS.gff             \
        Krogh.HMMGene.gff Mural.GRAIL.gff Reese.Genie.gff Reese.GenieEST.gff         \
        Reese.GenieESTHOM.gff Solovyev.FGenesCGG1.gff Solovyev.FGenesCGG2.gff        \
        Solovyev.FGenesCGG3.gff Gaasterland.MAGPIE_Promoter.gff Ohler.LME_IMC.gff    \
        Ohler.LME_SSM.gff Reese.GeniePROM.gff Werner.CoreInspectorTSS.gff            \
        Adh.Fourier.gff Benson.TandemRepeatFinder.gff Gaasterland.MAGPIE_Calypso.gff \
        Adh.GCcontent.gff > ISMB1999_b0.ps 2> ISMB1999_b0.rpt
*  Back to Index 
 Custom-file  *  Tarball file  *

Putting All Stuff Into A4 Pages


ISMB1999_a4.ps ISMB1999_a4.ps ISMB1999_a4.ps ISMB1999_a4.ps ISMB1999_a4.ps
The previous five figures are only the first five pages of a total of thirty pages.

Not all people has a big format printer to obtain a hard-copy of the B0-size pages, but A4 is a quite common sheet format. The problem is a scale problem, although gff2ps can fit the 3Mb into a single A4 page, the plot is too hard to visualize. What solution can we find ? We have two options: fit one vertical block into a number of horizontal pages as we made in this section, or split such block into many vertical and horizontal as we have done in the following section. We have forced in this case to display only 100Kb per page.

The bash command-line was:
$BIN/gff2ps -VC ISMB_a4.rc -B 1 -N 100000 -s a4 --                                   \
        Adh.std1.gff Adh.std3.gff                                                    \
        Birney.GeneWise.gff Borodovsky.GeneMarkHMM.gff Gaasterland.MAGPIE.gff        \
        Gaasterland.MAGPIE_exon.gff Guigo.GeneID.gff Henikoff.BLOCKS.gff             \
        Krogh.HMMGene.gff Mural.GRAIL.gff Reese.Genie.gff Reese.GenieEST.gff         \
        Reese.GenieESTHOM.gff Solovyev.FGenesCGG1.gff Solovyev.FGenesCGG2.gff        \
        Solovyev.FGenesCGG3.gff Gaasterland.MAGPIE_Promoter.gff Ohler.LME_IMC.gff    \
        Ohler.LME_SSM.gff Reese.GeniePROM.gff Werner.CoreInspectorTSS.gff            \
        Adh.Fourier.gff Benson.TandemRepeatFinder.gff Gaasterland.MAGPIE_Calypso.gff \
        Adh.GCcontent.gff > ISMB1999_a4.ps 2> ISMB1999_a4.rpt
*  Back to Index 
 Custom-file  *  Tarball file  *

Making "Virtual" Posters


ISMB1999_a4_VP.ps ISMB1999_a4_VP.ps ISMB1999_a4_VP.ps ISMB1999_a4_VP.ps ISMB1999_a4_VP.ps
The previous five figures are only the five vertical pages for the first horizontal page of a total of thirty by five pages.

If you haven't a big format printer, you can, at least, glue the 150 pages composing this poster version to obtain a "virtual" page-size poster. In this case in the config file I've set that each source is represented in the minimum track/line number to fit all its elements and avoiding overlapping groups.

The bash command-line was:
$BIN/gff2ps -VC ISMB_a4_VP.rc -B 0 -N 100000 -s a4 --                                \
        Adh.std1.gff Adh.std3.gff                                                    \
        Birney.GeneWise.gff Borodovsky.GeneMarkHMM.gff Gaasterland.MAGPIE.gff        \
        Gaasterland.MAGPIE_exon.gff Guigo.GeneID.gff Henikoff.BLOCKS.gff             \
        Krogh.HMMGene.gff Mural.GRAIL.gff Reese.Genie.gff Reese.GenieEST.gff         \
        Reese.GenieESTHOM.gff Solovyev.FGenesCGG1.gff Solovyev.FGenesCGG2.gff        \
        Solovyev.FGenesCGG3.gff Gaasterland.MAGPIE_Promoter.gff Ohler.LME_IMC.gff    \
        Ohler.LME_SSM.gff Reese.GeniePROM.gff Werner.CoreInspectorTSS.gff            \
        Adh.Fourier.gff Benson.TandemRepeatFinder.gff Gaasterland.MAGPIE_Calypso.gff \
        Adh.GCcontent.gff > ISMB1999_a4_VP.ps 2> ISMB1999_a4_VP.rpt
*  Back to Index 
 Custom-file  *  Tarball file  *

*  BACK to
 GFF2PS
 HOME PAGE
* BACK to
 Josep F. ABRIL's
  HOME PAGE 

jabril@imim.es