Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME SOFTWARE * gff2ps HowTo Sources
 
Comparing Sources from GFF Data with gff2ps
 
   

SUMMARY

One of the gff2ps program strenghts is comparing results from different sources, so it is easy to see differences among a genomic sequence annotation and one or more gene prediction programs, including results from other programs such blast (always processing those results and converting to a GFF compliant file suitable for gff2ps of course). Here we are illustrating how we generated the PostScript plots you can find in the Human/Mouse Gene Prediction page. If you have problems when visualizing the PostScript files take a look on the section about using ghostview.


CONTENTS

GFF input files


As example we are using Hsap_BTK files. To avoid that both sgp predictions (for Full Ortologous -FO- and WGS sequences -3X-) collapse on the same track of geneid ones, we replaced 'geneid_v1.0' from source field by 'SGP.3X' and 'SGP.homol' (in 'Hsap_BTK_sgp3X' and 'Hsap_BTK_sgpFO' respectively). Click on the filenames to see their contents.

  Hsap_BTK_annotation.gff   Annotation of real genes for BTK region converted to GFF from Dan Brown annotation.

  Hsap_BTK_sgp3X.gff   sgp predictions using as evidence tblastx on human sequence against the WGS database (currently 3X).

  Hsap_BTK_sgpFO.gff   sgp predictions using as evidence tblastx on human sequence against mouse ortologous sequence (Hsap_BTK over Mmus_BTK).

  Hsap_BTK_geneid.gff   geneid predictions on Hsap_BTK sequence.

  Hsap_BTK_genscan.gff   genscan predictions on Hsap_BTK sequence.

  Hsap_BTK_tblastx3X.gff   tblastx similarity regions from human sequence searches against the WGS database (currently 3X).

  Hsap_BTK_tblastxFO.gff   tblastx similarity regions from human sequence searches against mouse ortologous sequence (Hsap_BTK over Mmus_BTK).

  Hsap_BTK_repeatmasker.gff   repeats found in BTK human sequence converted to GFF (filtered from the 'Hsap_BTK.out' file obtained with RepeatMasker).

gff2ps customization files


We ran gff2ps on the same GFF input files, using two slightly different customization files. Following table shows the basic differences between them, all the other variables are set to same values:

  brown.a4.rc   Page size is set to A4, there are three blocks per page and 10Kbp will appear on each block.
  brown.a3.rc   Page size is set to A3, there are four blocks per page and 30Kbp will appear on each block.

Structure of gff2ps customization files is shown in the following picture.

#
# Optional Header
#    (gff2ps generates a standard header when creates Default Custom File)
#
                                    (Comments and empty lines are skipped)

# L #                                 (This is Block Separator for Layout)
variable_name1=value    # blah,blah,blah…
  (Note that there is a blank space at least, shown here as '    ' before #)

  ···
# blah, blah, blah…    (Extra comment-lines can be added where you need)
  ···
variable_namen=value

# F # blah, blah, blah…
                  (You can place extra comments also here, after second #)

GFF-feature_key1::feature_variable_name=value
  ···
GFF-feature_keyn::feature_variable_name=value

# G #
GFF-group_key1::group_variable_name=value
  ···
GFF-group_keyn::group_variable_name=value

# S #
GFF-source_key1::source_variable_name=value
  ···
GFF-source_keyn::source_variable_name=value

Customization files and available variables are explained in chapter four of the User's Manual (available from gff2ps home page).

Running gff2ps


There are two basic ways to work on GFF records with gff2ps: you can merge all the GFF records from different sources into a single GFF file or you can split GFF records from different sources into different files. The second approach provides to you the advantage of easily ordering sources in your plots by ordering the different GFF source files in the command-line. You must keep in mind that sources appear on the PostScript figure in the same order as they are given from input (so that we prefer to work with separate files for each source).
If you manage to have a fixed set of sequence, source and feature names, you can define a fixed set of customization variables (as we did in the previous section), and reuse the custom files for all the plots having the same layout but different datasets (and it is easy to automate the process at the command-line/scripts level too).

The following two commands are using the same customization file on different files (for sgp and tblastx sources):

gff2ps -VC brown.a3.rc -- Hsap_BTK_annotation.gff  \
    Hsap_BTK_sgpFO.gff Hsap_BTK_geneid.gff         \
    Hsap_BTK_genscan.gff Hsap_BTK_tblastxFO.gff    \
    Hsap_BTK_repeatmasker.gff                      \
    > Hsap_BTK_FO_a3.ps 2> Hsap_BTK_FO_a3.log

gff2ps -VC brown.a3.rc -- Hsap_BTK_annotation.gff  \
    Hsap_BTK_sgp3X.gff Hsap_BTK_geneid.gff         \
    Hsap_BTK_genscan.gff Hsap_BTK_tblastx3X.gff    \
    Hsap_BTK_repeatmasker.gff                      \
    > Hsap_BTK_3X_a3.ps 2> Hsap_BTK_3X_a3.log

Note that we preserved the sources ordering (annotation, sgp, geneid, genscan, tblastx and repeatmasker). Backslashes ('\') mean, in bash shell, that the command and its parameters are passed in more than one line, so next line is appended to the previous command-line.

The following two commands are using different customization files on the same files:

gff2ps -VC brown.a4.rc -- Hsap_BTK_annotation.gff  \
    Hsap_BTK_sgp3X.gff Hsap_BTK_tblastx3X.gff      \
    Hsap_BTK_sgpFO.gff Hsap_BTK_tblastxFO.gff      \
    Hsap_BTK_geneid.gff Hsap_BTK_genscan.gff       \
    Hsap_BTK_repeatmasker.gff                      \
    > Hsap_BTK_ALL_a4.ps 2> Hsap_BTK_ALL_a4.log

gff2ps -VC brown.a3.rc -- Hsap_BTK_annotation.gff  \
    Hsap_BTK_sgp3X.gff Hsap_BTK_tblastx3X.gff      \
    Hsap_BTK_sgpFO.gff Hsap_BTK_tblastxFO.gff      \
    Hsap_BTK_geneid.gff Hsap_BTK_genscan.gff       \
    Hsap_BTK_repeatmasker.gff                      \
    > Hsap_BTK_ALL_a3.ps 2> Hsap_BTK_ALL_a3.log

You can see here how easy is to merge new sources to the final plots and how easy is to change the plot layout when using different customization files. Next table summarizes outputs from each custom file and input dataset:

  Custom
File
PostScript
File
GFF2PS
Log File
  FO Dataset  
  brown.a3.rc  
  Hsap_BTK_FO_a3.ps     Hsap_BTK_FO_a3.log  
3X Dataset Hsap_BTK_3X_a3.ps Hsap_BTK_3X_a3.log
FO + 3X
Dataset
brown.a4.rc Hsap_BTK_ALL_a4.ps Hsap_BTK_ALL_a4.log
brown.a3.rc Hsap_BTK_ALL_a3.ps Hsap_BTK_ALL_a3.log

 
  Disclaimer webmaster