geneid documentation: 5. Assembling predicted exons into gene structures

Table of contents:

Assembling exons:
geneid can be used to chain predicted exons, obtained through sources other than geneid, into coherent structures. The score associated to the input predicted exons will be used to predict the gene structure that maximizes the sum of scores of assembled exons. There is no point in using geneid in this context when the sum of these scores is not a meaningful score.

The set of predicted exons must be given in an additional file formatted in gff style. geneid is instructed to read this file with the command line option -O filename. This file must be sorted by the starting position of the exons (column 4 in gff format).

Working with promoters, polyA tails, CpG islands,...:
Actually, the input file may include predicted genic elements of any type, such as promoters, repeats, etc... and not only, exons. The gene model lists the rules according to which the predicted elements must be chained. Elements types are identified through the feature name in the gff file. Therefore, this name must be employed in the gene model to refer to the element. The rules specify essentially which elements might be chained together and within which distances,

geneid implementation of this problem is not completely satisfactory due to the frame/remainder requirement. The remainder is automatically computed from the frame and length of the element. But assigning a frame to intergenic elements such as promoters, CpG islands,... is pointless, so the frame column for these elements is recommended to be specified with a point: '.'. geneid internally will expand every record into three (one per frame), and frame/remainder problem will be skipped.

Another solution to avoid this geneid limitation would be introducing elements with length multiple of three and frame 0 to skip the frame/remainder problem. Both solutions are temporary and not satisfactory so this problem will be solved in next releases.

Enrique Blanco Garcia © 2003