Actually, the input file may include predicted genic elements of any type, such
as promoters, repeats, etc... and not only, exons. The gene model lists
the rules according to which the predicted elements must be chained.
Elements types are identified through the feature name in the
gff file. Therefore, this name must be employed in the gene model
to refer to the element. The rules specify essentially which elements might
be chained together and within which distances,
geneid implementation of this problem is not completely satisfactory
due to the frame/remainder requirement. The remainder is automatically computed
from the frame and length of the element. But assigning a frame to intergenic
elements such as promoters, CpG islands,... is pointless, so the frame
column for these elements is recommended to be specified with a point: '.'.
geneid internally will expand every record into three (one per frame),
and frame/remainder problem will be skipped.
Another solution to avoid this geneid limitation would be
introducing elements with length multiple of three and frame 0 to
skip the frame/remainder problem. Both solutions are temporary and not
satisfactory so this problem will be solved in next releases.
|