geneid allows to integrate ab initio predictions with external
evidence (annotations), such as annotated genes. The external
evidence is read from an additional file in gff format. geneid
is instructed to read this file with the command line option -R filename.
This input file must sorted by the starting position (column 4 in gff).
There is a difference between using options -O filename and
-R filename. In the first case, only elements extracted from file
will be assembled while, in the second one, gene predictions are built from
both file records and from geneid predictions.
If the elements in the input file are assigned an score, then they will
"compete" with geneid original predictions (if any) to be in the final gene
structure. For instance, this record will fight against ab initio predictions:
AE002566 lab_XX Internal 66255 66323 3.27 + 1
|
If no score is given for the element (a dot "."), then this element is
supposed to be mandatory (forced) in the final gene prediction (unless a conflicting
element with no score is also given in the input file). For instance, this record
will be in the final prediction.
AE002566 lab_XX Internal 66255 66323 . + 1
|
The frame can be either set in the column 8 of gff format or
skipped by using the wildcar "." when is unknown. In the last case,
geneid generates 3 equivalent elements (one per possible frame),
computing the corresponding remainder in each case, keeping the frame/remainder
consistency anyway when assembling is done. For instance, this record is
internally expanded to these 3 exons, being incorporated to the set of
candidate exons to be part of final gene prediction:
AE002566 lab_XX Internal 66255 66323 3.27 + .
AE002566 lab_XX Internal 66255 66323 3.27 + 0
AE002566 lab_XX Internal 66255 66323 3.27 + 1
AE002566 lab_XX Internal 66255 66323 3.27 + 2
|
By using the optional group field (column 9 in gff format), user
is able to specificy whether one annotated gene (annotation) introduced
in the input file has to be preserved if it is incorporated in the final results
or geneid predictions can be mixed within that annotation.
For instance, given this annotation (1), this gene will be preserved
in the final output. But, given this other annotation (2) for the same gene
but without setting a group identifier, we can obtain predictions such as
the following (3):
(1)
AE002566 external Terminal 21839 22922 18.37 - 1 gene_2
AE002566 external Internal 23679 24029 7.99 - 1 gene_2
AE002566 external First 30732 30775 -1.11 - 0 gene_2
(2)
AE002566 external Terminal 21839 22922 18.37 - 1
AE002566 external Internal 23679 24029 7.99 - 1
AE002566 external First 30732 30775 -1.11 - 0
(3)
AE002566 external Terminal 21839 22922 18.37 - 1
AE002566 external Internal 23679 24029 7.99 - 1
AE002566 geneid_v1.2 Internal 28002 28007 1.14 - 1
AE002566 external First 30732 30775 -1.11 - 0
|