geneid documentation: | 7. Improving predictions by using sequence homology information |
Table of contents: |
Frame definition in blast and geneid |
It is important to understand that frame definition in blast and
geneid is different and therefore a simple translation from one into
another format is needed. Altough this conversion is implemented inside
geneid, it might be necessary to keep it in mind to analyze and
understand the new results.
In blast, there are 3 frames (1,2 or 3)
corresponding to the mathematical modulus operation computed following this
formula:
i.e. Given a sequence which its first position is called 1, one HSP which starts
at the position 7 is said to be in frame 1 (7 modulus 3 = 1), while a HSP
which starts at the position 9 is said to be in frame 3 (9 modulus 3 = 0
and 0 is coded by 3).
IMPORTANT: HSPs might appear in the negative (reverse) strand of the
input sequence. In that case, the HSPs coordinates must be translated into
coordinates regarding to the reverse reading direction before computing the
frame.
In geneid, the frame is defined as the number of nucleotides (0,1,2)
from the first nucleotide in the exon sequence to the the first nucleotide
in the first COMPLETE codon translated from the genomic sequence
(see section Frame and Remainder (chapter 2)).
Obviously, it is necessary a method to convert geneid frames (exons)
into blast frames (HSPs/SRs) to use this information properly. The
following formula makes both types of frame definitions compatible
Given an exon and a SR (both in the same strand):
blastFrame(SR) = (starting(exon) + geneidFrame(exon)) % 3; remembering blastFrame = [1,2,3] and geneidFrame = [0,1,2]. |
Enrique Blanco Garcia © 2001