GetSitesWithProfile.c | geneid v 1.1 source documentation |
Description: |
Search by signal: prediction of start codons, acceptor splice sites and donor
splice sites by using a Position Weighted Array (a position weight matrix where
every position is a Markov chain instead of a simple nucleotide distribution
function). Predicted signals score must be higher than a fixed cutoff score
depending on the type. The recorded position for predicted signals is:
A from (ATG) for starts, X from XGT for donors and X from AGX for acceptors.
|
Briefing: |
long GetSitesWithProfile(char* s, profile* p, site* st, long l1, long l2) |
Scan the input sequence applying the PWA to every fragment candidate to
contain a true signal (length = profile.dimension). Applying the PWA: for
every position i, look for the probability of finding the (i-k..i) oligonucleotide in
this position, being the candidate a real signal, over the probability being
a false signal. In every position, the Markov chain is different, and the core is the set of
consecutive positions where the bias is complete (k fixed nucleotides
with probability 0 or 1, i.e. the characteristic dinucleotide for donors is
GT in the core). If the order of Markov chain is 0 or 1, to look up the Markov string
is done directly, while a loop is required for order higher than 1
(trinucleotides and so on). Candidate regions obtaining a higher than
cutoff score are inserted into the result list (array). Returned the number
of final predicted signals.
|
Enrique Blanco Garcia © 2001