GetSitesWithProfile.c geneid v 1.1 source documentation


Description:
Search by signal: prediction of start codons, acceptor splice sites and donor splice sites by using a Position Weighted Array (a position weight matrix where every position is a Markov chain instead of a simple nucleotide distribution function). Predicted signals score must be higher than a fixed cutoff score depending on the type. The recorded position for predicted signals is: A from (ATG) for starts, X from XGT for donors and X from AGX for acceptors.
Briefing:
long  GetSitesWithProfile(char* s,
                          profile* p,
                          site* st, 
                          long l1, 
                          long l2) 
Scan the input sequence applying the PWA to every fragment candidate to contain a true signal (length = profile.dimension). Applying the PWA: for every position i, look for the probability of finding the (i-k..i) oligonucleotide in this position, being the candidate a real signal, over the probability being a false signal. In every position, the Markov chain is different, and the core is the set of consecutive positions where the bias is complete (k fixed nucleotides with probability 0 or 1, i.e. the characteristic dinucleotide for donors is GT in the core). If the order of Markov chain is 0 or 1, to look up the Markov string is done directly, while a loop is required for order higher than 1 (trinucleotides and so on). Candidate regions obtaining a higher than cutoff score are inserted into the result list (array). Returned the number of final predicted signals.




Enrique Blanco Garcia © 2001