8. EM algorithm


0. Initializing:
  1. Random selection of one motif occurrence for every sequence.

  2. Align the motifs to compute for the first time the content of the matrices M and B.

1. E step:
  1. In the first sequence, use the matrices M i B to score every candidate segment of 10 bps

  2. For every segment, normalize the score according to the rest of candidates (computing a weight):


  3. Repeat the same operation for the rest of sequences, recording both the candidates segments and their scores (weights), taking into account the current estimation of the motif and background (matrices M and B).

2. M step:
  1. Update the matrices M and B with every candidate:

    • M: For each nucleotide in a candidate, according to the position within the motif, update the corresponding position by adding the weight.

    • B: For each nucleotide of a sequence that is not within the corresponding motif, update the background matrix by adding the weight.

  2. Normalize the matrices.

3. Repeat steps 1 and 2 to convergence


PREV NEXT
Enrique Blanco Garcia © 2002 eblanco@imim.es