Pattern discovery algorithm

8. EM algorithm

0. Initializing:

Random selection of one motif occurrence for every sequence.

Align the motifs to compute for the first time the content of the matrices M and B.

1. E step:

In the first sequence, use the matrices M i B to score every candidate segment of 10 bps

For every segment, normalize the score according to the rest of candidates (computing a weight):

Repeat the same operation for the rest of sequences, recording both the candidates segments and their scores (weights), taking into account the current estimation of the motif and background (matrices M and B).

2. M step:

Update the matrices M and B with every candidate:

M: For each nucleotide in a candidate, according to the position within the motif, update the corresponding position by adding the weight.

B: For each nucleotide of a sequence that is not within the corresponding motif, update the background matrix by adding the weight.

Normalize the matrices.

3. Repeat steps 1 and 2 to convergence