Practice 3. Definitions EMBL - UPF course 2001
   
 

Gibbs sampling:

  • Looking for the most conserved (and unknown) pattern of a given fixed length in a set of sequences.
  • The algorithm works iteratively, repeating some operations until some stopping condition is met (usually, pattern quality).
  • The results are represented as weight matrices or logos.
  • The score measures the significance of alignments sampled.

Disadvantage:
Due to the take of random decisions, this algorithm does not guarantee that the actual best conserved pattern will be found having the highest score, and converging to a local maximum, rather than to the global one. Therefore, more than one running over the same input is recommended.

Example (AlignAce output):

Motif 1
GCAAAGCGTGA	0	13	0
GCACTGCCTGA	0	218	1
GCCCAGCGTGA	0	486	1
GCATTGCGGGC	1	251	0
GCAGAGCCCGA	1	440	1
GAAAAGCGCGC	2	312	1
GCACTGCCCGA	2	446	0
CCATTGCCCGA	3	196	0
GCAAAGCCTGA	4	100	1
GAGTAGCCTGA	4	155	0
GCAGAGCCTGC	4	348	1
*** *******
MAP Score: 15.1348

Example (makelogo output):





Algorithm. Given n input sequences.

Starting step:
Select randomly one subsequence of fixed length from each input sequence to build the initial set of occurences (pattern or motif).

Iteration step:

  • Choose one input sequence
  • Extract the occurence of this sequence and build a weight matrix with the rest of occurences.
  • Use this matrix to score every segment of the fixed length in the selected sequence (sampling).
  • Choose "randomly" a new occurrence in the selected sequence according to the scores obtained by every segment candidate to be the new one.
  • Add the new occurences to the output set.
  • Repeat until no improvement (score) is reached.

L A S T H O M E N E X T

Enrique Blanco, Sergi Castellano and Genis Parra © 2001