Practice 3. Exercise EMBL - UPF course 2001
   
 

In this practice, we run a Gibbs sampler to discover unknown but conserved patterns in a set of input sequences. Then this pattern implemented as a weight matrix will be used to search in the sequences for more occurrences.

ADVICE: It is very useful to open 2 or more browser windows, preserving this text in one of them and running the exercise using another one.



Input sequences:
6 genes of Drosophila melanogaster.

WWW tools:
Regulatory Sequence Analysis Tools (RSA) by Jacques van Helden from SCMBB - Service de Conformation des Macromolécules Biologiques et de Bioinformatique (Université Libre de Bruxelles).



Step 1:Run a Gibbs sampler in a set of sequences.

  • Read the promoter regions in fasta format.

  • Connect to RSA tools server.

  • In the menu (left frame), click over Pattern discovery: gibbs (matrices).

  • Copy and paste the promoter regions into the Sequence box.

  • Click the button Go to submit the query.

  • Inspect the output: best motif, weight matrix and information content.

  • Press the button pattern matching (patser) to use this matrix to find new occurrences in the same set of sequences.

  • Press the button Go.

  • Press the button Feature map to enter a new menu about plotting the results.

  • Press the button Go to obtain a graphical output of the reported matches.

Step 2:Modify some parameters of the sampler.

  • Modify the size of the pattern to generate: increase or decrease the value in the box Matrix length and repeat the process.

Questions:

  1. Can you see the "generator" patterns that were used to build the matrix in the plot containing the new found occurrences?
  2. Changing the size of the produced pattern, do you get subsets of the same core pattern or they are completely diferent?

Results:

SIZEoutputplot
10 X X
8 X X
6 X X

Extra work:

The program AlignACE is another program which implements the Gibbs sampling algorithm.

  • This is the output obtained from AlignACE for this input set of sequences. Take a look at this format and at this graphical output using sequence logos.

  • We have run AlignACE several times with the same input. Find the differences and similarities among them:
  • CompareACE is a program to find the AlignACE results (patterns) in a database of patterns to see which is the most similar. Try to run the server with some of the provided outputs.

L A S T H O M E N E X T

Enrique Blanco, Sergi Castellano and Genis Parra © 2001