Practice 1. Exercise EMBL - UPF course 2001
   
 

In this practice we will use consensus sequences of several well-known binding sites to find potential binding sites in a set of putative promoter regions corresponding to coexpressed genes in a DNA-microarray experiment.

Consensus sequences are extracted from:
P. Bucher. Journal of molecular biology 212: 563-578 (1990)


ADVICE: It is very useful to open 2 or more browser windows, preserving this text in one of them and running the exercise using another one.



Input sequences:
6 genes of Drosophila melanogaster.

WWW tools:
Regulatory Sequence Analysis Tools (RSA) by Jacques van Helden from SCMBB - Service de Conformation des Macromolécules Biologiques et de Bioinformatique (Université Libre de Bruxelles).



Step 1: Exact matches of TATA-box consensus: STATAAAWR

  • Read the promoter regions in fasta format.

  • Connect to RSA tools server.

  • In the menu (left frame), click over Pattern matching: dna-pattern (strings).

  • Type the consensus pattern into the Query pattern box.

  • Copy and paste the promoter regions into the Sequence box.

  • Choose "direct only" in the Search strands selector.

  • Click the button Go to submit the query.

  • -- NO EXACT MATCHES WILL BE FOUND (go next step) --

Step 2: Partial matches of TATA-box consensus: STATAAAWR

  • Repeat the process but increasing the number of allowed mismatches in the pattern (try 1, 2 and 3 in Substitutions box).

  • Results will be displayed below the headline
    "PatID Strand Pattern SeqID Start End Matching_word Score"

  • Click the button Feature map to enter a new menu about plotting the results.

  • Click the button Go to obtain a graphical output of the reported matches. Browse across the interactive map.

Questions:

  1. Real TATA-boxes are supposed to appear 20 bp before Transcription Start Site (TSS). How many of the TATA boxes are in this range? NOTE: TSS annotations might easily contain errors and therefore ranges and distances will be useless.
  2. How many occurences will you get if you try 9 substitutions (everything)? Do you get one occurence in every position of the sequence? Why not? Think about the option prevent overlapping matches . Try switching it off.

Results:

MismatchesOutputMap
1 X X
2 X X
3 X X
Mismatchesno overlapoverlap
9 X X

Extra work:

Try to find new putative binding sites with these other consensus:

  • GC-box consensus: GGGCGG
  • CAAT-box consensus: RRCCAATS

L A S T H O M E N E X T

Enrique Blanco, Sergi Castellano and Genis Parra © 2001