Practice 2. Exercise
   
 

In this practice, a weight matrix computed in P. Bucher. Journal of molecular biology 212: 563-578 (1990), from sequences containing real TATA-boxes will be obtained by accessing to the TRANSFAC database and reading the corresponding entry. Then, this matrix will be used to scan for putative TATA boxes in the input set of promoter regions.

ADVICE: It is very useful to open 2 or more browser windows, preserving this text in one of them and running the exercise using another one.



Input sequences:
6 genes of Drosophila melanogaster.

WWW tools:
Regulatory Sequence Analysis Tools (RSA) by Jacques van Helden from SCMBB - Service de Conformation des Macromolécules Biologiques et de Bioinformatique (Université Libre de Bruxelles).

TRANSFAC database, the transcription factor database.



Step 1: Access TRANSFAC to acquire the TATA box entry

  • Connect to TRANSFAC database.

  • Click over the Browse link.

  • Click over the Matrix by Factor name link.

  • Find TATA in the list of transcription factors and then click over.

  • Click over one of these Tata Binding Proteins (TBPs): T00794, T00796 or T0097 to access to their information card.

  • Lines starting with MX means matrices entries related to this factor. There are two matrices: select V$TATA_01 and take a look at the entry containing the weight matrix (do not lose this window!).

Step 2:Use the matrix to find TATA-boxes in a set of promoter regions.

  • Read the promoter regions in fasta format.

  • Connect to RSA tools server.

  • In the menu (left frame), click over Pattern matching: patser (matrices).

  • Copy and paste the matrix into the Matrix box.

  • Select transfac in the Format (matrix) selector.

  • Copy and paste the promoter regions into the Sequence box.

  • Choose "single" in the Search strands selector.

  • Press the button Go to submit the query.

  • Press the button Feature map to enter a new menu about plotting the results.

  • Click the button Go to obtain a graphical output of the reported matches.

Questions:

  1. Which conclusion can you extract from the plot? Some regions are predicted to contain more than 5 TATA boxes while zero or one are supposed to happen in real genes. Perhaps, increasing the threshold a smaller amount of results will be predicted.

Results:

TRANSFAC matrixpatser Outputpatser Map
V$TATA_01 X X

Extra work:

TRANSFAC possess its own program to scan promoter regions using the available collection of matrices (MatInspector).

  • Go to the TRANSFAC database.
  • Click over MatInspector V2.2.
  • Choose the vertebrates collection of matrices.
  • For every separate input sequence submit the query.
  • Choose the Individual selection option.
  • Select the matrix V$TATA_01 and submit.
  • Take a look at the results. Compare to the previous results.
  • Try again with insects collection of matrices. You will obtain a result like this (plotted with gff2ps).


N E X T

Enrique Blanco García © 2002