Model dependent coding statistics. Lisboa 2001
   
  Search by homology: practice

Search by homology

In this practical we will run a genomic DNA against a protein database using similarity search tools in order to find protein homology regions in this DNA sequence. After this, we will try to get the best gene structure with the previous results. Finally, we will match annotations against predictions.


WWW DB Tools

We will use:

Step 1. Extract a genomic DNA sequence from the database

  • Load SRS (at EBI ) and Start the session.
  • Select the EMBL database and standard query form.
  • Type HS3112010 entry and submit the query.
  • Click on HS3112010.
  • Click on save
  • Sequence format is fasta. View is FastSeqs. Switch off "Save table as..." and then click on save.
  • Here you can find the embl entry and the fasta sequence.

Step 2. Detect and mask repeats and low complexity DNA sequences

  • Search Repeat Masker (or the EMBL mirror)
  • Cut and Paste the genomic DNA sequence.
  • Select Return format: html
  • Submit Sequence
  • Notice the number of bases masked
  • Click here to have a look to the masked sequence.

Step 3. Run one similarity search tool to find homology-to-protein regions

  • Connect to blastx.
  • Cut and Paste the genomic masked DNA sequence.
  • Select program: Blastx.
  • Select database: swissprot.
  • Submit (search) the query.
  • In the monitoring page, click on Format Results.
  • Look at blastx results (homology to protein regions plot).
  • Click on the highest score protein. Is it the same protein ?
  • Click on the IL13_BOVIN protein.
  • Select Display: fasta and click on Display.
  • Here you can find the IL13_BOVIN protein and the blast output

Step 4. Search a genic structure fitting well with current protein

  • Connect to genewise2.
  • Cut and Paste the genomic masked DNA sequence (HS3112010).
  • Cut and Paste the current protein(IL13_BOVIN).
  • Submit (search) the query...
  • or ...here you have the results from genewise2.

Questions

  • 1. Look at annotated gene (EMBL) and the predicted gene(genewise), what do you think about the predictions?
  • 2. Repeat step3: run blastx with nonmasked sequence and compare with the previous results.

L A S T H O M E N E X T