Concepts and applications in Biology and Bioinformatics

  This course aims to introduce the concept of ontology to represent a particular domain of knowledge related to the fields of Biology and Bioinfomratics.
We will then present tools that can be used to build an ontology, DAG-Edit, and a Web-based interface, AmiGO, to search and browse Gene Ontology (GO).
Finally, we will introduce their use in the context of the development of Web services.

Ontology course presentation

  Useful links
  These exercises aim to illustrate the use of Gene Ontology to help annotating proteins.

Exercise 1:
GO classification of a protein-coding gene

Aim: Annotate with GO a protein-coding gene which protein sequence has been predicted, but the function is unknown.
We provide the sequence of a bacteriophage protein, in FASTA format.

Idea: To find the function of a protein, you can search for functional motifs. These predicted domains can give you hints on the biological role that this protein may play. You can also search for homolog proteins. Given than two orthologous genes share theoritically the same function, you can infer GO terms from orthologous genes.

First step: Use functional domain prediction methods to give some hypothesis on the function of the protein.
We will use the Pfam database.

1. Go the Pfam query page, http://www.sanger.ac.uk/Software/Pfam/search.shtml.

2. Copy and paste the protein sequence

3. Submit your request by clicking the "Search Pfam" button

You can see that two very different types of domain are predicted:
  • An helix-turn-helix (HTH) DNA binding domain
  • An enzymatic domain
At this stage, what hypothesis can you tell about the function of the protein.

Second step: Use GOst to find similar bacterial proteins, check their GO annotations.
We will use GOst search tool from the AmiGO browser.

1. Go to the GOst page, http://www.godatabase.org/cgi-bin/gost/gost.cgi.

2. Copy and Paste the protein sequence

3. Submit your request by clicking the "Submit Query" button

You will get a set of protein homologs annotated with GO terms. Using these information, which function would you be giving to the protein, which process is it involve in ?
Finally, which GO terms would you use to annotate this protein ?

Exercise 2:

We provide a set of Human Ensembl gene identifiers that are involved in a particular type of diseases. We also provide a random set of genes that will serve us as a control. Using tools such as FatiGO, define over-represented GO functions and GO process terms. 1. Go to FatiGO Web page, http://fatigo.bioinfo.cipf.es.

2. Select "Compare" analysis.

3. Select the organism, i.e. "Homo sapiens".

4. Copy and paste the list of disease gene identifiers in the section "List of genes #1".

5. Copy and paste the list of control gene identifiers in the section "List of genes #2".

6. Select the GO ontology ("Molecular function").

7. Submit your request by clicking the "Run" button.

8. When the search is done, click the "Gene Ontology: molecular function" link, in the "Links with the results" section.

You will get a set of GO terms and their over-representation confidence score (p-value).

9. Repeat the same search for defining over-represented "Biological process" GO terms.

  • What GO function categories are over represented ?
  • What biological processes are over-represented ?
  • What diseases do you think these genes are involved in ?