A Course on Sequence Analysis: THE SEQUENCE DATABASES

THE SEQUENCE DATABASES

UPF

IMIM

GRIB

HOME

COURSES

SEQUENCE ANALYSIS

Databases

In This Section

An Historical Overview

The DNA and protein sequence databases are the lifeblood of Molecular Biology. By compiling sequence and functional information, biosequence databases allow for the integration of biological knowledge. Without these databases research in modern Molecular Biology can not be carried out. Although their massive use is relatively recent, biosequence databases have a rather long history---The first compilation of protein sequences is thirty years old. We will start this lesson, thus, with a brief overview of the history of the sequence databases, which will lead us from the first printed atlas by Dayhoff and coworkers in 1965, to the efforts leading to the establishment of the computer nucleic acid databases at Los Alamos and at the EMBL in the early 1980s. And from there, to the electronic submission of sequences, and to the integration of heterogenous databases. Next, we will take a look at the major sequence databases. How the information is organized, how can we access it, and how can we submitt our own informtation. In addition to the primary sequence databases, a pletora of specialized databases exist. We will also take a brief look and the information contained in them. We will explore a few recently developed tool to access information across different sequence databases and to "navigate" between them. We will finally discuss a number of problems associated with the development of the genomic projects and an even faster rate of sequence productions.

AN HISTORICAL OVERVIEW
THE PRIMARY DATABASES: STRUCTURE, BROWSING AND SUBMITTING
- The Nucleid Acid Sequence databases
  - GenBank / EMBL / DDBJ
  - GSDB
- The Protein Sequence Databases
  - PIR-International
  - SWISS-PROT
- The Protein Structure Database PDB
THE DERIVED DATABASES
- Issue on Databases, Nucleic Acids Research 26:1-389 (1998) [TOC]
BROWSING AND QUERYING ACROSS DATABASES
- Entrez
- SRS
THE DATA IN THE GENOME PROJECTS
- Genome Sequencing Projects (Terry Gaasterland)
- Genome Sequence Data Sources (The Genome Channel)