Consensus sequence:
Given a collection of known binding sites, a consensus sequence is developed
by giving the most preferred base at each position within a site. Then, this
pattern can be used to search in other sequences for new sites.
Disadvantage:
Exact matching means a loss of information so that a fixed number of
mismatches is usually allowed to express some degree of ambiguity.
| sequence 1 |
TACGAT |
| sequence 2 |
TATAAT |
| sequence 3 |
TATAAT |
| sequence 4 |
GATACT |
| sequence 5 |
TATGAT |
| sequence 6 |
TATGTT |
| consensus sequence |
TATAAT |
| consensus (IUPAC code) |
TATRNT |