Let D={A,C,G,T} be the alphabet of the nucleotide sequences. A motif (pattern, signal...) is an object dennoting a set of sequences on this alphabet, either in a deterministic or probabilistic way. Given a sequence S and a motif m, we will say that the motif m occurs in S if any of the sequences denoted by m occurs in S.
CTTAAAATAA
YTWWAAATAR (Consensus MEF2 sequence, Yu et al., 1992)
CTAAAAATAA
TTAAAAATAA
TTTAAAATAA
CTATAAATAA
TTATAAATAA
CTTAAAATAG
TTTAAAATAG
..........
Regular Expressions. The description is built on an
extension of the original alphabet. Among the new symbols of this extended
alphabet, there symbols dennoting the alternative occurence of a number of
nucleotides at a given position, and symbols denoting that a given
position may not be present.
C..?[STA]..C[STA][^P]C
(ferredoxin, iron-sulfur binding region signature, PROSITE database, Bairoch, 1991)
Position Weigth Matrices. The description includes a weight (score, probability, likelihood) for each symbol occuring at each position along the motif.
Follow the link for An Introduction to Position Weigth Matrices