Let A={A,C,G,T} be the alphabet of the nucleotide sequences. A motif (pattern, signal...) is an object dennoting a set of sequences on this alphabet, either in a deterministic or probabilistic way. Given a sequence S and a motif m, we will say that the motif m occurs in S if any of the sequences denoted by m occurs in S. We will use here indistinctly the terms motif, pattern, signal, etc. although these terms may be used with different meaning.
CTTAAAATAA
Exact words may encapsulate biologically functions, often when in the appropriate context. For instance, the sequence ``TAA'' denotes, under the appropriate circumstances, a translation stop codon.
CTAAAAATAA 
TTAAAAATAA 
TTTAAAATAA 
CTATAAATAA 
TTATAAATAA 
CTTAAAATAG 
TTTAAAATAG 
..........
YTWWAAATAR (Consensus MEF2 sequence, Yu et al., 1992)
were Y=[CT], W=[AT] and R=[AG]. MEF2 regulates genes specific to cardiac and skeletal muscle, such as Troponin
C..?[STA]..C[STA][^P]C
2Fe-2S ferredoxin, iron-sulfur binding region signature, PROSITE database, Bairoch, 1991)
Other examples,DNA polymerase family B signature EF-hand calcium-binding domain This is an structural motif
Follow the link for An Introduction to Position Weight Matrices
Examples of PWMs
 
 
From a  set of aligned donor sites we derive the following probability matrix
 
     -5    -4      -3     -2    -1    +1    +2    +3    +4    +5    +6       +7    +8
A  26.0  27.7    35.1   59.6   8.7   0.0   0.0  50.7  72.1   7.0  15.8     26.6  19.7
C  25.5  29.4    34.8   13.3   2.7   0.0   0.0   2.8   7.6   4.7  17.2     21.7  29.4
G  23.8  25.3    18.5   13.2  80.9 100.0   0.0  43.9  12.2  83.1  18.8     32.7  24.5
T  24.7  17.5    11.6   13.9   7.7   0.0 100.0   2.5   8.1   5.2  48.3     18.9  26.4
                  C/A     A     G     G     T     A     A     G     T
which assuming nucleotide equiprobability tranforms in the following log-likelihood matrix:
      -5    -4      -3     -2    -1    +1    +2    +3    +4    +5    +6       +7    +8
A   0.04  0.10    0.34   0.87 -1.05  -inf  -inf  0.71  1.06 -1.27 -0.46     0.06 -0.24
C   0.02  0.16    0.33  -0.63 -2.22  -inf  -inf -2.17 -1.19 -1.68 -0.38    -0.14  0.16
G  -0.05  0.01   -0.30  -0.64  1.17  1.39  -inf  0.56 -0.72  1.20 -0.29     0.27 -0.02
T  -0.01 -0.36   -0.77  -0.59 -1.18  -inf  1.39 -2.29 -1.13 -1.58  0.66    -0.28  0.06
The positions showing higher bias in nucleotide composition are the most informative positions.
Indeed, we can compute the information content at each position D(i), by using Shanon's formula
 
so for a postion with nucleotide equiprobability P = 1/4, the information content is zero
D(i) = 0 = 2 + 1/4 log2(1/4)
+ 1/4 log2(1/4) + 1/4 log2(1/4) + 1/4 log2(1/4)
The information content along a sequence aligment can be nicely visualized by means of the so-called sequence logos.
 
 
(Figure taken from http://www.orst.edu/instruction/bb331/lecture10/lecture10.html)
It is well known, however, that the stacking energy contributes to the stability of the double stranded DNA. This stacking energy depends on nearest neighbour arrengements along the DNA molecule. Tables of stacking energy are constantly being updated. This suggest that the positions along the donor site sequence are not independent. That is, the existence of a given nucleotide at a given position may influence the probability of the nucleotides at the nearby positions.
We can test this hypothesis by estimating the conditional probabilities of each nucleotide at each position, depending on the nucleotide at the precedent position, in the set above of known donor sites.
               position -3               position -2               position -1                position 1                position 2                position 3                position 4                position 5               position 6
       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T
A   29.2  31.9  25.5  13.4    62.4   9.5  15.2  12.9     7.0   1.7  86.2   5.1     0.0   0.0 100.0   0.0     0.0   0.0   0.0   0.0     0.0   0.0   0.0   0.0    65.4   9.5  13.3  11.8     6.0   3.0  87.4   3.7    19.1  15.9  39.8  25.3
C   48.6  32.5   6.2  12.7    69.2  11.6   6.4  12.8    19.1   7.1  55.2  18.5     0.0   0.0 100.0   0.0     0.0   0.0   0.0   0.0     0.0   0.0   0.0   0.0    72.7   4.7   6.7  16.0    19.5  17.8  42.8  20.0    24.8  25.2  10.6  39.4
G   38.8  36.2  17.7   7.3    62.6  15.8  12.3   9.3    12.3   2.4  79.1   6.2     0.0   0.0 100.0   0.0     0.0   0.0   0.0 100.0     0.0   0.0   0.0   0.0    82.5   5.6   9.0   2.9     6.2   4.2  86.1   3.4    15.2  17.2  15.9  51.7
T   16.4  41.3  29.5  12.9    17.7  25.6  29.5  27.2     2.9   3.3  84.4   9.4     0.0   0.0 100.0   0.0     0.0   0.0   0.0   0.0    50.8   2.8  43.8   2.5    26.9   7.5  50.7  14.9     6.1   7.9  78.7   7.2    12.5  10.7  43.4  33.5
 
    35.1  34.8  18.5  11.6    59.6  13.3  13.2  13.9     8.7   2.7  80.9   7.7     0.0   0.0 100.0   0.0     0.0   0.0   0.0 100.0    50.7   2.8  43.9   2.5    72.1   7.6  12.2   8.1     7.0   4.7  83.1   5.2    15.8  17.2  18.8  48.3
We can use this conditional probability distribution to compute the probability of a given sequence in a donor site. The probability of sequence S=s1s2s3s4s5s6s7s8s9 in a donor site can be computed now as
P(S)=P(s1) P(s2/s1) P(s3/s2) P(s4/s3) P(s5/s4) P(s6/s5) P(s7/s6) P(s8/s7) P(s9/s8)
where P(si/sj) is the probability of nucleotide sj in position k given that nucleotides si is at position k-1.
For instance, the probability of finding sequence  S=CAGGTTGGA is 
P(S)= 0.35 * 0.69 * 0.55 * 1.00 * 1.00 * 0.02 * 0.51 * 0.86 * 0.15
Actually, we usually compute a log-likelihood ratio as above. Assuming for instance p(si/sj)=0.25 ---that is, that there is no dependence between positions, we obtain the following log-likelihood matrix
               position -3               position -2               position -1                position 1                position 2                position 3                position 4                position 5               position 6
       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T       A     C     G     T
A   0.15  0.24  0.02 -0.62    0.91 -0.97 -0.50 -0.66   -1.28 -2.72  1.24 -1.58    -inf  -inf  1.39  -inf    -inf  -inf  -inf  -inf    -inf  -inf  -inf  -inf    0.96 -0.97 -0.63 -0.75   -1.43 -2.12  1.25 -1.92   -0.27 -0.46  0.46  0.01
C   0.66  0.26 -1.40 -0.67    1.02 -0.76 -1.37 -0.67   -0.27 -1.25  0.79 -0.30    -inf  -inf  1.39  -inf    -inf  -inf  -inf  -inf    -inf  -inf  -inf  -inf    1.07 -1.68 -1.32 -0.45   -0.25 -0.34  0.54 -0.22   -0.01  0.01 -0.86  0.46
G   0.44  0.37 -0.35 -1.24    0.92 -0.46 -0.71 -0.99   -0.71 -2.33  1.15 -1.40    -inf  -inf  1.39  -inf    -inf  -inf  -inf  1.39    -inf  -inf  -inf  -inf    1.19 -1.50 -1.02 -2.16   -1.39 -1.78  1.24 -1.99   -0.50 -0.37 -0.45  0.73
T  -0.42  0.50  0.16 -0.66   -0.35  0.02  0.17  0.08   -2.16 -2.03  1.22 -0.97    -inf  -inf  1.39  -inf    -inf  -inf  -inf  -inf    0.71 -2.17  0.56 -2.29    0.07 -1.21  0.71 -0.52   -1.41 -1.15  1.15 -1.24   -0.69 -0.85  0.55  0.29
 
    0.34  0.33 -0.30 -0.77    0.87 -0.63 -0.64 -0.59   -1.05 -2.22  1.17 -1.18    -inf  -inf  1.39  -inf    -inf  -inf  -inf  1.39    0.71 -2.17  0.56 -2.29    1.06 -1.19 -0.72 -1.13   -1.27 -1.68  1.20 -1.58   -0.46 -0.38 -0.29  0.66