1. Symmetry of Curvature of a DNA sequence
Given a sequence, on which curvature values have been computed for each trinucleotide step, a symmetrical pattern around a given nucleotide position n would imply similar values of curvature at equal distances from this position on either direction. That is, the value of curvature at position n-1, Curvn-1 should be similar to the value at position n+1, Curvn+1, the same holding for all pairs of positions at distance i from n for i=1, ..., m, where m is a parameter chosen at will. At each such distance, we can compute the absolute difference between the corresponding curvature values di = |Curvn-i-Curvn+i|. The lower this value is, the higher the symmetry within the given distance i from n.
We define the symmetry of the curvature of the sequence centered at position n on a window of length m as the inverse sum of the distances over all values from 1 to m:
The inversion in the Ssym formula is done in order to maximize the symmetry score, since the more symmetric the values on either side of position n, the more the sum of distances will approach zero and thus the symmetry value Ssym will increase.
2. Symmetry of Curvature around local curvature minima
Based on the above definition of Symmetry, the calculation of the Symmetry of Curvature is done in the following way.
Given a genomic sequence, the method proceeds by first calculating the curvature values and subsequently applying the symmetry constrains on the resulting curvature data. In more detail:
First, curvature values of the given sequence are calculated. In the current version of our method we calculate the DNA curvature using BENDS (Goodsell and Dickerson, 1994) as extended with the use of trinucleotide parameters by Munteanu et al. (1998). The output of this step is an array where a curvature value is attributed to every nucleotide, calculated through a window of length of 30bp centered on each nucleotide and sliding 1bp at a time. The predicted curvature may be calculated with the use of two alternative geometrical parameter sets. These represent roll, twist and tilt values obtained from two different structural conditions. The "nucleosome" parameter set has been compiled from nucleosome crystallographic structures solved with X-ray diffraction, while the "DNaseI hypersensitive" parameter set has been compiled from regions defined as DNaseI hypersensitive sites.
Second, the curvature array is now scanned again in search for local curvature minima through a simple constraint, asking for the curvature value at a certain nucleotide, smaller than both the previous and the next. We allow for such a small window of local minimum search since the curvature method is averaging over 30 nucleotides, thus each value represents an average over a region with length in the order of a supposed dyad axis. For positions that fulfill the above criterion, a local minimum score is calculated according to the formula:
Where Curvn is the curvature value at position n on the genomic sequence. The inversion in the Smin formula is done to selectively increase the scores for mild local minima, since the local decrease in curvature on the dyad axis region is expected to be a smooth, minor decrease rather than an acute one.
Third, the SymCurv symmetry score at every local minimum site is calculated using equation (1). The length parameter m was set to 25, based on the combined size of the pseudodyad axis and the immediate flanking regions. The calculation is thus conducted over a window of 50 nucleotides, which corresponds to 5 DNA double-helical pitches. The overall score of the Symmetry of Curvature, SymCurv is calculated as the product of the two scores.
SymCurv (n,m) = Smin(n) Ssym(n,m) (m=25)
Note that we could have computed the symmetry score at each nucleotide, but the overall score is 0 in sites where the local minimum criterion is not fulfilled (since Smin = 0), therefore we need to compute this score only when Smin is not zero. Since m is fixed at a value of 25, it will be omitted from the notation henceforth and we shall refer to SymCurv as the value for position n.
3. Nucleosome Positioning using SymCurv scores
In order to predict the distribution of nucleosome-occupied and nucleosome-free regions in a given sequence, we apply a greedy algorithm, which parses the sequence space into 146-nucleotide long, non-overlapping segments, according to the SymCurv score of each nucleotide in a strictly hierarchical manner. This means that the position with the highest overall value is defined as the center of the first nucleosome to be positioned, the second highest, non-overlapping with the first is positioned afterwards and so on. In the process we assume a minimal end-to-start nucleosomal linker of 20 nucleotides. We perform this for the SymCurv values obtained for both "nucleosome" and "DNaseI" parameter sets. In this way we end up with two alternative nucleosome profiles for each examined sequence. Since DNaseI hypersensitive sites are known to be regions of open-chromatin and increased transcriptional activity we hold the obtained positions to be the ones reflecting a more active, dynamic state of chromatin conformation, thus we choose to refer to the "nucleosome" predictions as corresponding to "stationary" and the "DNaseI" ones to "dynamic".
Goodsell, D.S., Dickerson, R.E. 1994. Bending and curvature calculations in B-DNA. Nucleic Acids Res, 22: 5497-5503.
Munteanu, M.G., Vlahovicek, K., Parthasarathy, S., Simon, I., Pongor, S. 1998. Rod models of DNA: sequence-dependent anisotropic elastic modeling of local bending phenomena.
Trends Biochem Sci, 23: 341-347.
AUTHORS AND ACKNOWLEDGEMENTS
CopyRight © 2008
SymCurv developed by
Christoforos Nikolaou and
The SymCurv Web Server created by
Sonja Daniela Althammer.