Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME Software meta
 
SymCurv Tutorial
 
 

Prediction of Nucleosome Sequences

 
Use SymCurv to obtain predictions of nucleosome positions for a given sequence.

Input

 
SymCurv takes as input a simple DNA sequence in FASTA format with a special header. The header should include sequence name, initial offset and strand ("+" or "-"). Initial offset is a number corresponding to the coordinates of the first nucleotide in the sequence. Nucleosome positions are calculated with reference to this. Strand defines whether sequence will be read by the program as is (if strand is "+") or whether its reverse complement will be treated (if strand is "-").

Header example
>seq_name:1:+
Sequence follows in a separate line. Both upper and lower case characters are acceptable, as well as combinations of the two

Sequence example
>seq_name:1:+
AAAAATgcccgTATATAGCCCGCCGAGCCCGCCGAGCCCGCCGAGCCCGCCGCccaaaTTtttcgTATATAGcccgTATATAGGCCCGCCGAGCGCCCGCCGAGC

N letters contained in the sequence are transcribed in A. Since the effect of A is expected to contribute neutrally to the curvature values. However, an excess of unknown bases should be avoided as it is bound to affect the final outcome.
The sequence name may be of the choice of the user, it can contain special characters and spaces. It can be a gene's name or a chromosomal coordinate.


Example

 
>sacCer1_ensGene_YMR244W:1:+
ATGGTACTTTGCAAATTACTGACACCATATTTCTTACTGTCAATTTTGAG
TGTCGGCGTGTTCACGGCGACCGCCGCGCCATCGCCCAGTATTCAAATGA
CGGAAAATACAAATCAAGATCATCATGAGCATGCCAAGCGTGGAGGAACG
TGTGCGTTCCCTAACTACGATGGGATGGTCGCAGTACAAAAAGGTGGATC
TAATGGAGGATGGGCTATGAGCCCTGACCAAGAATGTTCCTACGGTTCAT
GGTGCCCTTACGCTTGCAAACCAGGTCAACTAATGGGGCAATGGGACCCT
TCGGCTACCACATACTCTTATCCTAAATGTCAAAATGGAGGTTTGTACTG
TGATTCTAACGGTAACTTGCAAAAGCCAAACAGTGATAAAGACTATTGTT
ATGATGGGAAGGGAACCGTAATAGCGAAAAACAACGCTAACAGCGGTGAC
GTTGCATTTTGCCAGACCGTGCTTCCGGGCAACGAAGCTATGCTGATCCC
AACCTTAGTCGGCTCTGGGTCAAAGCAAACGCTGGCTGTGCCTGGTACAG
ACTACTGGGCCTCCAGCGCGTCGCATTACTACGTAAATGCTCCCGGTGTA
AGCGTAGAGGATGCATGCCAGTGGGGTAGTAGTGCAAATCCACAGGGGAA
CTGGGCCCCATTTGTAGCTGGTTCCAACATGGACGACAACCAGAACACTT
TTGTAAAGATTGGATGGAACCCCGTCTACCTGGAATCGTCATGTCCGTTC
AAGAACGTTAAGCCTTCATTCGGTATTAGAATTACTTGTGATGACGAATC
ACAATGTGAAGGCCTACCATGCTCCATTGACCCAAGTTCTAATGGAGTCA
ACGAAGTGACAAGTTCTGGCGGTGGTTCTTCCGGGGCTGGTGGTGGAAAC
TTTTGTGTTGTCACCGCCAGAAACGGCGCCAAGGCCAACATCGAAGTTTT
TGATGTTGGTAGCGGCTCATCTTCTAAAGGCAAGAGAGAACTGAATCCGC
TAGACGTTATTACCACAACGGTCACCGAGACCAAGTACAAGACAGTCACC
GTCACTGCCAAAACTTAG

Once the sequence is pasted or uploaded to the server, press submit.


Output

 
The output is dual, provided in the form of text files as well as graphical in the form of SymCurv and nucleosome position plots. Graphical output is disabled for sequences longer than 10Kb for speed purposes.

Text output

The symmetry of curvature is a property of DNA sequence with special attributes. Unpublished results suggest a strong tendency for nucleosome-forming sequences to have high values of SymCurv. At the same time, SymCurv is able to capture sequence constraints, which are related to structure in genomic regions where a functional predicted role is not supported by sequence conservation. SymCurv raw values may thus serve as an interesting structural aspect of a DNA sequence.

The raw SymCurv values are provided as a tab-delimited file. This contains the name of the sequence at the first field, followed by the relative coordinates in the second field. These are calculated relatively to the initial offset coordinate provided at the header of the sequence (here 1). Fields 3 and 4 in the tab-delimited file contain the SymCurv values as calculated with the use of two alternative geometrical parameters. Field 3 refers to SymCurv values obtained with the use of nucleosomal-specific parameters (nuc), while field 4 to the ones obtained with the use of DNaseI-specific parameters. (For more information on this, please check the SymCurv Documentation). Raw SymCurv values fall in a range between 0 and 100.

sacCer1_ensGene_YMR244W	58	0	0
sacCer1_ensGene_YMR244W	59	0	0
sacCer1_ensGene_YMR244W	60	0.0465047685983504	0
sacCer1_ensGene_YMR244W	61	0	0
sacCer1_ensGene_YMR244W	62	0	0
sacCer1_ensGene_YMR244W	63	0	0.0295858067463842
sacCer1_ensGene_YMR244W	64	0	0
sacCer1_ensGene_YMR244W	65	0	0
sacCer1_ensGene_YMR244W	66	0	0
sacCer1_ensGene_YMR244W	67	0	0
sacCer1_ensGene_YMR244W	68	0	0.402540082453594
sacCer1_ensGene_YMR244W	69	0	0
sacCer1_ensGene_YMR244W	70	0	0
sacCer1_ensGene_YMR244W	71	0	0
sacCer1_ensGene_YMR244W	72	0.262897808372804	0
sacCer1_ensGene_YMR244W	73	0	0
sacCer1_ensGene_YMR244W	74	0	0
sacCer1_ensGene_YMR244W	75	0	0

The SymCurv raw values are used in deducing the predicted nucleosome positions on the sequence of interest.

The user can obtain two files containing the predicted nucleosome positions based on the two alternative geometrical parameter files (see above). Files are in gff (general file format)

Nuc parameters
sacCer1_ensGene_YMR244W	SC_call_nuc	SC_nucleosome	88	234	3.78776165891352	.	.	.
sacCer1_ensGene_YMR244W	SC_call_nuc	SC_nucleosome	553	699	1.91633774580324	.	.	.
sacCer1_ensGene_YMR244W	SC_call_nuc	SC_nucleosome	743	889	1.9580759678685	.	.	.


DNase parameters
sacCer1_ensGene_YMR244W	SC_call_dnase SC_dnase	66	212	1.65028971344348	.	.	.
sacCer1_ensGene_YMR244W	SC_call_dnase SC_dnase	366	512	1.93077567379365	.	.	.
sacCer1_ensGene_YMR244W	SC_call_dnase SC_dnase	749	895	10.6447513336063	.	.	.

 
Each line in the gff file (general file format) corresponds to one predicted nucleosome. The first column contains the sequence name. Start and end of the nucleosome are given in columns 4 and 5. These are given with reference to the offset coordinate provided at the header of the fasta sequence, submitted initially. Each line also contains a score (given at column 6), whose range is 0-100. This is the SymCurv score for the center of the predicted nucleosome.

Differences in the two profiles are representing nucleosomes, which are predicted to have a tendency for remodelling. Since DNaseI hypersensitive sites are known to be regions of open-chromatin and increased transcriptional activity we hold the obtained positions to be the ones reflecting a more active, dynamic state of chromatin conformation, thus we choose to refer to the "nucleosome" predictions as corresponding to "stationary" and the "DNaseI" ones to "dynamic". Nucleosomes existing in the "nucleosome" set and absent in the "DNaseI" one are likely to be evicted during activation of the chromatin, while in the opposite case nucleosomes are predicted to be appearing during chromatin activation (See SymCurv Documentation for more information).

Graphical output

Raw SymCurv profiles are provided as bar-plots in a single clickable jpg file, available for download.

Both sets of nucleosome predictions are also provided in a pdf plot, which the user can download by clicking on the corresponding link. The plot has been created with the use of gff2ps.



AUTHORS AND ACKNOWLEDGEMENTS

 
SymCurv developed by Christoforos Nikolaou and Roderic Guigó.
The SymCurv Web Server created by Sonja Daniela Althammer.

CopyRight © 2008

 
  Disclaimer webmaster