Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME Resources & Datasets ABS
 
2.4. CONSTRUCTION OF BENCHMARKS
 
 

 
Description:
This is a web tool to create artificial data sets for testing pattern discovery programs. Artificial sequences are generated according to a certain nucleotide distribution with a set of embedded motifs from the ABS database. The motifs are planted in a subset of similar positions to simulate the existance of related motifs in a set of orthologous or co-regulated genes from a microarray experiment. A graphical representation of the produced data set is also displayed.

Parameters:
- Length: size of each artificial sequence (bp)
- Number of sequences: size of the data set
- Number of planted motifs: motifs included on each sequence
- Species: a specific data set to detect motifs from this species will be produced
- Nucleotide distribution: A,C,G,T content of the random regions between planted motifs
- Probability to plant a motif: probability to insert a motif in a sequence
- TF name: planted motifs are associated to these TFs

Example:
This is an artificial data set produced by this application.
Number of SequencesNumber of Planted Motifs
Length (nucleotides)Probability to plant a motif
SpeciesBackground composition
ACGT
TF name (multiple choice)




HINTS:
1. More than one TF can be selected (multiple choice is allowed).
2. Use Shift to select groups of consecutive TFs
3. Use Ctrl to select groups of non consecutive TFs


CopyRight © 2005

ABS is under GNU General Public License.

 
  Disclaimer webmaster