####AUTO TRAINING geneidTRAINer1_1.pl -species A.dorsata -gff A.dorsata.cDNAs.450nt.complete.NR100span.4training_test_50seqs_7scaffs.gff -fastas Ador_1.3_assembly_known_7scaffs_4testing.fa -sout stats.txt -branch no -reduced no ##### CURRENTLY within the perl script there is a directory pointing to where the binaries and awk scripts and C programs must be located: my $path = "$HOME/Research/geneid_training/automation/geneid-training/bin"; remember to set this one correctly. At the moment all the required externak perl modules required by the program are in the Geneid directory. You may change that as you see fit. Furthermore the main perl wrapper "geneidTRAINer1_1.pl" is also not executed from the "bin" directory. The required "external" awk scripts are in the bin directory and consist of: frequency.awk Getkmatrix.awk gff2gp.awk information.awk logratio_kmatrix.awk logratio_zero_order.awk multiple_annot2one.awk preparedimatrixacceptor4parameter.awk preparedimatrixdonor4parameter.awk preparetrimatrixstart4parameter.awk submatrix.awk gff2ps gff2cds The required "external" C programs are in the bin directory and consist of: Evaluation.tgz ###TO COMPILE EVALUATION YOU NEED TO MAKE CLEAN FIRST! make clean && make geneid_v1.4.4.Jan_13_2011.tar.gz pictogram.tar.gz SSgff.tgz (extracts CDS/intron sequences + splice sites) unsort-0.5.tgz (random sorting) These have to be compiled, and the binaries put in the "$path" directory above. The "genetic.code" text file is also in the bin directory. It can actually be put into the main code eventually. Currently there is another file genetic.code file which is for species in which only TGA is a stop codon and TAA and TAG code for glutamine (genetic.code.thermophila). I have also included a couple of sample gff files and fasta sequence files if you want to do some testing (2491 sequences, 365 scaffolds and 50 sequences, 7 scaffolds). You may reduce the number of gene model sequences in the gff file to make testing faster.. A.dorsata.cDNAs.450nt.complete.NR100span.4training_test.gff Ador_1.3_assembly_known_only_nolow.masked.4training.fa A.dorsata.cDNAs.450nt.complete.NR100span.4training_test_50seqs_7scaffs.gff Ador_1.3_assembly_known_7scaffs_4testing.fa *****NOTE***** IN ORDER TO TEST SET $PATH TO DIRECTORY WHERE THE awk and C scripts ARE! ************** EXAMPLE: my $path = "$HOME/geneidTrainer4Darek/bin"; $ENV{'PATH'} = $path.":".$ENV{'PATH'}; Plus the you may take a look at the parameter file resulting from the training using the sequences above: A.dorsata.geneid.optimized.param And the example of an output file with statistics on the training produced by geneidTRAINer1_1.pl: 18_12_11_3_10_114_1_306_0_stats.txt