geneid documentation: 4. Running geneid


Table of contents:


FASTA format:
geneid input sequences must be in FASTA format:

FASTA sequence files show this structure:

>NAME OR IDENTIFIER OF SEQUENCE
ATCGTACAGCTAGCTAGCTACGTACG
TCGTACGATCGATCGTAGCATCACGA
GGTCAGCATCTAGCACATCACACAGC
CTATCAGCTAGCATCGATCGCATCGA
...

Usually Fasta lines are 60 chars long but this is not mandatory.



General considerations:
To run geneid, type geneid [options] sequence_file where sequence_file is a FASTA formatted DNA sequence. Predictions are sent to the standard output. In addition, the program requires an organism specific parameters file (which can be broadly used within the phylogenetic group to which organism belongs. For instance, the human file may be used as well to predict genes in genomic sequences from vertebrates).The directory params contains several files for different organisms.

The parameters file may be speficied in three different ways:

  1. By default, geneid will try to find a default file called param.default which must be in the same directory that the binary.
  2. Through the enviromental var GENEID, defining the path.
  3. Using the command line option geneid -P file.


geneid command line options:
Selection of genomic elements to display
Flag Action
b Display best Start codons found on sequence
d Display best Donor sites found on sequence
a Display best Acceptor sites found on sequence
e Display best Stop codons found on sequence
f Display best First exons found on sequence
i Display best Internal exons found on sequence
t Display best Terminal exons found on sequence
s Display best Single genes found on sequence
x Display best exons (all) found on sequence
z Display Open Reading Frames found on sequence [see -Z]
D Display genomic sequence in predicted genes

Selection of output format
Flag Action
- geneid format
G gff format
X Extended format (gff or geneid)
M XML output (see -m)
m See DTD for XML output

Control prediction engine
Flag Action
- Predictions in both strands of sequence
W Predictions only in forward strand of sequence
C Predictions only in reverse strand of sequence
o Only signal and exon prediction (gene assembling disabled)
F Force prediction of a complete gene
O Only gene assembling (exons provided from file)
Z Switch ORF searching on

Re-annotation of sequences
Flag Action
R Include annotations (evidences) provided from file
S Use homology to protein information (SR) provided from file

Statistical model
Flag Action
P Provide a new parameter file
E Increase/decrease exon weight value

Miscellanea
Flag Action
v Display information during geneid running
B Show memory requirements
h Show list of available options


geneid output formats:
Click over every available format to see a detailed description:


Samples:
Click over every example to see the corresponding output: This the sequence used to generate these outputs: test.fa


Multi-fasta files:
geneid can process files containing more than one FASTA sequence. Moreover, external information as annotations and HSPs can be provided for every sequence. Requirement: records belonging to the same Locus must be sorted by first position.

Click over every example to see the corresponding output:





Enrique Blanco Garcia © 2003