Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME Software AStalavista
 
- AStalavista web server -
Alternative Splicing transcriptional landscape visualization tool
 
 
FREQUENTLY ASKED QUESTIONS:
The AStaLaVISta FAQs

Definitions and abbreviations

  • What is AS?
  • AS is used for Alternative Splicing.

  • What does ASTALAVISTA mean?
  • It stands for alternative splicing and transcriptional landscape visualization tool.
    It can be written AStalavista, astalavista, AsTaLaVisTa, or whatever you like.

  • What are CDS and UTR?
  • The CDS refer to the Coding DNA Sequence, defined by the part of a given mature transcript that is translated into protein, from the start codon to the stop codon.
    The UTRs, on the other hand, are the part that are transcribed but not translated, considering the 5'UTR before the start and the 3'UTR after the stop.

  • What is hg18?
  • This refers to the version of the human genome assembly and determines the version of the corresponding reference annotations (check the NCBI or UCSC website for more details). If you provide some custom annotation, you should check for compatibility with your genomic coordinates. Note that as AStaLAviSta is an independant system, the AS landscape would not be affected as long as you do not mix different versions of genome assembly in the same input. The external links to the UCSC browser may be, though.


    The AS code and the underlying method

  • Why designing a code for describing AS events while there is already a commonly used nomenclature?
  • First, there is no formal definition for the traditional nomenclature, usually just illustrated with simple schemes. This can induce some incertitude. For instance, we commonly use the term exon skipping. But not only. What about cassette exon, retained exon, or just alternative exon? Do these terms refer to the same biological event? Otherwise, are the differences clear for everybody? There is obviously a problem of semantic in AS.
    More importantly, the traditional nomenclature is by definition limited, because restricted to the 5 or 6 AS events usually mentionned in the literature. However, the transcriptome diversity exhibits a high complexity that can not be described by the common terms only. Indeed, depending on the reference annnotation that is analyzed, we estimate between one fifth to one third of the different types of splicing variations to be neglected by the traditional nomenclature.

  • How is the AS code defined?
  • To each AS event is assigned an AS code according to the relative position of the alternative splice sites that are involved in the splicing variation.
    A variation in the exon-intron structure is detected by comparing all overlapping transcripts in a pairwise fashion. For a given pair, a splice site is said alternative if it is not used in both transcripts. In the same transcript comparison, considering all splice sites sorted by their genomic position, an AS event is defined as a maximal succession of alternative splice sites.
    Then, each alternative splice site is assigned a number (1, 2, 3, ..) according to its relative position in the event, and a symbol that depends on its type. To denote a donor site a "^" sign is used (depicting the spliced out intron downstream of the donor site) and a "-" symbol characterizes an acceptor site. A splice site denoted "3-" therefore designates the 3rd splice site in an event, which is an acceptor. The AS code is built by writting down the splice sites of each transcript, separated by a comma. A 0 is used to denote transcripts that do not involve any of the alternative splice sites (e.g., in case of an exon skipping event).

    An exon skipping event has the AS code 1-2^,0.

    The AS code for two competetive donor sites is 1^,2^.

    Alternative acceptor sites are denoted by 1-,2-.

    The AS code of intron retention events is 1^2-,0.


    Input and Output

  • What are the ASTA and the GTF format?
  • A file format description is available here

  • Why do I have to select a species/genome version?
  • As described above, AStalavista extracts AS events dynamically from any given annotation, regardless of the species the annotation is from. However, since the GTF file format intrinsicly does not provide information about the species (respectively, the genome build), AStalavista needs this information in order to synchronize with the UCSC genome browser for proper visualization of the extracted events.

  • My species (or genome build version) is not in the list. Will the results be affected?
  • No. The selection possibilities in the "organism" field represent the most recent genome builds from popular species that can be browsed at the UCSC genome browser. If your transcript annotation is from a species/genome build that is not in the list, please select "-other-". Consequently, you will not be able to use the UCSC genome browser for visualization, but AS events will be extracted in normal fashion. If you would like to have an organism added to the list, please contact the authors.

  • Why are they no annotations provided for all of the given species?
  • AStalavista currently provides popular annotations (e.g., RefSeq) for some of the popular genomes (currently: human, mouse, fruitfly and worm). If you would like to analyze the AS landscape of a known annotation dataset, you can provide a gtf file to astalavista (to get this file, see next question) or ask the authors at sylvain.foissac<at>crg. es

  • Where can I find an annotation file for a given species?
  • You can go to the UCSC table browser, where you have to select the desired clade/species/genome assembly, and to specify GTF as "output format". Another possibility is the EnsEmbl Biomart datamining tool, where after selecting the "Dataset" (i.e., the organism) and the "Database" (fixing the genome build), you have to select "Attributes" - "Structures", and finally under "Results" to specify "GFF" in the "rows as" selection box.

  • What happen if I leave the "annotation" selection empty?
  • To extract AS events, AStalavista needs an input. If you can/do not select a predefined one, an annotation file or list of gene/protein identifiers is to be provided for the selected organism.

  • What happens if I provide an own annotation AND select a predefined annotation from AStalavista at the same time?
  • AStalavista allows for this in order to investigate for the enrichment of existing annotations. If you select a predefined annotation and you paste a custom annotation in the corresponding box, both annotation files will be concatenated and investigated together. In this case, please ensure that the annotation you provide is from the same species/genome build as the annotation predefined on AStalavista.

  • How can I provide own gene/transcript/protein identifiers in order to investigate AS events?
  • This option is available for the species/genome builds with predefined annotations. In order to run AS analysis exclusively on a set of genes/transcripts, you select one of those species (i.e., human, mouse, fruitfly or worm), you leave the "annotation" selection empty and paste the corresponding list of identifiers in the input box. AStalavista is currently providing various naming systems (incl. RefSeq, EnsEmbl, Gene symbols, SwissProt, etc.), however, if you wish to suggest of including additional nomenclatures, contact the authors at sylvain.foissac<at>crg. es.

  • What happens if I provide gene/transcript/protein identifiers and I select a predefined annotation?
  • This will result in a request for a complete genome analysis using the corresponding predefined annotation, taking additionally into account the genes/transcripts as identified by the identifiers. Consequently, generating the results may take comparatively long (i.e., several minutes), also dependent on the current server load. However, it enables to mix transcripts/genes from another annotation into one of the provided reference annotations.

  • Why is the AS event extraction on the reference annotations so fast?
  • In order to avoid long waiting times on a fixed input like the provided annotation files, AStalavista provides precomputed results here. Clearly, this is no longer possible when provided reference annotations are mixed with custom input, e.g., additional annotations or gene/transcript/protein identifier lists.

  • I have a transcript with no known identifier nor genomic annotation, just the FASTA sequence. Can I still use AstaLaVisTa?
  • Yes, but if the transcript is new (no ID), you need to know the genomic coordinates of the exon-intron structure.
    To obtain this, the best method is to align the transcript and the genomic sequence. You may try some dedicated splice-aligners web services like GeneSeqer, Spidey or BLAT.

  • My identifier is not found. Where can I check which kind of IDs are supported by Astalavista?
  • Here are the exhaustive list of the identifiers we handle for each species (gene/transcript/protein):
    fruitfly (dm2)
    human (hg17)
    human (hg18)
    mouse (mm8)
    worm (ce2)
    To look for yours, click on the link and use the find function of your web browser (often Ctrl-F).
    If you think that your identifier should be in the list or that an important source of annotation is missing, please let us know!

  • Why is the webserver producing a different output than the downloadable program?
  • We try our best to keep the webserver up to date, but the downloadable program versions are always more recent than the webserver. In case you encounter differences, the events shown by the webserver should always be a subset of the ones found by the executable in the download bundle. However, if you have the impression that something is fishy with your output, file a bug report or contact us.

     
      Disclaimer webmaster