VisualTE: a graphical interface for transposable element analysis at the genomic scale
© Tempel and Talla; licensee BioMed Central. 2015
Received: 10 August 2014
Accepted: 18 February 2015
Published: 27 February 2015
Transposable elements are mobile DNA repeat sequences, known to have high impact on genes, genome structure and evolution. This has stimulated broad interest in the detailed biological studies of transposable elements. Hence, we have developed an easy-to-use tool for the comparative analysis of the structural organization and functional relationships of transposable elements, to help understand their functional role in genomes.
We named our new software VisualTE and describe it here. VisualTE is a JAVA stand-alone graphical interface that allows users to visualize and analyze all occurrences of transposable element families in annotated genomes. VisualTE reads and extracts transposable elements and genomic information from annotation and repeat data. Result analyses are displayed in several graphical panels that include location and distribution on the chromosome, the occurrence of transposable elements in the genome, their size distribution, and neighboring genes’ features and ontologies. With these hallmarks, VisualTE provides a convenient tool for studying transposable element copies and their functional relationships with genes, at the whole-genome scale, and in diverse organisms.
VisualTE graphical interface makes possible comparative analyses of transposable elements in any annotated sequence as well as structural organization and functional relationships between transposable elements and other genetic object. This tool is freely available at: http://lcb.cnrs-mrs.fr/spip.php?article867.
Transposable elements (TEs) are repeated DNA sequences that can represent a large fraction of the genomic DNA in eukaryotic species . The sequencing and annotation of complete prokaryotic and eukaryotic genomes has revealed the massive impact of TEs on genomic structure, evolution, and gene regulation [2-4]. Currently, most bioinformatics tools related to transposable elements are TE databases that collect and organize TE families in genomes (e.g. Repbase  and ISFinder  for eukaryotic and prokaryotic TEs, respectively); or detection methods (e.g. RepeatMasker , Censor , Repet ) that look for TE copies in sequences.
To our knowledge, the UCSC Genome Browser (genome.ucsc.edu/index.html) , ENSEMBL site (www.ensembl.org/index.html) , and DFAM database (www.dfam.org)  are the only Web browsers available that allow for the visualization and exploration of TE annotations. These browsers can display TEs with different resolutions, but they do not permit analyses and comparisons of individual TE families and superfamilies. Moreover, these browsers do not display similarities of TEs compared with their consensus sequences, which is essential for dating different generations of TEs. Previously, we developed VisualRepbase (www.girinst.org/downloads/software/) , a JAVA interface that browses for occurrences of TEs in annotated genomes based on their family name and their similarity to recognized consensus sequences, and allows the user to compare the age and the invasion origin of selected TEs. However, VisualRepbase suffers from a limited number of available genomes due to infrequent updates of the background database.
Furthermore, VisualRepbase, ENSEMBL, and the UCSC Genome Browsers do not show relationships between transposable elements and neighboring genes. Here, we describe a new stand-alone software named VisualTE that dynamically displays and analyzes occurrences of TE families within annotated genomes, based on TE similarity and size. VisualTE also exhibits TE relationships with neighboring gene features, as well as inter- and intra-chromosomal comparisons.
Implementation and input data
VisualTE is written in the JAVA programming language (JAVA version 1.7 or later). The downloadable version can be installed and run on any operating system, including Windows, MacOS, and Linux.
VisualTE input data are divided into two categories: the annotation file (in Genbank and/or EMBL formats) and the repeat file. For the latter, VisualTE recognizes AB-BLAST , NCBI-BLAST , Censor , RepeatMasker , and Repet  formats. Moreover, a VisualTE format has been defined for the annotation and repeat files (see Additional file 1). A TE neighboring gene is defined as the closest annotated gene located upstream or downstream of a selected TE. VisualTE needs a file named ‘gene2go’ that can be downloaded from the NCBI website (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/) to analyze the Gene Ontology (GO) of these TE neighboring genes. VisualTE contains a TE superfamily information file that was extracted from the Repbase database (version 19.04)  and the ISFinder database (January 2014 version) . Compared to VisualRepbase , VisualTE allows the input of any annotated sequence in the right format. For GO studies, the 148 generic GO categories (‘GenericEBI’) and the two first levels (‘TreeLevel1’ and ‘TreeLevel2’) of the GO hierarchical tree were extracted from the EBI (www.ebi.ac.uk/QuickGO/GMultiTerm\#tab=choose-terms) and Gene Ontology (ftp://ftp.geneontology.org/pub/go/ontology/go-basic.obo) websites, respectively. All of the information files from the Repbase, ISFinder, EBI, and GO websites will be regularly updated. The complete Arabidopsis thaliana genome  in Genbank format and reference Repbase families were downloaded from the NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/Arabidopsis_thaliana/) and RepBase (www.girinst.org/repbase/) websites, respectively, for a case study. TE copies were identified using RepeatMasker  with default parameters.
Results and discussion
Data selection area
This area is composed of a ‘Help’ button, a clickable genome tree, a textfield for entering a TE family name, a ‘List of Transposable Elements’ button, and the ‘RUN VisualTE’ button. Clicking on the ‘Help’ button opens a new interface window that explains all functions and buttons of the interface.
To use the VisualTE main interface the user starts by selecting one or several transposable element families (manually within the ‘Selected TEs’ area or from the ‘List of Transposable Elements’ button) with one or several genomic items within the ‘Data Selection’ area (Area 1 in Figure 1). The ‘Selected TEs’ textfield allows the user to enter the name of the desired TE family up to a maximum of 20 TE names (e.g AtREP1, AtREP3, AtREP5 in Figure 1). However, we recommend that the user limits this number to three TE names for better visualization. The ‘List of Transposable Elements’ button also opens a new interface window with the complete list of TE families generated from the input file (classified by organisms), and, therefore, allows for the selection of TE families of interest. The genome tree allows for the selection of particular chromosomes, as shown in Figure 1 for ‘All’ chromosomes of A. thaliana. Adding (or removing) new chromosome(s) to (or from) memory is accomplished by clicking on the ‘Add sequence(s)’ (or ‘Remove sequence(s)’) button in the interface. Finally, with at least one selected chromosome and at least one valid TE family name, the user runs the VisualTE program through the ‘RUN VisualTE’ button.
Graphical option area
The ‘Option’ area includes (i) an ‘Enlarge Graph/Reduce Graph’ button that removes (or displays) the launch domain of the interface for better results visualization, and (ii) four options that dynamically interact with the ‘Graphical Panel area’ (Area 2 in Figure 1).
The ‘Annotation/TE’ menu displays genomic/TE annotations on chromosomes through the ‘Graphical Panel’ area. Genomic annotations include ‘Genes, Exons, PseudoGenes, miscRNA, and 5’- and 3‘-UTRs’, while TEs contain ‘Only selected TEs’, ‘Only selected TE Superfamily’, and ‘All TEs’ categories. An ‘All TEs’ item allows the user to display all TE copies within the selected chromosomes. The ‘Only selected TEs’ and ‘Only selected TE Superfamily’ choices do the same action for a specific TE family and superfamily, respectively. Each submenu independently displays (or removes) all genomic/TE annotations at the same time. The ‘Annotation/TE’ option is useful for examining TE copies according to their genetic environment.
The ‘Display by Size’ slider modifies all graphic panels and shows TEs that are respectively smaller and larger than the minimal and maximal values of the slider knob. By default, these values correspond to the smallest and the largest occurrences of the selected TEs, but can be dynamically changed by the user.
The ‘Display by Similarity’ slider exhibits and removes TEs that have respectively a lower and a larger similarity than the minimal and maximal values of the slider knob. The minimal and the maximal similarities (in comparison with the reference TE families) are set to 50% and 100% by default, respectively; but can be dynamically changed by the user. Since less divergent TE families are considered to be youngest ones, this slider can be used to estimate the evolutionary history of transposition in selected genomes.
The last item is a combo-list called ‘Save Results’. This list contains three saving options: the first two options save the whole graph or the visible part of the selected panel, while the last saving option writes out the TE occurrence list with their surrounding genes to a text file (as shown in the ‘All TE-Gene Features’ panel).
Graphical panel area
Because TEs are involved in genome rearrangements and in the expression of various genes [20,21], this area contains seven graphical panels (Area 3 in Figure 1) that show the structural organization and functional relationships between TEs and their host genomes. The ‘Graphical Panel’ area dynamically displays the selected TEs, each with a specific color code.
TE location on chromosome
This panel, which was first described in VisualRepbase , draws selected TE copies as well as genomic annotation items on chromosomes. Figure 1 shows the AtREP1, AtREP3, and AtREP5 occurrences in blue, green, and light blue rectangles, respectively. By default, two lines representing the selected TE copies and a graduated ruler of the chromosome size are displayed. When ‘Only selected TE Superfamily’, ‘All TEs’ or/and genomic items (‘Genes’, ‘Exons’,...) are selected, new lines corresponding to the annotations or to TE copies are displayed between the two previous lines (as shown in Area 3 in Figure 1). Compared with the panel in VisualRepbase , this panel has an additional button (‘Set Positions’) and two additional textfields (linked to the ‘Set Positions’ button) that dynamically modify the graphical view. The textfields (Start and End Position) and the ‘Set Positions’ button display the chromosome region between the two entered values. In addition, when a user clicks on a graphical element, a menu with detailed information (nature of the genetic object and its location) is displayed (e.g. the detailed information shown for AT3G04790 gene in Figure 1, Area 3). Similar to VisualRepbase, the ‘Zoom In’ (or ‘Zoom Out’) button increases (or decreases) by two-fold the width of the selected chromosomes. The last button, ‘Global View’, resizes the graphical view such that the largest chromosome is entirely included in the graphical panel width. These three buttons also modify the display of the ‘Distribution on Chromosome’ panel.
TE distribution on chromosome
TE distribution on genome
TE size distribution on genome
TE-gene gene ontology
All TE-gene features
This panel summarizes all of the results from the six previous panels into a table for download and further analysis (see Additional file 2). At least two lines for each TE copy correspond to the upstream and downstream genes closest to each TE. A third line is displayed, if the TE copy is inserted within a gene. For each TE, each line contains the TE location and orientation along the chromosomes, the superfamily name, and the similarity (compared with the consensus), as well as the name, the positions, and the orientation of gene, the distance between the neighboring gene and the TE, and the GO family. The last column (‘Ortholog’) represents the TE-gene couple X values as defined before. Moreover, when many genomes are selected, this panel allows a user to identify the TE copies that are conserved (or inserted) close to the same orthologous genes.
VisualTE is a stand-alone JAVA interface that allows users to analyze and visualize the size, the intra-chromosomal and inter-chromosomal copy distribution, and the genetic distance distribution of TE copies. Indeed, the ‘TE-Gene Distance’ graph which examines the relative location between the TE copies and genes, may indicate a role of TE in gene regulation. VisualTE should help researchers identify strong insertion biases toward specific TEs and chromosomes, leading to the discovery of TE functions. Moreover, it easily allows a user to perform comparative analyses with these TEs and any other genetic objects, including genes, exons, UTRs, pseudogenes, and miscRNAs. VisualTE can also exhibit the conserved couple TE-‘orthologous neighboring genes’ with their GOs in selected organisms, which could prove useful for examining functional relationships between TEs and neighboring genes. In summary, this graphical interface makes TE diversification studies possible in a single analysis, and thus might provide clues for understanding TE dynamics at the whole-genome scale.
Availability and requirements
Project Name: VisualTEProject home page: http://lcb.cnrs-mrs.fr/spip.php?article867 Operating system(s): Platform independentProgramming language: JAVALicence: Creative Common v3Any restrictions to use by non-academics: No.
We would like to thank Aurélie Bourhis, Thibault Martin, and Florian Philippe for their initial contributions to the VisualTE development. We thank our sources of funding (Aix-Marseille Université and CNRS).
- Bigot Y. Mobile Genetic Elements. Protocols and genomic applications. Genome Res Methods Mol Biol WALKER J.M. Ed Series Humana Press. 2012; 859:1–308.Google Scholar
- Bennetzen J, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol. 2014; 65:505–30.View ArticlePubMedGoogle Scholar
- Kejnovsky E, Lexa M. Quadruplex-forming DNA sequences spread by retrotransposons may serve as genome regulators. Mob Genet Elem. 2014; 4:28084.View ArticleGoogle Scholar
- Siguier P, Gourbeyre E, Chandler M. Bacterial insertion sequences: their genomic impact and diversity. FEMS Microbiol Rev. 2014; 10:1574–6976.Google Scholar
- Jurka J, Kapitonov V, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res. 2005; 110:462–7.View ArticleGoogle Scholar
- Siguier P, Varani A, Perochon J, Chandler M. Exploring bacterial insertion sequences with ISfinder: objectives, uses, and future developments. Methods Mol Biol. 2012; 859:91–103.View ArticlePubMedGoogle Scholar
- Tempel S. Using and understanding RepeatMasker. Methods Mol Biol. 2012; 859:29–51.View ArticlePubMedGoogle Scholar
- Jurka J, Klonowski P, Dagman V, Pelton P. CENSOR - a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem. 1996; 20:119–22.View ArticlePubMedGoogle Scholar
- Flutre T, Duprat E, Feuillet C, Quesneville H. Considering Transposable Element Diversification in De Novo Annotation Approaches. PLoS ONE. 2011; 6:16526.View ArticleGoogle Scholar
- Karolchik D, Barber G, Casper J, Clawson H, Cline MS, Diekhans M, et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 2014; 42:764–70.View ArticleGoogle Scholar
- Flicek P, Amode M, Barrell D, Beal K, Brent S, Carvalho-Silva D, et al. Ensembl 2012. Nucleic Acids Res. 2012; 40:84–90.View ArticleGoogle Scholar
- Wheeler T, Clements J, Eddy S, Hubley R, Jones T, Jurka J, et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 2013; 41:70–82.View ArticleGoogle Scholar
- Tempel S, Jurka M, Jurka J. VisualRepbase: an interface for the study of occurrences of transposable element families. BMC Bioinformatics. 2008; 9:345.View ArticlePubMed CentralPubMedGoogle Scholar
- Gish W. AB-BLAST. 1996-2009. http://blast.advbiocomp.com.
- Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008; 36:5–9.View ArticleGoogle Scholar
- Kohany O, Gentles A, Hankus L, Jurka J. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006; 7:474.View ArticlePubMed CentralPubMedGoogle Scholar
- The-Arabidopsis-Genome-Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000; 408:796–815.View ArticleGoogle Scholar
- Kapitonov V, Jurka J. Molecular paleontology of transposable elements from Arabidopsis thaliana. Genetica. 1999; 107:27–37.View ArticlePubMedGoogle Scholar
- Kapitonov V, Jurka J. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci. 2001; 98:8923–4.View ArticleGoogle Scholar
- Kazazian HJ. Mobile elements: drivers of genome evolution. Science. 2004; 303:1626–32.View ArticlePubMedGoogle Scholar
- Wessler S. Transposable elements and the evolution of eukaryotic genomes. Proc Natl Acad Sci. 2006; 103:17600–1.View ArticlePubMed CentralPubMedGoogle Scholar
- Kawabe A, Hansson B, Hagenblad J, Forrest A, Charlesworth D. Centromere locations and associated chromosome rearrangements in Arabidopsis lyrata and A. thaliana. Genetics. 2006; 173:1613–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Lyon M. The Lyon and the LINE hypothesis. Semin Cell Dev Biol. 2003; 14:313–8.View ArticlePubMedGoogle Scholar
- Cordaux R, Batzer M. The impact of retrotransposons on human genome evolution. Nat Rev Genet. 2009; 10:691–703.View ArticlePubMed CentralPubMedGoogle Scholar
- Cultrone A, Domínguez Y, Drevet C, Scazzocchio C, Fernández-Martín R. The tightly regulated promoter of the xanA gene of Aspergillus nidulans is included in a helitron. Mol Microbiol. 2007; 63:1577–87.View ArticlePubMedGoogle Scholar
- Kogan G, Usakin L, Ryazansky S, Gvozdev V. Expansion and evolution of the X-linked testis specific multigene families in the melanogaster species subgroup. PLoS ONE. 2012; 7:37738.View ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.