BOV – a web-based BLAST output visualization tool
- Rajesh Gollapudi†1,
- Kashi Vishwanath Revanna†1,
- Chris Hemmerich1,
- Sarah Schaack2 and
- Qunfeng Dong1Email author
© Gollapudi et al; licensee BioMed Central Ltd. 2008
Received: 28 June 2008
Accepted: 15 September 2008
Published: 15 September 2008
The BLAST program is one of the most widely used sequence similarity search tools for genomic research, even by those biologists lacking extensive bioinformatics training. As the availability of sequence data increases, more researchers are downloading the BLAST program for local installation and performing larger and more complex tasks, including batch queries. In order to manage and interpret the results of batch queries, a host of software packages have been developed to assist with data management and post-processing. Among these programs, there is almost a complete lack of visualization tools to provide graphic representation of complex BLAST pair-wise alignments. We have developed a web-based program, B LAST O utput V isualization Tool (BOV), that allows users to interactively visualize the matching regions of query and database hit sequences, thereby allowing the user to quickly and easily dissect complex matching patterns.
Users can upload the standard BLAST output in pair-wise alignment format as input to the web server (including batch queries generated installing and running the stand-alone BLAST program on a local server). The program extracts the alignment coordinates of matching regions between the query and the corresponding database hit sequence. The coordinates are used to plot each matching region as colored lines or trapezoids. Using the straightforward control panels throughout the web site, each plotted matching region can be easily explored in detail by, for example, highlighting the region of interest or examining the raw pair-wise sequence alignment. Tutorials are provided at the website to guide users step-by-step through the functional features of BOV.
BOV provides a user-friendly web interface to visualize the standard BLAST output for investigating wide-ranging genomic problems, including single query and batch query datasets. In particular, this software is valuable to users interested in identifying regions of co-linearity, duplication, translocation, and inversion among sequences. A web server hosting BOV is accessible via http://bioportal.cgb.indiana.edu/cgi-bin/BOV/index.cgi and the software is freely available for local installations.
The b asic l ocal a lignment s earch t ool (BLAST ), which allows for the comparison of similar sequences from the same species or across multiple species, has become one of the most popular bioinformatics programs used by biologists. This tool enables researchers to search their queries against sequence databases and produces an output of pair-wise alignments based on the query sequences and matching sequences from the database (referred to as hits). Although currently the majority of users utilize BLAST servers via the web which allow users to search against specialized databases (e.g., NCBI , PlantGDB ), increasingly biologists are installing the BLAST program on their local computer in order to search against customized sequence collections (see tutorials for running locally-installed BLAST program, e.g., ). As more sequence data have become accessible and the questions posed by genomicists have increased in complexity, additional programs have been developed to address some of the limitations of the basic BLAST capabilities, including those aimed at helping biologists post-process large BLAST outputs. Some examples include the MuSeqBox  and BioParser  programs, which can be used to flexibly select BLAST matching regions based on percent alignment coverage and identity, alignment scores, expectation value (E-value), and other attributes. In addition, the NuclearBLAST program  and the PLAN web server  allow users to store the plain BLAST text output in a relational database that enables advanced keyword searches for convenient data mining. Although the algorithms of such BLAST-utility programs are usually not sophisticated, their availability significantly improves the utility of the basic program and relieves the potential frustration of biologists who may otherwise be overwhelmed by having to manually analyze large sets of BLAST outputs.
In this same vein, we have developed a tool for the graphical output and visual analysis of matching regions between query and hit sequences identified by BLAST. This program is specifically designed to allow for the display of multiple regions of similarity between query and hit sequences which can be identified by BLAST, and has the additional benefit of handling single and batch query datasets. The typical BLAST output produces so-called High-scoring Segment Pairs (HSPs) that correspond to each matching region between the query and the database hit sequence. The query and hit coordinates of each HSP (i.e., where the matching region starts and ends on the query and hit sequences) are embedded in the BLAST pair-wise alignment. In a hypothetical example, when a genomic region that contains both gene X and gene Y from species A is compared to the orthologous genomic region from species B that contains gene X' and gene Y' (orthologs of gene X and Y, respectively), the BLAST output may contain two HSPs that represent the matching regions between the two orthologous gene pairs. The distribution of multiple HSPs can be highly complex if gene duplications, inversions, and/or rearrangements have occurred in the regions being compared . Even moderately complex multiple HSP distribution can be very difficult to interpret based on the raw BLAST pair-wise alignment output. Specifically, in order to find out which region of the query matches which region(s) of the hit sequence, the matching coordinates of each HSP must be mapped on both the query and hit sequences. Such a mapping process can be tedious if done by hand (e.g., manually extracting each set of matching coordinates and drawing them on pieces of paper). To automate HSP mapping, we have developed a web program that parses the HSP coordinates of an uploaded BLAST output to generate interactive maps which graphically display matching regions. This tool can be used with typical single-query BLAST outputs obtained online or with batch query outputs generated by those users who have installed stand-alone BLAST programs locally on their computers.
Results and discussion
Visualization tools are critical for interpreting data and making discoveries in the area of bioinformatics, and especially in the field of comparative genomics. Often, complex changes in genome structure and organization (e.g., gene duplications, inversions, and other rearrangements) are best identified, examined, and verified graphically rather than via the automated numerical or textual output of most computer programs. For whole-genome comparisons, sophisticated tools such as VISTA , GenAlyzer , SynBrowse , Sybil , Cinteny , and AutoGRAPH  have been developed and are popular within the bioinformatics community. Many biologists, however, may find it inconvenient to either install the software (e.g., SynBrowse and its prerequisites can present a serious challenge to install and update, even for bioinformaticians), or to prepare required data input files (e.g., the specific GFF-format files required by SynBrowse may require computer programming). On the other hand, many biologists have already become adept at using the well-known BLAST program and have applied it to a diverse number of research questions. In fact, many comparative genomics projects are first initiated by a simple BLAST search. Despite the popularity of the BLAST program, there is no published visualization tool that converts the raw BLAST pair-wise alignment into a straightforward display plotting the positions of multiple HSPs. Such a visualization tool can allow biologists to easily and quickly investigate gene or genome structures in a comparative genomics context, without the potential hassles of having to invoke the heavyweight genome browsers mentioned above.
Although few visualization programs directly related to BLAST outputs are currently available, some do exist that attempt to satisfy the most common goals in comparative genomics research. For example, Durand et al.  published the Visual BLAST program. Although its accompanying paper describes several useful features for analyzing BLAST outputs, the computer program was designed to run using the now obsolete Microsoft Windows 95/NT operating system and the web site hosting the program is no longer available. In addition, based on the description of their published paper, Visual BLAST does not provide the plotting function for HSPs available in BOV. Another program, BLAST2GENE , converts BLAST output into a graphical plot (although very different than the ones produced by BOV). However, the BLAST2GENE program is only designed to compare one small gene sequence against a larger genomic region in order to identify all the similar gene copies in the latter. As a result, only an asymmetric diagram is plotted to indicate the HSP positions on the larger genomic region. In addition, its accompanying web server can only handle single-query BLAST outputs. In contrast, the BOV program treats both the query and hit sequence equivalently and both can be long sequences. In addition, the BOV server can process multi-query BLAST outputs and can produce a useful graphic for publication and presentation (easily available for download).
Originally, the development of BOV was prompted by our need to carefully analyze the BLAST comparisons among a set of Daphnia pulex genomic contigs (for an example of the application of BOV using this output, see fig. 1). After submitting our BLAST output to the server, a large-scale overview of the BLAST alignment regions between the assembled Daphnia genomic contig #1255 (total length 80,559 bp) and genomic contig #9748 (total length 129,917 bp) are produced (fig. 1c). Although the two contigs are quite long, the matching regions are limited to near the 3' portion of both contigs (fig. 1c). After zooming into the high-score matching region, some complicated matching patterns can be identified (fig. 1d, 1e). Five blocks of genomics regions show perfect co-linearity between these two contigs (colored trapezoids in fig. 1e, the query regions 59228–60123 in dark magenta, 60306–61235 in dark olive green, 62290–64492 in brown, and 64863–65497 in dark slate blue); five blocks of regions of contig #1255 show inversion and/or translocation in the corresponding regions of contig #9748 (colored lines in fig. 1e, query regions 57316–58420 in dark green, 59475–60704 in dark blue, 68330–69396 in dark cyan, 62874–63637 in dark orchid, and 69580–70221 in dark red). In addition, several regions of contig #9748 are duplicated in contig #1255 (e.g., the hit region 104119–105485 matches to both the query region 59475–60704 and 68998–70369, with the inversion of the query region 59475–60704). The plot of the matching regions (fig. 1e) makes it much more straightforward to fully understand the topology of complex genomic regions, including stretches of co-linearity, inversion, translocation, and duplication. In addition, using the tools embedded in BOV, it is possible to retrieve the sequence segments from the above matching regions for further study of their identities. Out of the ten highlighted matching regions, four match to Daphnia EST sequences (e.g., the query region 62290–64492 in brown; data not shown) indicating that those HSP segments correspond to actively transcribed genic regions. Using BLAST to search for matches among these regions and the NCBI nr and nt databases, we find no hits to known genes or repetitive elements (data not shown) indicating that, although there may be functional significance to at least some of these regions, the identification of candidate homologous genes or functional domains will require further investigation.
We envision that BOV will be very useful for biologists interested in examining the evolution of gene structures (including intron/exon turnover across species), relationships among orthologous and paralogous genes, analysis of repetitive elements or tandem arrays, as well as the identification of regions of small or large scale synteny along chromosomes (including inversions, translocations, and gene duplications). The BOV tool provides a freely accessible web server for biologists with no software installation or maintenance required by the user. Users can simply upload their BLAST output and follow the intuitive web interface to visualize the mapping of the HSPs. Similar to other BLAST-utility programs, BOV is not sophisticated in its computer science algorithms and can be downloaded in its entirety for use and/or modification depending on specific analytical needs. For example, although the BOV program is designed to process BLAST outputs, the parsing component can be easily replaced to handle any other pair-wise alignment program using our drawing routines. We believe that the availability of BOV enriches a biologist's tool kit for effectively processing BLAST output and conducting comparative genomic research.
The BLAST program is widely used in genetic and genomic research. We have developed a web server, BOV, to provide a visualization tool for biologists to conveniently dissect the BLAST output for complex matching patterns. Our program allows for single- or batch-query manipulation, can be easily accessed and downloaded for use and modification, and provides a user-friendly web interface to interactively visualize the matching regions of query and database hit sequences.
Availability and requirements
The BOV program is freely accessible, using a web browser at http://bioportal.cgb.indiana.edu/cgi-bin/BOV/index.cgi. The software is also available from the web site for local installation. We have made BOV portable across Linux and UNIX distributions, and compatible with BioPerl 1.4 (or higher version) and MySQL 5.0 (or higher version). We have tested the BOV installation on SunOS 5.11(i386), Ubuntu Linux server v2.6.24, and Gentoo Linux x86_64 v2.6.23. BOV can be viewed with FireFox 1.5, Opera 9.27, Safari 3.0, Internet Explorer 7.0, or their higher version. The BOV website will be updated to contain the latest information on operating system and software compatibility.
Project name: BOV
Project home page: http://bioportal.cgb.indiana.edu/cgi-bin/BOV/index.cgi
Operating systems: Local installation requires Linux/UNIX.
License: The software is under the Apache license 2.0.
This work was supported in part by the Indiana METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. We would like to thank Phillip Steinbachs for maintaining the system at the production server.
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402.PubMedPubMed CentralView ArticleGoogle Scholar
- Ye J, McGinnis S, Madden TL: BLAST: improvements for better sequence analysis. Nucleic Acids Res. 2006, W6-9. 34 Web Server
- Dong Q, Lawrence CJ, Schlueter SD, Wilkerson MD, Kurtz S, Lushbough C, Brendel V: Comparative plant genomics resources at PlantGDB. Plant Physiol. 2005, 139 (2): 610-618.PubMedPubMed CentralView ArticleGoogle Scholar
- Dong Q, Brendel V: Computational identification of related proteins: BLAST, PSI-BLAST, and other tools. The Proteomics Protocols Handbook. Edited by: Walker JM. 2005, Humana Press, 555-570.Google Scholar
- Xing L, Brendel V: Multi-query sequence BLAST output examination with MuSeqBox. Bioinformatics. 2001, 17 (8): 744-745.PubMedView ArticleGoogle Scholar
- Catanho M, Mascarenhas D, Degrave W, de Miranda AB: BioParser: a tool for processing of sequence similarity analysis reports. Appl Bioinformatics. 2006, 5 (1): 49-53.PubMedView ArticleGoogle Scholar
- Diener SE, Houfek TD, Kalat SE, Windham DE, Burke M, Opperman C, Dean RA: Alkahest NuclearBLAST: a user-friendly BLAST management and analysis system. BMC Bioinformatics. 2005, 6: 147-PubMedPubMed CentralView ArticleGoogle Scholar
- He J, Dai X, Zhao X: PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results. BMC Bioinformatics. 2007, 8: 53-PubMedPubMed CentralView ArticleGoogle Scholar
- Suyama M, Torrents D, Bork P: BLAST2GENE: a comprehensive conversion of BLAST output into independent genes and gene fragments. Bioinformatics. 2004, 20 (12): 1968-1970.PubMedView ArticleGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12 (10): 1611-1618.PubMedPubMed CentralView ArticleGoogle Scholar
- Pan X, Stein L, Brendel V: SynBrowse: a synteny browser for comparative sequence analysis. Bioinformatics. 2005, 21 (17): 3461-3468.PubMedView ArticleGoogle Scholar
- Wang H, Su Y, Mackey AJ, Kraemer ET, Kissinger JC: SynView: a GBrowse-compatible approach to visualizing comparative genome data. Bioinformatics. 2006, 22 (18): 2308-2309.PubMedView ArticleGoogle Scholar
- Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I: VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004, W273-279. 32 Web Server
- Choudhuri JV, Schleiermacher C, Kurtz S, Giegerich R: GenAlyzer: interactive visualization of sequence similarities between entire genomes. Bioinformatics. 2004, 20 (12): 1964-1965.PubMedView ArticleGoogle Scholar
- Crabtree J, Angiuoli SV, Wortman JR, White OR: Sybil: methods and software for multiple genome comparison and visualization. Methods Mol Biol. 2007, 408: 93-108.PubMedView ArticleGoogle Scholar
- Sinha AU, Meller J: Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. BMC Bioinformatics. 2007, 8: 82-PubMedPubMed CentralView ArticleGoogle Scholar
- Derrien T, Andre C, Galibert F, Hitte C: AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps. Bioinformatics. 2007, 23 (4): 498-499.PubMedView ArticleGoogle Scholar
- Durand P, Canard L, Mornon JP: Visual BLAST and visual FASTA: graphic workbenches for interactive analysis of full BLAST and FASTA outputs under MICROSOFT WINDOWS 95/NT. Comput Appl Biosci. 1997, 13 (4): 407-413.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.