The following section describes the main features of OpenGenomeBrowser. The reader may try them out at opengenomebrowser.bioinformatics.unibe.ch, where a freely accessible demo server with 70 bacterial genomes is hosted. Notably, on most pages, users may click on Tools, then Get help with this page to be redirected to a site that explains how the tool works and how to use it. Moreover, advanced configuration options are available on some pages. They can be accessed via a sidebar that opens when one clicks on the settings wheel (⚙) at the top right corner of the page.
Genomes table
Especially in large sequencing projects, it is vital that the data can be filtered and sorted according to metadata. This is the purpose of the genomes table view (Fig. 2) which serves as the entry point of OpenGenomeBrowser. By default, only the representative genomes are listed and only the name of the organism, the genome identifier, the taxonomic name, and the sequencing technology are shown as columns. Furthermore, there are over forty additional metadata columns available that can be dynamically added to the table. All columns can be used to filter and sort the data, which makes this view the ideal entry point for an analysis.
Detail views
The genome detail view (Fig. S1A) shows all available metadata of the respective genome and allows the user to download the associated files.
The gene detail view (Fig. S1B) is designed to facilitate easy interpretation of the putative functions of genes. It shows all annotations, their descriptions, the nucleotide- and protein sequences, metadata from the GenBank file and an interactive gene locus visualization facilitated by DNA features viewer [20]. If the gene is annotated with a gene ontology term that represents a subcellular location, this location will be highlighted on a SwissBioPics image [21].
Genomes in OpenGenomeBrowser can be labelled with tags, i.e., a short name (e.g., “halophile”) and a description (e.g., “extremophiles that thrive in high salt concentrations”). The tag detail view (Fig. S1C) shows the description of the tag and the genomes that are associated with it. Tags are particularly useful to quickly select groups of genomes in many tools of OpenGenomeBrowser. For example, to select all genomes with the tag “halophile”, the syntax “@tag:halophile” can be used.
Similarly, the TaxId detail view (Fig. S1D) shows all genomes that belong to the respective NCBI Taxonomy identifier (TaxId) [22], as well as the parent TaxId. Similar to tags, TaxIds can be used to select all genomes that belong to a certain TaxId, like this: “@taxphylum:Firmicutes”, or simply “@tax:Firmicutes”.
Gene comparison
The gene comparison view (Fig. 3) enables users to easily compute multiple sequence alignments and to compare gene loci side-by-side. Currently, Clustal Omega [23], MAFFT [24] and MUSCLE [25] are supported alignment algorithms. Alignments are visualized using MSAViewer [26] (Fig. 3B). Furthermore, the genomic regions around the genes of interest can be analyzed using a customized implementation of DNA features viewer [20] (Fig. 3C). Figure 3 shows an alignment of all genes on the demo server that contain the annotation K01610 (phosphoenolpyruvate carboxykinase; from the pyruvate metabolism pathway). The gene loci comparison reveals that in all queried Lacticaseibacilli, the genes are located in syntenic regions, i.e., next to the same orthologous genes.
Annotation search
Despite conceptually and technically straightforward, searching for annotations in a set of genomes can be tedious or even impossible for non-programmers. In OpenGenomeBrowser, annotation search is quick and easy, thanks to the PostgreSQL backend that allows fast processing of annotation information. In the annotation search view (Fig. 4), users can search for annotations in genomes, resulting in a coverage matrix (Fig. 4C) with one column per genome and one row per annotation. The numbers in the cells show how many genes in the genome have the same annotation. Clicking on these cells shows the relevant genes (Fig. 4D), while clicking on an annotation enables users to compare the corresponding genes (gene comparison view).
Pathways
Pathway maps, particularly the ones from the KEGG [27], are valuable tools to understand the metabolism of an organism. However, using them may be cumbersome. Commonly, biologists upload sequences to a service like BlastKOALA [28]. This service is designed to process one organism at a time, and calculation times can last multiple hours. Because each genome must be submitted individually, it becomes cumbersome when multiple organisms must be processed. Furthermore, it is not trivial to visualize multiple genomes on a pathway map. In OpenGenomeBrowser, this process is straightforward (Fig. 5A-C), user-friendly, and fast, as the annotations are pre-calculated and loaded into the database beforehand. Pathway maps are interactive, which allows the user to explore this information in great detail (Fig. 5D-F). For example, to investigate the genes that are involved in a certain enzymatic step, one needs only to click on the enzyme box, then on an annotation of interest, and finally on “compare the genes” to be redirected to gene comparison view.
While OpenGenomeBrowser does not include KEGG maps for licensing reasons, users with appropriate rights can generate them using a separate program [29]. The pathway maps do not necessarily have to be from KEGG. Pathway maps in a custom Scalable Vector Graphics (SVG) may be added to a designated folder in the folder structure (not shown in Fig. 1).
Blast
OpenGenomeBrowser allows users to perform a local alignment of protein and nucleotide sequences using BLAST [4]. The results are visualized using the BlasterJS [30] library.
Trees
OpenGenomeBrowser computes three kinds of phylogenetic trees. The fastest type of tree is based on the NCBI taxonomy ID which is registered in the metadata. It is helpful to get a quick taxonomic overview, but it entirely depends on the accuracy of the metadata.
The second type of tree is based on genome similarity. The assemblies of the selected genomes are compared to each other using GenDisCal-PaSiT6, a fast, hexanucleotide-frequency-based algorithm with similar accuracy as average nucleotide identity (ANI) based methods [31]. This algorithm yields a similarity matrix from which a dendrogram is calculated with the unweighted pair group method with arithmetic mean (UPGMA) algorithm [32]. We recommend this type of tree as a good compromise between speed and accuracy, specifically if many genomes are to be compared.
The third type of tree is based on the alignment of single-copy orthologous genes. This type of tree is calculated using the OrthoFinder [33] algorithm. Of all proposed tree type algorithms it is the most time- and computation-intensive and requires pre-computed all-vs-all DIAMOND [34] searches.
Dot plot
Dot plot is a simple and established [35] method of comparing two genome assemblies. It allows the discovery of insertions, deletions, and duplications, especially in closely related genomes sequenced with long-read technologies. In OpenGenomeBrowser’s implementation of dot plot, the assemblies are aligned against each other using MUMmer [36] and visualized using the Dot library [37]. The resulting plot (Fig. 6) is interactive, i.e., the user can zoom in on regions of interest by drawing a rectangle with the mouse and clicking on a gene which then opens the context menu with detailed information.
Gene trait matching
The gene trait matching view enables users to find annotations that correlate with a (binary) phenotypic trait. The input must consist of two non-intersecting sets of organisms that differ in a trait. OpenGenomeBrowser applies a Fisher’s exact test for each orthologous gene and corrects for multiple testing (alpha = 10%) using the Benjamini-Hochberg method [38, 39]. The multiple testing parameters can be adjusted in the settings sidebar. The test can be used on orthogenes as well as any other type of annotation, such as KEGG-gene annotation. The gene candidates that may be causing the trait can easily be further analyzed, for example by using the compare genes view.
Flower plot
The flower plot view provides the users with a simple overview of the shared genomic content of multiple genomes. The genomes are displayed as petals of a flower. Each petal indicates the number of annotations that are unique to this genome and the number of genes that are shared by some but not all others. The number of genes shared by all genomes is indicated in the center of the flower. (The code is also available as a standalone Python package [40]).
Downloader
The downloader view facilitates the convenient download of multiple raw data files, for example all protein FASTA files for a set of organisms.
Admin panel
OpenGenomeBrowser has a powerful user authentication system and admin interface, inherited from the Django framework. Instances of OpenGenomeBrowser can be configured to require a login or to allow basic access to anonymous users. Users can be given specific permissions, for example to create other user accounts, to edit metadata of organisms, genomes, and tags, and even to upload new genomes through the browser.
Resource requirements
OpenGenomeBrowser is not resource intensive. An instance containing over 1400 bacterial genomes runs on a computer with 8 CPU-cores (2.4 GHz) and 20 GB of RAM. The Docker container is about 3 GB in size and the Postgres database takes 21 GB of storage (SSD recommended).