Skip to main content
Figure 1 | BMC Genomics

Figure 1

From: Gene duplications in prokaryotes can be associated with environmental adaptation

Figure 1

General data flow in the analysis. For each organism the proteome was extracted and paralogs were identified with Blast searching. The resulting paranomes were analysed by plotting the number of genes with paralogs vs. genome length (Figure 2) and the distribution of COG categories (Figure 3). The paranomes were analysed for statistical overrepresentation of annotation terms using DAVID. The output from DAVID was initially analysed as a graph (Figure 4), using overlap between complete annotation lists from DAVID to define pairwise similarities between species. Clusters in the graph were identified with visual inspection (Table 1). For a more stringent analysis only GO terms were used, and these were analysed with biclustering using a matrix consisting of individual GO term occurrences for each species. First data based on three different background models for DAVID were compared (Figure 5). Clusters for the optimal background model were then analysed for GO terms frequently associated with specific species (Table 2). Interesting clusters in the biclustering were identified by analysing GO similarity versus genomic distance (Figure 6), and selected clusters representing simultaneously both high GO similarity and large evolutionary distance (Table 3) were discussed in relationship to literature data.

Back to article page