Legend for figure 9. A heatmap of counts of ORFs with an InterPro entry for the TOP 100 entries from Figure 8a. In the main heatmap colour intensity of a cell shows the number of ORFs with an InterPro entry shown by entries (rows) and by species (columns). Both rows and columns are ordered by hierarchical clustering to group similar rows or columns together. Columns were clustered with counts of ORFs while rows were clustered with the entry PCA loadings (Left side heatmap and Figure 8a). The dendrogram from hierarchical clustering is shown for columns and the phylum of species is indicated by a column colour bar between the heatmap and the dendrogram. Under the heatmap each species is specified by an abbreviation explained in Table 1. Left side heatmap shows the loading of the entry as in Figure 8a. Interpro entry identifier ("IPR id.", "IPR0" removed from beginning), name ("IPR name") and "Author assignment" are shown for each entry. The "Author assignment" is an assignment to general themes that summarise the individual categories based on the InterPro database. While the other assignments are directly based on the InterPro, "Secondary metabolism" covers entries which are known to participate also in secondary metabolism ([61, 62] and InterPro). InterPro entries assigned to "Dubious" are entries that InterPro itself considers unreliable.