Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: Trends in genome dynamics among major orders of insects revealed through variations in protein families

Fig. 1

The hierarchical clustering of protein sequences from complete sequenced genomes. a ProtoLevel (PL) is a normalized measure for the time of clustering procedure, where all leaves and the root cluster have PL = 0 and PL = 100, respectively. Cuts at predetermined PL thresholds are shown (dashed lines). At a certain cut, the clusters are a collection of disjoint families. Higher value for PL is associated with a smaller number of protein families. Empty circles mark proteins that are unannotated by an external expert system (e.g., Pfam) but belong to the family. Root superfamilies (Root SFs) are clusters at the top of the hierarchy based on a pruning of the binary tree at PL99. The total number of ProtoBug families, Root SFs and proteins that have no external annotations is shown. b Size distribution of the protein families from 18 Arthropods-complete proteomes. The histogram of protein families is ranked by their sizes. The blue bars show families of size 18 and multiplications (i.e., 36, 54). All families with >100 proteins each are combined

Back to article page