Gene family classification strategy. (A) Jaccard index (J) as measure of overlap between a trusted reference gene family (right ellipse) and a predicted gene family (left ellipse). (B) Proteins of a known reference gene family (black) are hierarchically clustered with all other proteins of one ore more species (white). From the resulting hierarchical tree the best matching cluster (= sub-tree with the highest Jaccard index, dashed rectangle, here J = 0.83) is extracted to represent the gene family. All proteins in this cluster will be predicted as members of this gene family.