Skip to main content

Table 5 Gene set quality measurements, including deviation of protein size from the group median, and maximal bit score per species in pairwise comparisons within the arthropod orthology groups. The bit score measures both gene model artefacts of alternative gene sets within species, and evolutionary divergence. Protein sizes may be more evolutionarily conserved, and may detect artefacts across and within speciesa

From: OGS2: genome re-annotation of the jewel wasp Nasonia vitripennis

Gene set

Average homology bitscore

Protein size deviation from median

Percent shorter than 2 standard deviations from median

Nasonia OGS2

727.6

−7.7

3.2

Nasonia NCBI

722.3

−7.8

2.7

Nasonia OGS1.2

683.5

−12.7

4

Apis

733.9

−0.3

2.4

Harpegnathos

694.3

−30

7.3

Tribolium

552

−26.1

4.5

Drosophila

508.7

54.5

1.3

  1. aFor each orthology group, the median protein size of all genes among the species within the group is determined. Then for each species gene set, the maximal BLASTp bit score of a gene within that group is recorded as metric #1, and the protein size difference from the group median of that maximal match is recorded as metric #2. These metrics are averaged for all groups per species, and reported as average bit score, as average size deviation, and as percentage of size outliers (2 standard deviations below median sizes). These gene set quality measurements are provided by the Evigene scripts: “eval_orthogroup_genesets.pl” and “orthomcl_tabulate.pl”. Partial gene models are a common artefact of draft gene sets, indicated by both a negative deviation from group median sizes, and larger percentage of outliers. A similar calculation is part of the OrthoDB methodology [108]