Skip to main content

Table 1 Effect of gene predictions

From: Microbial comparative pan-genomics using binomial mixture models

Data set

Observed

ORFans

Chao

Bin. mix.

Original NCBI

12599

5438

26614

42640

Reduced 10%

11273

4470

22549

32528

Reduced 50%

9336

3272

17083

27456

Easygene

9211

3121

17041

29818

  1. The number of observed gene families in data set, the number of ORFans (gene families found in 1 genome only), Chao estimates and binomial mixture estimates of pan-genome size for the original E. coli data as well as reduced data sets. "Reduced 10%" means the 10% shortest hypothetical proteins were removed from the original data set, and correspondingly for "Reduced 50%". "Easygene" is a new data set with genes predicted by the Easygene gene prediction tool.