Table 1 Effect of gene predictions

From: Microbial comparative pan-genomics using binomial mixture models

Data set Observed ORFans Chao Bin. mix.
Original NCBI 12599 5438 26614 42640
Reduced 10% 11273 4470 22549 32528
Reduced 50% 9336 3272 17083 27456
Easygene 9211 3121 17041 29818
  1. The number of observed gene families in data set, the number of ORFans (gene families found in 1 genome only), Chao estimates and binomial mixture estimates of pan-genome size for the original E. coli data as well as reduced data sets. "Reduced 10%" means the 10% shortest hypothetical proteins were removed from the original data set, and correspondingly for "Reduced 50%". "Easygene" is a new data set with genes predicted by the Easygene gene prediction tool.