Data set | Observed | ORFans | Chao | Bin. mix. |
---|
Original NCBI | 12599 | 5438 | 26614 | 42640 |
Reduced 10% | 11273 | 4470 | 22549 | 32528 |
Reduced 50% | 9336 | 3272 | 17083 | 27456 |
Easygene | 9211 | 3121 | 17041 | 29818 |
- The number of observed gene families in data set, the number of ORFans (gene families found in 1 genome only), Chao estimates and binomial mixture estimates of pan-genome size for the original E. coli data as well as reduced data sets. "Reduced 10%" means the 10% shortest hypothetical proteins were removed from the original data set, and correspondingly for "Reduced 50%". "Easygene" is a new data set with genes predicted by the Easygene gene prediction tool.