Comparison of unclassified proteins in published databases. The reference sets are 1 – All bacteria using Inverse homology 2 – All bacteria; 3 – Motile bacteria; 4 – Protebacteria; 5 – Selected bacteria. Pearson correlation coefficient on binary profiles was used to measure profile similarity for reference two through five. For Reference 1, Pearson correlation coefficient applied to Inverse homology. Pearson correlation coefficient with confidence r > 0.8 E-value 10-5 was used as threshold for BLAST. Classified (green); unclassified (yellow).