Comparison of different combinations of reference genome sets and E-value thresholds. a) E-3 b) E-5 c) E-10 d) E-15. Positive Predicted Value (PPV) was calculated using COG functional category as described in the methods section. Random indicates protein-protein interactions generated by scrambling proteins identities for each reference genome set. In Additional file 1, PPV was calculated using EcoCyc.