Design, components, and generation of ComPIL databases. a ComPIL utilizes 3 databases that are generated from an input protein FASTA file. MassDB contains peptide sequences organized by distinct mass; ProtDB contains protein information; SeqDB contains distinct peptide sequences along with their parent proteins (mapped to ProtDB). b Public protein repositories and numbers of proteins incorporated into ComPIL. Numbers shown above columns are in millions. c 1) Protein data from various repositories (shown in b) were grouped together in FASTA format. Protein records were imported into ProtDB. 2) Proteins were in silico digested to peptides using trypsin specificity. 3) Peptides were sorted by sequence or by mass to group peptides with identical sequences or masses together, respectively. 4) Peptides with identical sequences or masses were grouped into JSON objects which were imported into MongoDB as SeqDB or MassDB, respectively. For implementation details see Additional file 1: Methods

