Skip to main content

Table 2 Size of proteomic, simulated proteogenomic, and real proteogenomic databases for human

From: Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification

Database (target + decoy)

# Target (AA)

# Decoy (AA)

Proteomic

1Th + 1Dh

35,856,033

35,856,033

Simulated proteogenomic

1T1Dh + 2Dh

71,712,066

71,712,066

1T2Dh + 3Dh

107,568,099

107,568,099

1T5Dh + 6Dh

215,136,198

215,136,198

Real proteogenomic

6FTTh + 6FTDh

2,136,069,837

2,136,069,837

SGTh + SGDh

123,364,545

123,364,545

  1. Database sizes are measured by total length (AA) of contained peptides. 1Th: human reference protein database. nDh: decoy database of which size is n times of 1Th. 6FTTh: proteogenomic database constructed by 6-frame translation of human genome. 6FTDh: decoy database for 6FTTh. SGTh: proteogenomic database constructed by splicing information from human RNA sequencing data. SGDh: decoy database for SGTh