The bioinformatics workflow of the proposed method. A: Peptide spectral matching with the 'synthetic metaproteome' leads to peptide hits. The ORFs underlying the hitting peptide sequences are used to select a subset of the metagenomic pool of sequence data (in this case MetaHIT sequence repository). The two-step selection of megablast and blastx reduces the computational load and allows the metagenomic pool to be large. The selected subset is naively translated and is used as a search space for a next round of peptide spectral matching, generating set 2 of peptide hits. Note that many elements of the workflow can be altered to fit other metaproteomics studies.
B: Alignment of naive translations of selected traces with a synthetic metagenome starting sequence assigned to COG074. Underlined: domains of the synthetic metagenome starting sequence; Red: peptides spectrum matches using the synthetic metagenome peptide database; Blue: variant peptides detected in selected naively translated MetaHit sequence trace files. Leucine (L) and isoleucine (I) are isobaric and can therefore not be detected separately.