Research article | Open | Published:
The proteolytic system of lactic acid bacteria revisited: a genomic comparison
BMC Genomicsvolume 11, Article number: 36 (2010)
Lactic acid bacteria (LAB) are a group of gram-positive, lactic acid producing Firmicutes. They have been extensively used in food fermentations, including the production of various dairy products. The proteolytic system of LAB converts proteins to peptides and then to amino acids, which is essential for bacterial growth and also contributes significantly to flavor compounds as end-products. Recent developments in high-throughput genome sequencing and comparative genomics hybridization arrays provide us with opportunities to explore the diversity of the proteolytic system in various LAB strains.
We performed a genome-wide comparative genomics analysis of proteolytic system components, including cell-wall bound proteinase, peptide transporters and peptidases, in 22 sequenced LAB strains. The peptidase families PepP/PepQ/PepM, PepD and PepI/PepR/PepL are described as examples of our in silico approach to refine the distinction of subfamilies with different enzymatic activities. Comparison of protein 3D structures of proline peptidases PepI/PepR/PepL and esterase A allowed identification of a conserved core structure, which was then used to improve phylogenetic analysis and functional annotation within this protein superfamily.
The diversity of proteolytic system components in 39 Lactococcus lactis strains was explored using pangenome comparative genome hybridization analysis. Variations were observed in the proteinase PrtP and its maturation protein PrtM, in one of the Opp transport systems and in several peptidases between strains from different Lactococcus subspecies or from different origin.
The improved functional annotation of the proteolytic system components provides an excellent framework for future experimental validations of predicted enzymatic activities. The genome sequence data can be coupled to other "omics" data e.g. transcriptomics and metabolomics for prediction of proteolytic and flavor-forming potential of LAB strains. Such an integrated approach can be used to tune the strain selection process in food fermentations.
Lactic acid bacteria (LAB) have been used for centuries as starter or adjunct cultures in dairy fermentations. The breakdown of milk proteins (proteolysis) by LAB plays an important role in generating peptides and amino acids for bacterial growth and in the formation of metabolites that contribute to flavor formation of fermented products. The proteolytic system of LAB comprises three major components: (i) cell-wall bound proteinase that initiates the degradation of extracellular casein (milk protein) into oligopeptides, (ii) peptide transporters that take up the peptides into the cell, and (iii) various intracellular peptidases that degrade the peptides into shorter peptides and amino acids. In particular, as caseins are rich in proline, LAB have numerous proline peptidases for degrading proline-rich peptides [1–3]. Amino acids can be further converted into various flavor compounds, such as aldehydes, alcohols and esters .
Several reviews have described the proteolytic system of LAB with respect to their biochemical and genetic aspects [1, 5–8]. In the past ten years, however, many LAB genomes have been sequenced, which allows a thorough comparative analysis of their proteolytic systems at a genome scale. In a preliminary study, we described a comparative analysis of cell-wall-bound proteinase and various peptidases from 13 fully or incompletely sequenced LAB which were publicly available in May 2006 . More recently, over ten additional LAB genomes have become publicly available. These include 8 LAB strains from the Joint Genome Institute and the LAB Genome Consortium , the model laboratory strain Lactococcus lactis subsp. cremoris MG1363 , a Lactobacillus helveticus strain  which is known for its proteolytic capacity as an adjunct culture in cheese, and the probiotic strain Lactobacillus rhamnosus GG . Furthermore, a recent comparative genome hybridization (CGH) analysis of 39 L. lactis strains  provides opportunities to explore the diversity of the proteolytic system within the same species.
In this study, we systematically explored the diversity of the cell-wall bound proteinase, the peptidases and the peptide transporters in twenty-two completely sequenced LAB strains. The distinctions between subgroups in large peptidase families such as the PepP/PepQ/PepM family, the PepD family and the PepI/PepR/PepL family are described in detail as examples. The PepI/PepR/PepL family was compared with the EstA family of esterases, the key enzyme for synthesizing various ester flavors [4, 15], since the members of these two families share sequence and structure homology. Furthermore, the results from comparative genomics analysis were used to explore the diversity of members of the proteolytic system in 39 Lactococcus lactis strains by pangenome CGH analysis .
Comparative genome analyses and orthologous groups identification
Complete genome sequences of LAB were obtained from the NCBI microbial genome database http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi. The genomes include: Lactobacillus acidophilus NCFM (abbreviation LAC, accession code CP000033), Lactobacillus johnsonii NCC 533 (LJO, AE017198), Lactobacillus gasseri ATCC 33323 (LGA, CP000413), Lactobacillus delbrueckii subsp. bulgaricus ATCC 11842 (LDB, CR954253), Lactobacillus delbrueckii subsp. bulgaricus ATCC BAA365 (LBU, CP000412), Lactobacillus plantarum WCFS1 (LPL, AL935263), Lactobacillus brevis ATCC 367 (LBE, CP000416), Lactobacillus sakei 23 K (LSK, CR936503), Lactobacillus salivarius UCC118 (LSL, CP000233), Oenococcus oeni PSU1 (OOE, CP000411), Pediococcus pentosaceus ATCC 25745 (PPE, CP000422), Leuconostoc mesenteroides ATCC 8293 (LME, CP000414), Lactobacillus casei ATCC 334 (LCA, CP000423), Lactococcus lactis subsp. lactis IL1403 (LLX, AE005176), Lactococcus lactis subsp. cremoris MG1363 (LLM, AM406671), Lactococcus lactis subsp. cremoris SK11 (LLA, CP000425), Streptococcus thermophilus CNRZ1066 (STH, CP000024), Streptococcus thermophilus LMG18311 (STU, CP000023), Streptococcus thermophilus LMD9 (STM, CP000419), Lactobacillus reuteri F275 (LRF, CP000705), Lactobacillus helveticus DPC 4571 (LHE, CP000517) and Lactobacillus rhamnosus GG (LRH, FM179322). Incomplete genome sequences of Lactococcus lactis subsp. lactis strains KF147 and KF282  were additionally used for analysis of L. lactis strain diversity by pangenome CGH analysis .
Protein sequences of experimentally verified proteolytic system members, i.e. cell-wall bound proteinase, various peptidases and peptide transporters, were derived from the non-redundant protein database Uniprot http://www.uniprot.org/. These sequences were used to perform a BLASTP  search against all LAB genomes. The corresponding Hidden Markov Models (HMMs) of each protein family were obtained from the Pfam database  and utilized to search for homologous genes using the HMMER 2.3.2 package http://hmmer.janelia.org/. The homologous sequences of each proteinase, peptidase and peptide transporter were collected on basis of the BLAST and HMM search results and redundancies were removed. Orthologous groups (subfamilies) were identified by an in-house developed method [4, 20]. Multiple sequence alignments (MSA) were generated for each homologous group using MUSCLE . Bootstrapped (n = 1000) neighbor-joining family trees were constructed with ClustalW . The trees were visualized in LOFT  and orthologous groups were identified. The gene contexts were analyzed using the ERGO Bioinformatics Suite  to improve ortholog prediction when necessary.
3D structure alignment
Peptidases PepI/PepR/PepL and esterase EstA belong to the same protein superfamily, but they possess different functionalities. In order to identify substrate specificity of each protein subfamily, a comparison of known protein 3D structures was carried out. As described above, protein sequences of experimentally characterized peptidases PepI, PepR, and PepL, together with EstA esterases were used to search against all the sequenced LAB genomes and other prokaryote genomes in the NCBI database by BLASTP . Moreover, the HMM of the protein α/β hydrolase fold PF00561 from the Pfam database , to which PepI/PepR/PepL and EstA belong, was used to search against LAB genomes. Homologs of both PepI/PepR/PepL and EstA families were collected. Similarly, the protein sequences of experimentally verified PepI/PepR/PepL and EstA members were used for BLAST searches against the PDB database http://www.rcsb.org/pdb/. The protein sequences, as well as the 3D structures of the best BLAST hits were collected. Other proteins with similar structures were retrieved by the Dali server http://ekhidna.biocenter.helsinki.fi/dali_server/ using the protein structures of the BLAST hits as input.
The retrieved 3D structures of the proteins used as templates in this study are: the tricorn interacting factor F1 with proline iminopeptidase (PIP) activity from Thermoplasma acidophilum (PDB ID: 1MTZ), proline iminopeptidases from Xanthomonas campestris pv. citri (PDB ID: 1AZW) and Serratia marcescens (PDB ID: 1WM1) as members of PepI/R/L subfamilies, and the esterase (PDB ID: 2UZ0) from Streptococcus pneumoniae which belongs to the EstA subfamily. These 3D structures were superimposed and visualized by the YASARA program (version 6.813, http://www.yasara.org/). Conserved superimposable regions (core regions) of the catalytic domain were identified based on the 3D-structure alignment, and these consisted of 4 discontinuous sequence segments that are connected by loops of variable structure.
The amino acid sequences of the four core region segments were aligned with MUSCLE or ClustalW as described . The alignments were manually curated for ambiguously aligned sequences compared to the 3D-structure alignment. Sequences with more than 90% identity were removed. Finally, a MSA was constructed based on concatenated alignments of all the curated local alignments of the core regions [see Additional File 1]. A bootstrapped (n = 1000) neighbor-joining tree on basis of the MSA was constructed and orthologous groups, so-called subfamilies, were identified automatically by LOFT.
Pangenome CGH diversity analysis
Comparative genome hybridization (CGH) data of 39 L. lactis strains was acquired from pangenome arrays . The pangenome array was constructed on basis of publicly available complete genome sequences of L. lactis subsp. lactis IL1403, L. lactis subsp. cremoris SK11, and incomplete genome sequences of L. lactis strains KF147 and KF282, as described by Bayjanov et al. . The CGH data used in this study can be found under the accession number GSE12638 in the NCBI GEO (NCBI G ene E xpression O mnibus) database http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE12638.
The genes encoding predicted proteolytic system components of the three sequenced L. lactis strains were used to query the database containing pangenome CGH data. We obtained a statistical score of the hybridization signal for each gene from the reference strains against 39 L. lactis strains. A cut-off value 5.5 was used to assign presence or absence of every gene from the proteolytic system in query strains, as described by Bayjanov et al. . In most cases, a gene is regarded present in a specific strain if it has a maximum score higher than 5.5 .
The distribution of proteolytic system components in sequenced LAB genomes
An overview of the distribution of components of the proteolytic system identified in 22 completely sequenced LAB is given in Figure 1. A detailed list of genes with GI codes can be found in Additional File 2. The number of genes encoding putative members of each proteinase, peptide transporter and peptidase subfamily are shown.
The LAB genomes in the L. acidophilus group , including L. acidophilus, L. johnsonii, L. gasseri, L. bulgaricus, and L. helveticus strains, encode a relatively higher number and variety of proteolytic system components. Some enzymes are only found in a few LAB strains, such as the cell-wall bound proteinase (PrtP). PrtP was only found on the chromosome of L. acidophilus, L. johnsonii, L. bulgaricus, L. casei, L. rhamnosus and S. thermophilus strain LMD9, as well as on the plasmid of L. lactis subsp. cremoris SK11 . Members of both the PepE/PepG (endopeptidases) and PepI/PepR/PepL (proline peptidases) superfamilies are absent in lactococci and streptococci. On the other hand, many of the peptidases seem to be essential for bacterial growth or survival as they are encoded in all LAB genomes. For instance, aminopeptidases PepC, PepN, and PepM, and proline peptidases PepX and PepQ are present in all genomes, usually with one gene per genome. Some LAB genomes have two peptidase homologs, possibly with the same function (shown in brackets in Figure 1), e.g. two PepC homologs (GI codes: 42518641 and 42518638) in L. johnsonii. Other essential peptidases (found in all LAB genomes) such as endopeptidase PepO and dipeptidase PepV are encoded by multiple paralogous genes.
L. acidophilus, L. brevis, L. casei, L. rhamnosus and L. lactis strains possess all three known LAB peptide transport systems, i.e. the di/tripeptide Dpp and DtpT systems and the oligopeptide Opp system . In contrast, L. reuteri strain only has one functional peptide transport system, the DtpT system. Several peptide transporters or peptidases fall into larger protein superfamilies. Examples are (i) the oligopeptide-binding protein OppA and di/tripeptide-binding proteins DppA/DppP in the same peptide-binding protein family, (ii) aminopeptidase PepC together with endopeptidases PepE and PepG belonging to MEROPS peptidase family C1-B, (iii) proline peptidases PepI, PepR and PepL belonging to MEROPS family S33, and (iv) aminopeptidase PepM together with proline peptidases PepP and PepQ belonging to MEROPS family M24 (Figure 1). Protein members in those large superfamilies share high sequence similarity, and cannot always be distinguished by simple BLAST sequence homology searches. Using a comparative genomics approach, the large protein families can be divided into subfamilies with putatively different substrate specificities. For example, the aminopeptidase PepC subfamily can be clearly distinguished from the endopeptidase PepE/PepG subfamily as they are separated into distinct groups in a superfamily tree . In other cases, such as the endopeptidase PepF family, several distinct subgroups can be distinguished but the difference in specificity between the subgroups is still unclear [see Additional File 3].
Three large peptidase families (PepP/PepQ/PepM, PepD and PepI/PepR/PepL) will be discussed in detail in the following sections.
Subfamilies of peptidase family PepP/PepQ/PepM
PepP, PepQ and PepM belong to the MEROPS peptidase family M24 which requires metal ions for catalytic activity. PepM is a methionyl aminopeptidase cleaving N-terminal methionine from proteins. PepP is a member of the proline peptidases which cleave off any N-terminal amino acid linked to proline in an oligopeptide. PepQ is also a proline peptidase, however specific for Xaa-Pro dipeptides, where Xaa represents any amino acid (Figure 1)
Our phylogenetic analysis shows that PepP, PepQ and PepM are separated into three distinct subgroups in accordance with the known different substrate specificities of each peptidase (Figure 2). PepP and PepQ seem to be more closely related than PepM on the basis of the family tree, which is in agreement with the differences in their catalytic activities. Bacterial PepM is an aminopeptidase belonging to subfamily M24A which usually requires cobalt ions for catalysis, while PepP and PepQ as proline peptidases belong to the subfamily M24B which prefers manganese .
In the PepP subgroup, one gene is found in each LAB genome except in L. sakei and Pediococcus pentosaceus. The absence of the pepP genes in both genomes is very likely due to a gene loss event. The family tree also includes an experimentally verified pepP gene from L. lactis whose protein product has been purified and characterized . Moreover, LAB-derived pepP genes are always flanked on the chromosome by a gene encoding an elongation factor for protein translation. The conserved gene context of pepP among LAB genomes is consistent with the putative important physiological role of PepP in protein maturation, as suggested by Matos et al. .
Genes from the PepQ cluster are distributed equally in all LAB genomes, generally as one copy per genome. However, the L. delbrueckii bulgaricus strains have two pepQ paralogs. One paralog is clustered with the other orthologs of LAB, whereas the second paralog is located in a separate cluster (LBU_116514595 and LDB_104774485). This might be the result of an ancient duplication (Figure 2) or horizontal gene transfer (HGT) event. Rantanen et al. suggested that the second paralogous pepQ of L. bulgaricus is a cryptic gene . Experimentally characterized pepQ genes from L. delbrueckii bulgaricus  and L. helveticus (GI: 3282339) are added and highlighted in the tree, supporting the annotation of the subgroups.
In the aminopeptidase PepM subgroup, L. brevis has an extra paralogous gene, which clusters together with the L. plantarum pepM gene. Gene context analysis suggests that pepM genes in all Lactobacillus strains share the same neighbor genes, except the pepM gene from L. plantarum and both the paralogs from L. brevis. One of the L. brevis pepM genes (LBE_116334483) is located in the same operon as a transposase. Based on the protein family tree, we hypothesize that an extra pepM gene was acquired first in the ancestor of L. brevis and L. plantarum, after which one gene was lost from L. plantarum. The L. plantarum pepM gene (LPL_28377183) is flanked by a methionine metabolism related operon (cysK_cblB/cglB_cysE). Therefore, the pepM gene in L. plantarum may have a broader function, probably utilizing proteins and peptides as methionine pool, in addition to the classic PepM function for N-terminal maturation of proteins.
One gene from Leuconostoc mesenteroides (LME_116618966) is located as an intermediate between the PepP/PepQ and PepM subfamilies. It shares higher sequence homology with a putative pepP gene from Clostridium botulinum (Figure 2) and has a phage-related gene in its neighborhood. This suggests that the pepP gene from Leuconostoc mesenteroides might be acquired from clostridia.
Subfamilies of peptidase family PepD
The PepD dipeptidase family has a broad specificity toward various dipeptides . PepD has been purified and characterized from L. helveticus by Vesanto et al. . The pepD genes are distributed heterogeneously in LAB genomes, varying from 0 to 6 paralogs. The pepD gene is absent in Leuconostoc mesenteroides and truncated in S. thermophilus strains, while multiple genes are mainly observed in Lactobacillus genomes (Figure 1). Recently, Smeianov et al. reported the expression level of four pepD genes from L. helveticus CNRZ32 by a microarray analysis . Five major PepD subfamilies can be clearly distinguished based on the multiple sequence alignment (Figure 3). PepD1-4 are assigned with the names according to the four pepD genes from L. helveticus . Due to the lack of experimental evidence, it is still unclear whether the substrate specificities vary between those subfamilies. Microarray analysis of L. helveticus has shown that pepD1, pepD2 and pepD4 were up-regulated in MRS medium compared to growth in milk, while pepD3 was not differentially expressed in both media . It suggests that differences between subgroups of pepD1/pepD2/pepD4 and pepD 3 could also be on the level of transcription regulation. Moreover, several genes are located as intermediate between the major PepD subgroups in the superfamily tree. Most of those genes have unclear origins and functions. The protein sequences of LCA_116493607 from L. casei, LRH_258507036 from L. rhamnosus, LJO_42518640 from L. johnsonii, and LBU_116514855 from L. bulgaricus have best BLASTP hits to several recently sequenced lactobacilli, such as L. hilgardii and L. buchneri, suggesting a possible duplication of the gene in a specific Lactobacillus group.
3D-structure comparison to distinguish PepI/PepR/PepL peptidases from EstA family esterases
The proline iminopeptidase PepI possesses aminopeptidase activity toward N-terminal proline peptides, preferably tri-peptides, while prolinase PepR has a broad specificity for dipeptides including Pro-Xaa dipeptides . The only characterized PepL is from L. delbrueckii subsp. lactis DSM7290 and it displays high specificity for di-/tri- peptides with N-terminal leucine residues . Interestingly, the PepI/PepR/PepL family and the esterase EstA family belong to the same α/β hydrolase superfamily, since the BLASTP analysis of PepI/PepR/PepL members against the non-redundant protein database also retrieves homologs from the EstA family. Multiple sequence alignment (MSA) of the whole protein sequences of the homologs from those two protein families is not reliable, as large insertions and deletions are present in these sequences, and several regions of the proteins share very low sequence similarity. Therefore, we first compared the 3D structures of four representative proteins by superposition, including proline iminopeptidases from Thermoplasma acidophilum (PDB ID: 1MTZ) , Xanthomonas campestris pv. citri (PDB ID: 1AZW) , and Serratia marcescens (PDB ID: 1WM1)  as members from the PepI/R/L family, and an esterase A (PDB ID: 2UZ0) from Streptococcus pneumoniae  as a member from the EstA subfamily (Figure 4). The superimposed 3D structures share a highly similar catalytic domain, which displays a typical canonical α/β hydrolase topology consisting of an eight-stranded β-sheet, and have a non-conserved cap domain. Four conserved structural regions in the catalytic domain, separated by variable loops, were identified based on the structure alignment. A detailed comparison of the residues of the catalytic site and substrate-binding region can be found in Additional File 4. In contrast, the cap domain shows a large structural variation, and the esterase EstA has a much smaller cap domain than the peptidases (Figure 4). The cap regions of peptidases cover and close the substrate-binding region, allowing only the N-terminal proline of a peptide to fit into the substrate-binding pocket.
A MSA of the concatenated sequences of the four conserved structural regions of the PepI/PepR/PepL and EstA superfamily members from various microorganisms was constructed and manually curated [See Additional File 1]. On basis of the curated MSA, a much improved superfamily tree was constructed for the PepI/PepR/PepL and EstA families, including LAB and other bacteria, as well as the reference proteins with known 3D structures (Figure 5). In this 3D alignment tree, the homologs of the superfamily can be clearly separated into four subclusters (Figure 5). The first cluster PepIa contains the proline iminopeptidases from Proteobacteria and non-LAB Firmicutes, including the ones from the known structures 1AZW and 1WM1. The second cluster contains the esterase members from LAB, including the representative structure 2UZ0 from S. pneumoniae. The third cluster PepIb contains proline iminopeptidases from Proteobacteria and Actinobacteria, and PepI from Firmicutes (including the ones from LAB), as well as the known structure 1MTZ from Thermoplasma acidophilum. The last cluster PepR/L consists of putative PepL proteins from LAB and the subgroup of prolinase PepR. Experimentally verified proteins PepR from L. helveticus CNRZ32 [38, 39], PepI from L. delbrueckii subsp. bulgaricus CNRZ 397 [40, 41], PepL from L. delbrueckii subsp. lactis DSM7290  and EstA from L. lactis  and L. casei BL23  also support this subdivision within the protein superfamily (Figure 5). Moreover, PepI from L. helveticus strain 53/7 has also been experimentally characterized .
Sequenced lactococcal, streptococcal, leuconostoc and L. salivarius strains lack the genes encoding proline peptidases PepI, PepR or PepL. This agrees with the observation from gene deletion experiments in strains harboring those peptidase genes that the physiological role of PepI, PepR or PepL is not essential for cell growth [39, 45, 46]. However, in L. helveticus, the growth rate in milk was slower for a PepI-deletion mutant as compared to the wild type . Similarly, the activity of cell extract of L. helveticus and L. rhamnosus toward several proline dipeptides was significantly reduced in a PepR-deletion mutant [39, 46]. Those observations suggest that PepI/PepR/PepL may contribute specifically to the proteolytic capacity on proline-containing peptides of Lactobacillus strains.
Diversity of the proteolytic system in L. lactis strains
The distribution of proteolytic system components in various L. lactis strains was studied by comparative genome hybridization (CGH) analysis. PanGenome arrays were made based on ORFs found in four sequenced L. lactis strains, and subsequently used to determine the presence or absence of orthologs in 39 L. lactis strains . Table 1 summarizes only the proteolytic system genes with variable absence/presence patterns in the 39 L. lactis strains. All other components described in Figure 1 but not shown in Table 1, such as PepC, PepN, PepM, PepA, PepD, PepV, PepT, PepP, PepQ, DtpT and most members of the Dpp system, are present in almost all strains. PepE/PepG and PepI/PepR/PepL family members are absent in all L. lactis strains. Those genes are excluded from the table, as well as all genes of strains P7304 and P7266 (see explanation in Table legend). Some plant-derived L. lactis strains such as KF24, NIZOB2244W, LMG9446 and KW10 have the largest set of proteolytic system genes.
Variations are found for proteinase PrtP and its maturation protein PrtM, for peptidases Pcp, PepO2, PepF2 and PepX2, and for genes from peptide transport systems Opp and Dpp (Table 1). Most of these genes are known to be present on plasmids : in strain SK11 the prtP, prtM and pcp genes are located on one large plasmid, while the pepO2, pepF2 and oppABCDF2 are co-localized on a different plasmid. The co-presence or co-absence of these genes in other L. lactis strains (Table 1), is largely consistent with their coupling in SK11, and suggests that variability is mainly due to the presence or absence of the plasmids. Cell-wall bound proteinase PrtP together with PrtM are mainly present in L. lactis subsp. cremoris, although several L. lactis subsp. lactis strains also harbor these genes (including dairy strains e.g. UC317, ML8, and ATCC19435T).
PepX2 is a PepX homolog of L. lactis subsp. lactis IL1403. It is mainly found in L. lactis subsp. lactis strains from dairy origin. This putative pepX2 gene was originally annotated as a hypothetical protein named YmgC. It contains both a C-terminal domain of X-prolyl dipeptidyl aminopeptidase and a Peptidase_S15 catalytic domain which are usually found in PepX, whereas the PepX N-terminal domain is missing in PepX2. No experimental evidence for the enzyme activity of PepX2 is known. The family tree of PepX shows that this putative pepX2 gene is not clustered in the same orthologous group as its paralogous gene from L. lactis subsp. lactis IL1403 [Additional File 5]. The only members of the PepX2 (YmgC) group in sequenced LAB genomes are from L. lactis subsp. lactis IL1403 and Pediococcus. Their best BLAST hits against the non-redundant protein database are from Listeria monocytogenes, suggesting a HGT event [See Additional File 5].
In this study, we performed a systematic genome-wide analysis of all the proteins involved in proteolysis, including cell-wall bound proteinase, peptide transporters, and peptidases, from twenty-two fully sequenced LAB genomes, including Lactobacillus, Lactococcus, Streptococcus, Pediococcus, Oenococcus, and Leuconostoc strains. The comparative genomics analysis was shown to distinguish various subgroups within a protein superfamily, allowing a highly improved annotation of genes and clarification caused by inconsistent annotation.
This information on the distribution of the proteolytic system genes can be used to predict the proteolytic potential of various LAB strains. For instance, L. bulgaricus and L. helveticus have a very extensive set of proteolytic enzymes, which is consistent with previous knowledge that L. bulgaricus serves as the proteolytic organism in yoghurt rather than S. thermophilus . L. helveticus is a proteolytic cheese adjunct culture that has been used to degrade bitter peptides in cheese . Interestingly, L. bulgaricus encodes the Dpp system with preference for uptake of hydrophobic di/tripeptides, complementing S. thermophilus which encodes the general di/tripeptide transporter DtpT in its genome, suggesting that more peptides can be utilized by both bacteria when grown together. LAB species of plant origin, such as L. plantarum, O. oeni, and Leuconostoc mesenteroides, encode less proteolytic enzymes in their genomes, which agrees with their ecological niche that is fiber-rich but contains less proteins.
Several examples have been provided for the division of large superfamilies into subfamilies. Clear separation of major subgroups can be observed from the family trees. By including the experimentally characterized genes, different substrate specificities can be assigned to various subfamilies. The PepP/PepQ/PepM and PepI/PepR/PepL superfamilies include subfamilies with distinct substrate specificities. The general dipeptidase superfamily PepD consists of several distinct orthologous groups of which the substrate specificities are still unknown. In most cases, the prediction of orthologous groups and the evolutionary events leading to the variation of substrate specificities are straight-forward using the phylogenetic analysis. However, some orphan genes are present as intermediate groups between the subfamilies with unknown functions and some of them may originate from HGT events.
Peptidases PepI/R/L and the esterase EstA, which is also involved in flavor-formation by LAB, belong to the same α/β hydrolase superfamily. We performed a comparative analysis of 3D structures of representative proteins from each subfamily in order to identify the core regions of the enzymes and to improve the multiple sequence alignment of the superfamily. Orthologs could then be identified more clearly in the protein family tree as constructed on basis of the curated MSA of the core regions. The classic catalytic triad Ser-His-Asp of the α/β hydrolase family is conserved in most of the members of the PepI/R/L and EstA superfamily. However, in the PepR subfamily of LAB (Figure 5), the catalytic Asp residues are substituted by Glu residues. Aspartate and glutamate residues are chemically equivalent and differ only in length of the side chain. The substitution of Asp by Glu has been observed in prokaryotic subtilases , as well as in an acetylcholinesterase of Torpedo californica and a lipase of Geotrichum candidum [50, 51]. Moreover, two additional peptidases from L. plantarum and L. casei (LPL_28379307 and LCA_116494294) which are not grouped into the PepR subfamily (Figure 5) also have glutamate catalytic residues instead of aspartate residues. It suggests that the substitution of Asp to Glu may have happened in the common ancestor of these two proteins and the PepR family. Since the glutamate residue at the catalytic triad is only found to be conserved in the PepR subfamily, it can be used as an extra indication for determining whether a peptidase with unclear function belongs to the PepR subfamily.
One of the applications of our comparative analysis is to explore the diversity of proteolytic system genes in various strains of L. lactis by combining the results from comparative genomics analysis and the hybridization data from pangenome CGH analysis. Distinct patterns were found in the presence and absence of proteolytic enzymes in the two L. lactis subspecies, i.e. subsp. lactis and subsp. cremoris, confirming the proteolytic diversity between the subspecies, and now providing a genetic basis for this diversity. Several strains show corresponding distributions of some proteolytic genes in their genomes, presumably resulting from the presence or absence of plasmids encoding proteolytic system components.
We performed a genome-wide comparative study on the proteolytic system of LAB, and demonstrated that the functional annotation of proteolytic system genes can be improved by combining phylogeny, synteny and literature. Examples of the PepP/PepQ/PepM family, the PepD family and the PepI/PepR/PepL family elucidated that protein subfamilies with distinct substrate specificities can be identified. In the case of the PepI/PepR/PepL family, protein 3D-structure alignment allowed us to more clearly distinguish the peptidase subfamilies and an esterase family EstA. Moreover, the complete distribution of proteolytic system components in various sequenced LAB strains was obtained.
The diversity of proteolytic system genes from 39 Lactococcus strains was explored using CGH analysis. Several components including proteinase, oligopeptide transport system and peptidases were shown to be distributed unevenly among the Lactococcus strains. The presence or absence of those proteolytic system components are probably the result of the presence or absence of plasmids that encode them.
Knowledge of the variations in proteolytic system components may allow the prediction of proteolytic and flavor-forming potential of bacterial strains, and could direct future experimental tests into the phenotypes of various LAB. Ultimately, this knowledge could be used to improve the sensory characteristics of dairy and other fermented food products by supporting the strain selection process.
Lactic Acid Bacteria
Hidden Markov Models
Multiple Sequence Alignments
Comparative Genome Hybridization
Horizontal Gene Transfer.
Christensen JE, Dudley EG, Pederson JA, Steele JL: Peptidases and amino acid catabolism in lactic acid bacteria. Antonie Van Leeuwenhoek. 1999, 76: 217-246. 10.1023/A:1002001919720.
Kim YK, Yaguchi M, Rose D: Isolation and Amino Acid Composition of Para-Kappa-Casein. J Dairy Sci. 1969, 52: 316-320.
Stewart AF, Bonsing J, Beattie CW, Shah F, Willis IM, Mackinlay AG: Complete nucleotide sequences of bovine alpha S2- and beta-casein cDNAs: comparisons with related sequences in other species. Mol Biol Evol. 1987, 4: 231-241.
Liu M, Nauta A, Francke C, Siezen RJ: Comparative genomics of enzymes in flavor-forming pathways from amino acids in lactic acid bacteria. Appl Environ Microbiol. 2008, 74: 4590-4600. 10.1128/AEM.00150-08.
Kunji ER, Mierau I, Hagting A, Poolman B, Konings WN: The proteolytic systems of lactic acid bacteria. Antonie Van Leeuwenhoek. 1996, 70: 187-221. 10.1007/BF00395933.
Savijoki K, Ingmer H, Varmanen P: Proteolytic systems of lactic acid bacteria. Appl Microbiol Biotechnol. 2006, 71: 394-406. 10.1007/s00253-006-0427-1.
Sousa MJ, Ardö Y, McSweeney PLH: Advances in the study of proteolysis during cheese ripening. International Dairy Journal. 2001, 11: 327-345. 10.1016/S0958-6946(01)00062-0.
Doeven MK, Kok J, Poolman B: Specificity and selectivity determinants of peptide transport in Lactococcus lactis and other microorganisms. Mol Microbiol. 2005, 57: 640-649. 10.1111/j.1365-2958.2005.04698.x.
Liu M, Siezen R: Comparative genomics of flavour-forming pathways in lactic acid bacteria. Australian Journal of Dairy Technology. 2006, 61: 61-68.
Makarova K, Slesarev A, Wolf Y, Sorokin A, Mirkin B, Koonin E, Pavlov A, Pavlova N, Karamychev V, Polouchine N: Comparative genomics of the lactic acid bacteria. Proc Natl Acad Sci USA. 2006, 103: 15611-15616. 10.1073/pnas.0607117103.
Wegmann U, O'Connell-Motherway M, Zomer A, Buist G, Shearman C, Canchaya C, Ventura M, Goesmann A, Gasson MJ, Kuipers OP: Complete genome sequence of the prototype lactic acid bacterium Lactococcus lactis subsp. cremoris MG1363. J Bacteriol. 2007, 189: 3256-3270. 10.1128/JB.01768-06.
Callanan M, Kaleta P, O'Callaghan J, O'Sullivan O, Jordan K, McAuliffe O, Sangrador-Vegas A, Slattery L, Fitzgerald GF, Beresford T: Genome sequence of Lactobacillus helveticus, an organism distinguished by selective gene loss and insertion sequence element expansion. J Bacteriol. 2008, 190: 727-735. 10.1128/JB.01295-07.
Kankainen M, Paulin L, Tynkkynen S, von Ossowski I, Reunanen J, Partanen P, Satokari R, Vesterlund S, Hendrickx AP, Lebeer S: Comparative genomic analysis of Lactobacillus rhamnosus GG reveals pili containing a human-mucus binding protein. Proc Natl Acad Sci USA. 2009, 106: 17193-17198. 10.1073/pnas.0908876106.
Bayjanov JR, Wels M, Starrenburg M, van Hylckama Vlieg JE, Siezen RJ, Molenaar D: PanCGH: a genotype-calling algorithm for pangenome CGH data. Bioinformatics. 2009, 25: 309-314. 10.1093/bioinformatics/btn632.
Smit G, Smit BA, Engels WJM: Flavour formation by lactic acid bacteria and biochemical flavour profiling of cheese products. FEMS Microbiology Reviews. 2005, 29: 591-610. 10.1016/j.femsre.2005.04.002.
Siezen RJ, Starrenburg MJ, Boekhorst J, Renckens B, Molenaar D, van Hylckama Vlieg JE: Genome-scale genotype-phenotype matching of two Lactococcus lactis isolates from plants identifies mechanisms of adaptation to the plant niche. Appl Environ Microbiol. 2008, 74: 424-436. 10.1128/AEM.01850-07.
UniProt Consortium: The universal protein resource (UniProt). Nucleic Acids Res. 2008, 36: D190-195. 10.1093/nar/gkn141.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-D141. 10.1093/nar/gkh121.
Francke C, Kerkhoven R, Wels M, Siezen RJ: A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1. BMC Genomics. 2008, 9: 145-10.1186/1471-2164-9-145.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.
Heijden van der RT, Snel B, van Noort V, Huynen MA: Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinformatics. 2007, 8: 83-10.1186/1471-2105-8-83.
Overbeek R, Larsen N, Walunas T, D'Souza M, Pusch G, Selkov E, Liolios K, Joukov V, Kaznadzey D, Anderson I: The ERGO (TM) genome analysis and discovery system. Nucleic Acids Res. 2003, 31: 164-171. 10.1093/nar/gkg148.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.
Folkertsma S, van Noort P, Van Durme J, Joosten HJ, Bettler E, Fleuren W, Oliveira L, Horn F, de Vlieg J, Vriend G: A family-based approach reveals the function of residues in the nuclear receptor ligand-binding domain. J Mol Biol. 2004, 341: 321-335. 10.1016/j.jmb.2004.05.075.
Siezen RJ, Renckens B, van Swam I, Peters S, van Kranenburg R, Kleerebezem M, de Vos WM: Complete sequences of four plasmids of Lactococcus lactis subsp. cremoris SK11 reveal extensive adaptation to the dairy environment. Appl Environ Microbiol. 2005, 71: 8371-8382. 10.1128/AEM.71.12.8371-8382.2005.
Matos J, Nardi M, Kumura H, Monnet V: Genetic characterization of pepP, which encodes an aminopeptidase P whose deficiency does not affect Lactococcus lactis growth in milk, unlike deficiency of the X-prolyl dipeptidyl aminopeptidase. Appl Environ Microbiol. 1998, 64: 4591-4595.
Rantanen T, Palva A: Lactobacilli carry cryptic genes encoding peptidase-related proteins: characterization of a prolidase gene (pepQ) and a related cryptic gene (orfZ) from Lactobacillus delbrueckii subsp. bulgaricus. Microbiology. 1997, 143 (Pt 12): 3899-3905. 10.1099/00221287-143-12-3899.
Morel F, Frot-Coutaz J, Aubel D, Portalier R, Atlan D: Characterization of a prolidase from Lactobacillus delbrueckii subsp. bulgaricus CNRZ 397 with an unusual regulation of biosynthesis. Microbiology. 1999, 145 (Pt 2): 437-446. 10.1099/13500872-145-2-437.
Vesanto E, Peltoniemi K, Purtsi T, Steele JL, Palva A: Molecular characterization, over-expression and purification of a novel dipeptidase from Lactobacillus helveticus. Appl Microbiol Biotechnol. 1996, 45: 638-645. 10.1007/s002530050741.
Smeianov VV, Wechter P, Broadbent JR, Hughes JE, Rodriguez BT, Christensen TK, Ardo Y, Steele JL: Comparative high-density microarray analysis of gene expression during growth of Lactobacillus helveticus in milk versus rich culture medium. Appl Environ Microbiol. 2007, 73: 2661-2672. 10.1128/AEM.00005-07.
Klein JR, Dick A, Schick J, Matern HT, Henrich B, Plapp R: Molecular cloning and DNA sequence analysis of pepL, a leucyl aminopeptidase gene from Lactobacillus delbrueckii subsp. lactis DSM7290. Eur J Biochem. 1995, 228: 570-578. 10.1111/j.1432-1033.1995.0570m.x.
Goettig P, Groll M, Kim JS, Huber R, Brandstetter H: Structures of the tricorn-interacting aminopeptidase F1 with different ligands explain its catalytic mechanism. EMBO J. 2002, 21: 5343-5352. 10.1093/emboj/cdf552.
Medrano FJ, Alonso J, Garcia JL, Romero A, Bode W, Gomis-Ruth FX: Structure of proline iminopeptidase from Xanthomonas campestris pv. citri: a prototype for the prolyl oligopeptidase family. EMBO J. 1998, 17: 1-9. 10.1093/emboj/17.1.1.
Inoue T, Ito K, Tozaka T, Hatakeyama S, Tanaka N, Nakamura KT, Yoshimoto T: Novel inhibitor for prolyl aminopeptidase from Serratia marcescens and studies on the mechanism of substrate recognition of the enzyme using the inhibitor. Arch Biochem Biophys. 2003, 416: 147-154. 10.1016/S0003-9861(03)00293-5.
Kim MH, Kang BS, Kim S, Kim KJ, Lee CH, Oh BC, Park SC, Oh TK: The crystal structure of the estA protein, a virulence factor from Streptococcus pneumoniae. Proteins. 2008, 70: 578-583. 10.1002/prot.21680.
Dudley EG, Steele JL: Nucleotide sequence and distribution of the pepPN gene from Lactobacillus helveticus CNRZ32. FEMS Microbiol Lett. 1994, 119: 41-45. 10.1111/j.1574-6968.1994.tb06864.x.
Shao W, Yuksel GU, Dudley EG, Parkin KL, Steele JL: Biochemical and molecular characterization of PepR, a dipeptidase, from Lactobacillus helveticus CNRZ32. Appl Environ Microbiol. 1997, 63: 3438-3443.
Atlan D, Gilbert C, Blanc B, Portalier R: Cloning, sequencing and characterization of the pepIP gene encoding a proline iminopeptidase from Lactobacillus delbrueckii subsp. bulgaricus CNRZ 397. Microbiology. 1994, 140 (Pt 3): 527-535. 10.1099/00221287-140-3-527.
Morel F, Gilbert C, Geourjon C, Frot-Coutaz J, Portalier R, Atlan D: The prolyl aminopeptidase from Lactobacillus delbrueckii subsp. bulgaricus belongs to the alpha/beta hydrolase fold family. Biochim Biophys Acta. 1999, 1429: 501-505.
Fernandez L, Beerthuyzen MM, Brown J, Siezen RJ, Coolbear T, Holland R, Kuipers OP: Cloning, characterization, controlled overexpression, and inactivation of the major tributyrin esterase gene of Lactococcus lactis. Appl Environ Microbiol. 2000, 66: 1360-1368. 10.1128/AEM.66.4.1360-1368.2000.
Yebra MJ, Viana R, Monedero V, Deutscher J, Perez-Martinez G: An esterase gene from Lactobacillus casei cotranscribed with genes encoding a phosphoenolpyruvate:sugar phosphotransferase system and regulated by a LevR-like activator and sigma54 factor. J Mol Microbiol Biotechnol. 2004, 8: 117-128. 10.1159/000084567.
Varmanen P, Rantanen T, Palva A: An operon from Lactobacillus helveticus composed of a proline iminopeptidase gene (pepI) and two genes coding for putative members of the ABC transporter family of proteins. Microbiology. 1996, 142 (Pt 12): 3459-3468. 10.1099/13500872-142-12-3459.
Yuksel GU, Steele JL: DNA sequence analysis, expression, distribution, and physiological role of the Xaa-prolyldipeptidyl aminopeptidase gene from Lactobacillus helveticus CNRZ32. Appl Microbiol Biotechnol. 1996, 44: 766-773. 10.1007/BF00178616.
Varmanen P, Rantanen T, Palva A, Tynkkynen S: Cloning and characterization of a prolinase gene (pepR) from Lactobacillus rhamnosus. Appl Environ Microbiol. 1998, 64: 1831-1836.
Sieuwerts S, de Bok FA, Hugenholtz J, van Hylckama Vlieg JE: Unraveling microbial interactions in food fermentations: from classical to genomics approaches. Appl Environ Microbiol. 2008, 74: 4997-5007. 10.1128/AEM.00113-08.
Sridhar VR, Hughes JE, Welker DL, Broadbent JR, Steele JL: Identification of endopeptidase genes from the genomic sequence of Lactobacillus helveticus CNRZ32 and the role of these genes in hydrolysis of model bitter peptides. Appl Environ Microbiol. 2005, 71: 3025-3032. 10.1128/AEM.71.6.3025-3032.2005.
Siezen RJ, Renckens B, Boekhorst J: Evolution of prokaryotic subtilases: Genome-wide analysis reveals novel subfamilies with different catalytic residues. Proteins: Structure, Function and Bioinformatics. 2007, 67: 681-694. 10.1002/prot.21290.
Polgar L: The catalytic triad of serine peptidases. Cell Mol Life Sci. 2005, 62: 2161-2172. 10.1007/s00018-005-5160-x.
Dodson G, Wlodawer A: Catalytic triads and their relatives. Trends Biochem Sci. 1998, 23: 347-352. 10.1016/S0968-0004(98)01254-7.
Seo JM, Ji GE, Cho SH, Park MS, Lee HJ: Characterization of a Bifidobacterium longum BORI dipeptidase belonging to the U34 family. Appl Environ Microbiol. 2007, 73: 5598-5606. 10.1128/AEM.00642-07.
Rademaker JL, Herbet H, Starrenburg MJ, Naser SM, Gevers D, Kelly WJ, Hugenholtz J, Swings J, van Hylckama Vlieg JE: Diversity analysis of dairy and nondairy Lactococcus lactis isolates, using a novel multilocus sequence analysis scheme and (GTG)5-PCR fingerprinting. Appl Environ Microbiol. 2007, 73: 7128-7137. 10.1128/AEM.01017-07.
This work was supported by grant CSI4017 of the Casimir program of the Ministry of Economic Affairs, the Netherlands.
ML conceived and designed the study, performed the analyses, drafted and revised the manuscript; JRB carried out the pangenome CGH analysis and the diversity analysis of Lactococcus lactis; BR carried out the protein 3D structure alignment; AN coordinated the study and helped revising the manuscript; RJS conceived, designed and coordinated the study, helped drafting and revised the manuscript. All authors read and approved the final manuscript.