Bioinformatics comparisons of RNA-binding proteins of pathogenic and non-pathogenic Escherichia coli strains reveal novel virulence factors

Ghosh, Pritha; Sowdhamini, Ramanathan

doi:10.1186/s12864-017-4045-3

Research article
Open access
Published: 24 August 2017

Bioinformatics comparisons of RNA-binding proteins of pathogenic and non-pathogenic Escherichia coli strains reveal novel virulence factors

Pritha Ghosh¹ &
Ramanathan Sowdhamini¹

BMC Genomics volume 18, Article number: 658 (2017) Cite this article

2744 Accesses
2 Citations
1 Altmetric
Metrics details

Abstract

Background

Pathogenic bacteria have evolved various strategies to counteract host defences. They are also exposed to environments that are undergoing constant changes. Hence, in order to survive, bacteria must adapt themselves to the changing environmental conditions by performing regulations at the transcriptional and/or post-transcriptional levels. Roles of RNA-binding proteins (RBPs) as virulence factors have been very well studied. Here, we have used a sequence search-based method to compare and contrast the proteomes of 16 pathogenic and three non-pathogenic E. coli strains as well as to obtain a global picture of the RBP landscape (RBPome) in E. coli.

Results

Our results show that there are no significant differences in the percentage of RBPs encoded by the pathogenic and the non-pathogenic E. coli strains. The differences in the types of Pfam domains as well as Pfam RNA-binding domains, encoded by these two classes of E. coli strains, are also insignificant. The complete and distinct RBPome of E. coli has been established by studying all known E. coli strains till date. We have also identified RBPs that are exclusive to pathogenic strains, and most of them can be exploited as drug targets since they appear to be non-homologous to their human host proteins. Many of these pathogen-specific proteins were uncharacterised and their identities could be resolved on the basis of sequence homology searches with known proteins. Detailed structural modelling, molecular dynamics simulations and sequence comparisons have been pursued for selected examples to understand differences in stability and RNA-binding.

Conclusions

The approach used in this paper to cross-compare proteomes of pathogenic and non-pathogenic strains may also be extended to other bacterial or even eukaryotic proteomes to understand interesting differences in their RBPomes. The pathogen-specific RBPs reported in this study, may also be taken up further for clinical trials and/or experimental validations.

Background

Escherichia coli is one of the most abundant, facultative anaerobic gram-negative bacterium of the intestinal microflora and colonises the mucus layer of the colon. The core genomic structure is common among the commensal strains and the various pathogenic E. coli strains that cause intestinal and extra-intestinal diseases in humans [1]. In the pathogenic strains, novel genetic islands and small clusters of genes are present in addition to the core genomic framework and provide the bacteria with increased virulence [2,3,4]. The extracellular intestinal pathogen, enterohemorrhagic E. coli (EHEC), which cause diarrhea, hemorrhagic colitis and the haemolytic uremic syndrome, is the most devastating of the pathogenic E. coli strains [5, 6].

Pathogenic bacteria have evolved various strategies to counteract host defences. They are also exposed to environments that are undergoing constant changes. Hence, in order to survive, bacteria must adapt themselves to the changing environmental conditions by altering gene expression levels and in turn adjusting protein levels according to the need of the cell. Such regulations may occur at the transcriptional and/or post-transcriptional levels [7].

RNA-binding proteins (RBPs) are a versatile group of proteins that perform a diverse range of functions in the cell and are ‘master regulators’ of co-transcriptional and post-transcriptional gene expression like RNA modification, export, localization, mRNA translation, turnover [8,9,10,11,12] and also aid in the folding of RNA into conformations that are functionally active [13]. In bacteria, many different classes of RBPs interact with small RNAs (sRNA) to form ribonucleoprotein (RNP) complexes that participate in post-transcriptional gene regulation processes [14,15,16,17,18,19,20,21,22,23]. In eukaryotes, noncoding RNAs (ncRNAs) are known to be important regulators of gene expression [24,25,26]. Hence, bacterial RBPs that are capable of inhibiting this class of RNAs, are also capable of disrupting the normal functioning of their host cells, thus acting as virulence factors. Roles of RBPs like the Hfq [27,28,29,30,31,32,33,34,35,36], Repressor of secondary metabolites A (RsmA) [36,37,38,39,40,41] and endoribonuclease YbeY [42] as virulence factors, have also been very well studied.

Here, we describe the employment of mathematical profiles of RBP families to study the RBP repertoire, henceforth referred to as the ‘RBPome’, in E. coli strains. The proteomes of 19 E. coli strains (16 pathogenic and three non-pathogenic strains) have been studied to compare and contrast the RBPomes of pathogenic and non-pathogenic E. coli. More than 40 different kinds of proteins have been found to be present in two or more pathogenic strains, but absent from all the three non-pathogenic ones. Many of these proteins are previously uncharacterised and may be novel virulence factors and probable candidates for further experimental validations.

We have also extended our search method to probe to all available E. coli complete proteomes (till the date of the study) for RBPs, and thus obtain a bigger picture of the RBP landscape in all known E. coli strains. The search method can also be adapted in future for comparing the RBPomes of other species of bacteria as well. In addition, our work also discusses case studies on a few interesting RBPs. The first of them is an attempt to provide a structural basis for the inactivity of the Ribonuclease PH (RNase PH) protein from E. coli strain K12, the second study deals with the structural modelling and characterisation of RNA substrates of an ‘uncharacterised’ protein that is exclusively found in the pathogenic E. coli strains, whereas the third one involves the analysis of pathogen-specific Cas6 proteins and comparison with their non-pathogenic counterparts.

Methods

Dataset

Protein families were grouped on the basis of either structural homology (structure-centric families) or sequence homology (sequence-centric families). A dataset of 1285 RNA-protein and 14 DNA/RNA hybrid-protein complexes were collected from the Protein Data Bank (PDB) (May 2015) and were split into protein and RNA chains. The RNA-interacting protein chains in this dataset were classified into 182 Structural Classification of Proteins (SCOP) families, 135 clustered families and 127 orphan families (a total of 437 structure-centric families), on the basis of structural homology with each other. Sequence-centric RNA-binding families were retrieved from Pfam, using an initial keyword search of ‘RNA’, followed by manual curation to generate a dataset of 746 families. The structure-centric classification scheme, the generation of structure-centric family Hidden Markov Models (HMMs) and retrieval of sequence-centric family HMMs from the Pfam database (v 28) were as adapted from our previous study [43].

Proteomes of 19 E. coli strains were retrieved from UniProt Proteomes (May 2016) [44] for the comparative study of pathogenic and non-pathogenic strains. The names and organism IDs of the E. coli strains, their corresponding UniProt proteome IDs and the total number of proteins in each proteome have been listed in Table 1.

Table 1 E. coli proteomes for comparative study. The 19 E. coli proteomes from UniProt (May 2016) used in the study for the comparison of RBPomes of pathogenic and non-pathogenic strains have been listed in this table. The pathogenic and the non-pathogenic E. coli strains have been represented in red and green fonts, respectively

Full size table

All complete E. coli proteomes were retrieved from RefSeq (May 2016) [45] to study the overall RBP landscape in E. coli. The names of the E. coli strains, their corresponding assembly IDs and the total number of proteins in each proteome and have been listed in Table 2.

Table 2 Complete E. coli proteomes. The 166 E. coli complete proteomes from RefSeq (May 2016) that have been used in the study have been listed in this table

Full size table

Search method

The search method was described in our previous study [43] and is represented schematically in Fig. 1. A library of 1183 RBP family HMMs (437 structure-centric families and 746 sequence-centric families) were used as start points to survey the E. coli proteomes for the presence of putative RBPs. The genome-wide survey (GWS) for each E. coli proteome was performed with a sequence E-value cut-off of 10⁻³ and the hits were filtered with a domain i-Evalue cut-off of 0.5. i-Evalue (independent E-value) is the E-value that the sequence/profile comparison would have received if this were the only domain envelope found in it, excluding any others. This is a stringent measure of how reliable this particular domain may be. The independent E-value uses the total number of targets in the target database. We have now mentioned this definition in the revised manuscript. The Pfam (v 28) domain architectures (DAs) were also resolved at the same sequence E-value and domain i-Evalue cut-offs.

Comparison of RNA-binding proteins across strains

The RBPs identified from 19 different strains of E. coli, were compared by performing all-against-all protein sequence homology searches using the BLASTP module of the NCBI BLAST 2.2.30 + suite [46] with a sequence E-value cut-off of 10⁻⁵. The hits were clustered on the basis of 30% sequence identity and 70% query coverage cut-offs to identify similar proteins i.e., proteins that had a sequence identity of greater than or equal to 30%, as well as a query coverage of greater than or equal to 70%, were considered to homologous in terms of sequence and hence clustered. These parameters were standardised on the basis of previous work from our lab to identify true positive sequence homologues [47].

Associations for proteins that were annotated as ‘hypothetical’ or ‘uncharacterised’, were obtained by sequence homology searches against the NCBI non-redundant (NR) protein database (February 2016) with a sequence E-value cut-off of 10⁻⁵. The BLASTP hits were also clustered on the basis of 100% sequence identity, 100% query coverage and equal length cut-offs to identify identical proteins.

Clusters that consist of proteins from two or more of the pathogenic strains, but not from any of the non-pathogenic ones, will henceforth be referred as ‘pathogen-specific clusters’ and the proteins in such clusters as ‘pathogen-specific proteins’. Sequence homology searches were performed for these proteins against the reference human proteome (UP000005640) retrieved from Swiss-Prot (June 2016) [44] at a sequence E-value cut-off of 10⁻⁵. The hits were filtered on the basis of 30 percentage sequence identity and 70 percentage query coverage cut-offs.

Modelling and dynamics studies of RNase PH protein

The structures of the active and inactive monomers of the tRNA processing enzyme Ribonuclease PH (RNase PH) from strains O26:H11 (UniProt ID: C8TLI5) and K12 (UniProt ID: P0CG19), respectively, were modelled on the basis of the RNase PH protein from Pseudomonas aeruginosa (PDB code: 1R6M: A) (239 amino acids) using the molecular modelling program MODELLER v 9.15 [48]. The active and inactive RNase PH monomers are 238 and 228 amino acids in length, respectively and are 69% and 70% identical to the template, respectively. Twenty models were generated for each of the active and inactive RNase PH monomers and validated using PROCHECK [49], VERIFY3D [50], ProSA [51] and HARMONY [52]. The best model for each of the active and inactive RNase PH monomers were selected on the basis of Discrete Optimized Protein Energy (DOPE) score and other validation parameters obtained from the above-mentioned programs. The best models for the active and inactive RNase PH monomers were subjected to 100 iterations of the Powell energy minimisation method in the Tripos Force Field (in absence of any electrostatics) using SYBYL7.2 (Tripos Inc.). These were subjected to 100 ns (ns) molecular dynamics (MD) simulations (three replicates each) in the AMBER99SB protein, nucleic AMBER94 force field [53] using the Groningen Machine for Chemical Simulations (GROMACS 4.5.5) program [54].

The biological assembly (hexamer) of RNase PH from Pseudomonas aeruginosa (PDB code: 1R6M) served as the template and was obtained using the online tool (PISA) (http://www.ebi.ac.uk/pdbe/prot_int/pistart.html) [55]. The structures of the active and inactive hexamers of RNase PH from strains O26:H11 and K12, respectively were modelled and the 20 models generated for each of the active and inactive RNase PH hexamers were validated using the same set of tools, as mentioned above. The best models were selected and subjected to energy minimisations, as described above. Electrostatic potential on the solvent accessible surfaces of the proteins were calculated using PDB2PQR [56] (in the AMBER force field) and Adaptive Poisson-Boltzmann Solver (APBS) [57]. The head-to-head dimers were randomly selected from both the active and the inactive hexamers of the protein for performing MD simulations, to save computational time. Various energy components of the dimer interface were measured using the in-house algorithm, PPCheck [58]. This algorithm identifies interface residues in protein-protein interactions on the basis of simple distance criteria, following which the strength of interactions at the interface are quantified. 100 ns MD simulations (three replicates each) were performed with the same set of parameters as mentioned above for the monomeric proteins.

Modelling and dynamics studies of an ‘uncharacterised’ pathogen-specific protein

The structure of the PELOTA_1 domain (Pfam ID: PF15608) of an ‘uncharacterised’ pathogen-specific protein from strain O103:H2 (UniProt ID: C8TX32) (371 amino acids) was modelled on the basis of the L7Ae protein from Methanocaldococcus jannaschii (PDB code: 1XBI: A) (117 amino acids) and validated, as described earlier. The 64 amino acids long PELOTA_1 domain of the uncharacterised protein, has 36% sequence identity with the corresponding 75 amino acids domain of the template. The best model was selected as described in the case study on RNase PH. This model was subjected to 100 iterations of the Powell energy minimisation method in the Tripos Force Field (in absence of any electrostatics) using SYBYL7.2 (Tripos Inc.). Structural alignment of the modelled PELOTA_1 domain and the L7Ae K-turn binding domain from Archaeoglobus fulgidus (PDB code: 4BW0: B) was performed using Multiple Alignment with Translations and Twists (Matt) [59]. The same kink-turn RNA from H. marismortui, found in complex with the L7Ae K-turn binding domain from A. fulgidus, was docked onto the model, guided by the equivalents of the RNA-interacting residues (at a 5 Å cut-off distance from the protein) in the A. fulgidus L7Ae protein (highlighted in yellow in the upper panel of Fig. 7c) using the molecular docking program HADDOCK [60]. The model and the L7Ae protein from A. fulgidus, in complex with kink-turn RNA from H. marismortui, were subjected to 100 ns MD simulations (three replicates each) in the AMBER99SB protein, nucleic AMBER94 force field using the GROMACS 4.5.5 program.

Sequence analysis of pathogen-specific Cas6-like proteins

The sequences of all the proteins in Cluster 308 were aligned to the Cas6 protein sequence in E. coli strain K12 (UniProt ID: Q46897), using MUSCLE [61] and subjected to molecular phylogeny analysis using the Maximum Likelihood (ML) method and a bootstrap value of 1000 in MEGA7 (CC) [62, 63]. All reviewed CRISPR-associated Cas6 protein sequences were also retrieved from Swiss-Prot (March 2017) [44], followed by manual curation to retain 18 Cas6 proteins. Sequences of two uncharacterised proteins (UniProt IDs: C8U9I8 and C8TG04) from Cluster 308, known to be homologous to known CRISPR-associated Cas6 proteins (on the basis of sequence homology searches against the NR database, as described earlier) were aligned to those of the 18 reviewed Cas6 proteins using MUSCLE. The sequences were then subjected to molecular phylogeny analysis using the above-mentioned parameters. Secondary structure predictions for all the proteins were performed using PSIPRED [64].

The structures of Cas6 proteins from E. coli strain K12 (PDB codes: 4QYZ: K, 5H9E: K and 5H9F: K) were retrieved from the PDB. The RNA-binding and protein-interacting residues in the Cas6 protein structures were calculated on the basis of 5 Å and 8 Å distance cut-off criteria, from the associated crRNAs (PDB codes: 4QYZ: L, 5H9E: L and 5H9F: L, respectively) and the protein chains (PDB codes: 4QYZ: A-J, 5H9E: A-J and 5H9F: A-J, respectively), respectively.

Results

Genome-wide survey (GWS) of RNA-binding proteins in pathogenic and non-pathogenic E. coli strains

The GWS of RBPs was performed in 19 different E. coli strains (16 pathogenic and three non-pathogenic strains) and a total of 7902 proteins were identified (Additional file 1: Table S1). Figure 2a shows the number of RBPs found in each of the strains studied here. The pathogenic strains have a larger RBPome, as compared to the non-pathogenic ones - with strain O26:H11 encoding the greatest (441). The pathogenic strains also have bigger proteome sizes (in terms of the number of proteins in the proteome), as compared to their non-pathogenic counterparts, by virtue of maintaining plasmids in them. Hence, to normalise for proteome size, the number of RBPs in each of these strains were expressed as a function of their respective number of proteins in the proteome (Fig. 2b). We observed that the difference in the percentage of RBPs in the proteome among the pathogenic and the non-pathogenic strains are insignificant (Welch Two Sample t-test: t = 3.2384, df = 2.474, p-value = 0.06272).

To compare the differential abundance of domains, if any, among the pathogens and the non-pathogens, the Pfam DAs of all the RBPs were resolved (to strengthen the results in this section, this study has been extended to all known E. coli proteomes and will be discussed in a later section). The number of different types of Pfam domains and that of Pfam RNA-binding domains (RBDs) found in each strain have been represented in Fig. 2c. We observed that the difference in the types of Pfam domains, as well as Pfam RBDs, encoded by the pathogenic and the non-pathogenic strains are insignificant (Welch Two Sample t-test for types of Pfam domains: t = − 1.3876, df = 2.263, p-value = 0.2861; Welch Two Sample t-test for types of Pfam RBDs: t = − 0.9625, df = 2.138, p-value = 0.4317). The number of different Pfam RBDs, found across all the 19 E. coli strains studied here, has been shown in Fig. 2d and also been listed in Table 3.

Table 3 Pfam RNA-binding domains. The Pfam RBDs and their corresponding occurrences in the GWS of 19 E. coli strains have been listed in this table. The Pfam domains listed are on the basis of Pfam database (v.28)

Full size table

We found that E. coli encodes 185 different types of Pfam RBDs in their proteomes and the DEAD domain was found to be the most abundant, constituting approximately 4% of the total number of Pfam RBD domains in E. coli. The DEAD box family of proteins are RNA helicases that are required for RNA metabolism and thus are important players in gene expression [65]. These proteins use ATP to unwind short RNA duplexes in an unusual fashion and also help in the remodelling of RNA-protein complexes.

Comparison of RNA-binding proteins across strains reveals novel pathogen-specific factors

The proteins were clustered on the basis of sequence homology searches in order to compare and contrast the RBPs across the E. coli strains studied here. The 7902 proteins identified from all the strains were grouped into 384 clusters, on the basis of sequence homology with other members of the cluster (Additional file 2: Table S2). Greater than 99% of the proteins could cluster with one or more RBPs and formed 336 multi-member clusters (MMCs), whereas the rest of the proteins failed to cluster with other RBPs and formed 48 single-member clusters (SMCs). The distribution of members among all the 384 clusters has been depicted in Fig. 3.

The largest of the MMCs, consists of 1459 RBPs which are ATP-binding subunit of transporters. The E. coli genome sequence had revealed that the largest family of paralogous proteins were composed of ATP-binding cassette (ABC) transporters [66]. The ATP-binding subunit of ABC transporters share common features with other nucleotide-binding proteins [67] like, the E. coli RecA [68] and the F1-ATPase from bovine heart [69]. GCN20, YEF3 and RLI1 are examples of soluble ABC proteins that interact with ribosomes and regulate translation and ribosome biogenesis [70,71,72].

The other large MMCs were those of small toxic polypeptides that are components of the bacterial toxin-antitoxin (TA) systems [73,74,75,76,77], RNA helicases that are involved in various aspects of RNA metabolism [78, 79] and Pseudouridine synthases that are enzymes responsible for pseudouridylation, which is the most abundant post-transcriptional modification in RNAs [80]. Cold shock proteins bind mRNAs and regulate translation, rate of mRNA degradation etc. [81, 82]. These proteins are induced during the response of the bacterial cell towards temperature rise.

The majority of the SMCs (38 out of 48 SMCs) are RBPs from pathogenic strains and lack homologues in any of the other strains considered here. These include proteins like putative helicases, serine proteases, and various endonucleases. Likewise, members of the small toxic Ibs protein family (IbsA, IbsB, IbsC, IbsD and IbsE that form Clusters 362, 363, 364, 365 and 366 respectively) from strain K12 are noteworthy examples of SMCs that are in non-pathogenic strains only. These Ibs proteins cause the cessation of growth when overexpressed [83].

Pathogen-specific proteins

In this study, the 226 pathogen-specific proteins that formed 43 pathogen-specific clusters are of special interest. Sixty-three of these proteins were previously uncharacterised and associations for all of these proteins were obtained on the basis of sequence homology searches against the NCBI-NR database. The function annotation of each of these clusters were transferred on the basis of homology. The biological functions and the number of RBPs constituting these pathogen-specific clusters have been listed in Table 4.

Table 4 Pathogen-specific RNA-binding protein clusters. The size of RBP clusters with members from only the pathogenic E. coli strains in our GWS of 19 E. coli strains have been listed in this table

Full size table

If these pathogen-specific proteins are exclusive to the pathogenic strains, then they may be exploited for drug design purposes. To test this hypothesis, we surveyed the human (host) proteome for the presence of sequence homologues of these proteins. It was found that, barring the protein kinases that were members of Cluster 98 (marked in asterisk in Table 4), none of the pathogen-specific proteins were homologous to any human protein within the thresholds employed in the search strategy (please see Methods section for details). Few of the pathogen-specific protein clusters are described in the following section.

The DEAD/DEAH box helicases that use ATP to unwind short duplex RNA [65], formed three different clusters. In two of the clusters, the DEAD domains (Pfam ID: PF00270) were associated with C-terminal Helicase_C (Pfam ID: PF00271) and DUF1998 (Pfam ID: PF09369) domains. On the other hand, in a bigger cluster, the DEAD/DEAH box helicases were composed of DNA_primase_S (Pfam ID: PF01896), ResIII (Pfam ID: PF04851) and Helicase_C domains. Four of the pathogen-specific clusters were those of Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) sequence-associated proteins, consisting of RBPs from 10 pathogenic strains each. Recent literature reports also support the role of CRISPR-associated proteins as virulence factors in pathogenic bacteria [84]. The KilA-N domains are found in a wide range of proteins and may share a common fold with the nucleic acid-binding modules of certain nucleases and the N-terminal domain of the tRNA endonuclease [85]. Fertility inhibition (FinO) protein and the anti-sense FinP RNA are members of the FinOP fertility inhibition complex which regulates the expression of the genes in the transfer operon [86,87,88,89]. tRNA (fMet)-specific endonucleases are the toxic components of a TA system. This site-specific tRNA-(fMet) endonuclease acts as a virulence factor by cleaving both charged and uncharged tRNA-(fMet) and inhibiting translation. The Activating Signal Cointergrator-1 homology (ASCH) domain is also a putative RBD due to the presence of an RNA-binding cleft associated with a conserved sequence motif characteristic of the ASC-1 superfamily [90].

Identification of the distinct RNA-binding protein repertoire in E. coli

We identified identical RBPs across E. coli strains, on the basis of sequence homology searches and other filtering criteria (as mentioned in the Methods section). Out of the 7902 RBPs identified in our GWS, 6236 had one or more identical partners from one or more strains and formed 1227 clusters, whereas 1666 proteins had no identical counterparts. Hence, our study identified 2893 RBPs from 19 E. coli strains that were distinct from each other. Identification of such a distinct pool of RBPs will help to provide an insight to the possible range of functions performed by this class of proteins in E. coli, and hence compare and contrast with the possible functions performed by RBPs in other organisms.

GWS of RNA-binding proteins in all known E. coli strains

We extended the above-mentioned study, by performing GWS of RBPs in 166 complete E. coli proteomes available in the RefSeq database (May 2016) and a total of 8464 proteins were identified (Additional file 3). It should be noted that, unlike the nomenclature system of UniProt, where the same protein occurring in different strains are denoted with different UniProt accession IDs, RefSeq assigns same or at times different accession IDs to the same protein occurring in different strains. Thus, on the basis of unique accession IDs, 8464 RBPs were identified. The 8464 RBPs were grouped into 401 clusters on the basis of sequence homology with other members of the cluster. We found that greater than 99% of the proteins could cluster with one or more RBPs and formed 339 MMCs, whereas the rest of the proteins failed to cluster with other RBPs and formed 62 SMCs.

The above-mentioned GWS statistics for RBP numbers have been plotted in Fig. 4a. The number of different Pfam RBDs found across all complete E. coli proteomes has been shown in Fig. 4b. Similar to the afore-mentioned results, seen from the dataset of 19 E. coli proteomes, it was found that E. coli encodes 188 different types of Pfam RBDs in their proteomes and the DEAD domain was still observed to be the most abundant, constituting approximately 6% of the total number of Pfam RBD domains in E. coli. The length distribution of RBPs from E. coli have been plotted in Fig. 4c and RBPs of the length 201–300 amino acids were found to be the most prevalent.

Identification of the complete distinct RBPome in 166 proteomes of E. coli

These 8464 RBPs (please see previous section) formed 1285 clusters of two or more identical proteins, accounting for 3532 RBPs, whereas the remaining 4932 RBPs were distinct from the others. Hence, 6217 RBPs, distinct from each other, were identified from all known E. coli strains, which is much greater than the number (2893) found from 19 E. coli proteomes.

It should be noted that the pathogenicity annotations are not very clear for few of the 166 E. coli strains for which complete proteome information are available. Hence, we have performed the analysis for the pathogen-specific proteins using the smaller dataset of 19 proteomes, whereas all the 166 complete proteomes have been considered for the analysis for the complete E. coli RBPome.

Case studies

Three case studies on interesting RBPs were performed to answer some outstanding questions and have been described in the following sections. The first of the three examples, deals with a RNase PH protein that does not cluster with those from any of the other 165 E. coli proteomes considered in this study. This protein, which forms a SMC, is interesting in the biological context due to its difference with the other RNase PH proteins, both at the level of sequence as well as biological activity. The second case study deals with a protein that is a part of a pathogen-specific cluster, in which none of the proteins are well-annotated. This protein was found to encode a bacterial homologue of a well-known archaeo-eukaryotic RBD, whose RNA-binding properties are not as well studied as its homologues. The final study involves a sequence-based approach to analyse the pathogen-specific CRISPR-associated Cas6 proteins, and compare the same with similar proteins from the non-pathogenic strains.

Case study 1: RNase PH from strain K12 is inactive due to a possible loss of stability of the protein

RNase PH is a phosphorolytic exoribonuclease involved in the maturation of the 3′-end of transfer RNAs (tRNAs) containing the CCA motif [91,92,93]. The RNase PH protein from strain K12 was found to be distinct from all other known RNase PH proteins from E. coli and has a truncated C-terminus. In 1993, DNA sequencing studies had revealed that a GC base pair (bp) was missing in this strain from a block of five GC bps found 43–47 upstream of the rph stop codon [94]. This one base pair deletion leads to a translation frame shift over the last 15 codons, resulting in a premature stop codon (five codons after the deletion). This premature stop codon, in turn, leads to the observed reduction in size of the RNase PH protein by 10 residues. It was also shown by Jensen [94] that this protein lacks RNase PH activity. Figure 5a shows a schematic representation of the DAs of the active (up) and inactive (down) RNase PH proteins, with the five residues that have undergone mutations and the ten residues that are missing from the inactive RNase PH protein depicted in orange and yellow, respectively. These are the residues of interest in our study. The same colour coding has been used both in Fig. 5a and b.

To provide a structural basis for this possible loss of activity of the RNase PH protein from strain K12, we modelled the structures of the RNase PH protein monomer as well as the hexamer from strains O26:H11 and K12 (Fig. 5b and c). It is known in the literature that the hexamer (trimer of dimers) is the biological unit of the RNase PH protein and that the hexameric assembly is mandatory for the activity of the protein [95, 96].

The stability of both the monomer and the hexamer were found to be affected in strain K12, as compared to that in strain O26:H11. The energy values have been plotted in Fig. 6a. In both monomer and hexamer, there is a reduction in stability, suggesting that the absence of C-terminal residues affects the stability of the protein, perhaps more than a cumulative contribution to the stability of the protein. It should be noted that since the monomeric form of the inactive protein is less stable than that of its active counterpart, the hexameric assembly of the inactive RNase PH protein is only a putative one. Hence, the putative and/or unstable hexameric assembly of the RNase PH protein, leads to the loss of activity of the protein.

Figure 5b shows that the residues marked in cyan (left) are at an interacting distance of 8 Å from the residues of interest (left). These residues marked in cyan are a subset of the RNase PH domain, which is marked in magenta (right). Hence, the loss of possible interactions (between the residues marked in cyan and the residues of interest) and subsequently stability of the three-dimensional structure of the RNase PH domain might explain the inactive nature of the protein from strain K12. Figure 5d shows differences in the electrostatic potential on the solvent accessible surfaces of the active (left) and inactive (right) RNase PH proteins.

To test this hypothesis for the possible loss of function of the RNase PH protein due to loss of stability of the monomer and/or the hexamer, we performed MD simulations to understand distortions, if any, of the monomer and a randomly selected head-to-head dimer (from the hexameric assembly) of both the active and the inactive proteins. The dimers have been marked in black boxes in Fig. 5c. Various energy components of the dimer interface, as calculated by PPCheck, have been plotted in Fig. 6b. The results show that the inactive RNase PH dimer interface is less stabilised as compared to that of the active protein. The trajectories of the MD runs have been shown in additional movie files (Additional file 4, Additional file 5, Additional file 6 and Additional file 7, for the active monomer, inactive monomer, active dimer and inactive dimer, respectively). Analyses of Additional file 4, and Additional file 5 shows a slight distortion in the short helix (pink) in the absence of residues of interest (orange and yellow), which might lead to overall loss of stability of the monomer. Further analyses (Additional file 6 and Additional file 7) show the floppy nature of the terminal part of the helices that are interacting in the dimer. This is probably due to the loss of the residues of interest, which have been seen to be structured and less floppy in the active RNase PH dimer (Additional file 6).

For each of the systems, the H-bond traces for three replicates (represented in different colours) have been depicted. From these figures, we can observe that the replicates are showing similar H-bonding patterns. Analyses of the number of hydrogen bonds (H-bonds) formed in the system over each picosecond of the MD simulations of the active monomer, inactive monomer, active dimer and inactive dimer have been represented in Fig. 8a, b, c and d, respectively. Comparison of panels a and b of this figure shows a greater number of H-bonds being formed in the active monomer, as compared to that of the inactive monomer, over the entire time period of the simulation. Similarly, comparison of panels c and d of this figure shows a greater number of H-bonds being formed in the active dimer as compared to that of the inactive dimer, over the entire time period of the simulation. These losses of H-bonding interactions might lead to overall loss of stability of the dimer and subsequently that of the hexamer.

Case study 2: Uncharacterised pathogen-specific protein and its homologues show subtly different RNA-binding properties

In our study, we observed that Cluster 60 was composed of 10 proteins, each from a different pathogenic strain studied here. All the proteins in this cluster were either annotated as ‘putative’, ‘uncharacterised’, ‘hypothetical’ or ‘predicted’. To understand the RNA-binding properties of these orthologous pathogen-specific proteins, we resolved the Pfam DA of this protein. In particular, such an association to Pfam domains provide function annotation to a hitherto uncharacterised protein, from strain O103:H2, to RBD PELOTA_1. Hence, the structure of the RNA-binding PELOTA_1 domain of this protein was modelled on the basis of the L7Ae protein from M. jannaschii (Fig. 7a).

Domains that are involved in core processes, such as RNA maturation, e.g. the tRNA endonucleases, and translation and with an archaeo-eukaryotic phyletic pattern includes the PIWI, PELOTA and SUI1 domains [97]. In 2014, Anantharaman and co-workers had shown associations of the conserved C-terminus of a phosphoribosyltransferase (PRTase) in the Tellurium resistance (Ter) operon to a PELOTA or Ribosomal_L7Ae domain (Pfam ID: PF01248) [98]. These domains are homologues of the eukaryotic release factor 1 (eRF1), which is involved in translation termination. Unlike the well-studied PELOTA domain, the species distribution of the PELOTA_1 domain is solely bacterial and not much is known in literature regarding the specific function of this domain.

Structure of this modelled PELOTA_1 domain from the uncharacterised protein was aligned with that of the L7Ae kink-turn (K-turn) binding domain from an archaeon (A. fulgidus) (Fig. 7b). The model also retained the same basic structural unit as the eRF1 protein (data not shown). The L7Ae is a member of a family of proteins that binds K-turns in many functional RNA species [99]. The K-turn RNA was docked onto the model, guided by the equivalents of the known RNA-interacting residues from the archaeal L7Ae K-turning binding domain. Both the complexes have been shown in Fig. 7c with the RNA-interacting residues highlighted in yellow. MD simulations of both these complexes were performed and the trajectories have been shown in additional movie files Additional file 8 (PELOTA_1 domain model-k-turn RNA complex) and Additional file 9 (L7Ae K-turn binding domain-k-turn RNA complex).

For each of the systems, the H-bond traces for three replicates (represented in different colours) have been depicted. From these figures, one can observe that the replicates are showing similar H-bonding patterns. Analyses of the number of H-bonds formed between the protein and the RNA over each picosecond of the MD simulations of the PELOTA_1 domain-RNA complex and the L7Ae K-turn binding domain-RNA complex have been represented in Fig. 8e and f, respectively. Comparison of panels e and f of this figure shows a greater number of H-bonds being formed in the L7Ae K-turn binding domain-RNA complex as compared to that of the PELOTA_1 domain-RNA complex over the entire time period of the simulation. These results show that the two proteins have differential affinity towards the same RNA molecule. This hints at the fact that these proteins might perform subtly different functions by the virtue of having differential RNA-binding properties.

Case study 3: Pathogen specific Cas6-like proteins might be functional variants of the well-characterised non-pathogenic protein

In many bacteria, as well archaea, CRISPR associated Cas proteins and short CRISPR-derived RNA (crRNA) assemble into large RNP complexes and provide surveillance towards invasion of genetic parasites [100,101,102]. The role of CRISPR-associated proteins as virulence factors in pathogenic bacteria has also been reported in recent literature [84]. We found that Cluster 308 consists of 10 pathogen-specific proteins, of which half of them were already annotated as Cas6 proteins, whereas the other half constituted of ‘uncharacterised’ or ‘hypothetical’ proteins. As mentioned in the Methods section, the latter proteins were annotated on the basis of sequence homology to known proteins in the NR database, as Cas6 proteins.

Molecular phylogeny analysis of all the proteins from Cluster 308 and Cas6 from E. coli strain K12 has been depicted in Additional file 10a: Figure S1, which reinstates the fact that the pathogen-specific proteins are more similar to each other, in terms of sequence, than they are to the Cas6 protein from the non-pathogenic strain K12. Furthermore, a similar analysis of two previously uncharacterised proteins (UniProt IDs: C8U9I8 and C8TG04) (red) from this pathogen-specific Cas6 proteins cluster (Cluster 308), with other known Cas6 proteins has been shown Additional file 10b: Figure S1. From the phylogenetic tree, one can infer that the pathogen-specific Cas6 proteins are more similar in terms of sequence to the Cas6 from E. coli strain K12 (blue) than that from other organisms.

Multiple sequence alignment (MSA) of all the proteins from Cluster 308 and Cas6 from strain K12 has been shown in Fig. 9. The RNA-binding residues in E. coli strain K12 Cas6 protein (union set of RNA-binding residues inferred from each of the three known PDB structures (see Methods section)) have been highlighted in yellow on its sequence (CAS6_ECOLI) on the MSA. The corresponding residues in the other proteins on the MSA, which are same as that in CAS6_ECOLI, have also been highlighted in yellow, whereas those which differ have been highlighted in red. From Fig. 9a, we can conclude that the majority of the RNA-binding residues in CAS6_ECOLI are not conserved in the pathogen-specific Cas6 proteins, and can be defined as ‘class-specific residues’. A similar colouring scheme has been followed in Fig. 9b, to analyse the conservation of protein-interacting residues in these proteins. From these analyses, we can speculate that due to the presence of a large proportion of ‘class-specific residues’, the RNA-binding properties, as well as protein-protein interactions, might be substantially different among the Cas6 proteins from non-pathogenic and pathogenic E. coli strains, which might lead to functional divergence. Secondary structures of each of these proteins, mapped on their sequence (α-helices highlighted in cyan and β-strands in green) in Fig. 9c, also hint at slight structural variation among these proteins.

Discussion

We have employed a sequence search-based method to compare and contrast the proteomes of 16 pathogenic and three non-pathogenic E. coli strains as well as to obtain a global picture of the RBP landscape in E. coli. The results obtained from this study showed that the pathogenic strains encode a greater number of RBPs in their proteomes, as compared to the non-pathogenic ones. The DEAD domain, involved in RNA metabolism, was found to be the most abundant of all identified RBDs. The complete and distinct RBPome of E. coli was also identified by studying all known E. coli strains till date. In this study, we identified RBPs that were exclusive to pathogenic strains, and most of them can be exploited as drug targets by virtue of being non-homologous to their human host proteins. Many of these pathogen-specific proteins were uncharacterised and their identities could be resolved on the basis of sequence homology searches with known proteins.

Further, in this study, we performed three case studies on interesting RBPs. In the first of the three studies, a tRNA processing RNase PH enzyme from strain K12 was investigated that is different from that in all other E. coli strains in having a truncated C-terminus and being functionally inactive. Structural modelling and molecular dynamics studies showed that the loss of stability of the monomeric and/or the hexameric (biological unit) forms of this protein from E. coli strain K12, might be the possible reason for the lack of its functional activity. In the second study, a previously uncharacterised pathogen-specific protein was studied and was found to possess subtly different RNA-binding affinities towards the same RNA stretch as compared to its well characterised homologues in archaea and eukaryotes. This might hint at different functions of these proteins. In the third case study, pathogen-specific CRISPR-associated Cas6 proteins were analysed and found to have diverged functionally from the known prototypical Cas6 proteins.

Conclusions

The approach used in our study to cross-compare proteomes of pathogenic and non-pathogenic strains may also be extended to other bacterial or even eukaryotic proteomes to understand interesting differences in their RBPomes. The pathogen-specific RBPs reported in this study, may also be taken up further for clinical trials and/or experimental validations.

The effect of the absence of a functional RNase PH in E. coli strain K12 is not clear. The role of the PELOTA_1 domain-containing protein may also be reinforced performing knockdown and rescue experiments. These might help to understand the functional overlap of this protein with its archaeal or eukaryotic homologues. Introduction of this pathogen-specific protein in non-pathogens might also provide probable answers towards its virulence properties. The less conserved RNA-binding and protein-interacting residues in the pathogen-specific Cas6 proteins, might point to functional divergence of these proteins from the known ones, but warrants further investigation.

Abbreviations

ABC:: ATP-binding cassette transporters
APBS:: Adaptive Poisson-Boltzmann Solver
ASCH:: Activating Signal Cointergrator-1 homology
bp:: Base pair
Cas:: CRISPR-associated system
CRISPR:: Clustered Regularly Interspaced Short Palindromic Repeat
crRNA:: CRISPR RNA
DA:: Domain architecture
DOPE:: Discrete Optimized Protein Energy
EHEC:: Enterohemorrhagic E. coli
Fin:: Fertility inhibition
GROMACS:: Groningen Machine for Chemical Simulations
GWS:: Genome-wide survey
HMM:: Hidden Markov Model
i-Evalue:: Independent E-value
K-turn:: Kink-turn
Matt:: Multiple Alignment with Translations and Twists
MD:: Molecular dynamics
ML:: Maximum Likelihood
MMC:: Multi-member cluster
MSA:: Multiple sequence alignment
ncRNA:: Noncoding RNA
NR:: Non-redundant
PDB:: Protein Data Bank
Pfam:: Protein families database
RBD:: RNA-binding domain
RBP:: RNA-binding protein
RNase PH:: Ribonuclease PH
RNP:: Ribonucleoprotein
RsmA:: Repressor of secondary metabolites A
SCOP:: Structural Classification of Proteins
SMC:: Single-member cluster
sRNA:: Small RNA
TA:: Toxin-antitoxin
tRNA:: Transfer RNA

References

Kaper JB, Nataro JP, Mobley HLT. Pathogenic Escherichia Coli. Nat. Rev. Microbiol. 2004;2:123–40.
Article CAS PubMed Google Scholar
Hacker J, Bender L, Ott M, Wingender J, Lund B, Marre R, et al. Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extra intestinal Escherichia Coli isolates. Microb Pathog. 1990;8:213–25.
Article CAS PubMed Google Scholar
Hacker J, Kaper JB. Pathogenicity Islands and the evolution of microbes. Annu Rev Microbiol. 2000;54:641–79.
Article CAS PubMed Google Scholar
Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H. Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol. 1997;23:1089–97.
Article CAS PubMed Google Scholar
Caprioli A, Morabito S, Brugère H, Oswald E. Enterohaemorrhagic Escherichia Coli: emerging issues on virulence and modes of transmission. Vet Res. 2005;36:289–311.
Article CAS PubMed Google Scholar
Garmendia J. Frankel gad CVF. Enteropathogenic and Enterohemorrhagic Escherichia Coli infections. Infect Immun. 2005;73:2573–85.
Article CAS PubMed PubMed Central Google Scholar
Perez-Rueda E, Martinez-Nuñez MA. The repertoire of DNA-binding transcription factors in prokaryotes: functional and evolutionary lessons. Sci Prog. 2012;95:315–29.
Article CAS PubMed Google Scholar
Cusack S. RNA – protein complexes. Curr Opin Struct Biol. 1999;6:66–73.
Article Google Scholar
Draper DE. Themes in RNA-protein recognition. J Mol Biol. 1999;293:255–70.
Article CAS PubMed Google Scholar
Jones S, Daley DT, Luscombe NM, Berman HM, Thornton JM. Protein-RNA interactions: a structural analysis. Nucleic Acids Res. 2001;29:943–54.
Article CAS PubMed PubMed Central Google Scholar
Chen Y, Varani G. Protein families and RNA recognition. FEBS J. 2005;272:2088–97.
Article CAS PubMed Google Scholar
Hall KB. RNA – protein interactions. Curr Opin Struct Biol. 2002;12:283–8.
Article CAS PubMed Google Scholar
Schroeder R, Barta A, Semrad K. Strategies for RNA folding and assembly. Nat Rev Mol Cell Biol. 2004;5:908–19.
Article CAS PubMed Google Scholar
Windbichler N, von Pelchrzim F, Mayer O, Csaszar E, Schroeder R. Isolation of small RNA-binding proteins from E. coli : evidence for frequent interaction of RNAs with RNA polymerase. RNA Biol. 2008;5:30–40.
Article CAS PubMed Google Scholar
Aiba H. Mechanism of RNA silencing by Hfq-binding small RNAs. Curr Opin Microbiol. 2007;10:134–9.
Article CAS PubMed Google Scholar
De Lay N, Schu DJ, Gottesman S. Bacterial small RNA-based negative regulation: Hfq and its accomplices. J. Biol. Chem. 2013:7996–8003.
Gaballa A, Antelmann H, Aguilar C, Khakh SK, Song K-B, Smaldone GT, et al. The Bacillus Subtilis Iron-sparing response is mediated by a fur-regulated small RNA and three small, basic proteins. Proc Natl Acad Sci U S A. 2008;105:11927–32.
Article CAS PubMed PubMed Central Google Scholar
Geissmann TA, Touati D. Hfq, a new chaperoning role: binding to messenger RNA determines access for small RNA regulator. EMBO J. 2004;23:396–405.
Article CAS PubMed PubMed Central Google Scholar
Holmqvist E, Vogel J. A small RNA serving both the Hfq and CsrA regulons. Genes Dev. 2013;27:1073–8.
Article CAS PubMed PubMed Central Google Scholar
Van Assche E, Van Puyvelde S, Vanderleyden J, Steenackers HP. RNA-binding proteins involved in post-transcriptional regulation in bacteria. Front. Microbiol. 2015;6.
Liu JM, Camilli A. A broadening world of bacterial small RNAs. Curr Opin Microbiol. 2010:18–23.
Oliva G, Sahr T, Buchrieser C. Small RNAs, 5′ UTR elements and RNA-binding proteins in intracellular bacteria: Impact on metabolism and virulence. FEMS Microbiol Rev. 2015:331–49.
Sauer E, Schmidt S, Weichenrieder O. Small RNA binding to the lateral surface of Hfq hexamers and structural rearrangements upon mRNA target recognition. Proc Natl Acad Sci. 2012;109:9396–401.
Article CAS PubMed PubMed Central Google Scholar
Prasanth KV, Spector DL. Eukaryotic regulatory RNAs: an answer to the “genome complexity” conundrum. Genes Dev. 2007;21:11–42.
Article CAS PubMed Google Scholar
Hannon GJ. RNA interference. Nature. 2002;418:244–51.
Article CAS PubMed Google Scholar
Mattick JS. The Functional Genomics of Noncoding RNA. Science (80-. ). 2005;309:1527–8.
Sonnleitner E, Hagens S, Rosenau F, Wilhelm S, Habel A, Jäger KE, et al. Reduced virulence of a hfq mutant of Pseudomonas Aeruginosa O1. Microb Pathog. 2003;35:217–28.
Article CAS PubMed Google Scholar
Sittka A, Pfeiffer V, Tedin K, Vogel J. The RNA chaperone Hfq is essential for the virulence of salmonella typhimurium. Mol Microbiol. 2007;63:193–217.
Article CAS PubMed PubMed Central Google Scholar
Sharma AK, Payne SM. Induction of expression of hfq by DksA is essential for Shigella flexneri virulence. Mol Microbiol. 2006;62:469–79.
Article CAS PubMed Google Scholar
Ding Y, Davis BM, Waldor MK. Hfq is essential for Vibrio cholerae virulence and downregulates σE expression. Mol Microbiol. 2004;53:345–54.
Article CAS PubMed Google Scholar
Kendall MM, Gruber CC, Rasko DA, Hughes DT, Sperandio V. Hfq virulence regulation in enterohemorrhagic Escherichia Coli O157:H7 strain 86-24. J Bacteriol. 2011;193:6843–51.
Article CAS PubMed PubMed Central Google Scholar
Chao Y, Vogel J. The role of Hfq in bacterial pathogens. Curr. Opin. Microbiol. 2010. p. 24–33.
Zeng Q, McNally RR, Sundin GW. Global small RNA chaperone Hfq and regulatory small RNAs are important virulence regulators in erwinia amylovora. J Bacteriol. 2013;195:1706–17.
Article CAS PubMed PubMed Central Google Scholar
Christiansen JK, Larsen MH, Ingmer H, Sogaard-Andersen L, Kallipolitis BH. The RNA-binding protein Hfq of Listeria monocytogenes: role in stress tolerance and virulence. J Bacteriol. 2004;186:3355–62.
Article CAS PubMed PubMed Central Google Scholar
Geng J, Song Y, Yang L, Feng Y, Qiu Y, Li G, et al. Involvement of the post-transcriptional regulator Hfq in Yersinia pestis virulence. PLoS One. 2009;4.
Wilf NM, Reid AJ, Ramsay JP, Williamson NR, Croucher NJ, Gatto L, et al. RNA-seq reveals the RNA binding proteins, Hfq and RsmA, play various roles in virulence, antibiotic production and genomic flux in Serratia sp. ATCC 39006. BMC Genomics. 2013;14:822.
Article PubMed PubMed Central Google Scholar
Pessi G, Williams F, Hindle Z, Heurlier K, Holden MTG, Cámara M, et al. The global posttranscriptional regulator RsmA modulates production of virulence determinants and N-acylhomoserine lactones in Pseudomonas Aeruginosa. J Bacteriol. 2001;183:6676–83.
Article CAS PubMed PubMed Central Google Scholar
Liaw SJ, Lai HC, Ho SW, Luh KT, Wang WB. Role of RsmA in the regulation of swarming motility and virulence factor expression in Proteus Mirabilis. J Med Microbiol. 2003;52:19–28.
Article CAS PubMed Google Scholar
Mulcahy H, O’Callaghan J, O’Grady EP, Maciá MD, Borrell N, Gómez C, et al. Pseudomonas Aeruginosa RsmA plays an important role during murine infection by influencing colonization, virulence, persistence, and pulmonary inflammation. Infect Immun. 2008;76:632–8.
Article CAS PubMed Google Scholar
Mulcahy H, O’Callaghan J, O’Grady EP, Adams C, O’Gara F. The posttranscriptional regulator RsmA plays a role in the interaction between Pseudomonas Aeruginosa and human airway epithelial cells by positively regulating the type III secretion system. Infect Immun. 2006;74:3012–5.
Article CAS PubMed PubMed Central Google Scholar
Chao N-X, Wei K, Chen Q, Meng Q-L, Tang D-J, He Y-Q, et al. The rsmA -like gene rsmA Xcc of Xanthomonas campestris pv. Campestris is involved in the control of various cellular processes, including pathogenesis. Mol. Plant-Microbe Interact. 2008;21:411–23.
Article CAS Google Scholar
Vercruysse M, Köhrer C, Davies BW, Arnold MFF, Mekalanos JJ, RajBhandary UL, et al. The Highly Conserved Bacterial RNase YbeY Is Essential in Vibrio cholerae, Playing a Critical Role in Virulence, Stress Regulation, and RNA Processing. Klose KE, editor. PLoS Pathog. 2014;10:e1004175.
Ghosh P, Sowdhamini R. Genome-wide survey of putative RNA-binding proteins encoded in the human proteome. Mol BioSyst Royal Society of Chemistry. 2016;12:532–40.
Article CAS Google Scholar
Bateman A, Martin MJ, O’Donovan C, Magrane M, Apweiler R, Alpi E, et al. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–12.
Article Google Scholar
Tatusova T, Ciufo S, Federhen S, Fedorov B, McVeigh R, O’Neill K, et al. Update on RefSeq microbial genomes resources. Nucleic Acids Res. 2015;43:D599–605.
Article CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
Article CAS PubMed Google Scholar
Kaushik S, Mutt E, Chellappan A, Sankaran S, Srinivasan N, Sowdhamini R. Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage. Promponas VJ, editor. PLoS One. 2013;8:e56449.
Šali A, Blundell TL. Comparative protein Modelling by satisfaction of spatial restraints. J Mol Biol. 1993;234:779–815.
Article PubMed Google Scholar
Laskowski RA, Macarthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of proteins structures. J Appl Crystallogr. 1993;26:283–91.
Article CAS Google Scholar
Profiles T. VERIFY3D : assessment of protein models with three- dimensional profiles. Methods Enzymol. 1997;277:396–404.
Article Google Scholar
Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35:W407–10.
Article PubMed PubMed Central Google Scholar
Pugalenthi G, Shameer K, Srinivasan N, Sowdhamini R. HARMONY: a server for the assessment of protein structures. Nucleic Acids Res. 2006;34:231–4.
Article Google Scholar
Wang J, Cieplak P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem. 2000;21:1049–74.
Article CAS Google Scholar
Berendsen HJC, van der Spoel D, van Drunen R. GROMACS: a message-passing parallel molecular dynamics implementation. Comput Phys Commun. 1995;91:43–56.
Article CAS Google Scholar
Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372:774–97.
Article CAS PubMed Google Scholar
Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA. PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res. 2004;32:W665–7.
Article CAS PubMed PubMed Central Google Scholar
Unni S, Huang Y, Hanson RM, Tobias M, Krishnan S, Li WW, et al. Web servers and services for electrostatics calculations with APBS and PDB2PQR. J Comput Chem. 2011;32:1488–91.
Article CAS PubMed PubMed Central Google Scholar
Sowdhamini R, Sukhwal A. PPCheck: a Webserver for the quantitative analysis of protein&ndash;protein interfaces and prediction of residue hotspots. Bioinform Biol Insights. 2015;9:141.
Article PubMed PubMed Central Google Scholar
Menke M, Berger B, Cowen L. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol. 2008;4:e10.
Article PubMed PubMed Central Google Scholar
Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: a protein−protein docking approach based on biochemical or biophysical information. J Am Chem Soc. 2003;125:1731–7.
Article CAS PubMed Google Scholar
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Article CAS PubMed PubMed Central Google Scholar
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–4.
Article CAS PubMed Google Scholar
Kumar S, Stecher G, Peterson D, Tamura K. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics. 2012;28:2685–6.
Article CAS PubMed PubMed Central Google Scholar
Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202.
Article CAS PubMed Google Scholar
Linder P, Jankowsky E. From unwinding to clamping — the DEAD box RNA helicase family. Nat Rev Mol Cell Biol Nature Publishing Group. 2011;12:505–16.
Article CAS Google Scholar
Blattner FR. The Complete Genome Sequence of Escherichia coli K-12. Science (80-. ). 1997;277:1453–62.
Kim S-H, Hung L-W, Wang IX, Nikaido K, Liu P-Q, Ames GF-L. No Title. Nature. 1998;396:703–7.
Story RM, Steitz TA. Structure of the recA protein-ADP complex. Nature. 1992;355:374–6.
Article CAS PubMed Google Scholar
Abrahams JP, Leslie AGW, Lutter R, Walker JE. Structure at 2.8 Â resolution of F1-ATPase from bovine heart mitochondria. Nature. 1994;370:621–8.
Article CAS PubMed Google Scholar
Dong J, Lai R, Jennings JL, Link AJ, Hinnebusch AG. The novel ATP-binding cassette protein ARB1 is a shuttling factor that stimulates 40S and 60S ribosome biogenesis. Mol Cell Biol. 2005;25:9859–73.
Article CAS PubMed PubMed Central Google Scholar
Samra N, Atir-Lande A, Pnueli L, Arava Y. The elongation factor eEF3 (Yef3) interacts with mRNA in a translation independent manner. BMC Mol Biol. 2015;16:17.
Article PubMed PubMed Central Google Scholar
Rodnina MV. Protein synthesis meets ABC ATPases: new roles for Rli1/ABCE1. EMBO Rep. 2010;11:143–4.
Article CAS PubMed PubMed Central Google Scholar
Van Melderen L, De Bast MS. Bacterial toxin-Antitoxin systems: More than selfish entities? PLoS Genet. 2009.
Van Melderen L. Toxin-antitoxin systems: Why so many, what for? Curr Opin Microbiol. 2010:781–5.
Goeders N, Van Melderen L. Toxin-antitoxin systems as multilevel interaction systems. Toxins (Basel). 2013:304–24.
Buts L, Lah J, Dao-Thi MH, Wyns L, Loris R. Toxin-antitoxin modules as bacterial metabolic stress managers. Trends Biochem Sci. 2005:672–9.
Gerdes K, Christensen SK, Løbner-Olesen A. Prokaryotic toxin-antitoxin stress response loci. Nat. Rev. Microbiol. 2005;3:371–82.
Article CAS PubMed Google Scholar
Jankowsky E, Fairman ME. RNA helicases--one fold for many functions. Curr Opin Struct Biol. 2007;17:316–24.
Article CAS PubMed Google Scholar
Jankowsky E. RNA helicases at work: Binding and rearranging. Trends Biochem Sci. 2011:19–29.
Hamma T, Ferré-D’Amaré AR. Pseudouridine Synthases. Chem Biol. 2006;13:1125–35.
Article CAS PubMed Google Scholar
Phadtare S, Alsina J, Inouye M. Cold-shock response and cold-shock proteins. Curr Opin Microbiol. 1999:175–80.
Yamanaka K. Cold shock response in Escherichia Coli. J Mol Microbiol Biotechnol. 1999;1:193–202.
CAS PubMed Google Scholar
Fozo EM, Kawano M, Fontaine F, Kaya Y, Mendieta KS, Jones KL, et al. Repression of small toxic protein synthesis by the sib and OhsC small RNAs. Mol Microbiol. 2008;70:1076–93.
Article CAS PubMed PubMed Central Google Scholar
Louwen R, Staals RHJ, Endtz HP, van Baarlen P, van der Oost J. The role of CRISPR-Cas Systems in Virulence of pathogenic bacteria. Microbiol Mol Biol Rev. 2014;78:74–88.
Article PubMed PubMed Central Google Scholar
Iyer LM, Koonin E V, Aravind L. No Title. Genome Biol. 2002;3:research0012.1.
Arthur DC, Ghetu AF, Gubbins MJ, Edwards RA, Frost LS, Glover JNM. FinO is an RNA chaperone that facilitates sense-antisense RNA interactions. EMBO J. 2003;22:6346–55.
Article CAS PubMed PubMed Central Google Scholar
Arthur DC, Edwards RA, Tsutakawa S, Tainer JA, Frost LS, Glover JNM. Mapping interactions between the RNA chaperone FinO and its RNA targets. Nucleic Acids Res. 2011;39:4450–63.
Article CAS PubMed PubMed Central Google Scholar
Ghetu AF, Gubbins MJ, Frost LS, Glover JN. Crystal structure of the bacterial conjugation repressor finO. Nat Struct Biol. 2000;7:565–9.
Article CAS PubMed Google Scholar
Mark Glover JN, Chaulk SG, Edwards RA, Arthur D, Lu J, Frost LS. The FinO family of bacterial RNA chaperones. Plasmid. 2015;78:79–87.
Article CAS PubMed Google Scholar
Iyer LM, Burroughs AM, Aravind L. The ASCH superfamily: novel domains with a fold related to the PUA domain and a potential role in RNA metabolism. Bioinformatics. 2006;22:257–63.
Article CAS PubMed Google Scholar
Deutscher MP, Marshall GT, Cudny H. RNase PH: an Escherichia Coli phosphate-dependent nuclease distinct from polynucleotide phosphorylase. Proc Natl Acad Sci. 1988;85:4710–4.
Article CAS PubMed PubMed Central Google Scholar
Kelly KO, Deutscher MP. Characterization of Escherichia Coli RNase PH. J Biol Chem. 1992;267:17153–8.
CAS PubMed Google Scholar
Wen T, Oussenko IA, Pellegrini O, Bechhofer DH, Condon C. Ribonuclease PH plays a major role in the exonucleolytic maturation of CCA-containing tRNA precursors in Bacillus Subtilis. Nucleic Acids Res. 2005;33:3636–43.
Article CAS PubMed PubMed Central Google Scholar
Jensen KF. The Escherichia Coli K-12 “wild types” W3110 and MG1655 have an rph frameshift mutation that leads to pyrimidine starvation due to low pyrE expression levels. J Bacteriol. 1993;175:3401–7.
Article CAS PubMed PubMed Central Google Scholar
Harlow LS, Kadziola A, Jensen KF, Larsen S. Crystal structure of the phosphorolytic exoribonuclease RNase PH from Bacillus Subtilis and implications for its quaternary structure and tRNA binding. Protein Sci. 2004;13:668–77.
Article CAS PubMed PubMed Central Google Scholar
Choi JM, Park EY, Kim JH, Chang SK, Cho Y. Probing the functional importance of the Hexameric ring structure of RNase PH. J Biol Chem. 2004;279:755–64.
Article CAS PubMed Google Scholar
Anantharaman V. Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res. 2002;30:1427–64.
Article CAS PubMed PubMed Central Google Scholar
Anantharaman V, Iyer LM, Aravind L. Ter-dependent stress response systems: novel pathways related to metal sensing, production of a nucleoside-like metabolite, and DNA-processing. Mol BioSyst. 2012;8:3142.
Article CAS PubMed PubMed Central Google Scholar
Huang L, Lilley DMJ. The molecular recognition of kink-turn structure by the L7Ae class of proteins. RNA. 2013;19:1703–10.
Article CAS PubMed PubMed Central Google Scholar
Barrangou R, Marraffini LA. CRISPR-cas systems: Prokaryotes upgrade to adaptive immunity. Mol Cell. 2014:234–44.
Jiang F, Doudna JA. The structural biology of CRISPR-Cas systems. Curr Opin Struct Biol. 2015:100–11.
van der Oost J, Westra ER, Jackson RN, Wiedenheft B. Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol. 2014;12:479–92.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgments

We thank NCBS (TIFR) for financial and infrastructural support.

Funding

We thank University Grants Commission (UGC) and the NCBS Bridge Postdoctoral Fellowship for funding P.G.

Availability of data and materials

All the data related to this work, including accession IDs of proteins, have been presented in the Additional files 1: Table S1, Additional file 2: Table S2 and Additional file 3.

Declarations

All authors have gone through the manuscript and contents of this article have not been published elsewhere.

Author information

Authors and Affiliations

National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India
Pritha Ghosh & Ramanathan Sowdhamini

Authors

Pritha Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Ramanathan Sowdhamini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RS conceived the idea and designed the project. PG acquired data and performed all the analyses. PG wrote the first draft of the manuscript and RS improved on it. Both the authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Ramanathan Sowdhamini.

Ethics declarations

Ethics approval and consent to participate

Not applicable, since this study has not directly used samples collected from humans, plant or animals, but has analysed publicly available, pre-existing protein sequence data.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1: Table S1.

RNA-binding proteins in 19 E. coli proteomes. All the RBPs obtained in the GWS of 19 E. coli strains have been listed in this table. The pathogenic and non-pathogenic E. coli strains have been highlighted in red and green, respectively. (DOC 115 kb)

Additional file 2: Table S2.

Clusters of RNA-binding proteins obtained from 19 E. coli proteomes. The clusters of RBPs with more than one member in the GWS of 19 E. coli strains have been listed in this table. The RBPs were clustered based on BLASTP searches at E-value, percentage identity and percentage query coverage cut-offs of 10⁻⁵, 30 and 70, respectively. (DOC 376 kb)

Additional file 3:

RNA-binding proteins in all complete E. coli proteomes. All the RBPs obtained in the GWS of 166 E. coli strains have been listed here. The RefSeq IDs of the proteins are listed along with the total number of strains in which the protein is present mentioned in brackets. (DOC 185 kb)

Additional file 4:

100 ns molecular dynamics simulations of the active RNase PH monomer in the AMBER99SB protein, nucleic AMBER94 force field. The protein has been colour coded as in Fig. 5b. Hydrogen bonds at distance and angle cut-offs of 3 Å and 20°, respectively, have been shown at the region of interest with black dotted lines. (MP4 50,376 kb)

Additional file 5:

100 ns molecular dynamics simulations of the inactive RNase PH monomer in the AMBER99SB protein, nucleic AMBER94 force field. The protein has been colour coded as in Fig. 5b. Hydrogen bonds at distance and angle cut-offs of 3 Å and 20°, respectively, have been shown at the region of interest with black dotted lines. (MP4 67,789 kb)

Additional file 6:

100 ns molecular dynamics simulations of the active RNase PH dimer in the AMBER99SB protein, nucleic AMBER94 force field. The protein has been colour coded as in Fig. 5b and c. Hydrogen bonds at distance and angle cut-offs of 3 Å and 20°, respectively, have been shown at the region of interest with black dotted lines. (MP4 67,624 kb)

Additional file 7:

100 ns molecular dynamics simulations of the inactive RNase PH dimer in the AMBER99SB protein, nucleic AMBER94 force field. The protein has been colour coded as in Fig. 5b and c. Hydrogen bonds at distance and angle cut-offs of 3 Å and 20°, respectively, have been shown at the region of interest with black dotted lines. (MP4 67,820 kb)

Additional file 8:

100 ns molecular dynamics simulations of the PELOTA_1 domain from the ‘uncharacterised’ protein in complex with kink-turn RNA, in the AMBER99SB protein, nucleic AMBER94 force field. The protein has been represented in blue and the RNA in red. Hydrogen bonds at distance and angle cut-offs of 3 Å and 20°, respectively, have been shown between the protein and the RNA has been shown with black dotted lines. (MP4 54,830 kb)

Additional file 9:

100 ns molecular dynamics simulations of the L7Ae K-turn binding domain from Archaeoglobus fulgidus in complex with kink-turn RNA from H. marismortui (PDB code: 4BW0: B), in the AMBER99SB protein, nucleic AMBER94 force field. The protein has been represented in blue and the RNA in red. Hydrogen bonds at distance and angle cut-offs of 3 Å and 20°, respectively, have been shown between the protein and the RNA has been shown with black dotted lines. (MP4 66,564 kb)

Additional file 10: Figure S1.

Molecular phylogeny analysis of Cas6 proteins. a. All the proteins from Cluster 308 and Cas6 from E. coli strain K12. b. Two previously uncharacterised proteins (UniProt IDs: C8U9I8 and C8TG04) from Cluster 308, with other known Cas6 proteins, including that from E. coli strain K12. In both the panels, the above-mentioned two previously uncharacterised proteins from the pathogen-specific Cas6 proteins cluster (Cluster 308) have been highlighted in red and the Cas6 protein from E. coli strain K12 in blue. (JPEG 4531 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Ghosh, P., Sowdhamini, R. Bioinformatics comparisons of RNA-binding proteins of pathogenic and non-pathogenic Escherichia coli strains reveal novel virulence factors. BMC Genomics 18, 658 (2017). https://doi.org/10.1186/s12864-017-4045-3

Download citation

Received: 08 May 2017
Accepted: 09 August 2017
Published: 24 August 2017
DOI: https://doi.org/10.1186/s12864-017-4045-3

Bioinformatics comparisons of RNA-binding proteins of pathogenic and non-pathogenic Escherichia coli strains reveal novel virulence factors

Abstract

Background

Results

Conclusions

Background

Methods

Dataset

Search method

Comparison of RNA-binding proteins across strains

Modelling and dynamics studies of RNase PH protein

Modelling and dynamics studies of an ‘uncharacterised’ pathogen-specific protein

Sequence analysis of pathogen-specific Cas6-like proteins

Results

Genome-wide survey (GWS) of RNA-binding proteins in pathogenic and non-pathogenic E. coli strains

Comparison of RNA-binding proteins across strains reveals novel pathogen-specific factors

Pathogen-specific proteins

Identification of the distinct RNA-binding protein repertoire in E. coli

GWS of RNA-binding proteins in all known E. coli strains

Identification of the complete distinct RBPome in 166 proteomes of E. coli

Case studies

Case study 1: RNase PH from strain K12 is inactive due to a possible loss of stability of the protein

Case study 2: Uncharacterised pathogen-specific protein and its homologues show subtly different RNA-binding properties

Case study 3: Pathogen specific Cas6-like proteins might be functional variants of the well-characterised non-pathogenic protein

Discussion

Conclusions

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

Declarations

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional files

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us