The Escherichia coli K-12 ORFeome: a resource for comparative molecular microbiology
- Seesandra V Rajagopala†1,
- Natsuko Yamamoto†2,
- Adrienne E Zweifel3,
- Tomoko Nakamichi2,
- Hsi-Kuang Huang1,
- Jorge David Mendez-Rios1,
- Jonathan Franca-Koh1,
- Meher Preethi Boorgula1,
- Kazutoshi Fujita4,
- Ken-ichirou Suzuki4,
- James C Hu3,
- Barry L Wanner5,
- Hirotada Mori2, 6Email author and
- Peter Uetz1, 7Email author
© Rajagopala et al; licensee BioMed Central Ltd. 2010
Received: 6 May 2010
Accepted: 11 August 2010
Published: 11 August 2010
Systems biology and functional genomics require genome-wide datasets and resources. Complete sets of cloned open reading frames (ORFs) have been made for about a dozen bacterial species and allow researchers to express and study complete proteomes in a high-throughput fashion.
We have constructed an open reading frame (ORFeome) collection of 3974 or 94% of the known Escherichia coli K-12 ORFs in Gateway® entry vector pENTR/Zeo. The collection has been used for protein expression and protein interaction studies. For example, we have compared interactions among YgjD, YjeE and YeaZ proteins in E. coli, Streptococcus pneumoniae, and Staphylococcus aureus. We also compare this ORFeome with other Gateway-compatible bacterial ORFeomes and show its utility for comparative functional genomics.
The E. coli ORFeome provides a useful resource for functional genomics and other areas of protein research in a highly flexible format. Our comparison with other ORFeomes makes comparative analyses straighforward and facilitates direct comparisons of many proteins across many genomes.
High-throughput DNA sequencing has increased the number of genome sequences to over 1,000 bacterial species from which we can infer their proteomes and often major parts of their metabolism and regulatory pathways. A systems level understanding of cells, however, will require the functional characterization of these proteins and how they work together. In recent years, a growing number of efforts have used high throughput assays to catalog gene expression, protein interactions, localization and metabolic activities. For many of these studies, the first step is to identify and then clone all the open reading frames (the "ORFeome") encoded by the genome of the organism .
Here we describe the construction of a comprehensive Escherichia coli ORF collection in a Gateway® entry vector. The library represents 3974 ORFs or 94% of all protein-coding genes. The Gateway® system facilitates the transfer of ORFs into a large range of expression vectors that are suitable for downstream studies. Here we demonstrate the utility of the E. coli ORFeome by comparing it to 12 other available microbial ORFeomes and by testing a set of protein-protein interactions among 5 species.
The E. coli entry clone library lacks start and stop codons and is thus compatible with both N-terminal and C-terminal expression clone constructions. The clones from the entry vectors can be easily shuttled into different Gateway-compatible expression vectors of many types in a high-throughput fashion [5, 6].
Results and Discussion
E. coli as a model for comparative genomics and biology
E. coli K-12 has led basic life science research for more than half a century due to its easy manipulation and its safety as a non-pathogenic organism. We wondered to what extent it can also serve as model for pathogenic bacteria and compared the E. coli ORFeome to all other bacterial ORFeomes that are available as Gateway-compatible clones. Figure 1b shows how many E. coli genes have orthologs in these species including Vibrio cholerae, Yersinia pestis, Streptococcus pneumoniae and others. For example, over 80% of E. coli COGs are conserved in Pseudomonas aeruginosa (Figure 1b). COGs (clusters of orthologous groups) represent conserved protein families and provide a standard way to compare gene sets . We can safely assume that the general molecular function of these E. coli proteins should be similar or identical to these homologues in other bacterial species. However, we cannot easily predict whether small changes in sequence will change the function or specificity of proteins. The availability of complete collections of easily moveable cloned ORFs facilitates functional studies in multiple species in parallel, even at the level of whole proteomes. As of today, Gateway® clone collections are available for at least 14 bacterial species including 2 strains of Staphylococcus aureus (Figure 1B,C, Additional file 1, Table S3). COGs should also facilitate comparative analysis, given that many of them are present in species for which ORFeomes are available. For example, 2162 COGs are present in at least four of the species for which ORFeomes are available (Figure 1C).
The E. coli ORFeome for protein expression
The E. coli ORFeome for functional genomics and protein interaction analysis
The availability of ORFeome collections will greatly facilitate comparative functional genomic studies. An example of this is to compare protein-protein interactions among multiple species in order to determine which interactions are conserved. Here we used the E. coli ORF collection as well as previously generated S. aureus and S. pneumoniae collections (http://pfgrc.jcvi.org/) to systematically test, by yeast two-hybrid, whether the recently described protein-protein interactions between the essential E. coli gene products YgjD, YjeE and YeaZ are conserved in these Gram-positive pathogens. These three proteins were selected as an interesting case study because they are highly conserved, essential, and of unknown function. The yjeE, yeaZ, and ygjD genes are highly conserved throughout eubacterial genomes while ygjD orthologs are also found throughout the archaea and eukaryotes. We found all the interactions that Handford et al.  reported but there were significant differences between species (Figure B, C, D). For example, YjeE and YeaZ from E. coli, but not from S. aureus or S. pneumoniae, interacted. The functions of these genes remain poorly understood. In E. coli, yeaZ is able to proteolyse ygjD while yjeE, an ATPase, competes with ygjD for binding to yeaZ. The inability of yjeE to interact with yeaZ in S. aureus and S. pneumoniae may indicate differences in the regulation of the ygjD-yeaZ complex in these species. Our study of these interactions not only demonstrated differences between the species tested but also showed another advantage of such a comparative approach: the S. pneumoniae YjeE as well as the the S. aureus YgjD protein autoactivated the reporter genes when fused to the Gal4 DNA binding domain. This property affects approximately 10% of bait proteins in yeast two-hybrid assays . However, while the S. aureus YgjD bait is autoactivating, YgjD of E. coli and S. pneumoniae are not (Figure 2B). Hence, comparative assays may offer one strategy for circumventing limitations of the yeast two-hybrid method.
Additionally, by revealing which interactions are evolutionarily conserved, such comparative studies will greatly enhance our ability to interpret the conserved biological functions of the interacting proteins, and also the computational analysis of high-throughput protein-protein interaction datasets . For example, crystal structures are available for all three interacting proteins, but only one from E. coli, namely YeaZ (Figure 2D). In order to obtain more information for model building and functional interactions, we expanded our test set beyond E. coli and tested the interactions among YgjD, YeaZ, and YjeE in five different species, including H. pylori and R. prowazekii. In addition to the expected intra species interactions, inter species interactions were observed (Figure 2C). The S. aureus YeaZ protein associated with the products of the E. coli and S. pneumoniae orthologs of ygjD while the S. pneumoniae YeaZ protein was able to interact with H. pylori YgjD. This last interaction was particularly unexpected as yeaZ is not conserved in H. pylori, and suggests the possibility that the functions of yeaZ may be performed by another protein in this species.
Given the availability of ORFeomes for more than a dozen species, such comparative analyses can now be carried out quite easily. More importantly, additional biochemical or genetic studies can be done in E. coli for which extensive resources, including deletion strains [11, 12] and comprehensive databases (http://www.PrFEcT.org > Resources), are available. For instance, our E. coli clones could be used to complement mutants in other species, which would demonstrate their functional equivalence.
In conjunction with other clone sets and the vast amount of genomics and proteomics data from E. coli, the Gateway-ORFeome will be another highly useful resource for the E. coli functional genomics community.
Cloning the E. coli ORFs from ASKA library to Gateway® entry vector
About 250 E. coli clones which were not present in the ASKA library or not successfully ligated from the ASKA library into the Gateway® entry vector were cloned by Gateway® recombinational cloning . The PCR products were inserted into the Gateway® entry vector pDONR™/Zeo (Invitrogen) by BP-cloning. The products resulting from site-specific recombination were transformed into E. coli and plated onto solid LB medium containing Zeocin. Two isolated colonies were selected for each reaction and the clones were verified by colony-PCR with pDONR™/Zeo-specific primers. The clones that had an insert of the expected size were picked for plasmid isolation and the plasmid was used as a template for DNA sequencing to verify the insert sequence.
Colony PCR of bacterial clones
We selected four isolated colonies for each pENTR/Zeo clone to verify the cloned ORF size. Colonies were picked with a sterile pipette tip and transferred to the wells of a 96-well plate containing 150 μl low-salt LB liquid medium containing 50 μg/ml Zeocin™ and incubated overnight at 37°C to generate glycerol long-term frozen stocks. 1 μl of bacterial culture was used for colony PCR in 96-well plates containing 50-μl samples with Biomix™ (Bioline, Cat. No. BIO-25012), pDONR™/Zeo-specific forward primer (5'-GTAAAACGACGGCCAG-3') and reverse primer (5'-CAGGAAACAGCTATGAC-3') (0.3 μM each). The 30 PCR cycles (94°C for 30 s, 55°C for 30 s, and 72°C for 1 min/kb) were preceded by heating to 94°C for 5 min and followed by a 7-min incubation at 72°C. The sizes of the PCR products were determined by agarose gel electrophoresis and ethidium bromide staining.
The DNA sequence of E. coli strain K-12 was obtained from the JCVI/CMR Genome Database (http://cmr.jcvi.org/cgi-bin/CMR/CmrHomePage.cgi) and primers were designed for clones which were cloned by Gateway® recombination (Figure 3), using the primer design tool (http://tools.bio.anl.gov/bioJAVA/jsp/ExpressPrimerTool/). The primers are designed without endogenous start and stop codons. In addition to a 20- to 30-nucleotide-long ORF-specific sequence the attB1 segment (5'-aaaaagcaggctta-3') was added to each forward primer, followed by ORF-specific bases without a start codon. The attB2 segment (5'-agaaagctgggtg-3') was added at the 5' end of each reverse primer, which was complementary to the end of the ORF, without the last nucleotide of the stop codon. The primers were obtained from Invitrogen in a 96-well format.
PCR amplification of the ORFs
For the clones constructed by Gateway® recombinational cloning, PCR was performed in 96-well plates containing 50-μl reaction volumes consisting of 1 U KOD DNA polymerase (Novagen), dNTP mix (0.4 mM each), primary forward and reverse primers (0.3 μM each), and E. coli K-12 strain W3110 genomic DNA (200 ng). The complete sequences of attB1 (5'-GGGGACAAGTTTGTACAAAAAAGCAGGCT-3') and attB2 (5'-GGGGACCACTTTGTACAAGAAAGCTGGGT-3') were added in the secondary round PCR, where the first round PCR product was used as a template, to generate the full-length attB1 and attB2 sites flanking the ORFs. The PCR cycles were used as recommended by the KOD DNA polymerase manufacturer (Novagen, Cat. No.710853). These PCR products were used for BP reaction.
attB × attP recombination reactions (BP reactions)
The PCR-amplified ORFs with attB1 and attB2 sites were recombined into the vector pDONR™/Zeo (Invitrogen) by using the BP Clonase™ II Enzyme Mix (Invitrogen). In 96-well plates, samples containing 1 μl purified PCR product, 1 μl BP Clonase™ II Enzyme Mix, 75 ng pDONR™/Zeo plasmid and TE buffer, pH 8.0, up to 5 μl were incubated overnight at 25°C. After adding 1 μg proteinase K (Invitrogen) and incubating at 37°C for 30 minutes, the BP reactions were directly used for bacterial transformation.
attL × attR recombination reactions (LR reactions)
Entry vectors were set up in LR reactions to recombine the gene of interest into several destination vectors (expression vectors). The destination vectors used were pDEST22, pDEST32 (Invitrogen), pGADT7g, pGBGT7g and pDEST-GST vectors. Samples containing 5 μl prepared entry clone, 1 μl LR Clonase™ II Enzyme Mix (Invitrogen), destination vector (150 ng), and TE buffer, pH 8.0 to 10 μl were incubated at 25°C for two hours. After adding 1 μg proteinase K (Invitrogen) and incubating at 37°C for 30 min, the LR reactions were directly used for plasmid transformation into E. coli.
Validation of entry clones by DNA sequencing
To sequence verify the inserted ORFs we re-arrayed each clone which showed the right size as a PCR product. All these clones were grown on 1.2 ml LB-Zeocin liquid medium in 96 deep well plates (2 ml Qiagen). An aliquot of this culture (50 ul) was used to make a glycerol stock for longer storage. The plasmid DNA was isolated by using 96 well plasmid preparation plates (Millipore), and the plasmid preparations were sequenced with a pDONR™/Zeo-specific forward primer and reverse primers to verify the insert from both N-terminal and C-terminal ends of the ORFs. All the sequencing reads were analyzed using NCBI stand alone BLAST against the E. coli K-12 genome database to confirm the identity of each ORF. The clone verification was classified into three categories based on the sequencing coverage of the insert, class A: insert is verified from both N and C-terminal ends; class B: insert is verified either from N or C-terminal ends (Additional file 1, Table S1); class C: unverified (or sequence failed).
Validation of entry clones by recombinant protein production
In order to verify the functionality of the ORFeome, a random sample was used to show expression of proteins from the cloned ORFs. Ten different E. coli entry clones were shuttled into the pDEST-Exp vector designed to make a fusion protein with a GST-Tag. The resulting expression vectors were transformed into the BL21(DE3) protein expression strain of E. coli. After induction of protein expression with IPTG, the cells were lysed and the crude lysates were analyzed by western blot and antibody detection anti-GST antibodies. Protein was expressed from all 10 of the GST-tagged proteins we tested (Figure 2A).
Yeast two-hybrid analysis
The yeast two-hybrid assay is conducted as described by Rajagopala et al. (Rajagopala et al. 2007b).
Homology of bacterial ORFeomes
The cluster of orthologous group (COGs) for all the 14 bacterial species for which cloned ORFeomes are available (Additional file 1, Table S3), were extracted from the STRING database . This table was used to obtain orthologous protein information between different bacterial species based on COGs relationship. The COGs from Escherichia coli K-12 (strain W3110) were used as reference to obtain homologous proteins in 14 different bacterial species (including two strains of Staphylococcus aureus, Col and Mu50; only Col was used for the COG analysis though). Similarly, by taking each of these 14 species as reference, the homologues for the rest of the species were extracted. Unique proteins from each species in the COGs were taken and the fraction of these out of all the predicted protein coding genes of the respective genome was used to calculate the percentage of homologous proteins. The matrix with all these values with reference as well as other species was made and used to generate a heat map in order to represent the percentage of homologous proteins in different species (Figure 1B).
We thank Keehwan Kwon for the pDEST-GST vector. This project was funded by NIH grant RO1GM79710 (PU), U24 GM077905 (JH), RC1 GM092047 (BLW), Grant-in-Aid for Scientific Research (A) and KAKENHI (Grant-in-Aid for Scientific Research) on Priority Areas "System Genomics" from the Ministry of Education, Culture, Sports, Science and Technology of Japan (HM), and by grant HEALTH-F3-2009-223101 from the Seventh Research Framework Programme of the European Union (PU).
- Hilson P: Cloned sequence repertoires for small- and large-scale biology. Trends in Plant Science. 2006, 11 (3): 133-141. 10.1016/j.tplants.2006.01.006.PubMedView ArticleGoogle Scholar
- Hartley JL, Temple GF, Brasch MA: DNA cloning using in vitro site-specific recombination. Genome Res. 2000, 10 (11): 1788-1795. 10.1101/gr.143000.PubMed CentralPubMedView ArticleGoogle Scholar
- Riley M, Abe T, Arnaud MB, Berlyn MK, Blattner FR, Chaudhuri RR, Glasner JD, Horiuchi T, Keseler IM, Kosuge T: Escherichia coli K-12: a cooperatively developed annotation snapshot--2005. Nucleic Acids Res. 2006, 34 (1): 1-9. 10.1093/nar/gkj405.PubMed CentralPubMedView ArticleGoogle Scholar
- Kitagawa M, Ara T, Arifuzzaman M, Ioka-Nakamichi T, Inamoto E, Toyonaga H, Mori H: Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research. DNA Res. 2005, 12 (5): 291-299. 10.1093/dnares/dsi012.PubMedView ArticleGoogle Scholar
- Hallez R, Letesson JJ, Vandenhaute J, De Bolle X: Gateway-based destination vectors for functional analyses of bacterial ORFeomes: application to the Min system in Brucella abortus. Appl Environ Microbiol. 2007, 73 (4): 1375-1379. 10.1128/AEM.01873-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Rajagopala SV, Hughes KT, Uetz P: Benchmarking yeast two-hybrid systems using the interactions of bacterial motility proteins. Proteomics. 2009, 5: 5296-5302. 10.1002/pmic.200900282.View ArticleGoogle Scholar
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28 (1): 33-36. 10.1093/nar/28.1.33.PubMed CentralPubMedView ArticleGoogle Scholar
- Handford JI, Ize B, Buchanan G, Butland GP, Greenblatt J, Emili A, Palmer T: Conserved network of proteins essential for bacterial viability. J Bacteriol. 2009, 191 (15): 4732-4749. 10.1128/JB.00136-09.PubMed CentralPubMedView ArticleGoogle Scholar
- Titz B, Thomas S, Rajagopala SV, Chiba T, Ito T, Uetz P: Transcriptional activators in yeast. Nucleic Acids Res. 2006, 34 (3): 955-967. 10.1093/nar/gkj493.PubMed CentralPubMedView ArticleGoogle Scholar
- Rajagopala SV, Titz B, Goll J, Parrish JR, Wohlbold K, McKevitt MT, Palzkill T, Mori H, Finley RL, Uetz P: The protein network of bacterial motility. Mol Syst Biol. 2007, 3: 128-10.1038/msb4100166.PubMed CentralPubMedView ArticleGoogle Scholar
- Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006, 2: 10.1038/msb4100050. 2006 0008Google Scholar
- Yamamoto N, Nakahigashi K, Nakamichi T, Yoshino M, Takai Y, Touda Y, Furubayashi A, Kinjyo S, Dose H, Hasegawa M: Update on the Keio collection of Escherichia coli single-gene deletion mutants. Mol Syst Biol. 2009, 5: 335-10.1038/msb.2009.92.PubMed CentralPubMedView ArticleGoogle Scholar
- von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P: STRING 7--recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 2007, D358-362. 10.1093/nar/gkl825. 35 Database
- Hayashi K, Morooka N, Yamamoto Y, Fujita K, Isono K, Choi S, Ohtsubo E, Baba T, Wanner BL, Mori H: Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110. Mol Syst Biol. 2006, 2: 10.1038/msb4100049. 2006 0007Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.