Rumen sampling and rumen content fractionation
A sample of whole rumen content was obtained from a fistulated Friesian dairy cow, grazing ad libitum on a ryegrass - clover pasture diet, supplemented with pasture silage (~10% of the recommended daily intake per animal). The sampling was conducted in May 2009 at Lye Farm, DairyNZ (Waikato, New Zealand) under the animal ethics permission number AE 11483 granted by the Ruakura Animal Ethics Committee. Between 1 and 1.5 kg of rumen contents was collected in the morning and immediately processed. A protocol for partitioning of the rumen microbial fraction tightly adherent to plant biomass (plant-adherent fraction) from liquid (planktonic) and associated (loosely attached) microbial fractions is described in detail in Additional file 4. Fractions and samples of digesta obtained from different phases of the process were snap-frozen in liquid nitrogen and kept on dry ice until long term storage at -80ºC.
Bacterial strains, display system and growth conditions
Escherichia coli strain TG1 (supE thi-1 Δ(lac-proAB) Δ(mcrB-hsdSM)5 (rK
−) [F’ traD36 proAB lacI
ZΔM15]) was used as a host for the construction of phage display libraries, as well as for propagation of the wild-type helper phage, VCSM13 (Stratagene, USA). The E. coli strain K1976 (TG1 transformed with plasmid pJARA112 that expresses gIII under the control of phage-inducible promoter ppsp) was used to obtain infectious stocks of the helper phage VCSM13d3, containing deletion of the complete gIII coding sequence .
Phagemid vector pDJ01 , designed for selective secretome display, was used for construction of the metasecretome libraries. The display cassette of pDJ01 contains the promoter ppsp, followed by the ribosome-binding site, the start (ATG) codon, multiple cloning site and the sequence encoding the C-domain of phage protein pIII. In contrast to other display vectors, pDJ01 does not have a signal sequence. This vector also contains a chloramphenicol resistance marker (CmR), plasmid (ColE1) origin of replication, and phage intergenic sequence containing f1 origin of replication and packaging signal. When helper phage VCSM13d3 is used to assemble phagemid-containing virion particles (PPs), empty pDJ01 vector only produces defective particles that are sensitive to the detergent sarcosyl [0.1% (w/v)]. Inserts that contain a signal sequence or other motifs that can mediate targeting the N-terminus of the fusion into the E. coli membrane or the periplasm are required for assembly of the pIII C-domain into the virion and formation of detergent-resistant virions (; Figure 1).
E. coli cells were incubated in 2 × Yeast Extract Tryptone broth (2 × YT) at 37ºC with aeration (200 rpm). Solid medium for growth of E. coli transformants also contained 1.5% (w/v) bacteriological agar (Oxoid, USA) unless otherwise indicated. When required, antibiotics were added to media at the following concentrations: 25 μg ml−1 chloramphenicol (Cm) and 60 μg ml−1 ampicillin (Amp).
Metagenomic DNA extraction from rumen microbial community plant-adherent fraction
High molecular weight metagenomic DNA from the rumen microbial plant-adherent fraction was extracted according to Stein et al.  with some modifications. In total, 2 g of microbial cell pellet from the plant-adherent fraction was split into five samples which were each separately embedded in 0.7 ml of 1% low-melting-temperature agarose and incubated in a syringe for 10 min on ice. Samples were extruded into 10 ml of lysis buffer [1% (w/v) sarcosyl, 0.2% (w/v) sodium-deoxycholate, 10 mM Tris-HCl (pH 8.0), 50 mM NaCl, 100 mM ethylenediaminetetraacetic acid (EDTA), lysozyme (1 mg/ml)] and incubated for 2.5 h at 37°C, followed by 17 h incubation in 40 ml ESP buffer [0.5% (w/v) sarcosyl, 20 mM EDTA and 0.013 AU protease (Qiagen, Germany)] at 55°C to inactivate nucleases present in the sample. After addition of fresh ESP buffer (20 ml) to each sample and 1 h incubation at 55°C, three washes with TE buffer [10 mM Tris-HCl (pH 8.0), 1 mM EDTA] were performed and remaining proteases were inactivated for 15 min at 70°C. To digest agarose, samples were incubated overnight at 37°C with 15 U of AgarACE™ enzyme (Promega, USA). Residual insoluble oligosaccharides were removed by centrifugation and the supernatant, containing crude DNA released from the agarose, was subjected to phenol:chloroform:isoamyl alcohol extraction (25:24:1). After pooling together the five starting samples, metagenomic DNA was concentrated using a 100 kDa cut-off Vivaspin filter device (Sartorius Stedim Biotech, Germany).
Construction of rumen metagenome phage display libraries
Two shotgun metagenome phage display libraries were constructed: a small pilot library for preliminary assessment of methodology and a large library. Both libraries were constructed from mechanically sheared metagenomic DNA isolated from the rumen plant-adherent microbial fraction and cloned into the secretome-selective phagemid pDJ01  (Figure 1). Around 150 μg of high molecular weight metagenomic DNA in 55 mM Tris-HCl (pH 8.0), 15 mM MgCl2, 25% glycerol was sheared by nebulisation in disposable medical nebulisers by subjecting the sample to a pressure of 10 psi for 1 min, followed by size fractionation, de-salting and concentration in 100 kDa cut-off Vivaspin ultra-filtration spin columns (Sartorius Stedim Biotech, Germany). Prior to cloning, the ends of the metagenomic DNA fragments were repaired using an enzyme cocktail containing T4 DNA Polymerase (Roche, Switzerland), Klenow Enzyme (Roche, Switzerland), and OptiKinaseTM (Affymetrix, USA). Next, DNA was purified by phenol:chloroform:isoamyl alcohol (25:24:1) extraction followed by ethanol-precipitation and resuspension in 150 μl of 10 mM Tris-HCl (pH 8.0). Approximately 19 μg of the end-repaired metagenomic DNA inserts were ligated to 6.5 μg of the vector pDJ01, which was cut using SmaI restriction endonuclease (Roche, Switzerland) and dephosphorylated using rAPid Alkaline Phosphatase (Roche, Switzerland). Ligated DNA was extracted with phenol:chloroform, precipitated and dissolved in 75 μl sterile deionised water.
A total of 2 μg of ligated metagenomic DNA was electro-transformed into the E. coli TG1 electrocompetent cells to obtain the pilot shotgun library, while the rest of the ligation mixture was used in 27 separate transformation reactions to generate a large shotgun library and overcome a problem of promiscuous (fast growing) clones. The resulting 27 transformant samples were also individually processed through the whole metasecretome selection procedure and pyrosequencing sample preparation. To estimate primary shotgun library size, aliquots from each transformation were plated on Cm-containing plates. The remaining portion of each transformation mixture was mixed with 9 ml of 2 × YT broth containing chloramphenicol (2 × YT Cm25) and incubated for 8 h at 37°C with aeration to amplify the libraries. Amplified library aliquots were frozen at -80°C in 7% DMSO, apart from 1 ml that was used immediately for the secretome selection.
Selection of secretome-encoding library clones
A protocol described previously with modifications was used for direct selection of the metasecretome phage display library . In order for a secretome protein-encoding library to be enriched, it had to fulfil two conditions: i) to be translationally fused (i.e. in-frame) with phage protein pIII encoded by the vector; ii) to encode for a membrane-targeting signal, in order to target vector-encoded phage protein pIII (devoid of signal sequence) to the inner membrane of E. coli. When both of these conditions are met, the peptide fused to pIII allows display of the fusion protein on the surface of the virion and complementation of the assembly defect in the gIII-deletion helper phage VCSM13d3, resulting in detergent-resistant virions (phagemid particles). Selection for secretome-encoding inserts is therefore based on treatment of the library, in the form of phagemid particles, that eliminates detergent-sensitive, while preserving the detergent-resistant phagemid particles [40, 41]. A 1 ml aliquot of the overnight culture containing amplified primary library clones was used to inoculate 100 ml of 2 × YT Cm25 media. The exponentially growing culture (OD600 = 0.2) was infected with helper phage VCSM13d3 at a multiplicity of infection 50 (50 phage : 1 bacterium) for 1 h at 37°C. Infected cells were harvested by centrifugation at 2,600 × g for 10 min at room temperature and the resulting pellet was mixed with 40 ml of soft agar [2 × YT broth containing 0.6% (w/v) molecular biology grade agarose]. Agarose-embedded cells were poured over 16 selective plates (2 × YT Cm25 plates containing molecular biology grade agarose instead of bacteriological agar) and incubated overnight at 37°C . Phagemid particles were extracted from each plate with 5 ml of 2 × YT, concentrated by PEG/NaCl precipitation and resuspended in 1 ml 10 mM Tris-HCl (pH 7.6).
To eliminate structurally unstable virions (lacking pIII; derived from non-secretome library clones), extracted phagemid particles were incubated in 0.1% (w/v) sarcosyl for 10 min at room temperature. The ssDNA released from defective virions was removed by incubation with DNaseI (200 U) in the presence of MgCl2 (5 mM) for 1 h at room temperature, followed by addition of EDTA (to final concentration of 25 mM) and heating at 75°C for 10 min to inactivate DNase. Sarcosyl-resistant recombinant virions were precipitated by PEG/NaCl and the ssDNA was extracted using E.Z.N.A.® M13 DNA Kit (Omega Bio-Tek, USA) according to manufacturer’s recommendations.
Construction of pilot metasecretome library and sequence analysis of randomly selected metasecretome library inserts
The ssDNA isolated after the secretome selection was transformed into E. coli and inserts from individual transformants analysed by Sanger sequencing. In the pilot experiment, DNA from 90 randomly selected transformants were sequenced at the Massey Genome Service (Massey University, New Zealand). All inserts were sequenced using primer pspR03 (5′-TGCCTTTAGCGTCAGACTGTAGC-3′), complementary to the pIII-coding sequence of the vector to identify the insert-pIII joint and determine the frame of the insert-containing ORF relative to pIII. The sequences obtained were analysed using Vector NTI® Advance 11 Software package (Life Technologies, USA). Types of secretion signals in putative ORFs (longer than 24 amino acid residues) in frame with phage gIII were predicted using a range of available algorithms (SignalP 4.1 , TMHMM 2.0 , LipoP 1.0 , PRED-LIPO , SecretomeP 2.0 , PilFind 1.0 , PRED-TAT ) using the default settings and cut-off values.
Next generation sequencing sample preparation
The secretome-selected ssDNA derived from the large-scale primary library through 27 separate ligations, library amplifications and selections was amplified in 27 separate PCR reactions (35 cycles starting from picogram amounts of ssDNA template) using hot-start PrimeSTAR® Max DNA Polymerase (Takara Bio, Japan). Primers PCRF2 (5′-GCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCA-3′) and PCRR2 (5′-GGCGACATTCAACGATTGAGGGAGGGAAGGT-3′) were designed to anneal to pDJ01, 361 bp upstream, and 367 bp downstream, of the library insert. Analysis of each of the 27 PCR reactions by agarose gel electrophoresis showed smears of different-sized products, and in addition several discernable bands, suggesting more prominent amplification of some clones. The band patterns were different in all 27 PCR reactions, suggesting that there was no single highly prominent amplification product. Moreover, the Sanger sequencing reactions of the two eluted bands showed multiple traces in the chromatogram, representing a mixture of products rather than a single product. The analysis of the PCR reactions by agarose gel electrophoresis also demonstrated that the amplicon corresponding to the empty vector (728 nt) could not be detected as a separate band. Empty vector was the single most abundant clone in the metagenomic library prior to selection, and the lack of its amplification using post-selection DNA as a template confirmed that the secretome selection step eliminated most of the “background” non-secretome-encoding recombinant phagemids, including the empty vector.
Amplicons generated in these 27 PCR reactions were pooled and fragmented by two shearing methods: restriction endonuclease AluI (Thermo Fisher Scientific, USA) treatment and mechanical shearing using nebulisers, under several conditions (see below), to obtain a fragment length range between 0.6 and 0.8 kb recommended for pyrosequencing. The sample was divided into portions and fragmented using five different conditions: 1 min AluI digestion; 3 h AluI digestion, 6 min nebulisation at 35 psi; 6 min nebulisation at 35 psi followed by 1 min AluI digestion, and 6 min nebulisation followed by 3 h AluI digestion. AluI digestions were performed with 5 U enzyme/μg DNA at 37°C and to stop the enzymatic reactions, AluI was inactivated by heating at 65°C for 20 min. Mechanical shearing of samples containing 10% (v/v) of glycerol was performed on ice, in a disposable nebuliser (Invitrogen, USA), by applying pressure at 35 psi for 6 min. Equal amounts (2.5 μg) of DNA, size-fractionated by all five methods, were mixed and a total of 12.5 μg DNA was submitted to pyrosequencing using 454 GS FLX Titanium platform (Roche, Switzerland) at Macrogen Inc. sequencing facility (Seoul, Korea; a half-plate in total). Sequencing template was prepared by the sequencing-service provider according to the Rapid Library Preparation Method Manual (Roche, Switzerland), except that the protocol commenced from the second, fragment end repair step.
In silico analysis of NGS metasecretome dataset
Metasecretome pyrosequencing reads were trimmed with SeqClean  to remove sequences of pDJ01 vector and VCSM13d3 helper phage. Summary statistics for metasecretome reads are presented in Additional file 2. Metagenome sequence dataset obtained by shotgun sequencing of the total metagenomic DNA from the plant-adherent rumen microbial communities of two New Zealand cows, grazing a similar pasture-based diet to the cow used for the metasecretome library analysis, using Roche 454 GS FLX platform (one plate per cow; two plates in total) was analysed to provide a reference point for comparison to the metasecretome dataset. Both sequencing datasets were processed and automatically annotated using the JGI IMG/M system . Functional categorisation and phylogenetic composition of annotated metasecretome and metagenome sequence datasets can be accessed through IMG/M system .
Protein coding genes predicted via the IMG/M system for the metasecretome and metagenome datasets (222,960 and 671,876 ORFs, respectively), as well as 2,547,270 predicted ORFs from the bovine switchgrass-adherent metagenome dataset , were subjected to annotation and assignment to families of carbohydrate-active enzymes (CAZymes) using dbCAN database release 3.0, based on the CAZy database as of March 2013 . dbCAN output was parsed using the following cut-off values: alignment length > 80 amino acid residues, E-value < 1 × 10−5; otherwise E-value < 1 × 10−3. To remove duplicates and to analyse distinct ORFs, all dbCAN hits were clustered at 100% sequence identity threshold using CD-HIT algorithm  and clustered hits to cellulosome-associated modules were further analysed. The family level taxonomic assignment of ORFs containing cellulosome modules in the metasecretome was analysed based on the best BLASTP hit against the NCBI-NR database. For hits with a 40 bit-score threshold for cohesin and SLH module-containing ORFs, and a 35 bit-score threshold for dockerin-module containing ORFs, taxonomic family assignments of the host organism for the best BLAST hit were manually curated using recent bacterial classification proposals [56, 77–81].
Availability of supporting data
The pilot metasecretome phage display library sequences supporting the results of this article are available in the GenBank repository and their accession numbers are included within Additional file 1. The metasecretome and metagenome sequence datasets supporting the results of this article can be accessed through the ‘quick genome search’ box available on the IMG/M main page using the corresponding IMG genome ID (3300000332 for metasecretome and 3300000524 for metagenome dataset), or in the NCBI BioProject database (accession ID PRJNA244109).