Samples
PhiX control standard (V3, 10 nM) was purchased from Illumina (San Diego, CA). Serially diluted loading samples of PhiX were sequenced on an Illumina HiSeq 2500 or MiSeq sequencer for 50 single-read cycles. Genomic DNAs from strain variants of Bordetella bronchiseptica (strain RB50, ATCC BAA-588) and Bordetella pertussis (strain Tohama I) were extracted using Qiagen (Germantown, MD) DNeasy Kit. Genomic DNA from microbial mock community B (20 bacterial strains with staggered rRNA operon counts from 1000 to 1000,000 copies/μl) for 16S rRNA gene sequencing was from Bei Resources (Manassas, VA). HCoEpiC cells were isolated from human colonic tissues (ScienCell Research Laboratories, Carlsbad, CA). HCT116 is a human colon cancer cell line (ATCC, Manassas, VA). Total RNAs of the HCoEpiC and HCT116 were extracted using the Qiagen RNeasy Mini Kit.
The standard Illumina protocol for nanomolar library preparation (2–4 nM) for HiSeq or MiSeq sequencers is presented in Fig. 1a. An alternative, a sub-nanomolar protocol we devised, is presented in Fig. 1b. The efficacy of our sub-nanomolar protocol, as compared to Illumina protocols, was validated in five comparative studies using different types of libraries. In the first study, sequencing results were compared between control PhiX DNA library, denatured following the Illumina protocol (as the baseline), and various concentrations of PhiX libraries prepared with the sub-nanomolar protocol. In the second study, genomic DNA libraries (Bordetella bronchiseptica, strain RB50 and Bordetella pertussis strain Tohama I), prepared according to the Illumina protocol (2 nM) and the sub-nanomolar protocol (100 pM) were denatured and diluted to 10 pM and sequenced on a MiSeq sequencer. In the third study, mRNA libraries, prepared according to the Illumina protocol (2 nM) and the sub-nanomolar protocol (100 pM), respectively, were denatured and diluted to 8.3 pM before loading onto the neighboring lanes in a flow cell and sequenced on a HiSeq 2500 sequencer. In the fourth study, small RNA libraries were prepared and compared under the same conditions as in the third study. In the fifth study, 16S rRNA gene sequencing was carried out under the aforementioned conditions (except for the final loading amount of 4 pM with 10% PhiX) for comparison using a MiSeq sequencer.
Genomic DNA, mRNA, small RNA, and 16S metagenomic sample preparation and sequencing
The qualities of genomic DNA samples were assessed using a Nanodrop spectrophotometer (Thermo Scientific, Madison, WI) based on A260/280 and A260/A230. The qualities of RNA samples were assessed using Agilent (Santa Clara, CA) 2100 Bioanalyzer to ensure the RNA Integrity Number (RIN) of at least 8. Sample preparation protocols of DNA-sequencing, messenger RNA-sequencing, small RNA-sequencing, and 16S rRNA gene sequencing are briefly summarized below.
DNA-sequencing
Genomic DNA samples were processed following the protocol of Illumina TruSeq DNA PCR-Free Sample Preparation Kit. One microgram of gDNA sample was subjected to fragmentation, end repairing, size selection, adenylation tailing, and adapter ligation. Paired-end sequencing (150 × 2 cycles, V2 cartridge) of multiplexed cDNA samples was carried out on an Illumina MiSeq sequencer.
mRNA-sequencing
RNA samples were processed following the protocol for the Illumina TruSeq Stranded mRNA Sample Preparation Kit. Poly (A) tailed RNA was purified from 1 μg of total RNA, fragmented, and reverse-transcribed into cDNAs. Double strand cDNAs were adenylated at the 3′ ends and individually indexed, followed by limited-cycle (15) amplification. Paired-end sequencing (100 × 2 cycles) of multiplexed mRNA samples per lane was carried out on an Illumina HiSeq 2500 sequencer.
Small RNA-sequencing
RNA samples of HCoEpiC were processed following the protocol for the Illumina TruSeq Small RNA Sample Preparation Kit. In brief, RNA adapters specifically targeting the 3′ hydroxyl group or the 5′ monophosphate group were ligated to small RNA present in 1 μg total RNA. Adapter-modified small RNA was then subjected to reverse transcription to yield the first-strand cDNA. This was followed by synthesis of the second-strand cDNA and a limited-cycle (11) amplification, by which individual indexes were added. The small RNA libraries were loaded onto a Sage Science (Beverly, MA) Pippin Prep using 3% (to elute 145–160 bp for microRNA) or 2% agarose gel (to elute 160–320 bp for other small noncoding RNA). Single-end sequencing of multiplexed small RNA libraries was carried out on an Illumina HiSeq 2500 sequencer for 50- or 120-cycles, respectively, for microRNA and other small noncoding RNA.
16S rRNA gene sequencing
Microbial mock community B samples were processed following the protocol for 16S Metagenomic Sequencing Library Preparation (Illumina, Part # 15044223 Rev. B). In brief, custom primer pairs specifically targeting variable V3 and V4 regions of the 16S rRNA gene were used to create a single amplicon of approximately ~ 460 bp. During the amplification using 2-staged PCR (25 and 8 cycles, respectively), Illumina sequencing adapters and dual-index barcodes were added to the amplicons. Paired-end sequencing (300 × 2 cycles, V3 cartridge) of multiplexed cDNA samples was carried out on an Illumina MiSeq sequencer.
Data analyses
Fastq files obtained from MiSeq and HiSeq 2500 sequencers were generated using MiSeq Reporter v.2.6.2.3 and CASAVA v.1.8.2 (or later, bcl2fastq v.2.2.0), respectively. Only those reads passing the chastity filter were reported. Data analyses performed at our core facility are briefly summarized below, and data analyses of mRNA- and small RNA-sequencing were described previously [3].
DNA-sequencing
BWA v.0.7.9 was used to align reads in fastq files to the Bordetella bronchiseptica reference genome (NCBI accession: BX470250.1; gi|33,591,071) and Bordetella pertussis reference genome (NCBI accession: NC_002929.2; gi|33,591,275). GATK v.1.6.23 was used as a variant caller for SNPs and short indels. The above tasks were performed using wrappers in MiSeq Reporter. The program, deepTools v.2.5.0.1, was used to compare the genome-wide similarity of bam files by running the program in “bins” mode using default settings [4]. In short, “multiBamSummary” computes the read coverages for genomic regions for the paired (i.e. Illumina standard vs. sub-nanomolar protocols) bam files; followed by “plotCorrelation” to calculate and visualize pairwise correlation values between the read coverages.
mRNA-sequencing
Tophat v.2.1.0 was used to align reads in fastq files to the UCSC human hg38 reference genome. Cufflinks v.2.2.1 was used to assemble the transcriptome based on the hg38 reference annotation. Cuffdiff v.2.2.1 was used to calculate expression and test the statistical significance of observed differential expressions using default settings [5]. The quantification of relative abundance of each transcript was reported as Reads Per Kilobase per Million (RPKM) [6].
Small RNA-sequencing
First, Cutadapt 1.9.1 was used to trim Illumina TruSeq adapter sequences from fastq files at 3′ ends. Subsequently, for microRNA, miRDeep2 v.2.0.0.7 was used to map the reads to the miRBase (version 21) microRNA database and quantify microRNA expression levels. For other small noncoding RNA, data were analyzed using Qiagen (Germantown, MD) CLC Genomics Workbench 8.5.1 for mapping trimmed reads to the Ensemble GRCh38.82 noncoding RNA database and quantifying the levels of small RNA expression.
16S rRNA gene sequencing
Illumina BaseSpace 16S Metagenomics app was used to generate a classification of reads at taxonomic levels from kingdom to species. The classification step uses a proprietary algorithm that provides species-level classifications for paired-end reads, involving matching short subsequences of the reads (called words) to a set of 16S reference sequences (Illumina-curated version of the Greengenes database). The accumulated word matches for each read were used to assign reads to a particular taxonomic classification.