The Tohoku Medical Megabank Organization (Tohoku University Graduate School of Medicine, Miyagi, Japan) constituted the prospective cohort (design paper is in preparation), and 5 ml of peripheral blood was donated by the Japanese participants, all with written informed consent. The whole project was approved by the ethical committee of the Tohoku University School of Medicine. Because of their availability for further follow-up investigations, we selected the first 12 participants of the cohort (10 males and 2 females) who lived in the neighbor of Tohoku Medical Megabank Organization.
To extract the genomic DNA, the whole peripheral blood samples were processed with an Autopure LS system (Qiagen, Germany) for automated nucleotide purification following the manufacturer’s instructions. We omitted the RNase treatment, measured the concentration of the double-stranded DNA with PicoGreen (Life Technologies, Carlsbad, CA), and adjusted the concentration of the DNA to 200 ng/μL in Elution Buffer (Qiagen, Germany).
PCR-free whole-genome sequencing with the HiSeq 2500 sequencer
The genomic DNA (2 μg in 100 μL) was sonicated using a Covaris LE220 (Covaris Inc., Woburn, MA) to an average target size of 550 bp. The sheared DNA was used for library construction with the TruSeq DNA PCR-free sample preparation kit (Illumina, San Diego, CA) on a Bravo liquid-handling instrument (Agilent Technologies, Santa Clara, CA). The libraries were analyzed using a DNA Fragment Analyzer (Advanced Analytical Technologies, Ames, IA) and quantified by real-time PCR using the KAPA Library Quant Kit (KAPA Biosystems, MA).
Ten microliters of 2 nM libraries were denatured with an equal volume of 0.1 N sodium hydroxide; 1.5–2.0 pM of the denatured library was then used for on-board cluster generation on a HiSeq 2500 system (Illumina) with a TruSeq Rapid PE Cluster Kit (Illumina). Then the sequencing-by-synthesis reaction was performed in rapid-run mode with a 162-bp paired-end protocol. We ran one sample per flow cell so that no index read was needed. For each reaction, one and a half TruSeq Rapid SBS (sequencing-by-synthesis) kits (200 cycle) were used.
Preparation of libraries for TargetSeq exome capture
Genomic DNA was prepared for exome capture according to the protocol included with the Ion Xpress Plus Fragment Library Kit for the AB Library Builder (Life Technologies, Carlsbad, CA) using the AB Library Builder System (Applied Biosystems, Foster City, CA). The DNA solvent was exchanged with pure water by ethanol precipitation. Enzymatic shearing was performed using 1–2 μg genomic DNA per sample. Sheared DNA was purified using the Agencourt AMPure XP Reagent (Agencourt, Boston, MA) with a target size peak of 350 bp, followed by adaptor ligation (A1 and P1). The adaptor-ligated genomic DNA fragments were then eluted in 45 μL of low TE buffer and amplified by PCR using 200 μL of Platinum PCR Supermix High Fidelity (Life Technologies), 5 μL of 50 μM library amplification primer mix, and 45 μL ligated DNA. The thermal cycling protocol was initial denaturation at 95°C for 5 min, followed by 7–8 cycles at 95°C for 15 s, 58°C for 15 s, and 72°C for 60 s. The amplified library was purified and eluted in 50 μL of low TE buffer using AMPure XP reagent.
Pre-capture library DNA was hybridized to exome capture probes using an Ion TargetSeq Exome Enrichment Kit (Life Technologies) according to the manufacturer’s specifications. A total of 500 ng of library DNA, 5 μL of 1 mg/mL Human Cot-1 DNA, and 5 μL of Ion TargetSeq Blocker P1 and A per sample were mixed and dried using a CC-105 centrifugal concentrator (TOMY, Tokyo, Japan) at the high temperature setting for 40 min. The dried DNA was dissolved in 7.5 μL of TargetSeq Hybridization Solution A, added to 3 μL of Enhancer B and 4.5 μL of Exome Probe Pool, and hybridized at 47°C for 72 hours in a Veriti Thermal Cycler (Applied Biosystems, Foster City, CA). The probe-hybridized library DNA was enriched by binding to streptavidin-coated M-270 beads (Dynal, Oslo, Norway), rotated at 650 rpm at 47°C for 45 min in a Thermomixer comfort (Eppendorf, Hamburg, Germany). The streptavidin conjugate was washed with TargetSeq Hybridization and Wash Kit solutions (Life Technologies) strictly according to the manufacture’s protocol and eluted in 30 μL of DNase-free water. The probe-hybridized DNA was further amplified by PCR using 200 μL of Platinum PCR Supermix High Fidelity, 20 μL of library amplification primer mix, and 30 μL of the probe-hybridized DNA. The thermal cycling protocol was initial denaturation at 95°C for 5 min followed by 12 cycles at 95°C for 15 s, 58°C for 15 s, and 72°C for 60 s. The amplified exome DNA library was subjected to size selection using E-Gel SizeSelect Agarose Gel (Applied Biosystems), purified twice with a 1.5-fold volume of AMPure XP reagent, and eluted in 25 μL of low TE buffer. The quantity and quality of the captured libraries were assessed using a StepOne Plus qPCR instrument (Life Technologies) with an Ion Library Quantitation Kit (Life Technologies) and Bioanalyzer instrument (Agilent Technologies, Santa Clara, CA) with Agilent High Sensitivity DNA Kit (Agilent Technologies) according to the manufacturers’ instructions.
Template preparation and sequencing with an Ion Proton sequencer
Emulsion PCR was performed using a OneTouch 2 instrument (Life Technologies) with an Ion PI Template OT2 200 Kit v2 following the manufacturer’s instructions. The enrichment of template-positive Ion Sphere Particles (ISP) in the Ion Proton I chip was achieved using the Ion OneTouch ES enrichment system (Life Technologies). Ion Proton I chip version 2 was prepared and loaded according to the manufacturer’s recommendations. The total base output as a criterion for a successful experiment was set as 9 GB. If a sample did not reach this criterion for the total base output in one experiment, we performed the sequencing again with the same library and merged the results before aligning the reads to the reference GRCh37/hg19 sequence.
SNP array scanning with the iScan system
We used a Human Omni 2.5-8 v1.1 DNA Analysis Kit (Illumina) to analyze 160 ng of genomic DNA following the manufacturer’s instructions. In brief, the genomic DNA was subjected to isothermal amplification followed by fragmentation with nuclease. The DNA was precipitated with 2-propanol, then hybridized with oligonucleotide probes immobilized on Human Omni 2.5-8 BeadChips (eight samples per BeadChip slide). After washing, the probes underwent single-base extension using the captured genomic DNA as templates and incorporating 2, 4-dinitrophenyl- or biotin-labeled nucleotides to identify the genotypes. Then, immunohistochemical staining was performed to amplify the incorporated signal. Two Robotic Universal modules (Freedom evo, TECAN, Maennedorf, Switzerland) and the Illumina Infinium LIMS system (Illumina) were used in a series of experiments. An iScan scanner system with AutoLoader 2.X controlled by iScan Control Software ver. 3.3.28 (Illumina) was used for data acquisition. The SNP calling was performed using the Genotyping Module in the GenomeStudio software (ver.2011.1: Illumina). The default set cluster file was HumanOmni2-5 M-8b1-1_B.egt (Illumina), and a Gen Call Threshold of 0.15 was used for SNP calling. The SNP call rate was calculated and samples with overall call rates over 99% and LogRdev values below 0.2 were used for further analysis.
Variant calling pipeline for Illumina HiSeq sequencing
Fastq files were generated by base calling with CASVA 1.8.2. Reads in the generated fastq files were mapped to the human reference genome (GRCh37/hg19) with decoy sequences (hs37d5) using the BWA-MEM alignment algorithm in BWA version 0.7.5a-r405 [38, 39] with the default options, and stored as BAM files. The following post-processing was applied to the BAM files: reads in the BAM files were realigned with Realigner Target Creator and Indel Realigner in the Genome Analysis Tookit 2.5-2 (GATK), and their base quality scores were recalibrated with Base Recalibrator and Print Reads in GATK [40, 41]. For the Realigner Target Creator and Indel Realigner, no VCF file of known indel sites was given as input. SNP sites in NCBI’s SNP database (dbSNP, version 137) in a VCF file were input to Base Recalibrator as known SNP sites. Variant calling was conducted with the post-processed BAM files using Unified Genotyper in GATK with the default options [40, 41], and the results were stored in VCF files.
Variant calling pipeline for Ion Proton sequencing
The variant calling pipeline of Life Technologies was used to analyze the Ion Proton sequencing runs. Reads were aligned to the GRCh37/hg19 reference sequence using tmap software version 3.6.39. Variant discovery and genotype calling of multi-allelic substitutions and indels was performed on each individual sample using the Torrent Variant Caller (TVC) version 3.6.39. Parameters for variant discovery were set based on the “TVC 3.6 Parameters for TargetSeq Exome on Proton PI” with thresholds (snp_beta_bias = 150, snp_strand_bias = 0.9999, maximum common signal shift = 0.5, snp_min_variant_score = 5, and minimum_mapping _quality_score = 0) changed from the default values suggested by Life Technologies to use as many reads as possible.
Analysis tools and selection of SNPs
To compare the genotypes from the three platforms, the output VCF files from the two NGS outputs and the Omni 2.5-8 output files formatted with PLINK  were processed to generate subsets that contained the common target SNP sites. The common target SNP sites was obtained by intersecting autosomal target manifests, Ion-TargetSeq-Exome-50 Mb-hg19.bed by Life Technologies, and the bed file of Omni 2.5-8 from the Human Omni 2.5-8 v1.1 DNA Analysis Kit by Illumina, using the BEDTools software suite .
SNPs outside of the common target SNP sites were filtered out leaving 83,237 sites as the common targets. From these loci, six probes for detection of copy number variations were removed. We also found 2,626 overlapping probes in the Omni 2.5-8 array and integrated the SNP calls using these corresponding probes. In total, we analyzed 79,143 SNPs.
Genotyping data on each platform were obtained from the VCF files and PLINK/PED files using a set of in-house scripts; then, the genotype concordance and accuracy were calculated.
Differences in the NGS read depth between discordant and concordant SNP calls with the Omni 2.5-8 calls were examined using the Mann–Whitney U test with SAMtools , in-house scripts, and the R statistical environment . Logistic regression analyses for the discordant and concordant calls between the NGSs and Omni 2.5-8 with three NGS quality control (QC) parameters (read depth, GC content, and homopolymer length) were performed in the R statistical environment. Logistic regression analysis of the SNPs with read depths in the range of ± 1 SD from the average was performed.
The SNP calls at each position covered by Omni 2.5-8 are attached as Additional file 3 (for HiSeq 2500), Additional file 4 (for Proton), and Additional file 5 (for Omni 2.5-8). Genomic DNAs used in this study will be distributed through Tohoku Medical Megabank Organization upon request.