Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population
- Ikuko N Motoike†1,
- Mitsuyo Matsumoto†1, 4,
- Inaho Danjoh†1,
- Fumiki Katsuoka1, 5,
- Kaname Kojima1,
- Naoki Nariai1,
- Yukuto Sato1,
- Yumi Yamaguchi-Kabata1,
- Shin Ito1,
- Hisaaki Kudo2,
- Ichiko Nishijima2,
- Satoshi Nishikawa1,
- Xiaoqing Pan1,
- Rumiko Saito1,
- Sakae Saito1,
- Tomo Saito1,
- Matsuyuki Shirota1, 3, 6,
- Kaoru Tsuda1,
- Junji Yokozawa1,
- Kazuhiko Igarashi1, 4,
- Naoko Minegishi2,
- Osamu Tanabe1,
- Nobuo Fuse1,
- Masao Nagasaki1,
- Kengo Kinoshita1, 3, 7,
- Jun Yasuda1 and
- Masayuki Yamamoto1, 5Email author
© Motoike et al.; licensee BioMed Central Ltd. 2014
Received: 23 December 2013
Accepted: 1 August 2014
Published: 10 August 2014
Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.
Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.
Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.
Whole-genome sequencing (WGS) of human genomic DNA with next-generation sequencers (NGSs) has opened a new avenue for personalized healthcare and personalized medicine based on the detection of genetic variations related to physical traits [1, 2]. The application of human WGS to large-population genetics requires rapid, low cost, and accurate validation technologies.
The resequencing market is currently dominated by Illumina HiSeq sequencing platforms (hereafter referred to as HiSeq) that have been applied in large population studies [3–5]. Bridging PCR amplification of fragmented genomic DNA in a flow cell and sequencing-by-synthesis chemical reactions truly realize massive parallel sequencing from both ends of a DNA fragment . The output from a HiSeq instrument can reach up to 600 GB per run, with more than 80% of the reads with an average quality score higher than 30 (99.9% accurate). In particular, the newly released protocol for HiSeq (PCR-free library construction with rapid-run mode: 162 bp paired ends) omits the initial PCR amplification step during library construction and completes human WGS with high depth (up to 33×: 100 GBs) in two days in one flow cell. This improved protocol is expected to accelerate the discovery of disease-susceptible variants by the WGS analysis of human populations on a large scale.
Importantly, the accuracy of variant calls made with NGS data is critical for future genetic investigations that aim to detect disease-susceptible single nucleotide variations (SNVs) . Even with the improvements in the chemistry used and in the equipment, systemic biases have been reported for both the genome coverage and the accuracy of variant calls of most NGSs . Currently, the validation of SNV calls that are newly discovered using NGSs depends on conventional methods based on amplification of the target region with PCR, Sanger sequencing, hybridization of sequence-specific oligonucleotide probes, and mass spectroscopic assays [4, 8]. More than three million SNVs have been reported in a human genome compared with the reference GRCh37/hg19 sequence (http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/) . In the analyses of large populations, comprehensive validation of newly observed SNVs is sometimes prohibitively expensive and tedious using these traditional low-throughput methods.
It is of interest to discover whether the overall accuracy of variant calls can be improved using a hybrid approach such as using NGSs with different working principles to analyze the same genome, as indicated previously [4, 8]. The rationale of this notion is that platform-specific biases or errors in the data from one NGS platform can be corrected by using the data from another NGS platform, if the two methods are based on different working principles. We surmised, therefore, that an appropriate combination of different NGSs may reduce the overall cost of sequencing.
A semiconductor-based non-optical NGS has recently become available . These sequencers are attractive candidates as alternatives to HiSeq. The semiconductor sequencers directly detect changes of pH that are caused by the release of hydrogen ions when nucleotides are incorporated into the growing DNA strand during the DNA polymerase reaction in cells within a chip, which is manufactured using the same technology that is used to construct integrated circuits . This method features rapid reaction time and low price in consumables per base [10, 11]. The first semiconductor-based NGS, Ion Torrent Personal Genome Machine, has been used widely in many different applications [11–16]. The larger Ion Proton semiconductor NGS (hereafter referred to as Ion Proton) has now been launched, and the total output of the Ion Proton I chip is reported to be nearly 10 GB, which is suitable for targeted resequencing of, for example, the human exome. Because the sequencing reaction in semiconductor-based NGSs does not use terminator nucleotides, the accuracy of the generated reads are known to decrease for homopolymer repeat sequences [10, 12, 17–20]. Nonetheless, many known disease-causing mutations have been detected by the semiconductor sequencers [11, 12, 14, 15, 21], implying the potential of the platform.
It has been reported that a PCR-free protocol for HiSeq is not free from coverage bias, especially for high and low GC regions . Therefore, the addition of exome data (generated using low-cost NGSs) to the WGS data processed by HiSeq may strike a balance between cost and benefit, making it an attractive strategy for sequencing functionally important regions in human populations. Here, we compare the variant call results for the genomic DNA of 12 Japanese individuals generated by WGS on HiSeq 2500 in rapid-run mode and targeted exome sequences obtained using on Proton with the I chip. We used the Omni 2.5-8 SNP arrays as references for the variant calls and compared the SNP data among the three platforms. We found that Ion Proton exhibited high concordance in variant calls with the other two platforms, indicating that Ion Proton is a promising tool for validation of multiple SNPs in the WGS of a large population.
Outline of sequence outputs
Basic statistics of the whole genome sequences generated by the HiSeq sequencer
Total bases (GB)
Read 2 %Q30: 100–150 cycles
Aligned bases (GB)
Mapping ratio (%)
Basic statistics of whole exome sequences generated by the Ion Proton sequencer
Total bases (GB)
Average read length (bp)
Aligned bases (GB)
Aligned bases (%)
Comparison of SNP calls among three platforms
To assess the potential of the Ion Proton exome sequencing to validate the SNV calls for the WGS generated by HiSeq 2500 of the 12 Japanese population samples, we evaluated the SNP calls using the Illumina Omni 2.5-8 SNP chip (hereafter referred to as Omni 2.5-8) as a reference to characterize the differences between the two NGS platforms. The reported Japanese genomic sequence generated by HiSeq 2000 seemed to be substantially different from the GRCh37/hg19 reference genome . We focused on the autosomal target sequences of the Ion Proton exome kit (TargetSeq Exome Enrichment Kit, total target regions: 50 MB) to compare the NGS results from the two platforms. We extracted the variant call results, and tested the variants at the loci covered by Omni 2.5-8. A total of 79,143 SNPs on the autosomes in the 12 Japanese genomes were tested (Methods).
Concordant SNP calls made from the HiSeq, Ion Proton, and Omni 2.5-8 SNP array data
Concordance between the Omni 2.5-8 and HiSeq calls was very high and much less variable among the samples (98.8–99.0% concordance) than the concordance between the Omni 2.5-8 and Ion Proton calls (81.8–96.0% concordance). The concordance among the calls by the three platforms exhibited relatively high variance compared with the concordance between the Omni 2.5-8 and HiSeq calls. The %CV is as low as 0.48% for the concordance between the calls made by Omni 2.5-8 and HiSeq. The variance in the numbers of concordant SNPs among the 12 individual genomes seems to be caused by fluctuations in the number of SNPs (14,033–16,220) called by Ion Proton (Table 3).
Reproducibility of SNP calls by individual NGSs
The level plot analysis also indicated possible systemic errors inherent in each platform. For example, 53 SNPs were not called by HiSeq for any of the 12 samples, whereas all of them were called as homozygous for the alternate alleles by Omni 2.5-8 (Figure 2A, upper left corner). Visual inspection of the BAM files using the Integrated Genome Viewer [23, 24] suggested that some of the SNP loci may be caused by the large structural variations that are commonly found in the genomes of Japanese populations, implying that population-specific reference genomic sequences will improve the quality of mapping of NGS reads to reference genomes.
Average numbers of NGS SNP calls showing concordant or discordant with Omni 2.5-8
Proton concordant & HiSeq discordant
HiSeq concordant & Proton discordant
HiSeq = Proton
Effects of sequence depth and GC contents on SNP calls by the NGSs
Logistic regression analysis among QC factors of NGS and calling discordance with Omni 2.5-8
A + B
A + C
A + B + C
In addition, we calculated the transition/transversion ratio for Omni 2.5-8 and the two NGSs and found that the transition/transversion ratio was smaller for variant calls between the two NGSs but not between either of the two NGSs and Omni 2.5-8 or are all discordant SNPs, indicating that the NGSs might miss more transitions than transversions (Additional file 1: Table S2). The transition/transversion ratio values that we obtained are compatible to the ratios reported in a previous study .
In this paper, variant SNP calls in the exonic sequences from the genomic DNA of 12 Japanese individuals were compared between two NGS platforms, targeted exome sequencing by Ion Proton and PCR-free WGS by HiSeq 2500, and Omni 2.5-8. The Omni 2.5-8 chip SNP calls were used as the reference. Approximately 80,000 SNP loci were analyzed in this study from which Omni 2.5-8 called 17,000 variants, and more than 90% of these variant calls were supported by both HiSeq and Ion Proton. Ion Proton exhibited a maximum of 96% concordance of SNP calls with the SNP calls by the other two platforms, indicating that Ion Proton is a promising tool for the valiadation of multiple SNPs. Nonetheless, improvements in the sensitivity of Ion Proton for alternative alleles are still necessary for the efficient detection of novel SNVs in a human genome using Ion Proton.
Of the three factors that are known to affect SNV calling by NGSs, we found that all three factors (read depth, GC content, and homopolymer length) affect more or less the accuracy of Ion Proton SNP calls, whereas read depth was the major factor that affected HiSeq SNP calls. GC content has been shown to be a confounding factor for exon capturing [27–29]; therefore, the accuracy of Ion Proton SNP calls may have been affected by differences in the GC content near the SNP loci. Non-optical semiconductor-based sequencers like Ion Proton have been used widely for multiple amplicon sequencing [11–13, 30–35]; however, it has been reported that these sequencers exhibit relatively high error rates, especially for homopolymer sequences [17–19, 36]. While homopolymer length did affect the accuracy of the Ion Proton SNP calls more than it affected the accuracy of the HiSeq SNP calls, its affect on the Ion Proton SNP calls was small (the logistic regression coefficient for homopolymer length in Ion Proton was 0.0123, Table 5).
An interesting question is whether Ion Proton can cover errors found in HiSeq data. We found that HiSeq failed to call from 22 to 53 SNP calls in the 12 individual samples, which were called commonly by Ion Proton and Omni 2.5-8 (Figure 1), indicating that Ion Proton may cover the HiSeq missed calls. The Ion Personal Genome Machine (PGM) that preceded Ion Proton was used for a similar purpose in the characterization of the Tasmania devil genome sequence .
We found that there was a significant difference between the number of SNPs called among the 12 samples by HiSeq and Ion Proton, and the variance was larger for Ion Proton calls than it was for HiSeq calls. The sequencing processes in Ion Proton after the chip loading appeared to exhibit reproducibility similar to that of HiSeq during sequencing-by-synthesis. Therefore, the major reason for the variance in the number of Ion Proton SNP calls must be in the different library construction procedures that are used in the two NGSs. In this study, the library preparation procedures that we used for Ion Proton were manual, whereas liquid-handling robots were used for the construction of libraries for HiSeq. We surmise that the Ion Proton output will improve if automated liquid-handling procedures can be established. Indeed, we found that the reproducibility of library construction for Ion Proton was improved using an automated library construction method (SS, INM, and JYa: unpublished observation).We found that most of the SNPs called by HiSeq were supported by the Ion Proton SNP calls; indicating that Ion Proton could be used to validate novel SNPs in a population. However, if the SNPs are relatively rare in a population, they may not be validated by Ion Proton. In fact, the pixels near the origin on the horizontal axis are a bit brighter than those farther from the origin, as can be seen in Figure 2B and 2C. The data plotted in Figure 2B and 2C demonstrated how the addition of Ion Proton SNP call data to HiSeq SNP calls can provide strong support for the identification of alternate alleles in a population of interest. Our results suggest that approximately 97% of the expected SNVs in exonic regions of a human genome will be verified by Ion Proton exome sequencing. This finding will contribute to designing custom arrays for large-scale population-specific genetic studies of populations of unique origin, like the Japanese.
Validation of SNVs detected in WGS is critical for studying disease-related variations in human population genetics. Combining different types of NGSs in analyses of the genome sequences from the same individual may be an efficient means of validateing multiple SNV calls simultaneously. Here, we attempted to show the versatility and usefulness of the combination of Ion Proton exome sequencing with HiSeq 2500 WGS by analyzing 12 independent Japanese genomes sequences and comparing the corresponding SNP calls, with the Omni 2.5-8 SNP calls as the reference. We found that Ion Proton exhibited a maximum of 97% concordance in variant calls with the other two platforms, indicating that Ion Proton is a promising tool for the validation of multiple SNPs in the exons of genomic sequences.
The Tohoku Medical Megabank Organization (Tohoku University Graduate School of Medicine, Miyagi, Japan) constituted the prospective cohort (design paper is in preparation), and 5 ml of peripheral blood was donated by the Japanese participants, all with written informed consent. The whole project was approved by the ethical committee of the Tohoku University School of Medicine. Because of their availability for further follow-up investigations, we selected the first 12 participants of the cohort (10 males and 2 females) who lived in the neighbor of Tohoku Medical Megabank Organization.
To extract the genomic DNA, the whole peripheral blood samples were processed with an Autopure LS system (Qiagen, Germany) for automated nucleotide purification following the manufacturer’s instructions. We omitted the RNase treatment, measured the concentration of the double-stranded DNA with PicoGreen (Life Technologies, Carlsbad, CA), and adjusted the concentration of the DNA to 200 ng/μL in Elution Buffer (Qiagen, Germany).
PCR-free whole-genome sequencing with the HiSeq 2500 sequencer
The genomic DNA (2 μg in 100 μL) was sonicated using a Covaris LE220 (Covaris Inc., Woburn, MA) to an average target size of 550 bp. The sheared DNA was used for library construction with the TruSeq DNA PCR-free sample preparation kit (Illumina, San Diego, CA) on a Bravo liquid-handling instrument (Agilent Technologies, Santa Clara, CA). The libraries were analyzed using a DNA Fragment Analyzer (Advanced Analytical Technologies, Ames, IA) and quantified by real-time PCR using the KAPA Library Quant Kit (KAPA Biosystems, MA).
Ten microliters of 2 nM libraries were denatured with an equal volume of 0.1 N sodium hydroxide; 1.5–2.0 pM of the denatured library was then used for on-board cluster generation on a HiSeq 2500 system (Illumina) with a TruSeq Rapid PE Cluster Kit (Illumina). Then the sequencing-by-synthesis reaction was performed in rapid-run mode with a 162-bp paired-end protocol. We ran one sample per flow cell so that no index read was needed. For each reaction, one and a half TruSeq Rapid SBS (sequencing-by-synthesis) kits (200 cycle) were used.
Preparation of libraries for TargetSeq exome capture
Genomic DNA was prepared for exome capture according to the protocol included with the Ion Xpress Plus Fragment Library Kit for the AB Library Builder (Life Technologies, Carlsbad, CA) using the AB Library Builder System (Applied Biosystems, Foster City, CA). The DNA solvent was exchanged with pure water by ethanol precipitation. Enzymatic shearing was performed using 1–2 μg genomic DNA per sample. Sheared DNA was purified using the Agencourt AMPure XP Reagent (Agencourt, Boston, MA) with a target size peak of 350 bp, followed by adaptor ligation (A1 and P1). The adaptor-ligated genomic DNA fragments were then eluted in 45 μL of low TE buffer and amplified by PCR using 200 μL of Platinum PCR Supermix High Fidelity (Life Technologies), 5 μL of 50 μM library amplification primer mix, and 45 μL ligated DNA. The thermal cycling protocol was initial denaturation at 95°C for 5 min, followed by 7–8 cycles at 95°C for 15 s, 58°C for 15 s, and 72°C for 60 s. The amplified library was purified and eluted in 50 μL of low TE buffer using AMPure XP reagent.
Pre-capture library DNA was hybridized to exome capture probes using an Ion TargetSeq Exome Enrichment Kit (Life Technologies) according to the manufacturer’s specifications. A total of 500 ng of library DNA, 5 μL of 1 mg/mL Human Cot-1 DNA, and 5 μL of Ion TargetSeq Blocker P1 and A per sample were mixed and dried using a CC-105 centrifugal concentrator (TOMY, Tokyo, Japan) at the high temperature setting for 40 min. The dried DNA was dissolved in 7.5 μL of TargetSeq Hybridization Solution A, added to 3 μL of Enhancer B and 4.5 μL of Exome Probe Pool, and hybridized at 47°C for 72 hours in a Veriti Thermal Cycler (Applied Biosystems, Foster City, CA). The probe-hybridized library DNA was enriched by binding to streptavidin-coated M-270 beads (Dynal, Oslo, Norway), rotated at 650 rpm at 47°C for 45 min in a Thermomixer comfort (Eppendorf, Hamburg, Germany). The streptavidin conjugate was washed with TargetSeq Hybridization and Wash Kit solutions (Life Technologies) strictly according to the manufacture’s protocol and eluted in 30 μL of DNase-free water. The probe-hybridized DNA was further amplified by PCR using 200 μL of Platinum PCR Supermix High Fidelity, 20 μL of library amplification primer mix, and 30 μL of the probe-hybridized DNA. The thermal cycling protocol was initial denaturation at 95°C for 5 min followed by 12 cycles at 95°C for 15 s, 58°C for 15 s, and 72°C for 60 s. The amplified exome DNA library was subjected to size selection using E-Gel SizeSelect Agarose Gel (Applied Biosystems), purified twice with a 1.5-fold volume of AMPure XP reagent, and eluted in 25 μL of low TE buffer. The quantity and quality of the captured libraries were assessed using a StepOne Plus qPCR instrument (Life Technologies) with an Ion Library Quantitation Kit (Life Technologies) and Bioanalyzer instrument (Agilent Technologies, Santa Clara, CA) with Agilent High Sensitivity DNA Kit (Agilent Technologies) according to the manufacturers’ instructions.
Template preparation and sequencing with an Ion Proton sequencer
Emulsion PCR was performed using a OneTouch 2 instrument (Life Technologies) with an Ion PI Template OT2 200 Kit v2 following the manufacturer’s instructions. The enrichment of template-positive Ion Sphere Particles (ISP) in the Ion Proton I chip was achieved using the Ion OneTouch ES enrichment system (Life Technologies). Ion Proton I chip version 2 was prepared and loaded according to the manufacturer’s recommendations. The total base output as a criterion for a successful experiment was set as 9 GB. If a sample did not reach this criterion for the total base output in one experiment, we performed the sequencing again with the same library and merged the results before aligning the reads to the reference GRCh37/hg19 sequence.
SNP array scanning with the iScan system
We used a Human Omni 2.5-8 v1.1 DNA Analysis Kit (Illumina) to analyze 160 ng of genomic DNA following the manufacturer’s instructions. In brief, the genomic DNA was subjected to isothermal amplification followed by fragmentation with nuclease. The DNA was precipitated with 2-propanol, then hybridized with oligonucleotide probes immobilized on Human Omni 2.5-8 BeadChips (eight samples per BeadChip slide). After washing, the probes underwent single-base extension using the captured genomic DNA as templates and incorporating 2, 4-dinitrophenyl- or biotin-labeled nucleotides to identify the genotypes. Then, immunohistochemical staining was performed to amplify the incorporated signal. Two Robotic Universal modules (Freedom evo, TECAN, Maennedorf, Switzerland) and the Illumina Infinium LIMS system (Illumina) were used in a series of experiments. An iScan scanner system with AutoLoader 2.X controlled by iScan Control Software ver. 3.3.28 (Illumina) was used for data acquisition. The SNP calling was performed using the Genotyping Module in the GenomeStudio software (ver.2011.1: Illumina). The default set cluster file was HumanOmni2-5 M-8b1-1_B.egt (Illumina), and a Gen Call Threshold of 0.15 was used for SNP calling. The SNP call rate was calculated and samples with overall call rates over 99% and LogRdev values below 0.2 were used for further analysis.
Variant calling pipeline for Illumina HiSeq sequencing
Fastq files were generated by base calling with CASVA 1.8.2. Reads in the generated fastq files were mapped to the human reference genome (GRCh37/hg19) with decoy sequences (hs37d5) using the BWA-MEM alignment algorithm in BWA version 0.7.5a-r405 [38, 39] with the default options, and stored as BAM files. The following post-processing was applied to the BAM files: reads in the BAM files were realigned with Realigner Target Creator and Indel Realigner in the Genome Analysis Tookit 2.5-2 (GATK), and their base quality scores were recalibrated with Base Recalibrator and Print Reads in GATK [40, 41]. For the Realigner Target Creator and Indel Realigner, no VCF file of known indel sites was given as input. SNP sites in NCBI’s SNP database (dbSNP, version 137) in a VCF file were input to Base Recalibrator as known SNP sites. Variant calling was conducted with the post-processed BAM files using Unified Genotyper in GATK with the default options [40, 41], and the results were stored in VCF files.
Variant calling pipeline for Ion Proton sequencing
The variant calling pipeline of Life Technologies was used to analyze the Ion Proton sequencing runs. Reads were aligned to the GRCh37/hg19 reference sequence using tmap software version 3.6.39. Variant discovery and genotype calling of multi-allelic substitutions and indels was performed on each individual sample using the Torrent Variant Caller (TVC) version 3.6.39. Parameters for variant discovery were set based on the “TVC 3.6 Parameters for TargetSeq Exome on Proton PI” with thresholds (snp_beta_bias = 150, snp_strand_bias = 0.9999, maximum common signal shift = 0.5, snp_min_variant_score = 5, and minimum_mapping _quality_score = 0) changed from the default values suggested by Life Technologies to use as many reads as possible.
Analysis tools and selection of SNPs
To compare the genotypes from the three platforms, the output VCF files from the two NGS outputs and the Omni 2.5-8 output files formatted with PLINK  were processed to generate subsets that contained the common target SNP sites. The common target SNP sites was obtained by intersecting autosomal target manifests, Ion-TargetSeq-Exome-50 Mb-hg19.bed by Life Technologies, and the bed file of Omni 2.5-8 from the Human Omni 2.5-8 v1.1 DNA Analysis Kit by Illumina, using the BEDTools software suite .
SNPs outside of the common target SNP sites were filtered out leaving 83,237 sites as the common targets. From these loci, six probes for detection of copy number variations were removed. We also found 2,626 overlapping probes in the Omni 2.5-8 array and integrated the SNP calls using these corresponding probes. In total, we analyzed 79,143 SNPs.
Genotyping data on each platform were obtained from the VCF files and PLINK/PED files using a set of in-house scripts; then, the genotype concordance and accuracy were calculated.
Differences in the NGS read depth between discordant and concordant SNP calls with the Omni 2.5-8 calls were examined using the Mann–Whitney U test with SAMtools , in-house scripts, and the R statistical environment . Logistic regression analyses for the discordant and concordant calls between the NGSs and Omni 2.5-8 with three NGS quality control (QC) parameters (read depth, GC content, and homopolymer length) were performed in the R statistical environment. Logistic regression analysis of the SNPs with read depths in the range of ± 1 SD from the average was performed.
The SNP calls at each position covered by Omni 2.5-8 are attached as Additional file 3 (for HiSeq 2500), Additional file 4 (for Proton), and Additional file 5 (for Omni 2.5-8). Genomic DNAs used in this study will be distributed through Tohoku Medical Megabank Organization upon request.
This work was supported (in part) by the MEXT Tohoku Medical Megabank Project. JYa was supported in part by grants-in-aid for scientific research (JSPS). We thank Drs. Keiko Nakayama and Ryo Funayama for the initial set-up of the facilities. We thank Dr. Gen Tamiya for critical comments on the manuscript. We also thank Kaori Abe, Ai Hachiya, Nozomi Hatanaka, Sachiko Hirano, Masae Kimura, Natsumi Konno, Kiriko Nozoe, Yukie Oguma, Ayako Okumoto, Noriko Takahashi, and Sayaka Kumagai for their technical assistance.
- Gonzaga-Jauregui C, Lupski JR, Gibbs RA: Human genome sequencing in health and disease. Annu Rev Med. 2012, 63: 35-61. 10.1146/annurev-med-051010-162644.PubMed CentralPubMedView ArticleGoogle Scholar
- Berg JS, Khoury MJ, Evans JP: Deploying whole genome sequencing in clinical practice and public health: meeting the challenge one bin at a time. Genet Med. 2011, 13 (6): 499-504. 10.1097/GIM.0b013e318220aaba.PubMedView ArticleGoogle Scholar
- Altshuler D, Durbin DR, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Collins FS, De La Vega FM, Donnelly P, Egholm M, Flicek P, Gabriel SB, Gibbs RA, Knoppers BM, Lander ES, Lehrach H, Mardis ER, McVean GA, Nickerson DA, Peltonen L, Schafer AJ, Sherry ST, Wang J, Wilson R, Gibbs RA, Deiros D, Metzker M, Muzny D, Reid J, Wheeler D, et al: A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073. 10.1038/nature09534.PubMedView ArticleGoogle Scholar
- Altshuler DM, Durbin DR, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs RA, Dinh H, Kovar C, Lee S, et al: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491 (7422): 56-65. 10.1038/nature11632.View ArticleGoogle Scholar
- Wong LP, Ong RT, Poh WT, Liu X, Chen P, Li R, Lam KK, Pillai NE, Sim KS, Xu H, Sim NL, Teo SM, Foo JN, Tan LW, Lim Y, Koo SH, Gan LS, Cheng CY, Wee S, Yap EP, Ng PC, Lim WY, Soong R, Wenk MR, Aung T, Wong TY, Khor CC, Little P, Chia KS, Teo YY: Deep whole-genome sequencing of 100 southeast Asian Malays. Am J Hum Genet. 2013, 92 (1): 52-66. 10.1016/j.ajhg.2012.12.005.PubMed CentralPubMedView ArticleGoogle Scholar
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038/nature07517.PubMed CentralPubMedView ArticleGoogle Scholar
- Goldstein DB, Allen A, Keebler J, Margulies EH, Petrou S, Petrovski S, Sunyaev S: Sequencing studies in human genetics: design and interpretation. Nat Rev Genet. 2013, 14 (7): 460-470. 10.1038/nrg3455.PubMed CentralPubMedView ArticleGoogle Scholar
- Ratan A, Miller W, Guillory J, Stinson J, Seshagiri S, Schuster SC: Comparison of sequencing platforms for single nucleotide variant calls in a human sample. PLoS One. 2013, 8 (2): e55089-10.1371/journal.pone.0055089.PubMed CentralPubMedView ArticleGoogle Scholar
- Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT, Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao X, Reed B, et al: An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011, 475 (7356): 348-352. 10.1038/nature10242.PubMedView ArticleGoogle Scholar
- Boland JF, Chung CC, Roberson D, Mitchell J, Zhang X, Im KM, He J, Chanock SJ, Yeager M, Dean M: The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet. 2013, 132 (10): 1153-1163. 10.1007/s00439-013-1321-4.PubMed CentralPubMedView ArticleGoogle Scholar
- Li X, Buckton AJ, Wilkinson SL, John S, Walsh R, Novotny T, Valaskova I, Gupta M, Game L, Barton PJ, Cook SA, Ware JS: Towards clinical molecular diagnosis of inherited cardiac conditions: a comparison of bench-top genome DNA sequencers. PLoS One. 2013, 8 (7): e67744-10.1371/journal.pone.0067744.PubMed CentralPubMedView ArticleGoogle Scholar
- Elliott AM, Radecki J, Moghis B, Li X, Kammesheidt A: Rapid detection of the ACMG/ACOG-recommended 23 CFTR disease-causing mutations using ion torrent semiconductor sequencing. J Biomol Tech. 2012, 23 (1): 24-30. 10.7171/jbt.12-2301-003.PubMed CentralPubMedView ArticleGoogle Scholar
- Chan M, Ji SM, Yeo ZX, Gan L, Yap E, Yap YS, Ng R, Tan PH, Ho GH, Ang P, Lee AS: Development of a next-generation sequencing method for BRCA mutation screening: a comparison between a high-throughput and a benchtop platform. J Mol Diagn. 2012, 14 (6): 602-612. 10.1016/j.jmoldx.2012.06.003.PubMedView ArticleGoogle Scholar
- Kanagal-Shamanna R, Portier BP, Singh RR, Routbort MJ, Aldape KD, Handal BA, Rahimi H, Reddy NG, Barkoh BA, Mishra BM, Paladugu AV, Manekia JH, Kalhor N, Chowdhuri SR, Staerkel GA, Medeiros LJ, Luthra R, Patel KP: Next-generation sequencing-based multi-gene mutation profiling of solid tumors using fine needle aspiration samples: promises and challenges for routine clinical diagnostics. Mod Pathol. 2014, 27: 314-327.PubMedView ArticleGoogle Scholar
- Abou Tayoun AN, Tunkey CD, Pugh TJ, Ross T, Shah M, Lee CC, Harkins TT, Wells WA, Tafe LJ, Amos CI, Tsongalis GJ: A Comprehensive Assay for CFTR Mutational Analysis Using Next-Generation Sequencing. Clin Chem. 2013, 59 (10): 1481-1488. 10.1373/clinchem.2013.206466.PubMedView ArticleGoogle Scholar
- Junemann S, Prior K, Szczepanowski R, Harks I, Ehmke B, Goesmann A, Stoye J, Harmsen D: Bacterial community shift in treated periodontitis patients revealed by ion torrent 16S rRNA gene amplicon sequencing. PLoS One. 2012, 7 (8): e41606-10.1371/journal.pone.0041606.PubMed CentralPubMedView ArticleGoogle Scholar
- Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB: Characterizing and measuring bias in sequence data. Genome Biol. 2013, 14 (5): R51-10.1186/gb-2013-14-5-r51.PubMed CentralPubMedView ArticleGoogle Scholar
- Bragg LM, Stone G, Butler MK, Hugenholtz P, Tyson GW: Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput Biol. 2013, 9 (4): e1003031-10.1371/journal.pcbi.1003031.PubMed CentralPubMedView ArticleGoogle Scholar
- Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ: Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012, 30 (5): 434-439. 10.1038/nbt.2198.PubMedView ArticleGoogle Scholar
- Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012, 13: 341-PubMed CentralPubMedView ArticleGoogle Scholar
- Feliubadalo L, Lopez-Doriga A, Castellsague E, Del Valle J, Menendez M, Tornero E, Montes E, Cuesta R, Gomez C, Campos O, Pineda M, Gonzalez S, Moreno V, Brunet J, Blanco I, Serra E, Capella G, Lazaro C: Next-generation sequencing meets genetic diagnostics: development of a comprehensive workflow for the analysis of BRCA1 and BRCA2 genes. Eur J Hum Genet. 2013, 21 (8): 864-870. 10.1038/ejhg.2012.270.PubMed CentralPubMedView ArticleGoogle Scholar
- Fujimoto A, Nakagawa H, Hosono N, Nakano K, Abe T, Boroevich KA, Nagasaki M, Yamaguchi R, Shibuya T, Kubo M, Miyano S, Nakamura Y, Tsunoda T: Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet. 2010, 42 (11): 931-936. 10.1038/ng.691.PubMedView ArticleGoogle Scholar
- Thorvaldsdottir H, Robinson JT, Mesirov JP: Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013, 14 (2): 178-192. 10.1093/bib/bbs017.PubMed CentralPubMedView ArticleGoogle Scholar
- Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol. 2011, 29 (1): 24-26. 10.1038/nbt.1754.PubMed CentralPubMedView ArticleGoogle Scholar
- Minoche AE, Dohm JC, Himmelbauer H: Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011, 12 (11): R112-10.1186/gb-2011-12-11-r112.PubMed CentralPubMedView ArticleGoogle Scholar
- Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, Milosavljevic A, Gibbs RA, Yu F: An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 2012, 13: 8-10.1186/1471-2105-13-8.PubMed CentralPubMedView ArticleGoogle Scholar
- Asan , Xu Y, Jiang H, Tyler-Smith C, Xue Y, Jiang T, Wang J, Wu M, Liu X, Tian G, Wang J, Wang J, Yang H, Zhang X: Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biol. 2011, 12 (9): R95-10.1186/gb-2011-12-9-r95.PubMed CentralPubMedView ArticleGoogle Scholar
- Clark MJ, Chen R, Lam HY, Karczewski KJ, Chen R, Euskirchen G, Butte AJ, Snyder M: Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011, 29 (10): 908-914. 10.1038/nbt.1975.PubMed CentralPubMedView ArticleGoogle Scholar
- Parla JS, Iossifov I, Grabill I, Spector MS, Kramer M, McCombie WR: A comparative analysis of exome capture. Genome Biol. 2011, 12 (9): R97-10.1186/gb-2011-12-9-r97.PubMed CentralPubMedView ArticleGoogle Scholar
- Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto JM, Bertalan M, Borruel N, Casellas F, Fernandez L, Gautier L, Hansen T, Hattori M, Hayashi T, Kleerebezem M, Kurokawa K, Leclerc M, Levenez F, Manichanh C, Nielsen HB, Nielsen T, Pons N, Poulain J, Qin J, Sicheritz-Ponten T, Tims S, et al: Enterotypes of the human gut microbiome. Nature. 2011, 473 (7346): 174-180. 10.1038/nature09944.PubMed CentralPubMedView ArticleGoogle Scholar
- Daum LT, Rodriguez JD, Worthy SA, Ismail NA, Omar SV, Dreyer AW, Fourie PB, Hoosen AA, Chambers JP, Fischer GW: Next-generation ion torrent sequencing of drug resistance mutations in Mycobacterium tuberculosis strains. J Clin Microbiol. 2012, 50 (12): 3831-3837. 10.1128/JCM.01893-12.PubMed CentralPubMedView ArticleGoogle Scholar
- Geurts-Giele WR, der Velden AW D-v, Bartalits NM, Verhoog LC, Hanselaar WE, Dinjens WN: Molecular diagnostics of a single multifocal non-small cell lung cancer case using targeted next generation sequencing. Virchows Arch. 2013, 462 (2): 249-254. 10.1007/s00428-012-1346-4.PubMedView ArticleGoogle Scholar
- Hadd AG, Houghton J, Choudhary A, Sah S, Chen L, Marko AC, Sanford T, Buddavarapu K, Krosting J, Garmire L, Wylie D, Shinde R, Beaudenon S, Alexander EK, Mambo E, Adai AT, Latham GJ: Targeted, high-depth, next-generation sequencing of cancer genes in formalin-fixed, paraffin-embedded and fine-needle aspiration tumor specimens. J Mol Diagn. 2013, 15 (2): 234-247. 10.1016/j.jmoldx.2012.11.006.PubMedView ArticleGoogle Scholar
- Perkins TT, Tay CY, Thirriot F, Marshall B: Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes. PLoS One. 2013, 8 (6): e67539-10.1371/journal.pone.0067539.PubMed CentralPubMedView ArticleGoogle Scholar
- Singh RR, Patel KP, Routbort MJ, Reddy NG, Barkoh BA, Handal B, Kanagal-Shamanna R, Greaves WO, Medeiros LJ, Aldape KD, Luthra R: Clinical Validation of a Next-Generation Sequencing Screen for Mutational Hotspots in 46 Cancer-Related Genes. J Mol Diagn. 2013, 15 (5): 607-622. 10.1016/j.jmoldx.2013.05.003.PubMedView ArticleGoogle Scholar
- Yeo ZX, Chan M, Yap YS, Ang P, Rozen S, Lee AS: Improving indel detection specificity of the Ion Torrent PGM benchtop sequencer. PLoS One. 2012, 7 (9): e45798-10.1371/journal.pone.0045798.PubMed CentralPubMedView ArticleGoogle Scholar
- Miller W, Hayes VM, Ratan A, Petersen DC, Wittekindt NE, Miller J, Walenz B, Knight J, Qi J, Zhao F, Wang Q, Bedoya-Reina OC, Katiyar N, Tomsho LP, Kasson LM, Hardie RA, Woodbridge P, Tindall EA, Bertelsen MF, Dixon D, Pyecroft S, Helgen KM, Lesk AM, Pringle TH, Patterson N, Zhang Y, Kreiss A, Woods GM, Jones ME, Schuster SC: Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). Proc Natl Acad Sci U S A. 2011, 108 (30): 12348-12353. 10.1073/pnas.1102838108.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:13033997. 2013Google Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralPubMedView ArticleGoogle Scholar
- DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011, 43 (5): 491-498. 10.1038/ng.806.PubMed CentralPubMedView ArticleGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20 (9): 1297-1303. 10.1101/gr.107524.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81 (3): 559-575. 10.1086/519795.PubMed CentralPubMedView ArticleGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26 (6): 841-842. 10.1093/bioinformatics/btq033.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralPubMedView ArticleGoogle Scholar
- Ihaka R, Gentleman R: R: a language for data analysis and graphics. J Comp Graph Stat. 1996, 5: 299-314.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.