- Research article
- Open Access
Copy number variations (CNVs) identified in Korean individuals
- Tae-Wook Kang†1,
- Yeo-Jin Jeon†1,
- Eunsu Jang†2,
- Hee-Jin Kim1,
- Jeong-Hwan Kim1,
- Jong-Lyul Park1,
- Siwoo Lee2,
- Yong Sung Kim1,
- Jong Yeol Kim2Email author and
- Seon-Young Kim1Email author
© Kang et al; licensee BioMed Central Ltd. 2008
Received: 24 June 2008
Accepted: 18 October 2008
Published: 18 October 2008
Copy number variations (CNVs) are deletions, insertions, duplications, and more complex variations ranging from 1 kb to sub-microscopic sizes. Recent advances in array technologies have enabled researchers to identify a number of CNVs from normal individuals. However, the identification of new CNVs has not yet reached saturation, and more CNVs from diverse populations remain to be discovered.
We identified 65 copy number variation regions (CNVRs) in 116 normal Korean individuals by analyzing Affymetrix 250 K Nsp whole-genome SNP data. Ten of these CNVRs were novel and not present in the Database of Genomic Variants (DGV). To increase the specificity of CNV detection, three algorithms, CNAG, dChip and GEMCA, were applied to the data set, and only those regions recognized at least by two algorithms were identified as CNVs. Most CNVRs identified in the Korean population were rare (<1%), occurring just once among the 116 individuals. When CNVs from the Korean population were compared with CNVs from the three HapMap ethnic groups, African, European, and Asian; our Korean population showed the highest degree of overlap with the Asian population, as expected. However, the overlap was less than 40%, implying that more CNVs remain to be discovered from the Asian population as well as from other populations. Genes in the novel CNVRs from the Korean population were enriched for genes involved in regulation and development processes.
CNVs are recently-recognized structural variations among individuals, and more CNVs need to be identified from diverse populations. Until now, CNVs from Asian populations have been studied less than those from European or American populations. In this regard, our study of CNVs from the Korean population will contribute to the full cataloguing of structural variation among diverse human populations.
Understanding variations in the human genome is the key to unraveling the phenotypic diversity among individuals and understanding various human diseases. Genomic variations exist at various levels, from differences in single nucleotides to microscopic chromosome-level variation . Copy number variations (CNVs), a new type of genomic variation that has recently received considerable attention, are deletions, insertions, duplications, and more complex variations ranging from 1 kb to submicroscopic sizes [1–4]. Recent advances in array technologies such as BAC arrays, oligonucleotide array CGHs, and whole-genome SNP arrays, have finally enabled researchers to identify this new type of variation, which had gone unnoticed for a long time .
Since Sebat et al.  and Iafrate et al.  first reported large-scale CNVs among normal human individuals in 2004, and since then, many researchers have identified novel CNVs using diverse technical and computational approaches [8–17]. These reported CNVs are collected and maintained in a curated database, the database of genomic variants http://projects.tcag.ca/variation/, which contains more than 15,000 CNVs obtained from 48 publications as of April, 2008. However, the discovery of new CNVs has not yet been saturated, and many challenges remain for the standardization of CNV discovery [18, 19]. The global map of CNVs from the 270 normal individuals in the HapMap collection is an important advance in the field, yet genomes from more individuals from diverse populations should be studied to achieve a full cataloging of human CNVs .
Whole-genome SNP arrays such as Affymetrix 500 K or Illumina 300 K arrays, which are widely used for whole-genome association studies, are also useful for CNV discovery since the intensity of the probes can be exploited to detect CNV gains and losses [20–23]. A few recent studies successfully utilized whole-genome SNP data from control populations in North American and European countries for the detection of novel CNVs [19, 22, 24, 25]. Here, we report the identification of 10 novel CNVs from 116 normal Korean individuals by analyzing Affymetrix 250 Nsp SNP array data. Our work will be valuable in expanding our knowledge of CNVs across diverse populations and ethnicities.
Results and discussion
CNVRs from the Korean population
Commonly used algorithms for CNV detection from SNP arrays can produce widely different results from the same data because they differ both in the way reference samples are prepared and in their calling criteria [19, 26]. A stringent criterion to select only regions identified by more than two different algorithms is currently recommended to increase confidence in the identified CNVs . In this work, we applied three algorithms, CNAG , dChip  and GEMCA , to our data set of 116 normal Korean individuals genotyped using Affymetrix 250 K Nsp arrays. We identified a total of 65 CNVRs, among which 10 CNVRs (15.4%) were novel and not present in the Database of Genomic Variants. Many novel CNVs were likely missed by our approach, but we chose to be conservative in our selection of CNVs to reduce false positives. More than 15.4% of the identified CNVs in the Korean population would be novel if we consider a recent study, which showed that most CNV loci are actually smaller than currently recorded in the Database of Genomic Variants .
Size and occurrence of CNVs in the Korean population
Distribution of CNV sizes identified in the Korean population
10 K–100 K
100 K–200 K
200 K–300 K
300 K–400 K
400 K–500 K
500 K–1 M
1 M–10 M
Occurrence of CNVs among the Korean population
Comparison by ethnicity
Overlap between CNVs from the Korean population and CNVs from the 270 HapMap individuals
HAP* CNV count
HAP CNVR count
HAP CNV size (bp)
KOR# unique count
KOR-HAP overlap count
KOR-HAP overlap size (bp)
KOR-HAP overlap count percent
KOR-HAP overlap size percent
A. Affymetrix 500 K
Novel CNVRs from the Korean population
Among the 10 novel CNVRs identified from the Korean population, 3 CNVRs contained a total of 5 genes (Additional file 5). The total length of the novel CNVRs was 1,788,129 bp, or 0.06% of the human genome. The total length of the 55 known CNVRs is 14,280,140 bp (0.48% of the human genome). Twenty-four of these CNVRs contained 52 genes.
Functional annotation of novel CNVs from the Korean population
regulation of embryonic development
positive regulation of epithelial cell proliferation
regulation of epithelial cell proliferation
epithelial cell proliferation
regulation of cell migration
regulation of cell adhesion
regulation of cell motility
regulation of locomotion
positive regulation of cell proliferation
regulation of developmental process
Functional annotation of known CNVs from the Korean population
sensory perception of smell
sensory perception of chemical stimulus
cell surface receptor linked signal transduction
neurological system process
G-protein coupled receptor protein signaling pathway
antibiotic biosynthetic process
antibiotic metabolic process
entrainment of circadian clock
transmembrane receptor protein tyrosine phosphatase signaling pathway
drug metabolic process
The fact that 15% (10/65) of CNVs in the Korean population were novel implies that current CNV discovery has not yet plateaued, and that the genomes of more individuals should be examined to fully understand CNVs in the general population. Until recently, CNV studies have mainly focused on populations in North America and Europe [19, 25]. More individuals from other continents, such as Asia, Africa, and South America, need to be studied to enrich our understanding of the diversity of CNVs in the human population. We stress that the Korean population had less than a 40% overlap in CNVRs with the 90 Asian HapMap individuals, which suggests that more individuals should be studied to fully represent the pattern of CNVs among East Asian populations. In this regard, our work on 116 Korean individuals will be a useful resource for better understanding the diverse variation in the human genome.
Recent studies have shown that CNVs are as important as single nucleotide polymorphisms (SNPs) or microscopic variations. Many studies have reported the identification of novel CNVs, but more CNVs from diverse populations should be identified until we have a full catalogue of the structural variations among human populations. Until now, the CNVs of Asian populations have not been as thoroughly studied as those of European or American populations, and in this regard our study of CNVs from the Korean population will contribute to the full cataloguing of structural variations among diverse human populations.
Blood specimens were obtained from normal, healthy subjects who visited the Korean Institute of Oriental Medicine (KIOM) and collaborative hospitals. The internal review board at KIOM approved study protocols and informed consent was obtained from all enrolled study subjects. Genomic DNA was extracted from blood samples using the QIAamp DNA Blood Maxi Kit (Qiagen, Valencia, CA) according to the manufacturer's instruction. DNA concentration and purity were determined using the NanoDrop DN-1000 spectrophotometer (NanoDrop Technologies, Rockland, DE).
Affymetrix GeneChip Nsp 250 K Mapping Array data
The 250 K Nsp mapping assay was performed according to the manufacturer's protocol. Briefly, DNA (250 ng) was digested with NspI (NEB, MA) and then ligated with an NspI linker supplied by Affymetrix. The ligated DNA was diluted four-fold and PCR-amplified using a PCR primer complementary to the linker DNA. The PCR products were purified using a DNA Amplification Clean-Up Kit (Clontech, CA) and 90 μg of the PCR products were fragmented by DNase I treatment. The fragmented DNA was labelled using 0.86 mM GeneChip DNA labelling reagents (Affymetrix) and 1.5 U/μl terminal deoxy-nucleotidyl transferase (TdT) for 4 hr at 37°C, while the remaining 4.5 μl was examined on 4% TBE agarose gel to confirm that average DNA fragment size was < 180 bp. Hybridization and subsequent steps were performed according to the manufacturer's instructions. Hybridization experiments that passed the genotyping call rate over 93% by the dynamic model algorithm were used in the subsequent analysis to reduce false positive predictions arising from low quality genotyping data.
Copy number analysis using CNAG, dChip and GEMCA
Three algorithms, CNAG (version 2.0), GEMCA (available at http://www2.genome.rcast.u-tokyo.ac.jp/CNV/gemca_details.html) and dChip, were used to infer copy numbers from 250 K Nsp SNP array data.
A reference data set of 48 normal individuals (obtained from the Affymetrix website) was used in the non-paired reference analysis with default parameters and CNVs inferred as more than two consecutive SNPs in CNAG analysis. In the GEMCA analysis, a reference data set of 10 normal individuals was used in the non-paired reference analysis and the default parameters were used. The boundary of CNVs was determined using 90% density borders . Analysis with dChip was normalized at the probe intensity level with an invariant set normalization method . A signal value was calculated for each SNP using an average model method (PM/MM difference). From the raw copy numbers, the inferred copy number was estimated by using HMM (Hidden Markov model) and 10% of sample trimmed options and CNVs were inferred as more than two consecutive SNPs. Finally, for each individual, CNVs were defined as a region identified by more than two algorithms (overlap rate >= 50%, length >= 1000 bp). This strategy is likely to increase a confidence in the detected CNVs although many novel CNVs may be missed . Considering the current lack of standards in CNV discovery methods, we think that a more stringent approach like ours is appropriate. NCBI genome build 36 (hg18) was used to map each CNV to its genomic position.
Comparison of Korean CNVs with those of 270 HapMap individuals
CEL files for the 270 HapMap individuals were downloaded from the Affymetrix web site. For copy number analysis of the 270 HapMap samples, the same reference set of 48 samples was used in the CNAT analysis. CNV data for each of the 269 HapMap individuals investigated using the whole genome TilePath (WGTP) array was downloaded from the CNV Project web site at the Welcome Trust Sanger Institute http://www.sanger.ac.uk/humgen/cnv/.
Determination of novel CNVRs and functional annotation analysis
CNVs identified in our Korean population were compared with 11,966 CNVs in the Database of Genomic Variants (downloaded as of Feb. 2008). The GOstat web service was used for gene ontology (GO) term analysis to study the enrichment of GO terms in the known and novel CNVs . This analysis was performed with the default option for biological processes and the GO term candidates were ordered by p-value.
Quantitative-PCR (Q-PCR) for CNVs validation
Two selected novel CNVs were validated by Q-PCR. Q-PCR was done in 20 μl with the following components: 7.0 μl of molecular biology grade water (Hyclone, US), 10 μl of 2 × SYBR Green Premix EX Taq solution, 0.5 μl of forward and reverse primers (10 pmol/μl each) and 2 μl template DNA (1 ng/ml). Primer sequences were 5'-AGCCAGCTATCAGGTGAGGA-3' (SYNPR-forward), 5'-ACTTGTCTAAGCCCCTGCAA-3' (SYNPR-reverse), 5'-GAGTGGGCTTTGTGGTGAAT-3' (KRR1-forward) and 5'-TGTGCTGGGCATATTAGTGG-3' (KRR1-reverse). Q-PCR was conducted using CFX96 (Bio-Rad Laboratories, US) with the following cycling condition: initial denaturation at 95°C for 3 min followed by 45 cycles of 95°C for 10 s, 60°C for 20 s and and 72°C for 20 s. The relative quantification in each sample was determined.
This work was supported by a grant NBC1900712 (to YSK) from the Ministry of Science and Technology of Korea and KRIBB Research Initiative program.
- Sharp AJ, Cheng Z, Eichler EE: Structural variation of the human genome. Annu Rev Genomics Hum Genet. 2006, 7: 407-442. 10.1146/annurev.genom.7.080505.115618.PubMedView ArticleGoogle Scholar
- Feuk L, Carson AR, Scherer SW: Structural variation in the human genome. Nat Rev Genet. 2006, 7 (2): 85-97. 10.1038/nrg1767.PubMedView ArticleGoogle Scholar
- Feuk L, Marshall CR, Wintle RF, Scherer SW: Structural variants: changing the landscape of chromosomes and design of disease studies. Hum Mol Genet. 2006, 15 (Spec No 1): R57-66. 10.1093/hmg/ddl057.PubMedView ArticleGoogle Scholar
- Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, Altshuler DM, Aburatani H, Jones KW, Tyler-Smith C, Hurles ME: Copy number variation: new insights in genome diversity. Genome Res. 2006, 16 (8): 949-961. 10.1101/gr.3677206.PubMedView ArticleGoogle Scholar
- Carter NP: Methods and strategies for analyzing copy number variation using DNA microarrays. Nat Genet. 2007, 39 (7 Suppl): S16-21. 10.1038/ng2028.PubMedPubMed CentralView ArticleGoogle Scholar
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M: Large-scale copy number polymorphism in the human genome. Science. 2004, 305 (5683): 525-528. 10.1126/science.1098918.PubMedView ArticleGoogle Scholar
- Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet. 2004, 36 (9): 949-951. 10.1038/ng1416.PubMedView ArticleGoogle Scholar
- Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM, Clark RA, Schwartz S, Segraves R: Segmental duplications and copy-number variation in the human genome. Am J Hum Genet. 2005, 77 (1): 78-88. 10.1086/431652.PubMedPubMed CentralView ArticleGoogle Scholar
- Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D: Fine-scale structural variation of the human genome. Nat Genet. 2005, 37 (7): 727-732. 10.1038/ng1562.PubMedView ArticleGoogle Scholar
- Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 2006, 38 (1): 75-81. 10.1038/ng1697.PubMedView ArticleGoogle Scholar
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W: Global variation in copy number in the human genome. Nature. 2006, 444 (7118): 444-454. 10.1038/nature05329.PubMedPubMed CentralView ArticleGoogle Scholar
- Kriek M, White SJ, Szuhai K, Knijnenburg J, van Ommen GJ, den Dunnen JT, Breuning MH: Copy number variation in regions flanked (or unflanked) by duplicons among patients with developmental delay and/or congenital malformations; detection of reciprocal and partial Williams-Beuren duplications. Eur J Hum Genet. 2006, 14 (2): 180-189. 10.1038/sj.ejhg.5201540.PubMedView ArticleGoogle Scholar
- Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis P, Feuk L: Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 2006, 16 (12): 1566-1574. 10.1101/gr.5630906.PubMedPubMed CentralView ArticleGoogle Scholar
- Khaja R, Zhang J, MacDonald JR, He Y, Joseph-George AM, Wei J, Rafiq MA, Qian C, Shago M, Pantano L: Genome assembly comparison identifies structural variants in the human genome. Nat Genet. 2006, 38 (12): 1413-1418. 10.1038/ng1921.PubMedPubMed CentralView ArticleGoogle Scholar
- Locke DP, Sharp AJ, McCarroll SA, McGrath SD, Newman TL, Cheng Z, Schwartz S, Albertson DG, Pinkel D, Altshuler DM: Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am J Hum Genet. 2006, 79 (2): 275-290. 10.1086/505653.PubMedPubMed CentralView ArticleGoogle Scholar
- McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB, Lee C, Daly MJ: Common deletion polymorphisms in the human genome. Nat Genet. 2006, 38 (1): 86-92. 10.1038/ng1696.PubMedView ArticleGoogle Scholar
- Qiao Y, Liu X, Harvard C, Nolin SL, Brown WT, Koochek M, Holden JJ, Lewis ME, Rajcan-Separovic E: Large-scale copy number variants (CNVs): distribution in normal subjects and FISH/real-time qPCR analysis. BMC Genomics. 2007, 8: 167-10.1186/1471-2164-8-167.PubMedPubMed CentralView ArticleGoogle Scholar
- Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Carter NP, Hurles ME, Feuk L: Challenges and standards in integrating surveys of structural variation. Nat Genet. 2007, 39 (7 Suppl): S7-15. 10.1038/ng2093.PubMedPubMed CentralView ArticleGoogle Scholar
- Pinto D, Marshall C, Feuk L, Scherer SW: Copy-number variation in control population cohorts. Hum Mol Genet. 2007, 16 (Spec No 2): R168-173. 10.1093/hmg/ddm241.PubMedView ArticleGoogle Scholar
- Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME: Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 2006, 16 (12): 1575-1584. 10.1101/gr.5629106.PubMedPubMed CentralView ArticleGoogle Scholar
- Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, Chiba S, Bailey DK, Kennedy GC: A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 2005, 65 (14): 6071-6079. 10.1158/0008-5472.CAN-05-0465.PubMedView ArticleGoogle Scholar
- Simon-Sanchez J, Scholz S, Fung HC, Matarin M, Hernandez D, Gibbs JR, Britton A, de Vrieze FW, Peckham E, Gwinn-Hardy K: Genome-wide SNP assay reveals structural genomic variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum Mol Genet. 2007, 16 (1): 1-14. 10.1093/hmg/ddl436.PubMedView ArticleGoogle Scholar
- Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 2006, 16 (9): 1136-1148. 10.1101/gr.5402306.PubMedPubMed CentralView ArticleGoogle Scholar
- Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE: A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet. 2007, 80 (1): 91-104. 10.1086/510560.PubMedPubMed CentralView ArticleGoogle Scholar
- Zogopoulos G, Ha KC, Naqib F, Moore S, Kim H, Montpetit A, Robidoux F, Laflamme P, Cotterchio M, Greenwood C: Germ-line DNA copy number variation frequencies in a large North American population. Hum Genet. 2007, 122 (3–4): 345-353. 10.1007/s00439-007-0404-5.PubMedView ArticleGoogle Scholar
- Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M: Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics. 2007, 8: 368-10.1186/1471-2105-8-368.PubMedPubMed CentralView ArticleGoogle Scholar
- Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics. 2004, 20 (8): 1233-1240. 10.1093/bioinformatics/bth069.PubMedView ArticleGoogle Scholar
- Perry GH, Ben-Dor A, Tsalenko A, Sampas N, Rodriguez-Revenga L, Tran CW, Scheffer A, Steinfeld I, Tsang P, Yamada NA: The fine-scale and complex architecture of human copy-number variation. Am J Hum Genet. 2008, 82 (3): 685-695. 10.1016/j.ajhg.2007.12.010.PubMedPubMed CentralView ArticleGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20 (9): 1464-1465. 10.1093/bioinformatics/bth088.PubMedView ArticleGoogle Scholar
- Nguyen DQ, Webber C, Ponting CP: Bias of selection on human copy-number variants. PLoS Genet. 2006, 2 (2): e20-10.1371/journal.pgen.0020020.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.