Detection of copy number variations in rice using array-based comparative genomic hybridization
© Yu et al; licensee BioMed Central Ltd. 2011
Received: 14 April 2011
Accepted: 20 July 2011
Published: 20 July 2011
Skip to main content
© Yu et al; licensee BioMed Central Ltd. 2011
Received: 14 April 2011
Accepted: 20 July 2011
Published: 20 July 2011
Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice.
To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 718,256 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars 'Nipponbare' (from O. sativa ssp. japonica) and 'Guang-lu-ai 4' (from O. sativa ssp. indica). The CNVs identified vary in size from 1.1 kb to 180.7 kb, and encompass approximately 7.6 Mb of the rice genome. The largest regions showing copy gain and loss are of 37.4 kb on chromosome 4, and 180.7 kb on chromosome 8. In addition, 85 DNA segments were identified, including some genic sequences. Contracted genes greatly outnumbered duplicated ones. Many of the contracted genes corresponded to either the same genes or genes involved in the same biological processes; this was also the case for genes involved in disease and defense.
We detected CNVs in rice by array-based comparative genomic hybridization. These CNVs contain known genes. Further discussion of CNVs is important, as they are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics.
Copy number variations (CNVs), or copy number polymorphisms (CNPs), are forms of structural variation (SV) that are alterations in DNA resulting in the cell having an abnormal number of copies of one or more segments of DNA. A CNV is a DNA segment ranging from 1 kb to 3 Mb that has been deleted, inserted, or duplicated, on certain chromosomes [1, 2]. In particular, segmental duplications (SDs) were demonstrated to be one of the major catalysts and hotspots for CNV formation [3–5]. A CNV was described as early as 1936, with the duplication of the Bar gene in Drosophila melanogaster . Recently, many studies have discovered CNVs in humans [7–9], chimpanzee , dog , cattle , rat , mice , Drosophila , yeast , E. coli , and maize [18, 19]. CNVs can be detected using cytogenetic techniques such as fluorescent in situ hybridization, array-based comparative genomic hybridization, and SNP genotyping arrays. Recent advances in DNA sequencing technologies have further enabled the identification of CNVs by next-generation sequencing [20–22].
CNVs can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression [23, 24]. Thus, CNVs are considered likely major sources of genetic variation, and may influence phenotypic variation and gene expression. Some human CNVs have been linked with susceptibility or resistance to disease. A higher CCL3L1 copy number, for example, can reduce risk of HIV/AIDS infection , and a lower FCGR3 copy number appears to contribute to increased susceptibility to glomerulonephritis . CNVs also have an impact on fitness and gene expression. CNVs detected among 15 female isolines of Drosophila have been subjected to purifying selection . In addition, a dramatic fruit size change due to a CNV with an insertion of 6-8 kb that affected gene regulation, was described during tomato breeding . It was recently demonstrated that most CNVs in humans are in linkage disequilibrium (LD) with single nucleotide polymorphisms (SNPs); and that LD decay of the two happens at similar rates . CNVs were confirmed to capture about 18% of the variation in gene expression, with little overlap with the variation captured by SNPs . Thus, CNVs can be developed as a type of molecular marker for molecular identification.
Rice (Oryza sativa L.), comprises two subspecies, indica and japonica. It is one of the most important food crops in the world, and a model plant for genomic studies of monocots. Rice genomes exhibit relatively high levels of SNPs and indels . Sequence comparisons between the Nipponbare (japonica) and 9311 (indica) genomes have shown high levels of polymorphisms ranging from one SNP/300 bp to one indel/kp [30, 31]. These can potentially be exploited as molecular markers between these divergent subspecies. However, there are few studies of structural variation within the rice genome. Recent study of many subclones within chromosome 4 of the BAC libraries of Nipponbare and Guang-lu-ai 4 (indica), has documented that many genes vary in copy number . With the completion of rice genome sequencing projects and advances in microarray technologies, comprehensive oligonucleotide microarrays are now being used to discover genetic polymorphisms. Array-based comparative genomic hybridization (aCGH) has the advantages of high resolution and high-throughput genome-wide screening of genomic imbalances, and has been used in rice to detect single-feature polymorphisms , and structural variations created by mutagenesis .
We used high-density oligonucleotide aCGH (containing 718,256 oligonucleotide probes) to investigate the number of CNVs between Nipponbare and Guang-lu-ai 4 genomes. We found high levels of CNVs, some representing large inserted/deleted regions. In addition, several DNA segments, often including genic sequences, were identified as present in the Nipponbare genome but absent from the Guang-lu-ai 4 genome. Ours is the first comprehensive map of CNVs in the rice genome; providing an important resource for understanding the nature of variation among different rice varieties.
Primers used in PCR validation of a CNV located on chromosome 12 in Nipponbare, Guang-lu-ai 4, and some other varieties of indica and japonica.
Different hybridization signal intensity of a gene across aCGH would indicate a gain or loss of a gene copy number during rice evolution. Using a stringent selection criterion (a Guang-lu-ai 4 to Nipponbare signal ratio of 1 : 2), we identified 500 protein-coding genes that were contracted in Guang-lu-ai 4, and only 19 genes that were duplicated (signal ratio > 2.0) (Additional file 4, Table S3). The dominance of gene contraction over duplication was obvious when the aCGH selection ratio was relaxed (data not shown). Contracted genes thus greatly outnumbered duplicated ones. The majority of contracted genes are hypothetical proteins, indicating duplication of preexisting genes to augment gene function. Among the 19 duplicated genes, three encode different enzymes: transposase, reverse transcriptase and terpenoid cyclase. One gene is involved in gibberellin synthesis, i.e. ent-kaurene synthase like-2. Xa1 is a known bacterial blight resistance gene. Duplication also occurred in genes relating to metabolism, such as the GTP-binding signal recognition particle SRP54, and the 2-oxoglutarate dehydrogenase E2 subunit. As well, two genes were involved in transcription, the RNA polymerase III RPC4 family protein and the C2H2-type zinc finger domain-containing protein. Many of the contracted genes corresponded to genes that were either the same genes or genes involved in the same biological processes. This was similar for genes involved in disease and defense, such as most of them encode proteins with conserved nucleotide-binding sites (NBS) and leucine-rich repeats (LRRs). In addition, Cytochrome P450 and concanavalin A-like lectin/glucanase play crucial roles in defending plants from disease.
Using aCGH, we have generated the first map of CNVs in the rice genome. After very stringent filtering, 641 CNV events were identified between the two rice subspecies cultivars Nipponbare and Guang-lu-ai 4. This is likely to represent a very conservative estimate of the true number of CNV events in the rice genome. Focusing only on the unique sequences in our microarray will have potentially led to an underestimation of the number of CNV events. This is due to the selective omission or reduction of probe density in some CNVs enriched regions that contain segmental duplications and diverse repetitive sequences. In addition, our stringent CNV calling criteria restrained the detection of putative true CNVs. Differing probe densities, algorithms and statistical criteria used in the literature, complicate comparisons of rates of CNVs among different organisms [2, 9–11, 13, 39]. Our data suggest that smaller CNVs (< 10 kb) are much more frequent than larger ones; this is supported by other studies [8, 19]. However, using next-generation sequencing techniques would offer advantages over aCGH as DNA variations and recombination breakpoints would be directly detected [21, 40–44].
CNV number differs between species. In mammals, the mean number of CNVs per individual has been found to range from 14 in macaques  to 70 in humans . In maize, around 400 CNVs have been detected between two cultivars (Mo 17 and B 73) [18, 19]. We observed many more CNVs between indica and japonica, the main reason for this was that we used subspecific samples. Indica and japonica diverged from their O. rufipogon ancestor between 200,000 and 400,000 years ago [37, 46, 47], and have richly diversified during the processes of domestication and selection. Both phenotypic and molecular studies have confirmed a relatively high level of differentiation between these two subspecies , suggesting great variation. This is also indicated by the lower numbers of deleted gene regions (ranging from 2 to 359) between 14 mutants and their wild type IR 64 of indica . More recently, tiling oligonucleotide microarrays with 42 million probes, showed that an average of 1,098 CNVs comprising 0.78% of the human genome were validated between two individuals . This was also found in a previous study , indicating that increased density and improved probe design will help us to better understand the roles of CNVs in organisms.
Although the presence and phenotypic effects of CNVs in plants have been little investigated on the genomic level, the nature of CNVs detected in maize suggests that they may have considerable impact on plant phenotypes, including disease responses and heterosis. We detected at least 519 genes in our high confidence CNV regions (Additional file 3, Table S2). However, it is likely that more genes are affected. We found that genes in many CNVs were involved in resistance, and that most of these encode proteins with conserved nucleotide-binding sites (NBS) and leucine-rich repeats (LRRs). NBS-LRR genes in plants tend to cluster at the same loci within genomes [50, 51]. Similarly, both resistance genes and quantitative trait loci (QTL) are clustered in the rice genome [52, 53]. In addition to its functional and agronomic importance, the NBS-LRR gene family has a structural role within the genome .
Previous research showed strong evidence that natural selection may shape CNVs, both in their patterns of polymorphism and their distribution within the genome [9, 15]. Long-term purifying selection has changed quantitative traits, and it is possible that genomic variation in rice supplies source material for the generation of novel alleles. This implies that characterization of rice CNVs is far from perfect, and provides a comprehensive view of the polymorphic phase of CNVs.
We have demonstrated that CNVs are able to be detected in rice using array-based comparative genome hybridization. These are likely to be linked with subspecific characteristics and to provide an important resource for understanding variation among different rice varieties.
The rice varieties for our aCGH survey, Nipponbare (japonica) and Guang-lu-ai 4 (indica), were provided by the China National Rice Research Institute, Hangzhou, Zhejiang Province. The 10 indica varieties for CNV validation were: Minbeiwanxian, Dianbaidashanwang, Sankecun, Aizizhan, Haohuangla, Chiliyubai, Nanjing 11, Zhechang 9, Liantangao and Zhuguang 23. The 10 japonica varieties were: Kendao 8, Guihuahuang, Xiushui 48, Baimaodao, Xingguo, Mingshuixiangdao, Maendalaqili, Weiguo, Zhongdan 2 and Shuiyuansanbaili.
Genomic DNA was extracted and purified from fresh young leaves using a Promega kit (Wizard® Genomic DNA Purification Kit). Total DNA was quantified using a spectrophotometer and electrophoresed on an agarose gel for integrity checking. Following the NimbleGen quality control requirements, the genomic DNA was undegraded and had 1.8 ≤ A260/A280 ≤ 2.0 and 1.9 ≤ A260/A230 ≤ 2.0.
Custom NimbleGen 3 × 720 K microarrays http://www.nimblegen.com contain 718,256 oligonucleotide probes designed and fabricated on a single slide; resulting in a median probe spacing of 500 bp. These types of arrays utilize synthetic probes 45 to 75-mer in length with similar melting temperatures, and do not require sample amplification or reduced representation. Probes were designed from the NCBI rice genome build of October 2006. Roche NimbleGen's CGH probe design criteria was utilized. Uniqueness information was generated using the SSAHA program http://www.sanger.ac.uk/Software/analysis/SSAHA/. Standard genomic DNA labeling (Cy3 for samples and Cy5 for references), hybridizations, array scanning, data normalization, and segmentation were performed at CapitalBio Corporation as described previously [39, 55]. High confidence calls were made according to the criteria used by Graubert et al. (2007). NimbleGen has an information package that describes the technology and provides measures of reproducibility, accuracy, sensitivity, and specificity. In brief, we used the normalized qspline method from the Bioconductor package in R. CNVs were identified by the circular binary segmentation algorithm . Candidate CNVs were identified by finding more than 5 probe segments with log2 ratios greater than ± 1.0. We conducted further analysis and visualization using SignalMap software (NimbleGen). Raw aCGH data for this study have been deposited to GenBank GEO database under accession GSE30542http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30542.
For validation, sequences flanking the first and last probe set location of CNV regions were used to design primers. In addition, to reduce the possibility of interference from overlaps between probes and primer sequences, we designed two independent pairs of primers to confirm partial validated CNVs. PCR methods followed those recommended by the TaKaRa LA Taq manufacturer, optimizing conditions for each use. Products were run on a 1.5% agarose gel, stained with ethidium bromide, and visualized on a UV transilluminator.
Copy number variation
Single nucleotide polymorphism
Array-based comparative genomic hybridization
Quantitative trait loci.
This work was supported by the Agricultural Wild Resources Protection Project of MOA, China, and the Basic Research Budget of China National Rice Research Institute (No. 2009RG001-3). We thank Y Ren for technical support and excellent discussions, and the associate editor and two anonymous reviewers for their valuable suggestions.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.