Accurate measurement of gene copy number for human alpha-defensin DEFA1A3
- Fayeza F Khan†1,
- Danielle Carpenter†1,
- Laura Mitchell1,
- Omniah Mansouri1,
- Holly A Black1,
- Jess Tyson1 and
- John AL Armour1Email author
© Khan et al.; licensee BioMed Central Ltd. 2013
Received: 7 June 2013
Accepted: 19 September 2013
Published: 20 October 2013
Multi-allelic copy number variants include examples of extensive variation between individuals in the copy number of important genes, most notably genes involved in immune function. The definition of this variation, and analysis of its impact on function, has been hampered by the technical difficulty of large-scale but accurate typing of genomic copy number. The copy-variable alpha-defensin locus DEFA1A3 on human chromosome 8 commonly varies between 4 and 10 copies per diploid genome, and presents considerable challenges for accurate high-throughput typing.
In this study, we developed two paralogue ratio tests and three allelic ratio measurements that, in combination, provide an accurate and scalable method for measurement of DEFA1A3 gene number. We combined information from different measurements in a maximum-likelihood framework which suggests that most samples can be assigned to an integer copy number with high confidence, and applied it to typing 589 unrelated European DNA samples. Typing the members of three-generation pedigrees provided further reassurance that correct integer copy numbers had been assigned. Our results have allowed us to discover that the SNP rs4300027 is strongly associated with DEFA1A3 gene copy number in European samples.
We have developed an accurate and robust method for measurement of DEFA1A3 copy number. Interrogation of rs4300027 and associated SNPs in Genome-Wide Association Study SNP data provides no evidence that alpha-defensin copy number is a strong risk factor for phenotypes such as Crohn’s disease, type I diabetes, HIV progression and multiple sclerosis.
The majority of human copy number variants (CNVs) are simple di-allelic polymorphisms, generally involving variable deletion of non-coding sequences. However, a small but interesting subgroup of CNVs displays multi-allelic polymorphism for the copy number of a gene or cluster of genes. Examples include polymorphism for the copy number of CCL3L1 and CCL4L1[1–3], of FCGR3A and FCGR3B[4, 5], and of a cluster of human beta-defensin genes on chromosome 8 [6–8]. In all these cases, associations of gene copy number with important medical phenotypes have been reported – of CCL3L1/CCL4L1 with HIV infection [2, 9–11], of FCGR3B with systemic autoimmune disorders [4, 5], and of beta-defensins with Crohn’s disease and psoriasis [12, 13]. In the case of Crohn's disease, the associations proposed with the beta-defensin CNV have attracted controversy, particularly related to the confidence with which CNV states can be called [14, 15].
Establishing robust evidence for these associations is made considerably more difficult by the technical challenge of determining accurate measures of copy number . Although most severe when the copy numbers are high, as in the case of the beta-defensins (2-12 copies), accuracy of copy number measurement is still an important issue in the interpretation of association data even when gene copy numbers are relatively low, as in the case of CCL3L1/CCL4L1 (0-4 copies in Europe) [17–22]. Typing copy number by real-time PCR may be subject to errors that compromise the accuracy of association studies . These errors may arise from differences in the physicochemical state of DNA samples that alter the relative behaviour of test and reference loci [18, 23]. At high copy numbers the level of relative precision required to distinguish integer copy number states with accuracy may simply be beyond the capabilities of real-time PCR, however carefully it is performed [14, 19]. For example, measurement error of only about 10% in analysis of a sample with a true copy number of 6 would result in an incorrect integer call. The quality-control difficulty created by performing case–control association studies of multi-allelic CNVs is compounded by the observation that no simple SNP tags that can act as surrogates for the determination of gene copy number have been identified to date.
Alternative approaches have been explored for the determination of copy number at multi-allelic loci that are simultaneously convenient, economic and accurate. For some but not all such loci , MLPA appears to provide an appropriate level of accuracy to call most integers correctly. Approaches involving Paralogue Ratio Tests (PRTs), which determine the representation of a test locus relative to a co-amplified reference locus, have also been successful in determining accurate copy number measures for even some of the more challenging loci [24–26]. Side-by-side comparisons [14, 27] appear to suggest greater accuracy of PRT compared with real-time PCR for robust and reproducible determination of copy number at multi-allelic CNVs. In addition to PRTs, measurement of paralogous ratios for allelic variants (microsatellites or indels) between variable repeats within a sample have also been valuable in supplementing information on gene copy number. PRT measurements in combination with allelic variant ratios have previously been used successfully in multiplex measurement systems for CCL3L1/CCL4L1, FCGR3A/B and beta-defensins [14, 29].
The cluster of human alpha-defensin genes on chromosome 8 includes the genes DEFA1 and DEFA3, which are copy-variable [7, 30, 31]. The genes DEFA1 and DEFA3 differ only by a single base substitution in the coding sequence, corresponding to a single amino acid difference between the peptides encoded. These genes appear to be interchangeable occupants of a 19 kb copy-variable repeat unit, with both DEFA1 and DEFA3 gene number showing variation. For this reason, Aldred et al. suggested the composite designation DEFA1A3 for the copy-variable locus. The DEFA1 and DEFA3 genes lead (after proteolytic processing) to the expression of three distinct antimicrobial peptides, generally designated as HNP-1, -2, and -3. High levels of these peptides are found in the granules of neutrophils [32, 33], and a small-scale study has suggested that the expression level of the peptides is correlated with gene copy number .
Serious technical challenges are posed by the accurate measurement of the multi-allelic copy number variation displayed by DEFA1A3, because most individuals have 6 or more repeats. A full characterisation of the variation should also include a separate determination of gene copy numbers for DEFA1 and DEFA3. Furthermore, the existence of one repeat per haplotype differing substantially in sequence from others (the “partial repeat” ) makes application of many standard methods problematic. These factors may underlie the failure to score this CNV in the WTCCC CNV study, which adopted very thorough and carefully controlled approaches to CNV typing .
In this study, we apply and combine a range of measurement methods to determine the copy number of DEFA1A3, and to define the relative contribution of the DEFA1 and DEFA3 gene variants. This work has allowed us to derive a consistent characterisation of copy number variation among 589 European samples. Our data allow us to identify a single SNP that effectively tags low, medium, and high-copy number states, which can therefore act as a convenient surrogate for approximate DEFA1A3 gene copy number in high-throughput studies.
We applied these PRT measurements to evaluate DEFA1A3 copy number in 600 unrelated DNA samples from Europeans (120 unrelated HapMap CEU phase 1 and 2 samples, and 480 samples from ECCAC HRC plates 1-5), calibrating the PRT ratios against samples of known copy number (see “Methods”). Of these 600 samples, 11 (1.83%) failed to produce adequate data (at least two measurements, including at least one PRT measurement), so that we obtained useful results for 589 unrelated European samples. Starting from “gold standard” DNA samples for which total DEFA1A3 copy number had been inferred from restriction fragment lengths , we developed a secondary set of reference samples, drawn from publicly available sources; these were validated both using multiple measurements against the original reference samples as well as segregation within pedigrees (see below). These new reference samples are specified in the Methods, and listed separately in (Additional file 1: Table S2).
Examples of integer copy number inference from PRT and ratio data
Minimum ratio (MR)
Examples of ML analysis, showing copy numbers in the range 2-10
Relative likelihood values for
N = 2
N = 3
N = 4
N = 5
N = 6
N = 7
N = 8
N = 9
N = 10
Minimum ratio 511.47
N = 2
N = 3
N = 4
N = 5
N = 6
N = 7
N = 8
N = 9
N = 10
Minimum ratio 21.99
N = 2
N = 3
N = 4
N = 5
N = 6
N = 7
N = 8
N = 9
N = 10
Minimum ratio 10.50
There was substantial variation in the confidence with which integer copy numbers were assigned, with MR ranging from just above unity (i.e., the assigned copy number was only marginally favoured over an alternative) to several million-fold. The median MR value was 20.1, and the interquartile range was 3.78-133.1; most samples, therefore, were assigned an integer copy number that was supported by a factor of at least 3 over alternatives. Low values of MR, corresponding to greater uncertainty in assignment to a particular integer, correlated as expected with (a) missing or uninformative data and (b) high copy number (see Additional file 1, Section 1d).
The analysis assumes that the same underlying copy number applies to all the sequence elements measured. To investigate whether any samples had evidence to the contrary, we highlighted samples as anomalous if they included one or more measure associated with a very low probability (P < 5 × 10-4) for the maximum-likelihood copy number. We found no evidence suggesting that any of the seven cases found in this way resulted from the existence of non-standard repeat units. Further discussion of this point can be found in the Additional file 1, Section 1d.
Distribution of diploid copy numbers from 589 European samples typed in this work, and comparison with previous studies
DEFA1A3 copy number
This study (N = 589)
Predicted frequency (HWE)
Aldred (N = 111)
Linzmeier (N = 27)
Nuytten (N = 344)
The definition of a SNP tagging DEFA1A3 copy number allows us to perform indirect association tests by interrogating existing GWAS SNP data. If a clinical phenotype is strongly associated with DEFA1A3 copy number, this should be indirectly reflected in an association with genotype at rs4300027, or the associated neighbouring SNPs rs4512398 (in near-complete LD with rs4300027 in European populations) and rs7825750 (r2 = 0.46 with rs4300027). Indeed, because of the strong but imperfect correlation with SNPs, a genuine underlying association with DEFA1A3 copy number may be manifest in GWAS data as a P value (for example, in the range 10-4 to 10-7) too high to merit attention in a genome-wide context. Complete GWAS data, listing P values for all SNPs typed, were available from the WTCCC  and the CHAVI GWAS study of HIV control , and we obtained the assistance of relevant investigators in examining data from GWAS studies of atopic dermatitis , coeliac disease , Crohn’s disease [39, 40], type 1 diabetes , lung function in cystic fibrosis , multiple sclerosis , psoriasis [44, 45] and ulcerative colitis [40, 46]. These were interrogated for P values with rs4300027 or rs4512398 where genotyped, and rs7825750 in other studies. The results are collated in Additional file 1: Table S3, and reveal no strong indication of association with the DEFA1A3 CNV as reflected indirectly in SNP data. It is noteworthy that for each of Crohn’s disease, psoriasis and type I diabetes there are two independent studies listed in Additional file 1: Table S3 that fail to show a significant association. Although the simplest explanation of these outcomes is that these phenotypes are not influenced by DEFA1A3 copy number, even well-powered GWAS have limited power to positively exclude an association, especially at low effect sizes. Only coeliac disease (P = 0.013) demonstrated a P value below 0.05 (with rs4512398), but given that 18 different studies were examined, even that cannot be viewed as significant once a correction has been made for multiple testing (Additional file 1: Table S3). The relationship between CNV status and flanking SNPs might be different in different populations, and we therefore examined separately the largest single (UK) cohort in the study of Dubois et al. , consisting of 2586 cases of coeliac disease and 7532 controls; in this alternative analysis, the association with rs4512398 was not significant (P = 0.29).
We can therefore use these observations to suggest that a strong influence of DEFA1A3 copy number on predisposition to any of these phenotypes in European populations is unlikely, despite the published evidence suggesting the influence of DEFA1A3 copy number in Crohn’s disease  and of increased alpha-defensin production on HIV progression . It also provides a simple (SNP-based) method for further investigation of other phenotypes in which DEFA1A3 copy number may be implicated, such as the published association with sepsis , in a way that would not be complicated by the difficulties of direct copy number measurement. Nevertheless, although SNP genotyping can be used as an aid to prioritisation, because the association between rs4300027 and the CNV is imperfect, direct typing of the CNV remains the only definitive way to investigate potential associations.
In the absence of high-throughput methods that confer absolute assurance of gene copy number, detailed assessment of the accuracy of a new typing methodology is essential before it can be used in large sample sets. Having defined the copy number of some reference standard samples using definitive methods such as PFGE, these can be then used to calibrate and test further experiments. In addition, the evaluation of accuracy requires careful analysis of the internal consistency of data derived from the integration of different measurement assays. In principle, to achieve the best typing quality, large-scale association studies should ideally use pulsed-field gel analysis, but in practice few studies have the DNA resources, equipment and personnel to undertake the kind of exemplary work done at the complement C4 locus [50, 51]. In particular, wider replication of association findings generally depends on a reliable but high-throughput method to type DNA samples of the kind found in most population sampling studies.
Most DEFA1A3 repeat alleles appear to harbour between 1 and 5 copies of a 19 kb copy-variable repeat, which allows different copy number alleles to be clearly distinguished after pulsed-field gel electrophoresis . We were therefore able to use samples that had been definitively typed by this method  as the starting-point for calibrating our methods; subsequent analysis of segregation in three-generation pedigrees defined further reference samples that displayed unambiguous copy numbers on repeated testing using PRT and ratio methods (Figure 3). Larger-scale typing then produced data that were internally consistent between PRT and ratio measurements and conformed well to the predictions of Hardy-Weinberg equilibrium using haplotype frequencies determined in three-generation families. Reassurance of the correct calibration of our typing methods is particularly important given the apparent differences with the population copy-number distributions discovered by other approaches [7, 47, 52].
The copy-number frequencies found in this study are similar to those determined by Aldred et al. who used a combination of MAPH and variant ratios, and although there are some differences (such as a higher frequency of copy numbers above 10 in the present work) the overall distribution is not significantly different (P = 0.073). By contrast, the differences between our data and the distribution given by Linzmeier and Ganz  based on real-time PCR measurements are highly significant (Table 3), especially in the representation of copy numbers above 8 (P = 1.95 × 10-10). Although it is possible that different population origins may influence the outcome, even the relatively small sample analysed by Linzmeier and Ganz seems incompatible with the values determined here, and may reflect limitations of real-time PCR typing for this locus. The study of Nuytten et al. used real-time PCR calibrated against concatemeric constructs, but reports a copy number distribution that is also very significantly different from the one reported here (P = 1.1 × 10-10), with a much lower frequency of samples with copy numbers above 8. Nuytten et al. do not use reference genomic DNA standards, and despite their careful and ingenious method to calibrate real-time PCR measurements, it is possible that in this case their cloned constructs do not produce the same calibration as would be obtained from genomic DNA samples of the same copy number. The real-time PCR results from Danish samples given by Jespersgaard and colleagues  also have significantly more samples of low copy number (6 or fewer) among controls than we find in Europeans (P = 5.6 × 10-3), but not among their samples from Crohn’s disease patients (P = 0.074). Our preliminary analysis (data not shown) demonstrates a strong correlation with integer copy numbers published recently for HapMap Chinese and Japanese samples by Cheng et al., although without further information on measurement variation or consistency for their real-time PCR assay it is not possible to judge the extent or causes of differences between our results.
In principle, read-depth analysis provides an alternative method to establish definitive diploid gene copy number for a sample, and the study of Sudmant et al. first used genome-wide analyses of read depth to define copy number variation profiles for individual DNA samples. Although the available data suggest that their analysis of the DEFA1A3 CNV is broadly comparable with ours (median copy number of 7.58 in Table S7 of Sudmant et al., median value 7 in this study), no individual copy number values are given by Sudmant et al., and their sample of 159 individuals comes from diverse global populations . There were eight samples typed in our study which have also been sequenced as part of the Complete Genomics CNV Genome Baseline Set . Our copy numbers for these samples have a strong correlation (r2 = 0.93) with the recorded sequence coverage (for further details see Additional file 1, Section 2). Microarray data for 108 HapMap samples from Campbell et al. (their Supplementary Table S7) correlate reasonably well with our results (r2 = 0.49), even though the DEFA1A3 CNV does not form discrete genotype classes in their analyses, and the absolute copy numbers are calibrated by comparison of microarray signals against single-copy regions rather than specifically against known DEFA1A3 copy numbers. Presumably for this reason, Campbell et al. report copy number ranges for DEFA1A3 higher than measured in this study (mean 9.5 and median 9.4, compared with 7.5 and 8 respectively in this study). These analyses are described in Additional file 1, Section 2, and illustrated by a scatterplot in Additional file 1: Figure S4. Although the DEFA1A3 CNV was not called individually in the 42 million-element array-CGH study of Conrad et al. , their publicly available data can be compared with our own results for 17 samples, in which a good correlation (r2 = 0.74) is found (see Additional file 1, Section 2, and Additional file 1: Figure S5). The CNV at DEFA1A3 does not seem to have been defined and analysed in other recent studies on genome-wide identification of CNVs through read-depth analysis [58, 59].
By comparison with flanking SNP genotypes in HapMap samples we were able to define a strong association between DEFA1A3 copy number and rs4300027. To a first approximation this single SNP partitions our samples into classes with low (up to 6 copies), medium (6 to 8 copies) and high (8 copies or more) copy number, although initial further work suggests that this is not a simple cladistic split into high- and low-copy lineages (data not shown). In addition to its practical power in exploring possible associations of DEFA1A3 copy number with disease phenotypes, the strength and consistency of this association provides additional reassurance that our copy number typing is not subject to wide variation in accuracy. It is important to note that the samples analysed here are of European origin, and so rs4300027 can be used with confidence as a surrogate for DEFA1A3 copy number only in European cohorts. Most published GWAS data sets do indeed analyse European subjects, but our initial exploration of the HapMap samples suggests that the strong association of rs4300027 with copy number is not reproduced in Asian or African populations.
We have developed a PCR-based methodology for copy number measurement of the human alpha-defensin DEFA1A3 gene cluster. Our data show good internal evidence of accuracy and consistency, and we have discovered that DEFA1A3 copy number is strongly associated with SNP rs4300027 in European samples. This has in turn led to the application to GWAS investigations of rs4300027 genotype as a good proxy for approximate copy number range in Europeans.
DNA samples and standards
180 CEPH samples from the International HapMap phase I and II (http://ccr.coriell.org) and 480 random UK samples from the European Collection of Cell Cultures (ECACC) Human Random Control (HRC) panels 1 to 5 (http://www.hpacultures.org.uk) were used to develop the copy number measurement assays. The CEPH (CEU) samples used consist of 56 family trios, 5 duos and 2 singletons. For the data presented in the Results, only the 120 unrelated HapMap CEU samples were considered, so that we attempted to type 600 unrelated European samples, of which 589 produced satisfactory results. A further 110 individual CEPH samples were used to infer segregation of the CEPH trios from HapMap samples and another 99 individual CEPH samples from 3-generation pedigrees not included in the HapMap project were also used for segregation. The 23 CEPH families for which further samples were available and thus allowed segregation were; 12, 66, 104, 884, 1331, 1332, 1333, 1334, 1340, 1341, 1344, 1345, 1346, 1350, 1362, 1375, 1408, 1416, 1420, 1421, 1424, 1454, 13292. All DNA provided was extracted from lymphoblastoid cell lines.
In initial development our typing methods were calibrated using the reference samples of known DEFA1A3 copy number defined by Aldred et al. after pulsed-field gel electrophoresis and Southern blotting. These samples were used to define a second set of reference samples, this time from publicly-available sources. After initial calibration against the original pulsed-field gel-typed reference samples, the copy numbers of these new reference samples were confirmed by consistency of numerous repeated measurements using different methods, and by analysis of segregation within three-generation families (see Results below). The data reported in this paper were obtained by calibration against these new reference samples. The new reference samples were four samples available from the ECACC HRC-1 collection, C0007 (7 copies), C0075 (6 copies), C0150 (8 copies) and C0877 (9 copies), with three offspring from CEPH pedigrees (DNA available from Coriell), NA07062 (=1340-3, 5 copies), NA11998 (=1420-4, 6 copies) and NA07008 (=1340-5, 7 copies).
PCR and PRT methods
All PCR used 10 ng of input DNA, and a standard buffer at a final concentration of 50 mM Tris–HCl (pH8.8), 12.5 mM ammonium sulphate, 7.5 mM 2-mercaptoethanol, 125 μg/ml BSA, 1.4 mM MgCl2, and 200 μM each dNTP. PCR products were denatured in 10 μl formamide containing ROX-500 markers (Life Technologies) before denaturation (96°C, 3 minutes) and capillary electrophoresis. Although other combinations are possible, our work combined 1 μl each of FAM- and NED-labelled MLT1A0 PRT products with 1 μl of indel5, followed by electroinjection at 1 kV for 30 seconds into an ABI 3130xl Genetic Analyzer. Similarly, 4 μl of MspI-digested DEFA4 PRT PCR product and 4 μl HaeIII-digested DefHae3 PCR product were added to 10 μl formamide/ROX mixture, with injection at 2 kV for 45 seconds. GeneMapper software (Applied Biosystems) was used to extract the peak areas of the separated PCR products.
MLT1A0 PRT was performed using two independent PRT assays, one with a FAM labelled forward primer and the other with a NED labelled forward primer, that are then averaged into a single unrounded copy number value. Each PCR was performed with 1 μM each of primers (FAM/NED)-CCCAGAGAGCTCCTTC and GTGACTTATAAACAACAAAAA, using 24 cycles of 95°C for 30 seconds, 48°C for 30 seconds and 72°C for 30 seconds, followed by a 10-minute hold at 72°C. The primers amplified from an MLT1A0 dispersed repeat present in full repeats (only, see Figure 1) at DEFA1A3 and a similar repeat at the reference locus on chromosome 1. The MLT1A0 PRT gives products of 170 bp for the reference locus on chromosome 1 and 167 bp for the full repeat region of DEFA1A3.
DEFA4 PRT used 1 μM primers TGCTCCTGCTCTCCCTCCT and (HEX)- TTGGAATCAAGTCTTTGGAGAAA, amplifying for 26 cycles of 95°C for 30 seconds, 56.5°C for 30 seconds and 70°C for 30 seconds, followed by a 70°C hold for 10 minutes. This PCR exploits sequence similarities between the closely related genes DEFA1A3 and DEFA4, such that the primers were specifically designed to match sequences in both genes, giving products of 404 bp for the reference locus and 406 bp for DEFA1A3. These products cannot be completely separated by electrophoresis, and therefore an overnight restriction digestion at 37°C by MspI was performed which gives labelled products of 275 bp for the DEFA4 reference locus and 317 bp for DEFA1A3. Although we have observed a single instance of a haplotype carrying a deletion of DEFA4, no further examples of this variant have been observed (see Additional file 1, section 1(d)).
The ratio between the DEFA1 and DEFA3 gene variants was measured using an assay (“DefHae3”) exploiting the HaeIII restriction site difference between them. PCR used 1 μM of primers TGTCCCAGGCCCAAGGAAAA and FAM- TCCCTGTAGCTCTCAAAGCA, using 25 cycles of 95°C for 1 minute, 58°C for 1 minute and 70°C for 1 minute, followed by a 70°C hold for 10 minutes. The underlined base in the forward primer is a deliberate mismatch with the genomic sequence to create an artificial site for HaeIII. Because a completely undigested product arising from incomplete activity of the restriction enzyme cannot otherwise be distinguished from the (DEFA3) variant lacking an internal HaeIII site, it was necessary to introduce this artificial site into all products to act as a check of complete digestion by HaeIII. DEFA1 (HaeIII+) products yield a labelled product of 144 bp, and DEFA3 (HaeIII-) products 161 bp. PCR product (5 μl) was digested with 1.5U HaeIII in a total volume of 15 μl at 37°C for 12-16 hours. The full-length PCR product, indicating incomplete digestion, would be 170 bp. The DefHae3 ratio recorded is the ratio of 144 bp to 161 bp products, i.e., the DEFA1:DEFA3 ratio.
A deletion variant present in many repeats formed the basis of the “indel5” ratio assay. Indel5 PCR used 1 μM of primers HEX-CTGTCCAGGAAGAGGGAGAG and CAGCTGGAGGGTCTCTGTTC, and 23 cycles of 95°C for 30 seconds, 57°C for 30 seconds and 70°C for 30 seconds, followed by a 70°C hold for 10 minutes to generate amplicons of 124/129 bp. The indel5 ratio recorded is the ratio of deleted (124 bp) to undeleted (129 bp) products.
A 7 bp duplication variant present in many repeat units provided the basis of a third (“7bpdup”) ratio measurement. This assay used primers (HEX)- AGCAAAAATCAAACAACCTGA and GCTATGCCTCCAATCTGACC; after an initial denaturation of 95°C for 1 minute, products were amplified for 24 cycles of 95°C for 30 seconds, 54°C for 30 seconds and 70°C for 30 seconds, followed by a final hold at 70°C for 40 minutes. The 7bpdup ratio recorded is the ratio of unduplicated (275 bp) to duplicated (282 bp) products.
Genotyping of SNP rs4300027
Genotyping of rs4300027 was performed by PCR-RFLP. A single PCR reaction was performed with 1 μM each of primers AGATACCATGCTTGGAGGAA and GGGTCTTGAATTCAAATGTCAG. PCR cycle conditions were 36 cycles of 95°C for 30 s, 58.6°C for 30 s, 70°C for 30 s to generate an amplicon of 1043 bp in length. In the *C allele this is cleaved by HinfI to produce 6 fragments of 439 bp, 174 bp, 154 bp, 116 bp and 105 bp, as well as a small fragment of 55 bp. The second cleavage fails to occur in the presence of the *T allele and so a product of 613 bp is observed, as well as the other small (154 bp, 116 bp, 105 bp and 55 bp) fragments. The distinction between the longer allelic digestion fragments of 439 bp and 613 bp is clearly visible on a 2% (w/v) agarose gel.
PRT ratios were used to estimate gene copy number values, calibrating against reference samples of known copy number as described . These PRT copy number values were combined with ratio values for the same sample (from the indel5, DEFA1: DEFA3 and 7bpdup ratio measurements) to evaluate the most likely individual integer gene copy number. Briefly, for each PRT or ratio measurement, Gaussian models of measurement error (based on empirical observations) were used to estimate the probability of producing the actual measurement, given a particular value for the true gene copy number between 2 and 16. Once these probabilities had been determined for each measurement at each copy number, they were combined by multiplication to identify the integer copy number that maximises the joint probability of all the data, the “maximum likelihood copy number (MLCN)”. Further details can be found in Additional file 1.
For inclusion in the analysis, a sample needed to have at least two non-zero data points, of which at least one was a PRT. Out of the 600 DNA samples initially tested, 589 (98.2%) met these criteria.
Availability of supporting data
The data sets supporting the results of this article are included within the article (and its additional files).
We thank Anne Bowcock, Garry Cutting, Richard H. Duerr, Michel Georges, Hakon Hakonarson, David van Heel, Young-Ae Lee, Stephen Sawcer, Richard Trembath, TJ Urban, and members of the WTCCC and CHAVI studies for access to results from GWAS, and we are grateful to Xueqing Yu in Guangzhou, Jianjun Liu in Singapore and other colleagues for helpful discussions. This work was supported by a grant from the BBSRC (BB/1006370/1) to JALA. FFK was supported by a scholarship from the Government of Pakistan and University of Karachi (B/Estt(T)2007); OM is supported by the Division of Higher Education, Kingdom of Saudi Arabia (S4674), and HAB by a BBSRC Doctoral Training Award (BB/F016999/1).
- Townson JR, Barcellos LF, Nibbs RJ: Gene copy number regulates the production of the human chemokine CCL3-L1. Eur J Immunol. 2002, 32 (10): 3016-3026. 10.1002/1521-4141(2002010)32:10<3016::AID-IMMU3016>3.0.CO;2-D.View ArticlePubMedGoogle Scholar
- Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs RJ, Freedman BI, Quinones MP, Bamshad MJ, et al: The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. 2005, 307 (5714): 1434-1440. 10.1126/science.1101160.View ArticlePubMedGoogle Scholar
- Carpenter D, McIntosh R, Pleass R, Armour JAL: Functional effects of CCL3L1 copy number. Genes Immun. 2012, 13: 374-379. 10.1038/gene.2012.5.PubMed CentralView ArticlePubMedGoogle Scholar
- Aitman TJ, Dong R, Vyse TJ, Norsworthy PJ, Johnson MD, Smith J, Mangion J, Roberton-Lowe C, Marshall AJ, Petretto E, et al: Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature. 2006, 439 (7078): 851-855. 10.1038/nature04489.View ArticlePubMedGoogle Scholar
- Fanciulli M, Norsworthy PJ, Petretto E, Dong R, Harper L, Kamesh L, Heward JM, Gough SCL, de Smith A, Blakemore AIF, et al: FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat Genet. 2007, 39 (6): 721-723. 10.1038/ng2046.PubMed CentralView ArticlePubMedGoogle Scholar
- Hollox EJ, Armour JAL, Barber JCK: Extensive normal copy number variation of a beta-defensin antimicrobial-gene cluster. Am J Hum Genet. 2003, 73 (3): 591-600. 10.1086/378157.PubMed CentralView ArticlePubMedGoogle Scholar
- Linzmeier RM, Ganz T: Human defensin gene copy number polymorphisms: comprehensive analysis of independent variation in alpha- and beta-defensin regions at 8p22-p23. Genomics. 2005, 86 (4): 423-430. 10.1016/j.ygeno.2005.06.003.View ArticlePubMedGoogle Scholar
- Taudien S, Galgoczy P, Huse K, Reichwald K, Schilhabel M, Szafranski K, Shimizu A, Asakawa S, Frankish A, Loncarevic IF, et al: Polymorphic segmental duplications at 8p23.1 challenge the determination of individual defensin gene repertoires and the assembly of a contiguous human reference sequence. BMC Genomics. 2004, 5 (1): 92-10.1186/1471-2164-5-92.PubMed CentralView ArticlePubMedGoogle Scholar
- Ahuja SK, Kulkarni H, Catano G, Agan BK, Camargo JF, He W, O’Connell RJ, Marconi VC, Delmar J, Eron J, et al: CCL3L1-CCR5 genotype influences durability of immune recovery during antiretroviral therapy of HIV-1-infected individuals. Nat Med. 2008, 14 (4): 413-420. 10.1038/nm1741.PubMed CentralView ArticlePubMedGoogle Scholar
- Kulkarni H, Marconi VC, Agan BK, McArthur C, Crawford G, Clark RA, Dolan MJ, Ahuja SK: Role of CCL3L1-CCR5 genotypes in the epidemic spread of HIV-1 and evaluation of vaccine efficacy. PLoS ONE. 2008, 3 (11): e3671-10.1371/journal.pone.0003671.PubMed CentralView ArticlePubMedGoogle Scholar
- Shostakovich-Koretskaya L, Catano G, Chykarenko ZA, He W, Gornalusse G, Mummidi S, Sanchez R, Dolan MJ, Ahuja SS, Clark RA, et al: Combinatorial content of CCL3L and CCL4L gene copy numbers influence HIV-AIDS susceptibility in Ukrainian children. AIDS. 2009, 23 (6): 679-688.PubMed CentralPubMedGoogle Scholar
- Fellermann K, Stange DE, Schaeffeler E, Schmalzl H, Wehkamp J, Bevins CL, Reinisch W, Teml A, Schwab M, Lichter P, et al: A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am J Hum Genet. 2006, 79: 439-448. 10.1086/505915.PubMed CentralView ArticlePubMedGoogle Scholar
- Hollox EJ, Huffmeier U, Zeeuwen PLJM, Palla R, Lascorz J, Rodijk-Olthuis D, van de Kerkhof PCM, Traupe H, de Jongh G, Heijer M, et al: Psoriasis is associated with increased [beta]-defensin genomic copy number. Nat Genet. 2008, 40 (1): 23-25. 10.1038/ng.2007.48.PubMed CentralView ArticlePubMedGoogle Scholar
- Aldhous MC, Abu Bakar S, Prescott NJ, Palla R, Soo K, Mansfield JC, Mathew CG, Satsangi J, Armour JAL: Measurement methods and accuracy in copy number variation: failure to replicate associations of beta-defensin copy number with Crohn’s disease. Hum Mol Genet. 2010, 19 (24): 4930-4938. 10.1093/hmg/ddq411.PubMed CentralView ArticlePubMedGoogle Scholar
- Bentley R, Pearson J, Gearry R, Barclay M, McKinney C, Merriman T, Roberts R: Association of higher DEFB4 genomic copy number with Crohn’s disease. Am J Gastroenterol. 2010, 105: 354-359. 10.1038/ajg.2009.582.View ArticlePubMedGoogle Scholar
- Cantsilieris S, White SJ: Correlating multiallelic copy number polymorphisms with disease susceptibility. Hum Mutat. 2013, 34 (1): 1-13. 10.1002/humu.22172.View ArticlePubMedGoogle Scholar
- Bhattacharya T, Stanton J, Kim E-Y, Kunstman KJ, Phair JP, Jacobson LP, Wolinsky SM: CCL3L1 and HIV/AIDS susceptibility. Nat Med. 2009, 15 (10): 1112-1115. 10.1038/nm1009-1112.View ArticlePubMedGoogle Scholar
- Carpenter D, Walker S, Prescott N, Schwalkwijk J, Armour JAL: Accuracy and differential bias in copy number measurement of CCL3L1. BMC Genomics. 2011, 12: 418-10.1186/1471-2164-12-418.PubMed CentralView ArticlePubMedGoogle Scholar
- Field SF, Howson JMM, Maier LM, Walker S, Walker NM, Smyth DJ, Armour JAL, Clayton DG, Todd JA: Experimental aspects of copy number variant assays at CCL3L1. Nat Med. 2009, 15: 1115-1117. 10.1038/nm1009-1115.PubMed CentralView ArticlePubMedGoogle Scholar
- He W, Kulkarni H, Castiblanco J, Shimizu C, Aluyen U, Maldonado R, Carrillo A, Griffin M, Lipsitt A, Beachy L, et al: Reply to: “CCL3L1 and HIV/AIDS susceptibility” and “Experimental aspects of copy number variant assays at CCL3L1”. Nat Med. 2009, 15 (10): 1117-1120. 10.1038/nm1009-1117.View ArticlePubMedGoogle Scholar
- Urban TJ, Weintrob AC, Fellay J, Colombo S, Shianna KV, Gumbs C, Rotger M, Pelak K, Dang KK, Detels R, et al: CCL3L1 and HIV/AIDS susceptibility. Nat Med. 2009, 15: 1110-1112. 10.1038/nm1009-1110.PubMed CentralView ArticlePubMedGoogle Scholar
- Shrestha S, Tang J, Kaslow RA: Gene copy number: learning to count past two. Nat Med. 2009, 15 (10): 1127-1129. 10.1038/nm1009-1127.View ArticlePubMedGoogle Scholar
- Veal C, Freeman P, Jacobs K, Lancaster O, Jamain S, Leboyer M, Albanes D, Vaghela R, Gut I, Chanock S, et al: A mechanistic basis for amplification differences between samples and between genome regions. BMC Genomics. 2012, 13 (1): 455-10.1186/1471-2164-13-455.PubMed CentralView ArticlePubMedGoogle Scholar
- Armour JAL, Palla R, Zeeuwen PLJM, den Heijer M, Schalkwijk J, Hollox EJ: Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats. Nucleic Acids Res. 2007, 35 (3): e19-10.1093/nar/gkl1089.PubMed CentralView ArticlePubMedGoogle Scholar
- Carpenter D, Walker S, Prescott N, Schalkwijk J, Armour JAL: Accuracy and differential bias in copy number measurement of CCL3L1 in association studies with three auto-immune disorders. BMC Genomics. 2011, 12 (1): 418-10.1186/1471-2164-12-418.PubMed CentralView ArticlePubMedGoogle Scholar
- Walker S, Janyakhantikul S, Armour JAL: Multiplex paralogue ratio tests for accurate measurement of multiallelic CNVs. Genomics. 2009, 93 (1): 98-103. 10.1016/j.ygeno.2008.09.004.View ArticlePubMedGoogle Scholar
- Fode P, Jespersgaard C, Hardwick RJ, Bogle H, Theisen M, Dodoo D, Lenicek M, Vitek L, Vieira A, Freitas J, et al: Determination of beta-defensin genomic copy number in different populations: a comparison of three methods. PLoS ONE. 2011, 6 (2): e16768-10.1371/journal.pone.0016768.PubMed CentralView ArticlePubMedGoogle Scholar
- Hollox EJ, Detering J-C, Dehnugara T: An integrated approach for measuring copy number variation at the FCGR3 (CD16) locus. Hum Mutat. 2009, 30 (3): 477-484. 10.1002/humu.20911.PubMed CentralView ArticlePubMedGoogle Scholar
- Abu Bakar S, Hollox EJ, Armour JAL: Allelic crossover between distinct genomic locations generates copy number diversity in human beta-defensins. Proc Natl Acad Sci U S A. 2009, 106: 853-858. 10.1073/pnas.0809073106.View ArticlePubMedGoogle Scholar
- Aldred PMR, Hollox EJ, Armour JAL: Copy number polymorphism and expression level variation of the human α-defensin genes DEFA1 and DEFA3. Hum Mol Genet. 2005, 14 (14): 2045-2052. 10.1093/hmg/ddi209.View ArticlePubMedGoogle Scholar
- Mars WM, Patmasiriwat P, Maity T, Huff V, Weil MM, Saunders GF: Inheritance of unequal numbers of the genes encoding the human neutrophil defensins HP-1 and HP-3. J Biol Chem. 1995, 270 (51): 30371-30376. 10.1074/jbc.270.51.30371.View ArticlePubMedGoogle Scholar
- Harwig SS, Park AS, Lehrer RI: Characterization of defensin precursors in mature human neutrophils. Blood. 1992, 79: 1532-1537.PubMedGoogle Scholar
- Ganz T: Extracellular release of antimicrobial defensins by human polymorphonuclear leukocytes. Infect Immun. 1987, 55 (3): 568-571.PubMed CentralPubMedGoogle Scholar
- Wellcome Trust Case Control Consortium: Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature. 2010, 464: 713-720. 10.1038/nature08979.View ArticleGoogle Scholar
- Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447: 661-678. 10.1038/nature05911.View ArticleGoogle Scholar
- Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, Cirulli ET, Urban TJ, Zhang K, Gumbs CE, Smith JP, et al: Common genetic variation and the control of HIV-1 in humans. PLoS Genet. 2009, 5 (12): e1000791-10.1371/journal.pgen.1000791.PubMed CentralView ArticlePubMedGoogle Scholar
- Esparza-Gordillo J, Weidinger S, Folster-Holst R, Bauerfeind A, Ruschendorf F, Patone G, Rohde K, Marenholz I, Schulz F, Kerscher T, et al: A common variant on chromosome 11q13 is associated with atopic dermatitis. Nat Genet. 2009, 41 (5): 596-601. 10.1038/ng.347.View ArticlePubMedGoogle Scholar
- Dubois PCA, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GAR, Adany R, Aromaa A, et al: Multiple common variants for celiac disease influencing immune gene expression. Nat Genet. 2010, 42 (4): 295-302. 10.1038/ng.543.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmada MM, et al: Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nat Genet. 2008, 40 (8): 955-962. 10.1038/ng.175.PubMed CentralView ArticlePubMedGoogle Scholar
- Kugathasan S, Baldassano RN, Bradfield JP, Sleiman PMA, Imielinski M, Guthery SL, Cucchiara S, Kim CE, Frackelton EC, Annaiah K, et al: Loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease. Nat Genet. 2008, 40 (10): 1211-1215. 10.1038/ng.203.PubMed CentralView ArticlePubMedGoogle Scholar
- Hakonarson H, Grant SFA, Bradfield JP, Marchand L, Kim CE, Glessner JT, Grabs R, Casalunovo T, Taback SP, Frackelton EC, et al: A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature. 2007, 448 (7153): 591-594. 10.1038/nature06010.View ArticlePubMedGoogle Scholar
- Wright FA, Strug LJ, Doshi VK, Commander CW, Blackman SM, Sun L, Berthiaume Y, Cutler D, Cojocaru A, Collaco JM, et al: Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2. Nat Genet. 2011, 43 (6): 539-546. 10.1038/ng.838.PubMed CentralView ArticlePubMedGoogle Scholar
- Hafler DA, Compston A, Sawcer S, Lander ES, Daly MJ, De Jager PL, de Bakker PI, Gabriel SB, Mirel DB, International Multiple Sclerosis Genetics Consortium, et al: Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med. 2007, 357 (9): 851-862.View ArticlePubMedGoogle Scholar
- Capon F, Bijlmakers M-J, Wolf N, Quaranta M, Huffmeier U, Allen M, Timms K, Abkevich V, Gutin A, Smith R, et al: Identification of ZNF313/RNF114 as a novel psoriasis susceptibility gene. Hum Mol Genet. 2008, 17: 1938-1945. 10.1093/hmg/ddn091.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu Y, Helms C, Liao W, Zaba LC, Duan S, Gardner J, Wise C, Miner A, Malloy MJ, Pullinger CR, et al: A genome-wide association study of psoriasis and psoriatic arthritis identifies new disease loci. PLoS Genet. 2008, 4 (4): e1000041-10.1371/journal.pgen.1000041.PubMed CentralView ArticlePubMedGoogle Scholar
- Silverberg MS, Cho JH, Rioux JD, McGovern DPB, Wu J, Annese V, Achkar J-P, Goyette P, Scott R, Xu W, et al: Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nat Genet. 2009, 41 (2): 216-220. 10.1038/ng.275.PubMed CentralView ArticlePubMedGoogle Scholar
- Jespersgaard C, Fode P, Dybdahl M, Vind I, Nielsen OH, Csillag C, Munkholm P, Vainer B, Riis L, Elkjaer M, et al: Alpha-defensin DEFA1A3 gene copy number elevation in Danish Crohn’s disease patients. Dig Dis Sci. 2011, 56: 3517-3524. 10.1007/s10620-011-1794-8.View ArticlePubMedGoogle Scholar
- Rodríguez-García M, Climent N, Oliva H, Casanova V, Franco R, Leon A, Gatell JM, García F, Gallart T: Increased α-defensins 1-3 production by dendritic cells in HIV-infected individuals is associated with slower disease progression. PLoS ONE. 2010, 5 (2): e9436-10.1371/journal.pone.0009436.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen QX, Hakimi M, Wu SJ, Jin Y, Cheng BL, Wang HH, Xie GH, Ganz T, Linzmeier RM, Fang X: Increased genomic copy number of DEFA1/DEFA3 is associated with susceptibility to severe sepsis in Chinese Han population. Anesthesiology. 2010, 112: 1428-1434. 10.1097/ALN.0b013e3181d968eb.View ArticlePubMedGoogle Scholar
- Yang Y, Chung EK, Wu YL, Savelli SL, Nagaraja HN, Zhou B, Hebert M, Jones KN, Shu Y, Kitzmiller K, et al: Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am J Hum Genet. 2007, 80 (6): 1037-1054. 10.1086/518257.PubMed CentralView ArticlePubMedGoogle Scholar
- Chung EK, Yang Y, Rennebohm RM, Lokki M-L, Higgins GC, Jones KN, Zhou B, Blanchong CA, Yu CY: Genetic sophistication of human complement components C4A and C4B and RP-C4-CYP21-TNX (RCCX) modules in the major histocompatibility complex. Am J Hum Genet. 2002, 71 (4): 823-837. 10.1086/342777.PubMed CentralView ArticlePubMedGoogle Scholar
- Nuytten H, Wlodarska I, Nackaerts K, Vermeire S, Vermeesch J, Cassiman J-J, Cuppens H: Accurate determination of copy number variations (CNVs): application to the α- and β-defensin CNVs. J Immunol Methods. 2009, 344 (1): 35-44. 10.1016/j.jim.2009.03.002.View ArticlePubMedGoogle Scholar
- Cheng F-J, Zhou X-J, Zhao Y-F, Zhao M-H, Zhang H: Alpha-defensin DEFA1A3 gene copy number variation in Asians and its genetic association study in Chinese systemic lupus erythematosus patients. Gene. 2013, 517 (2): 158-163. 10.1016/j.gene.2013.01.011.View ArticlePubMedGoogle Scholar
- Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Eichler EE, et al: Diversity of human copy number variation and multicopy genes. Science. 2010, 330 (6004): 641-646. 10.1126/science.1197005.PubMed CentralView ArticlePubMedGoogle Scholar
- Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G, et al: Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 2010, 327 (5961): 78-81. 10.1126/science.1181498.View ArticlePubMedGoogle Scholar
- Campbell C, Sampas N, Tsalenko A, Sudmant P, Kidd J, Malig M, Vu T, Vives L, Tsang P, Bruhn L, et al: Population-genetic properties of differentiated human copy-number polymorphisms. Am J Hum Genet. 2011, 88 (3): 317-332. 10.1016/j.ajhg.2011.02.004.PubMed CentralView ArticlePubMedGoogle Scholar
- Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, et al: Origins and functional impact of copy number variation in the human genome. Nature. 2010, 464 (7289): 704-712. 10.1038/nature08516.PubMed CentralView ArticlePubMedGoogle Scholar
- Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, et al: Mapping copy number variation by population-scale genome sequencing. Nature. 2011, 470 (7332): 59-65. 10.1038/nature09708.PubMed CentralView ArticlePubMedGoogle Scholar
- Krumm N, Sudmant PH, Ko A, O’Roak BJ, Malig M, Coe BP, Quinlan AR, Nickerson DA, Eichler EE, NHLBI Exome Sequencing Project N: Copy number variation detection and genotyping from exome sequence data. Genome Res. 2012, 22: 1525-1532. 10.1101/gr.138115.112.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.