Within the last few years there have been significant developments in our understanding of copy number variation (CNV) and its contribution to human genetic variation. It is now apparent that CNVs are both a frequent form of variation throughout the human genome, and are implicated in disease susceptibility [1, 2].
In general, three forms of CNVs can be defined; insertion, deletion and multi-allelic, but accurate and standardised measurement of individual copy numbers is technically challenging. In particular the measurement of multi-allelic CNVs is the most problematic, more specifically those with higher copy numbers, where it is essential that the technical challenges of copy number genotyping are robust for all copy number classes and do not result in the systematic rejection of samples with high copy number. Repeated exclusion of particular genotypes will lead to mis-representation of copy number variation at a particular locus in a given population due to an artificially reduced apparent mean copy number. To define the full variation at a particular copy number variable locus it is critical that these technical challenges are overcome successfully.
Furthermore, and perhaps most importantly, accurate CNV measurement is essential in the context of case control association studies. Critically, inaccurate copy number measurements can lead to differential bias between cases and controls and result in false positive findings . In SNP based studies differential bias can lead to differences in allele frequency estimates between batches of samples. In the context of CNVs bias can be seen as a systematic shift in the raw measurement around integer values for cases and controls, leading to differential mis-classification of samples by either over- or under-estimation of the integer copy number. Differences in DNA source and quality can cause small changes in the measured copy number distribution of case and control samples . Furthermore, the power to detect disease associations in case control studies is also reduced with inaccurate genotyping of copy number . Thus accurate determination of copy number is limited by the precision of the methodology used, which needs to be robust and replicable to give maximum power and reduce spurious disease associations.
New whole-genome genotyping platforms now incorporate probes to interrogate multiple CNVs , and have begun to yield associations between such variants and disease phenotypes . However, even these methods are sensitive to DNA quality  and are limited in their accuracy to genotype multiallelic loci. The recent report by the Wellcome Trust Case Control Consortium  discusses in detail the potential artefactual influences on copy number determination and demonstrates the potential for reporting spurious associations with CNVs, especially multiallelic regions. To complement whole genome approaches, particularly for complex diseases where effect sizes are likely to be small, locus specific methods are also required for investigating multiallelic CNV loci. Such targeted methods can provide sufficiently accurate data to limit the potential for false positive or negative results as a result of differential bias. Locus specific methods are generally PCR-based, and perhaps the most widely used is Real-Time quantitative PCR, which depends upon the quantification of a copy variable test locus in comparison with an unrelated reference locus . However, differential amplification efficiencies between test and reference loci can generate inaccuracies, particularly for high copy number measurements where the relative difference in ratio between test and reference products is smaller for neighbouring copy number classes. An alternative locus specific method with improved reliability for CNV determination is the paralogue ratio test (PRT) . This method uses a single pairs of primers to amplify specifically two products simultaneously, one from a copy-constant reference locus and the other from the copy variable test locus of interest. The copy number of the test locus is then estimated from the ratio of test to reference products. Recently, the accuracy of the PRT method has been directly compared with a Real-Time quantitative PCR method of copy number measurement for β-defensins and shown to be the more accurate and robust approach . The PRT method has previously been used to successfully analyse both β-defensins and CCL3L1/CCL4L1 copy numbers [10, 11].
The copy variable genes CCL3L1 and CCL4L1 located on chromosome 17q12 lie in a 90 kb repeat unit, neighbouring the paralogous, but copy invariant, genes CCL3 and CCL4 . CCL3L1 and CCL4L1 show 96% sequence similarity at both the nucleotide and protein level with their respective paralogues [13, 14]. All four genes (CCL3, CCL3L1, CCL4 and CCL4L1) encode chemokines, chemotactic cytokines, which play an important role in the immune response by attracting lymphocytes and macrophages to sites of infection and inflammation. Furthermore they are all natural ligands for CCR5, the co-receptor used by HIV-1 for cell entry [15, 16], with CCL3L1 being the most potent . As such there have been a number of association studies focused on CCL3L1 copy number variation and HIV-1 susceptibility [7, 18, 19]. However, the reported associations are under dispute and the accuracy of CCL3L1 measurement is a major factor in the debate [20–23].
Here we report a modification of a previously described PRT method to measure CCL3L1/CCL4L1 copy number making it more efficient, cost effective and convenient. As CCL3L1 and CCL4L1 function as an attractant of inflammatory mediators we have performed three case control studies with autoimmune phenotypes (Crohn's disease, rheumatoid arthritis and psoriasis). We subsequently discuss in detail the accuracy of the PRT methodology used, the precision in CCL3L1/CCL4L1 copy number measurement and highlight the implications of differential bias with copy number variation.