In this study, we first documented the distribution of regions with variable copy number between twelve populations, comparing them with a Sub-Saharan African population (YRI) by aCGH, and using two independent CNV algorithms. We used stringent criteria to define a CNV based on the log2 ratios above 0.25 (both direct and dye-swap labels) for 3 consecutive probes in at least one of the comparison experiments, and concordance with the two CNV algorithms. We focussed on CNV of at least 30-kb in order to identify relative large polymorphisms that could present a specific pattern of variation among human populations.
Previous studies of samples in the HapMap collection and the HGDP panel highlighted a gradual decrease in genetic diversity in function of the distance from Sub-Saharan Africa [24, 36–40], a result that reflects the influence of geography on human genetics and that is consistent with the serial-founder model of human expansion out of Sub-Saharan Africa. These African populations have the highest degree of heterozygosity in most genomic regions, while populations from America and Oceania present the lowest intra-population variance. By using a pooled approach we looked to dilute intra-population differences and to enhance inter-population variability. Hence, comparing different pools of individuals from different populations with a pool from an African population, we expected to find more CNV as the geographic distance from Africa increases. Indeed, we found 54 loci that show structural variation in the form of CNV in worldwide populations (Additional file 2: Figure S1), most specifically in populations from America and Oceania, followed by Eastern Asian and Western Asian populations (Additional file 2: Figure S2).
Interestingly, we found an enrichment of segmental duplications (SD) in the loci detected and 58 known genes totally or partially overlapping some CNV regions (Additional file 3: Table S3). Most of these genes are involved in sensory perception, the immune system and distinct metabolic pathways, and some are associated with disease. These results are consistent with previous reports [1, 3], and they clearly support the idea that population-specific CNV profiles could explain adaptations to environmental pressure and differences in disease prevalence among populations.
Due to the significant association of LCE3C_LCE3B-del to the susceptibility to some autoimmune diseases that have a higher prevalence in some populations from developed countries, such as psoriasis, we expected to find differences in the frequency of the deletion among populations with different geographic origins and demographic histories. Re-analysing the aCGH data for the LCE3C_LCE3B region, we observed differences in signal intensity in a region smaller than the 6 probes that cover the 32-kb deletion. This could suggest population differences in the CNV breakpoints, particularly since the Database of Genomic Variants currently lists more than 25 distinct large structural variants that span either the LCE3C and LCE3B genes, or both, and the specific breakpoint coordinates used in our study correspond to one deletion identified for the first time in a European population. However, the same breakpoints were also used successfully in some Asian groups. Furthermore, it is important to consider that some of the probes in the array might not be absolutely specific and they may hybridise to similar sequences, probably other LCE genes, masking the signal from similar regions. Indeed, it is known that ascertaining CNV by aCGH is complicated due to poor power and non-trivial rates of false positives. Moreover, using genome-wide scanning techniques to detect CNV, like the Agilent H244k aCGH array, have a limited capacity to characterize specific breakpoints. In a population survey of the frequency of the deletion by PCR, we could amplify the deleted and non-deleted allele in all populations and samples, which confirms the limited power of the aCGH platform to characterize specific breakpoints. Although other unmeasured CNVs may affect this region in some populations, our analysis indicates that the specific deletion studied here may be the predominant one in most populations.
We found most populations to have a high percentage of the deleted allele, mostly in the heterozygous state, with the exception of some isolated instances and Karitiana population, which present a high number of relatives pairs that could reduce the representativeness of allele and genotype frequencies in this population . The aCGH results indicate that most populations tend to have a higher frequency of the deleted allele than Sub-Saharan Africans. The high frequency of the deletion could reflect some selection for the deletion among human populations, even though it has recently been described as a susceptibility factor for psoriasis and other autoimmune diseases [16, 29, 32–34]. Thus, the 32-kb LCE3C_LCE3B-del could offer protection against an unknown element, and its role as a susceptibility factor for autoimmune inflammatory diseases may be a “new” consequence of this earlier adaptation. In other words, the LCE cluster could had been subjected to natural selection at different times during human evolution, and a partial sweep of the deletion could occur if individuals carrying the deleted allele had greater resistance to specific pathogens (for example). In such scenario, however, the fitness advantage would have to outweigh the loss of the LCE3C_LCE3B genes and the potential regulatory changes of the LCE cluster incurred by disruption to the surrounding genomic region. A similar scenario has been put forward for rearrangements associated with the alpha-globin gene family, where recurrent deletions of HBA1 and HBA2 associated with alpha-thalassemia have reached a high frequency in Mediterranean and Pacific populations . Moreover, it is important to take into account that the autoimmune diseases related to LCE3C_LCE3B-del are also thought to be associated to other genetic variants [55–57], and that an important environmental component is involved in these disorders [58–61]. Thus, LCE3C_LCE3B-del only represents another genetic factor involved in susceptibility to these diseases, together with environmental factors like infections, drugs, stress, smoking and climate.
Populations in the HGDP have different sample sizes and, in some of them, first and/or second degree relatives pairs have been detected , which could both influence the estimation of the true values of allele and genotypes frequencies that underlie several studied populations. Furthermore, they present different demographic histories, and all these factors may also affect the power to detect selection. We should take into account the possibility that genetic differences among human populations could be caused by neutral demographic processes, such as “allele surfing”. This phenomenon is the result of the intense amount of genetic drift produced by strong bottlenecks that occurred during the exit “out of Africa”, which was followed by a spatial expansion that could lead to the geographic spread of an allele and increase its frequency in newly colonised areas . This neutral process has recently received special attention due to its consequences for allele frequencies that appear to reflect a selective process. Thus, a definitive evidence of the influence of the natural selection on genetic population differences it is not currently available. For example, two reports described an increase in the frequency of a derived allele outside Africa for two genes involved in the control of brain size (MCPH1 and ASPM), and high LDs [63, 64]. It was proposed that the derived haplotypes might be under local positive selection in non-African populations, although it was recently demonstrated that neutral allele surfing could generate similar geographic distributions of allele frequencies during the range expansion of Africa .
It is clear that the deleted allele has been established in most world populations, which probably has some kind of functional consequences. Expression of the LCE3C and LCE3B genes is induced upon epidermal activation as a consequence of inflammation or skin disease . However, the high frequency of the deletion worldwide suggests the existence of some redundancy in the function of LCE genes in this cluster. It is possible that other genes fulfil the function of LCE3C and LCE3B, although imperfectly, contributing to the abnormal differentiation and epidermal hyperproliferation characteristic of psoriatic lesions. Thus, when other susceptibility components are not present, the deletion is insufficient to produce the abnormal phenotype but when several susceptibility components concur, the LCE3C_LCE3B-del could lead to disease development.
In the previous study identifying the 32-kb deletion associated with psoriasis in several populations of European ancestry, 14 SNPs were found related with LCE3C_LCE3B-del, with allele G at rs4112788 being the only one in strong LD . Despite we could contemplate, a priori, the possibility that the association between these SNPs and the LCE3C_LCE3B-del could vary among populations, a strong LD between a given SNP and CNV suggest a single origin of both variants. For this reason we did not expect to find strong association between other SNPs and the LCE3C_LCE3B-del in other populations.
A strong LD might be also a common feature of a biallelic CNV, which is particularly useful in association studies for complex disorders in which the redundancy of information implied by LD can be used to optimize genotyping. Nevertheless, this might not be useful in association studies of all populations, since LD patterns vary among populations of different geographic origin and a much higher proportion of r
variance could be attributed to differences between continental regions , with similar characteristics found in CNV. Specifically, increases in LD as the geographic distance from East Sub-Saharan Africa augments have been reported, with the highest values occurring in the Americas, followed by Oceania, East Asia, Eurasia and Africa. As for CNV, this pattern matches the prediction from a model of sequential founder effects during spatial expansion from Africa, given that such founder effects would be expected to increase the LD at each step of the expansion [24, 67].
Our results for LCE3C_LCE3B-del are consistent with previous studies showing that the extent of LD in non-Africans is higher than in Africans , reflecting the origin and spread of modern humans from Africa. We found r
values >0.8 in all non-African populations with the exception of two Chinese groups (CHB and population grouped as SEA) and the Makrani population. Although these exceptions are not defined as “isolated” in the HGDP, they may reflect a particular demographic and genetic history of these populations or alternatively, a bias due to the small number of individuals from these populations in the study. However, the LD pattern between rs4112788G and LCE3C_LCE3B-del found for all populations differs from other studies. While the trend observed for general LD consists of a successive increase in the LD in Middle East-North Africa, Central South Asia, Europe, East Asia, Oceania and America with respect to Sub-Saharan Africa, we essentially detect low r
values in African populations and similar high values for the rest of the world.