Inter-population variability of DEFA3 gene absence: correlation with haplotype structure and population variability
© Ballana et al. 2007
Received: 21 September 2006
Accepted: 10 January 2007
Published: 10 January 2007
Skip to main content
© Ballana et al. 2007
Received: 21 September 2006
Accepted: 10 January 2007
Published: 10 January 2007
Copy number variants (CNVs) account for a significant proportion of normal phenotypic variation and may have an important role in human pathological variation. The α-defensin cluster on human chromosome 8p23.1 is one of the better-characterized CNVs, in which high copy number variability affecting the DEFA1 and DEFA3 genes has been reported. Moreover, the DEFA3 gene has been found to be absent in a significant proportion of control population subjects. CNVs involving immune genes, such as α-defensins, are possibly contributing to innate immunity differences observed between individuals and influence predisposition and susceptibility to disease.
We have tested the DEFA3 absence in 697 samples from different human populations. The proportion of subjects lacking DEFA3 has been found to vary from 10% to 37%, depending on the population tested, suggesting differences in innate immune function between populations. Absence of DEFA3 was correlated with the region's haplotype block structure. African samples showed a higher intra-populational variability together with the highest proportion of subjects without DEFA3 (37%). Association analysis of DEFA3 absence with 136 SNPs from a 100-kb region identified a conserved haplotype in the Caucasian population, extending for the whole region.
Complexity and variability are essential genomic features of the α-defensin cluster at the 8p23.1 region. The identification of population differences in subjects lacking the DEFA3 gene may be suggestive of population-specific selective pressures with potential impact on human health.
Defensin genes encode a family of small cationic peptides that act as antimicrobial mediators of the innate immune system . Defensins are arginine-rich peptides and invariably contain disulfide-linked cysteine residues, whose positions are conserved . The two main defensin subfamilies, α- and β-defensins, differ in the length of the peptide segments between cysteine residues and in the arrangement of disulphide bonds that link them. β-defensins have been found in most vertebrate species, whereas α-defensins are specific to mammals . Based on their adjacent chromosomal location, similar precursor peptides and gene structures, it has been postulated that all vertebrate defensins arose from a common gene precursor . While the efficacy of individual defensins against specific infectious agents varies, they have shown antimicrobial activity against gram-negative and gram-positive bacteria, fungi and enveloped viruses [1, 5]. At high concentrations, some defensins are also cytotoxic to mammalian cells, as cells exposed to high amounts of defensins in inflamed tissues generate pro-inflammatory signals that can contribute to tissue injury . In humans, most of the genes encoding α- and β-defensins are located in clusters on chromosome 8p23.1 [6, 7]. Within the region, two different defensin clusters can be distinguished: a telomeric cluster mostly containing α-defensin genes (DEFB1, DEFA6, DEFA4, DEFA1, DEFT1, DEFA3 and DEFA5) and at least two centromeric clusters of β-defensin genes (DEFB109p, DEFB108, DEFB4, DEFB103, DEFB104, DEFB106, DEFB105 and DEFB107) .
Chromosome band 8p23.1 is known to be a frequent site of chromosomal rearrangements mediated by low copy repeats (LCRs) or segmental duplications (SDs). It has been described that as many as one in four individuals from the general population carry a 4.7 Megabase (Mb) inversion of the region [8–10]. In addition, copy number variability involving both α-defensin (DEFA1 and DEFA3) and β-defensin (DEFB4, DEFB103 and DEFB104) genes in chromosome 8p23.1 has been well detected and characterized [11–14]. The number of DEFA1 and DEFA3 gene copies has been reported to range from 4 to 11 in a sample of 111 subjects, the DEFA3 allele being completely absent in 10% of them . Gene nomenclature for DEFA1, DEFT1 and DEFA3 has been replaced by DEFA1A3, following recommendations of Aldred et al, since these genes have been considered as being part of a copy number variant (CNV) region . In another study, Linzmeier and colleagues determined copy numbers of the DEFA1 and DEFA3 alleles in 27 subjects and found between 5 and 14 copies per diploid genome, with DEFA3 being absent in 26% of them .
Despite DEFA1 and DEFA3 being considered as members of the same CNV (DEFA1A3), they encode different peptides, HNP-1 and HNP-3, respectively. The mature HNP-1 and HNP-3 peptides differ only in their N-terminal amino acid, due to a single nucleotide difference, C3400A, between the DEFA1 and the DEFA3 genes . This C3400A is a paralogous sequence variant (PSV) that allows discrimination between the two gene copies. The HNP-2 peptide is identical to the last 29 amino acids of both the HNP-1 and the HNP-3 peptides. HNP-2 is presumably produced from proHNP-1 and/or proHNP-3 by post-translational proteolytic cleavage . It is likely that one or both genes, or another member of the DEFA1A3 CNV cluster encode the HNP-2 peptide. The three peptides are constitutively produced by neutrophil cell precursors and packaged in granules before mature neutrophils are released into the blood. During phagocytosis, the defensin-containing granules fuse to phagocytic vacuoles where defensins act as antimicrobial agents .
Recent work has shown that CNVs are a major source of genetic variation . Individual variability in resistance to infectious diseases has been extensively reported . However, the causes of this diversity in immune function are poorly understood. CNVs involving immune genes could contribute to the differences in innate immunity between individuals and influence predisposition and susceptibility to diseases, as it has been shown for human immunodeficiency virus and AIDS . Thus, it is important to analyze the impact of defensin gene CNVs on human health, both in healthy volunteers and in patients with disease [1, 19]. In this report we have studied the presence of DEFA3 in samples from different human populations. For this purpose, we used the International Haplotype Map (HapMap) Project collection and a cohort of Spanish healthy individuals.
We have analyzed 786 samples from four populations with ancestry in Europe, Africa or Asia (the HapMap collection), including Spanish healthy individuals. The source used for this study was the HapMap collection of 269 samples utilized by the International HapMap Consortium for the study of human genomic variation, initially through the investigation of SNPs and their associated haplotypes , and 180 additional HapMap samples. This collection comprises four populations: 30 parent-offspring trios (90 individuals) of the Yoruba from Ibadan, Nigeria (YRI), 30 parent-offspring trios (90 individuals) of European descent from Utah, USA (CEU), 45 unrelated Japanese from Tokyo, Japan (JPT) and 44 unrelated Han Chinese from Beijing, China (CHB). In addition, 30 Yoruban trios, 45 unrelated Japanese and 45 unrelated Chinese from the HapMap collection, but not genotyped in the HapMap project, were analyzed. The Spanish samples were 336 unrelated blood donor controls, all of Caucasian origin. Genomic DNA from EBV-transformed lymphoblastoid cell-lines was used. As Chinese and Japanese allele frequencies are found to be very similar , the analysis was performed combining both datasets, resulting in four different groups of samples tested: two Caucasian groups (CEU and Spanish general population subjects), Yoruba and Chinese/Japanese.
Absence of DEFA3 in Caucasian, Yoruba, Chinese/Japanese reference HapMap samples and in Spanish control samples
Paired chi-square p-value
Yoruba (n = 120)
Caucasian (n = 60)
Japanese/Chinese (n = 181)
Spanish (n = 336)
HapMap samples have been tested for the presence of CNVs by two different techniques Affymetrix SNP array and BAC array . DEFA1A3 region was identified as a CNV in 23 subjects (3 Caucasian, 7 Yoruban, and 13 Chinese/Japanese), but only in four cases where a gain or loss was detected, DEFA3 is absent. Copy number variation in the DEFA1A3 region is reported to be much more common than the variation identified by Redon et al . However, the small size of the DEFA1A3 CNV makes it undetectable with BAC arrays. Moreover, the presence of segmental duplications in the region entails a bad SNP coverage of the region by the Affymetrix SNP array, which does not allow an accurate detection of the CNV. Thus, the study of this CNV for association purposes has to be performed by quantitative methods or by the analysis of paralogous sequence variants.
A region of 100 kb, spanning from 6,810,001 bp to 6,910,000 bp, which contains the DEFA1A3 cluster and the single copy gene DEFA5 was chosen for the linkage disequilibrium analysis (based on human genome assembly hg17) (Figure 2). The HapMap data for the DEFA1A3 region included around 150 SNPs for each population (151 Caucasian, 169 Yoruba, 158 Japanese and 154 Chinese). However, only 136 of the SNPs had genotype data in all four populations. Interestingly, almost all genotyped SNPs are located outside the DEFA1A3 cluster (Figure 2). The absence of genotyped SNPs in the DEFA1A3 cluster is in agreement with the presence of segmental duplications that include the DEFA1A3 genes. Thus, the non-homogeneous distribution of SNPs within the region could be at least partially explained by the presence of high homologous repeated sequences. Genotyping errors enhanced by the presence of DEFA1/DEFA3 tandem gene arrays could have lead investigators to discard SNPs located within this region.
Of the 136 SNPs analyzed in all four populations, 55 were monomorphic in at least one of them (28 out of the 55 SNPs were monomorphic in all populations). Monomorphic SNPs can be used to measure genetic variability, by analyzing their distribution in the different populations. The Chinese and Japanese groups had the highest proportion of monomorphic SNPs (34%) which was very similar to that observed for Caucasian samples (31%), whereas the Yoruba samples had the smallest number of monomorphic SNPs (24%). This indicates that genetic variability is higher within Yoruba samples, while Chinese/Japanese and Caucasian populations show similar proportions of genetic variability. This higher variability for Yoruba samples is similar to that detected in the HapMap analysis for the whole genome . Interestingly, the proportion of monomorphic SNPs in this region is about 10% higher for each population group than the average reported for the HapMap data .
Several studies have recently reported a previously unknown high prevalence of copy number variation in humans . A recent study of CNVs in the HapMap samples has defined over 1400 CNV regions . On average, each individual varies at over 100 CNVs, representing about 20 Mb of genomic DNA difference. It has been suggested that CNVs account for a significant proportion of human normal phenotypic variation. It is thought that CNVs may also have an important role in the pathological variation in the human population [16, 23]. Analyses of the functional attributes of currently known CNVs reveal a remarkable enrichment for genes that are relevant to molecular-environmental interactions and genes that influence response to specific environmental stimuli, such as genes involved in immune response and inflammation .
CNVs involving α- and β-defensin genes (DEFA1A3 and DEFB4/DEFB103A) in the 8p23.1 region have been extensively characterized [12–14]. From a pathologic point of view, it is likely that α- and/or β-defensin CNVs affect the function and effectiveness of innate immunity. Such effects could be influenced by the frequent absence of the DEFA3 allele. In the present work, we have tested the absence of the DEFA3 allele in different human populations, finding significant differences between them, which could be indicative of differences in innate immune function between populations. This is not surprising since the different human population groups have been exposed to different environments regarding infectious agents and other factors. One obvious way by which CNVs result in human phenotypic diversity is by altering the transcriptional levels of the genes which vary in copy number . In addition, it has been postulated that retention of duplicate genes, rather than mutation to pseudogenes or neofunctionalization, is due to the generation of increased amounts of a beneficial product . This could be the case of DEFA1A3 in which variation in DEFA1 and DEFA3 copy number, and DEFA3 absence could underlie variable resistance to infection among individuals. Different selective pressures acting in each geographic region could likely explain population differences in DEFA3 absence.
Taudien and colleagues by manual clone-by-clone alignment significantly improved the assembly of defensin 8p23.1 locus, providing in silico evidences of the experimentally verified variability in defensin copy number and better representing the locus diversity . The exceptional genomic complexity and heterogeneity of the human 8p23.1 locus and the prominent role of defensins in the innate immunity framework raise the question of whether individual patterns of haplotypes, together with the variability in defensin genes copy number, affect the functionality of the defensin system. To address this issue, Taudien et al provided a molecular approach for the determination of individual defensin gene repertoires limited to 8p23.1 β-defensin clusters and using data from a 500 bp fragment in 4 individuals . In our case, we have characterized in detail the haplotype diversity and LD structure of a 100-kb region around α-defensin locus in 269 HapMap samples. The SNP distribution of the region is characteristic of the presence of segmental duplications, which result in a low-density of SNPs selected for genotyping. As previously reported for other genomic regions , the Yoruba samples present a higher variability than both the Chinese/Japanese and Caucasian samples. Additionally, in the Yoruban, the haploblock structures were smaller and the extent of LD between SNPs was lower, in accordance with the out-of-Africa theory for the origins of humans. The observation that the proportion of subjects lacking the DEFA3 gene is greater in Yoruba samples together with the fact that DEFA3 is thought to be human specific  may be an indication of the higher amount of original genetic variation among the first humans living in Africa, which afterwards migrated to other continents. The initial migration occurred as multiple, branching events and involved many founder effects in which certain haplotypes, SNPs and alleles appear to have increased in frequency in emigrant populations owing to genetic drift and different selection pressures . In this sense, we observed a diminished frequency of subjects without DEFA3 in Caucasian and Asian samples.
When association with DEFA3 absence was tested, SNPs and haplotypes in the Caucasian population were the only ones to be significant. The association observed in the Caucasian samples could be the result of strong founder effect. Founder effects and, particularly, the decrease in genetic diversity resulting from continental migrations, are associated with an increased haplotype length . This is observed when comparing the haplotype block patterns of the different populations analyzed, in which the Caucasian samples set has the longest haplotype blocks. Alternatively, Aldred and colleagues demonstrated that DEFA3 has arisen at the 5' end repeat position and has transferred to other positions within the array through unequal recombination between alleles , suggesting that recombination has been active in shaping diversity in the DEFA1A3 locus. However, our results indicate that, at least in the Caucasian samples, there has been little recombination between chromosomes with and without DEFA3, as we are able to find a haplotype associated with DEFA3 absence extending for nearly 100-kb. Moreover, as for DEFA3 absence, other haplotypes are likely to be associated with other patterns of CNV polymorphisms. However, other situations cannot be rule out without analyzing large pedigrees to determine unambiguously each chromosome structure at DEFA1A3 CNV.
The impact on human health of this qualitative variation in the presence of the DEFA3 gene product deserves to be explored in epidemiologic studies. Different studies have described differences in the function and specificity of DEFA1 and DEFA3 gene products, HNP1 and HNP3 [1, 19]. In general, HNP3 is thought to be less active than HNP1 against both gram-positive and gram-negative bacteria , but it is expressed at about twice the level of HNP1 . On the other hand, DEFA3 but not DEFA1, has been found upregulated in patients with systemic lupus erythematosus, idiopathic thrombocytopenic purpura or rheumatoid arthritis, suggesting that DEFA3 upregulation might be a general feature of autoimmune diseases [27, 28]. Therefore, the observed differences in DEFA3 absence may partially explain the different population incidences of infectious and/or autoimmune diseases in which DEFA3 plays an important role. Future studies are needed to establish whether patterns of DEFA3 absence correlate with certain population microbial exposures or different prevalence of autoimmune disorders. This could also be important in determining the exact nature of DEFA3 function and its specificity of action, if any, against certain antigens. Last, but not least, further studies focused on the determination of the total copy number of DEFA1A3 units will be crucial to build the complete picture of DEFA1A3 CNVs' impact on human health.
Complexity and variability are essential genomic features of the α-defensin cluster at 8p23.1 region. The present work gains insight into the existent variability in human populations in this specific region. The identification of population differences in the proportion of subjects lacking the DEFA3 gene may be suggestive of population-specific selective pressures, which should be studied in further inter-population epidemiological studies.
The analysis was performed on 450 HapMap samples and 336 Spanish controls. Unless otherwise noted, all samples were obtained from the Coriell Institute for Medical Research. A detailed description of HapMap populations samples can be found elsewhere . Written informed consent for the Spanish controls was obtained with the approval of the Institute Review Board and Ethics Committee.
A PCR amplification assay followed by restriction enzyme digestion (PCR-RFLP) has been used to discriminate DEFA1 (GenBank accession number L12690) and DEFA3 (GenBank accession number L12691) genes differing by a single nucleotide. A fragment of 304 bp around C3400A SNP was PCR amplified with fluorescently labelled primers (Forward 5'-TGAGAGCAAAGGAGAATGAG-3', Reverse 5'-GCAGAATGCCCAGAGTCTTC-3') and digested with HaeIII enzyme. In order to accomplish complete digestion, we used saturating conditions (2.5 U/25 μl reaction) of the enzyme to digest a short DNA fragment containing only one cutting site. In addition, in all the runs, a DEFA3 negative sample was included, as a positive control of the assay. About 2 μl of digestion product was added to 10 μl HiDi formamide containing ROX500 marker (Applied Biosystems) and run on an ABI 3100 capillary system (Applied Biosystems). Peaks were analysed using Genemapper software (Applied Biosystems).
The UCSC Genome Browser  served as the main source of genomic sequence, using the human genome assembly hg17. The region analysed was a 150 kb contig from 6,760,001 bp to 6,910,000 bp of chromosome 8p23.1 (based on human genome assembly hg17). Sequences were repeat-masked and aligned against itself using PipMaker . The size, orientation and structure of segmental duplications can be interpreted by using the PIP and Dot-Plot output generated by PipMaker. Multiple sequence alignments and phylogenetic tree construction were carried out by using the ClustalW program .
Between groups chi-square test was performed to compare the proportion of DEFA3 absence in different human populations. Genotyping data from HapMap public database  was used to test the hypothesis of association between geneticpolymorphisms and DEFA3 absence using logistic regression models. Odds ratios (OR)and 95% confidence intervals (95% CI) were calculated for eachgenotype compared with the homozygous for the major allele (theallele with greater frequency among individuals lacking the DEFA3 allele). Analyses were initially done under a codominant inheritance model (three genotypes separated). Then, simplified models were fitted: a dominant model (heterozygous grouped with the homozygous for the minor allele), a recessive model (heterozygous grouped with the homozygous for the major allele), an overdominant model (homozygous grouped) and a log-additive model (a score was assigned counting the number of minor alleles: the homozygote for the major allele was given score 0, the heterozygote score 1, and the homozygote for the minor allele score 2). The model with lowest Akaike information criteria was the recessive one (minus twice the log likelihood of the model plus the number of variables in the model) and it was selected for an easy summary of the results. P values were derived from likelihood ratio tests, and a significance level of 5% (two sided) was used for the analyses. All these analyses were performed using the SNPassoc R package .
Haploblocks were constructed using Haploview program . Haplotypes were reconstructed using the expectation maximization (EM) algorithm implemented in the haplo.stats R package . The OR and 95% CI were estimated using a generalized linear-regression framework that incorporates haplotype phase uncertainty by inferring a probability matrix of haplotype likelihoods also implemented in haplo.stats library.
We want to thank Raquel Rabionet for helpful comments in the preparation of the manuscript. This work was financially supported by Fundació La Marató de TV3 (993610), Instituto de Salud Carlos III, FIS-ISCIII (G03/203, PI052347 and CIBER-CB06/02/0058) and Departament d'Universitats i Societat de la Informació, Generalitat de Catalunya (2005SGR00008). The Spanish National Genotyping Center (CeGen) is founded by Genoma España. EB is recipient of a FI fellowship from Departament d'Universitats i Societat de la Informació, Generalitat de Catalunya (2003FI00066). NB is a recipient of a BEFI fellowship from Instituto de Salud Carlos III FIS-ISCIII.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.