Insights into the regulation of human CNV-miRNAs from the view of their target genes
© Wu et al.; licensee BioMed Central Ltd. 2012
Received: 22 April 2012
Accepted: 7 December 2012
Published: 18 December 2012
Skip to main content
© Wu et al.; licensee BioMed Central Ltd. 2012
Received: 22 April 2012
Accepted: 7 December 2012
Published: 18 December 2012
microRNAs (miRNAs) represent a class of small (typically 22 nucleotides in length) non-coding RNAs that can degrade their target mRNAs or block their translation. Recent research showed that copy number alterations of miRNAs and their target genes are highly prevalent in cancers; however, the evolutionary and biological functions of naturally existing copy number variable miRNAs (CNV-miRNAs) among individuals have not been studied extensively throughout the genome.
In this study, we comprehensively analyzed the properties of genes regulated by CNV-miRNAs, and found that CNV-miRNAs tend to target a higher average number of genes and prefer to synergistically regulate the same genes; further, the targets of CNV-miRNAs tend to have higher variability of expression within and between populations. Finally, we found the targets of CNV-miRNAs are more likely to be differentially expressed among tissues and developmental stages, and participate in a wide range of cellular responses.
Our analyses of CNV-miRNAs provide new insights into the impact of copy number variations on miRNA-mediated post-transcriptional networks. The deeper interpretation of patterns of gene expression variation and the functional characterization of CNV-miRNAs will help to broaden the current understanding of the molecular basis of human phenotypic diversity.
miRNAs are a class of small non-coding RNAs, which act through binding in a sequence-specific manner to the 3′UTR of target genes . Each miRNA can potentially regulate many transcripts and at least one-third of human genes are estimated to be miRNA targets. miRNAs participate in posttranscriptional gene regulation by repressing the expression of their target genes through inhibition of translation or cleavage of mRNAs [2–6]. miRNAs also contribute to genetic buffering of the gene expression variation, and play an important role in maintaining the identity of mature tissues through a feed-forward loop regulatory architecture [7, 8], such as the relationship between miR-9a and E(spl) in Drosophila[9, 10] and the regulation of E2F1 by miR-17 in human .
A primary goal in medical and evolutionary genomics is to understand the genetic mechanisms of natural variation in gene expression [12–16]. The structure of the human genome is highly variable and the copy number variations (CNVs) refer to alterations of genomic segments of more than 1,000 nucleotides that are present at significant frequencies within a population [17–19]. Many studies showed that CNVs can expand dosage variation of the associated genes, leading to the under-representation of dosage-sensitive protein-coding functional units such as transcription factors and members of protein complexes [20, 21]. CNVs can be discovered by cytogenetic techniques, such as fluorescent in situ hybridization, comparative genomic hybridization, array comparative genomic hybridization, and next-generation sequencing [22–24]. In humans, more than 30,000 genomic regions with segmental duplications have been recognized by systematic comparative genomic hybridizations on the DNA of healthy human subjects; however, the CNVs of other animals were far less studied (see http://projects.tcag.ca/variation). For example, only about 2,000 CNVs have been identified in Pan troglodytes and about 4,000 CNVs in inbred Mus musculus[26, 27].
Recent studies revealed a high frequency in copy number abnormality of miRNA processing genes, such as Dicer1 and Argonature2, in breast and ovarian cancers [28, 29]. Although copy number alterations of miRNAs and their regulatory genes were frequently investigated in oncogenesis [28–30], the evolutionary and functional impact of CNV-miRNAs on the human genome has not been studied extensively. Based on the human genomic structure variations, Marcinkowska et al. recently detected about 30% miRNAs located in the human CNV-regions, indicating that non-coding RNAs also have potential functional variants .
In this study, we comprehensively analyzed the properties of genes regulated by CNV-miRNAs and explored the potential involvement of CNV-miRNAs in the expression variability of their targets within and between populations. Our analysis revealed significant functional differences between the targets of CNV-miRNAs and the targets of non-CNV-miRNAs. The involvement of CNV-miRNAs in a wide range of cellular responses provided us with valuable information of the impact of CNVs on the post-transcriptional network.
We first compiled the genes regulated by CNV-miRNAs using the targets from TargetScan5.1 , which predicts miRNA targets based on sequence complementarities, sequence context information and binding energy. Because of its high confidence, TargetScan5.1 has been widely used in a variety of “omics” studies (see Methods) [32–34]. From among the miRNA-Target associations that were obtained, the representative miRNA for each family with the lowest total context score was presented, but all other miRNAs from the same family were considered to target the same gene at the same target sites . To study the non-redundant miRNA binding sites directly, we replaced the miRNAs by their miRNA-family ID. Finally, 63,428 regulatory relationships were constructed comprising 541 miRNA-families and 9,174 targets (see Additional file 1).
According to the study by Marcinkowska et al., a total of 209 miRNAs were found to locate in the human CNV-regions. These miRNAs belong to 172 families (see Additional file 2); the remaining 369 miRNA-families had no members in the CNV-regions. In the following analysis, these two types were referred to as CNV-miRNAs and non-CNV-miRNAs, respectively.
We investigated target genes of the non-CNV-miRNAs and CNV-miRNAs and classified them into three groups (see Additional file 3). The first group contains a total of 1,134 target genes that are regulated exclusively by CNV-miRNAs, 823 of the genes are regulated by one CNV-miRNA, 211 by two CNV-miRNAs, 67 by three CNV-miRNAs, 22 by four CNV-miRNAs, and 11 by ≥ 5 CNV-miRNAs. The second group contains a total of 5,710 target genes that are regulated by non-CNV-miRNAs and at least one CNV-miRNA. The third group consists of 2,330 target genes that are regulated exclusively by non-CNV-miRNAs, 1,408 of the genes are regulated by one non-CNV-miRNA, 515 by two non-CNV-miRNAs, 207 by three non-CNV-miRNAs, 95 by four non-CNV-miRNAs and 105 by ≥ 5 non-CNV-miRNAs.
Simulation analysis to explore the target-recognition preference of CNV-miRNAs and non-CNV-miRNAs
The numberof regulatory miRNAs
Mean of 1,000 simulations
Std. dev of 1,000 simulations
Genes regulated exclusively by CNV miRNAs
Genes regulated exclusively by non-CNV miRNAs
Intuitively, CNVs of miRNA genes can dramatically change their dosage, and this would then affect the expression levels of the target genes in the corresponding individuals [5, 15]. Recently, a series of genome-wide gene expression profiles have been measured in four HapMap ethnic populations, CEU (U.S. residents with Northern and Western European ancestry), YRI (Yoruba people of Ibadan, Nigeria), CHB (Chinese Han in Beijing) and JPT (Japanese from Tokyo). We calculated the coefficient of variation (CV) for each protein-coding gene across individuals in the four populations to quantify the within-population expression variability of each of the genes (see Methods). Briefly, the CV is the ratio of the standard deviation of gene’s expression to its mean intensity, which is considered to be an unbiased and comprehensive metric to measure the regulation diversity at the expression level among individuals  (see Additional file 4).
A good study has demonstrated that the within-population expression variability of genes can influence the propensity of their differential expression levels between populations . Here, some CNV-miRNAs may live in different populations; thus, the genes targeted by these CNV-miRNAs are likely to be differentially expressed among individuals within a population and also between different populations.
Using the method described above, we identified genes that were differentially expressed in at least one of the four ethnic populations (see Additional file 6). As shown in Figure 4B, a similar number of genes were differentially expressed among six population pairs selected from the four ethnic populations. We then investigated whether genes targeted by CNV-miRNAs were over-represented in these differentially expressed genes. As shown in Figure 4C, the proportion of differentially expressed genes was 15.7% for targets regulated exclusively by non-CNV-miRNAs, 17.4% for targets regulated by both CNV-miRNAs and non-CNV-miRNAs (p=0.060, Chi-square, two-tail test), the proportion increased further to 21.7% for targets regulated exclusively by CNV-miRNAs (p=0.001, Chi-square, two-tail test).
It is interesting to know whether or not the orthologs of human CNV-miRNAs were also located in CNV-regions of other animals. We compiled the available CNVs of Pan troglodytes and Mus musculus[26, 27], and then intersected the location of their miRNAs with the coordinates of the CNVs. The results showed that only 21 and eight miRNA-families have members located in CNV-regions in Pan troglodytes and inbred Mus musculus, respectively (see Additional file 8). Hence, the human genome contained the highest proportion of CNV-miRNAs, making it the best model to detect the mechanisms and function of CNV-miRNAs.
Animal genomes have the characteristics of dynamics and plasticity, giving them the ability to adapt to changing environmental conditions. Mobile and evolving elements such as telomeres, transposons, and copy number variants have been studied in investigations into the potential effect of environment on genomes. For example, Haasl and Payseur designed a mathematical model to study microsatellite variations, such as the expected distribution of repeat sizes, and the expected squared difference in repeat size among samples; their simulations revealed that microsatellites, especially triplet repeats, provided adaptation facilitators for beneficial evolution of genomes . miRNAs are relatively newly discovered genomic elements, but their post-transcriptional regulation is present early on in metazoan evolution . The number of miRNAs in a genome correlates with the morphological complexity of the animal, indicating that they play roles in evolutionary changes of body structure . It is now widely accepted that an increase in the complexity of gene regulatory mechanisms, at both the genomic and transcriptomic level, drives the appearance of more complex organisms. Two distinct mechanisms of increasing complexity of gene expression, namely, the co-evolution between CNVs and miRNAs, have been recently recognized and studied. Marcinkowska et al. compared the fractions of miRNA loci and the fraction of genome covered by CNVs, and reported that the CNV purification effect was insignificant . Felekkis et al. demonstrated that the number of distinct miRNA types and the average number of miRNA binding sites in genes in CNV regions were significantly higher than genes in non-CNV regions . In this study, we proposed the miRNA-target recognition may play important roles in escape from purification of the CNV-miRNAs that target the same genes. Further analysis revealed that “targeting by CNV-miRNAs” seems to be favored and that the target genes participate in a wide-range of cellular responses to environmental factors. For target genes regulated by one miRNA, CNV-miRNAs tend to target a higher average number of genes than non-CNV-miRNAs. From an evolutionary viewpoint, if the CNV-miRNAs were deleterious and only remained in the genome because they were difficult to remove, then we might expect them to have a tendency to target, on average, a lesser number of genes than non-CNV-miRNAs; furthermore, if the CNV-miRNAs were neutral and their retention attributed to random genetic drift, the CNV-miRNAs and non-CNV-miRNAs should target a similar average number of genes. Therefore, some CNV-miRNAs seems to be beneficial to the organism and “targeting by CNV-miRNAs” may provide positive selective pressure to their target genes.
Our analyses revealed pervasive impacts of CNV on the miRNA-mediated post-transcription regulatory network. Previous studies demonstrated that miRNAs preferentially regulated the hubs of protein interaction  and metabolic networks . We here propose that the CNV of miRNAs may fluctuate the dosage balance of signal transduction pathways, metabolic flux or protein complexes [53, 54], leading eventually to individuals of the same population or different populations having different susceptibility to diseases . Although it is difficult to identify these CNV-miRNAs without a comprehensive investigation of health risks among human populations, recent experimental studies have discovered CNV-causing dysregulation of miRNAs that confirmed their roles in disease occurrence. In one study, next-generation sequencing technology was used to explore CNV as a potential mechanism of miRNA mis-expression, the affected miRNA loci were consistently found to be either lost or gained, and their candidate mRNA targets were coordinately dysregulated; the authors demonstrated the structure variation of the miRNA loci clearly characterized the pre-invasive stage of breast cancer . In another study, genetic networks were inferred from miRNA expression in normal and cancer tissues, and cancer networks built from disjointed sub-networks were found to accompany miRNA copy number alterations, such as the amplification of the hsa-miR-17/92 family, the deletion of the hsa-miR-143/145 cluster, and the physical alteration of the hsa-miR-204/30 at the DNA copy number level . The results of these studies clearly demonstrate the feasibility of using the dysregulation of CNV-miRNAs as biological markers for disease screening; indicating that CNV-miRNAs and their targets should be given more attention in studies of human health.
To the best of our knowledge, this is the first genome-wide integrative analysis among human CNVs, miRNAs, their targets and expression variations. Our results will pave the way for future studies for the functional characterization of CNV-miRNAs. This study reveals more clear roles of CNV-miRNAs and is valuable for studying the impact of CNVs on human health.
The miRNAs and their predicted targets were taken from TargetScan (http://www.targetscan.org version 5.1) [32, 33]. Targets with a total context score of −0.3 or lower were ignored, where the score quantitatively measure the overall target efficacy . A total of 9,174 targets with at least one conserved 7-mer or 8-mer were selected as reliable miRNA targets  (see Additional file 1).
The microarray-based gene expression profiles were derived from lymphoblastic cell lines of 270 HapMap individuals (http://www.sanger.ac.uk/humgen/genevar, GSE6536), including 90 samples of YRI (Yoruba people of Ibadan, Nigeria), 90 samples of CEU (U.S. residents with northern and western European ancestry), 45 samples of CHB (Chinese Han in Beijing) and 45 samples of JPT (Japanese from Tokyo) [60, 61]. The annotation table was retrieved from http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GPL2507. The RefSeq identifiers were transformed to Ensembl Gene ID through BioMart . Finally, the expression profiles of 16,686 human genes (including 8,636 miRNA targets) across four HapMap populations were complied.
The following formulas were adopted to calculate the coefficient of variation (CV) of gene i in each ethnic population.
The mean intensity M i calculated by
The standard deviation σ i calculated by ,
The coefficient of variation CV i calculated by
Where j=1 to n, n represents the number of samples in a population, S ij represents the expression signal of gene i in sample j. Greater CV implies higher expression variability of a gene across individuals within the corresponding population (see Additional file 4).
Minor allele frequency (MAF) refers to the frequency at which the less common allele occurs in a given population. SNPs with a minor allele frequency of 5% or greater were targeted by the HapMap project and have been widely employed in Genome Wide Association Studies for complex traits (GWAS) [62, 63].
Where N aa represents the count of individuals who are homozygous for allele1, N Aa represents the count of individuals who are heterozygous, N aa represents the count of individuals who are homozygous for allele2.
The differential expression ratios (DER) of human genes were obtained from the study by Chen et al. (FitSNPs, http://fitsnps.stanford.edu/download.php) . Briefly, the authors downloaded 476 human GEO datasets from the NCBI Gene Expression Omnibus and categorized each GEO dataset into 24 types of comparisons, such as disease state, cell type, metabolism and so on. A total of 4,877 subset-versus-subset comparisons were performed to identify differentially expressed genes with a cutoff of q value ≤ 0.05 by SAM package . For each human gene, the count of GEO datasets in which it was differentially expressed was divided by the count of its measured GEO.
The gene symbols and EntrezGene IDs were transformed to their Ensembl gene IDs using the BioMart program .The Ensembl genes with available DERs were then intersected with the genes that were used for TargetScan5.1 prediction. Finally, the DER values of 9,784 genes that are not regulated by miRNAs and 8,979 target genes of miRNAs were obtained.
The Gene Ontology (GO) has developed three structured controlled vocabularies to describe gene products in terms of their associated biological processes, cellular components and molecular functions . The human gene association file was downloaded from http://www.geneontology.org/gene-associations/. For each GO term, the proportion of annotated genes was compared between the genes regulated exclusively by CNV-miRNAs and the genes regulated exclusively by non-CNV-miRNAs. The p-value was estimated by Fisher’s exact two-tailed test, and a cutoff of p ≤ 0.05 was used to identify the over-represented or under-represented GO terms among the genes that are regulated exclusively by CNV-miRNAs.
The project was started and completed in Dalian Institute of chemical Physics. Computations were performed on a Linux cluster with 50 nodes (Intel 5130, 2.0 GHz CPU, 4G memory, Laboratory of Molecular Modeling and Design, Dalian Institute of Chemical Physics, Chinese Academy of Sciences). Perl (http://perl.org) and R (http://www.r-project.org/) scripts were used for analysis, and can be obtained on request.
Copy number variation
miRNA that is located in copy number variation regions
miRNA that is not located in copy number variation regions
U.S. residents with northern and western European ancestry
Yoruba people of Ibadan, Nigeria
Chinese Han in Beijing
Japanese from Tokyo
The coefficient of variation ratio
Minor allele frequency
This work was supported by funding from “Hundred Talents Program” of Chinese Academy of Sciences and State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.