A genome-wide survey for SNPs altering microRNA seed sites identifies functional candidates in GWAS
© Richardson et al; licensee BioMed Central Ltd. 2011
Received: 10 May 2011
Accepted: 13 October 2011
Published: 13 October 2011
Gene variants within regulatory regions are thought to be major contributors of the variation of complex traits/diseases. Genome wide association studies (GWAS), have identified scores of genetic variants that appear to contribute to human disease risk. However, most of these variants do not appear to be functional. Thus, the significance of the association may be brought up by still unknown mechanisms or by linkage disequilibrium (LD) with functional polymorphisms. In the present study, focused on functional variants related with the binding of microRNAs (miR), we utilized SNP data, including newly released 1000 Genomes Project data to perform a genome-wide scan of SNPs that abrogate or create miR recognition element (MRE) seed sites (MRESS).
We identified 2723 SNPs disrupting, and 22295 SNPs creating MRESSs. We estimated the percent of SNPs falling within both validated (5%) and predicted conserved MRESSs (3%). We determined 87 of these MRESS SNPs were listed in GWAS association studies, or in strong LD with a GWAS SNP, and may represent the functional variants of identified GWAS SNPs. Furthermore, 39 of these have evidence of co-expression of target mRNA and the predicted miR. We also gathered previously published eQTL data supporting a functional role for four of these SNPs shown to associate with disease phenotypes. Comparison of FST statistics (a measure of population subdivision) for predicted MRESS SNPs against non MRESS SNPs revealed a significantly higher (P = 0.0004) degree of subdivision among MRESS SNPs, suggesting a role for these SNPs in environmentally driven selection.
We have demonstrated the potential of publicly available resources to identify high priority candidate SNPs for functional studies and for disease risk prediction.
microRNAs (miRs) are small 20-24 nucleotide (nt) noncoding RNAs that mediate translational repression by binding to miR recognition elements (MREs) found in the 3'UTR of their mRNA targets . The most critical region for binding and repression of mRNA by a miR are positions 2-7 of the MRE, referred to as the seed site. Although there are examples of miRs targeting mRNAs without perfect Watson-Crick complementarity to the MRE seed site (MRESS), a collection of evidence supports the MRESS as the most important feature for prediction and function. In some cases single 7mer seed sites are sufficient for a miR to repress translation, and ex-vivo experiments have shown single point mutations in the MRESS may reduce effectiveness or abolish miR mediated repression . Further highlighting the importance of this sequence, it has been demonstrated that a higher degree of negative selection occurs within predicted conserved MRESSs compared to conserved non MRESS control sites . Given the importance of the MRESS, it has been proposed that single nucleotide polymorphisms (SNPs) mapping within the MRESS, or which create novel MRESS (CNM), may have functional consequences resulting in phenotypic variation . Moreover, SNPs that create or abrogate MRESSs may modulate gene transcript and protein levels relevant to a phenotype of interest, generally, or under the influence of particular environmental conditions.
It has long been thought that disease causing variants act through alteration of exon sequence resulting in altered protein function. However, SNPs may also act to modulate gene expression, and this has been demonstrated for many promoter SNPs in which the risk allele alters the affinity of a transcription factor to its binding motif [5, 6]. Furthermore, several published examples show functional variants in MREs that modulate risk for a variety of disease states, such as breast cancer, Tourette's syndrome, and hypertension among others . Two studies have demonstrated a gene by environment interaction where a MRESS SNP modulates individual response to drug and dietary intakes [7–9]. A survey of the frequency of predicted and validated MRESS SNPs, identified an appreciable number of SNPs falling within MREs across the human genome . However, the number of risk alleles identified with plausible mechanisms for modulation of gene expression is outweighed by SNPs falling in gene desert regions . It could be that these SNPs fall within distant but bona fide enhancer or suppressor elements resulting in the modulation of gene expression, as was demonstrated for the variant within the 8q24 gene desert and its effects on TP53 expression in prostate . Alternatively, it may be these SNPs are in LD with variants not yet identified or available on GWAS chips. For example, sequencing of the HLA-C 3'UTR revealed a SNP modulating an MRE for the binding of miR-148. Furthermore, this SNP was shown to be in LD with rs9264942 which is found 35 kb upstream of HLA-C and associates with control of HIV . These data demonstrated that rs9264942 is a marker for a functional SNP that was not contained in commercial SNP arrays. Further underscoring this point, recent chromatin studies have identified novel non-coding gene regulatory regions, some of which contain top scoring hits for disease associating SNPs .
Currently over 1000 human miR sequences are reported in the miRbase catalog . Estimates suggest that over 30% of human protein-coding genes are regulated by miRs, and that each miR may potentially regulate hundreds of target transcripts [16, 17]. Given this large number of potential miR targets in the human genome, identifying allele-specific miR-mRNA interactions may help elucidate functional roles for a portion of the many SNPs identified in genome wide association studies (GWAS) that lack obvious functionality.
With such information in mind, one aim of the 1000 Genomes Project is to catalog over 95% of human variation in order to inform association studies of all potential causal SNPs . Furthermore, initial studies in the 1000 Genomes pilot indicated that a substantial number of variants are in LD with known disease markers and that these variants are not well covered on commercial arrays. Importantly, the data currently available in the 1000 Genomes Project provides unprecedented access to millions of SNPs, some of which may elucidate functional mechanisms for the many risk alleles identified in GWAS.
Here we have performed a genome-wide survey for SNPs falling within both experimentally validated and computationally predicted conserved MRESSs, by utilizing these data (dbSNP build132) . In addition to this analysis, we have surveyed these data for predicted CNM SNPs. Furthermore, we have examined all SNPs identified in GWAS for functional variants in relation to predicted MRESSs and CNM SNPs using the data from the 1000 Genomes Project. Combing with several other publically available data sources, we identified numerous MRESS SNPs as possible modulators of disease relevant phenotypes. Our work demonstrates the utility of the data generated from the 1000 Genomes Project and provides insight into the frequency and relevance of MRE SNPs in human disease and may provide some clues regarding environmentally driven human selection.
Results and Discussion
Approximately 5% of validated MREs contain SNPs in their seed site
SNPs found within validated MRESSs.
Pos in MRE
No population frequency data are available for 29% of the MRE SNPs (9 of 31), a value that will change as whole genome sequence data from more individuals surface and with completion of more encompassing GWAS studies. Nine SNPs have minor allele frequencies (MAF) < = 2%, and may be considered rare in the general population and therefore unlikely as common factors in complex disease. The 13 remaining SNPs have allele frequencies above 2% in at least one population listed in dbSNP (Table 1). Of note, 7 of the 31 (22%) SNPS identified have shown association with disease traits, emphasizing the potential importance of MRESS SNPs as modulators of disease risk.
Genome wide survey of predicted MRESS and CNM SNPs
Although many variants have been associated with the modulation of phenotypes relevant to disease in GWAS studies, the challenge of determining which of them may be casual remains . To identify potential causal variants mediated by MRESS creation or disruption we first performed a genome-wide survey for SNPs falling within computationally predicted conserved MRESSs. To do this, we utilized the http://microRNA.org portal to access a collection of predicted miR-mRNA interactions. These predictions were derived using an algorithm that incorporates an array of the most recent miR prediction guidelines, such as seed-site pairing, site context, free-energy, and target conservation across multiple vertebrates . We focused our analysis on MRESSs conserved across mammals as these are more likely to be of functional significance (see methods) . By comparing the gDNA coordinates of each predicted MRESS against the gDNA coordinates of dbSNP132 SNPs, we identified 2723 MRESS SNPs interrupting 5797 conserved predicted interactions. To further prioritize these hits, we classified them by the type of seed match the MRESS SNP was predicted to interrupt; 8mer (2245), 7mer-8m (3251), 7-mer-A1 (180) or 6mer (121). Although there is overlap in the degree of efficiency of repression by these different seed type classes (likely dependent of site sequence context), there remains a hierarchy with 8mer sites being most efficient . Interestingly 38% (2245) of MRESS SNPs fall within predicted 8mer MRESSs. It has been estimated that ~50% of predicted MREs are potentially functional and it is likely that a portion of the SNPs identified here fall within bona fide MREs . Overall, we estimate that 3% of high confidence predicted conserved MRESSs contain SNPs.
In addition to SNPs that may interrupt MRESSs, SNP alleles may also create MRESSs. To identify potential CNM SNPs we performed a genome-wide computational survey for predicted MRESSs that are created when the mRNA sequence contains the non-reference allele of hg19. Using the Ensembl variation API tools we retrieved the flanking 22 nt sequence from both upstream and downstream of the non-reference allele of every 3'UTR dbSNP132 SNP. Each sequence containing the non-reference allele was analyzed for potential miR-target sites using the miRanda software . This analysis provided us with 22295 CNM SNP creating 49047 miR-mRNA predictions which where also categorized by the seed-type they created; 8mer (10333), 7mer-8m (36188), and 7-mer-A1 (526) or 6mer (2000). It should be noted that there are many more predictions for CNM than MRESS SNPs. This is due to the fact that no conservation constraint was imposed on the CNM SNP predictions (See methods). Considering that many CNM SNPs presumably arise to create new regulatory sites, filtering our hits on conservation status would be counter-intuitive. Of note, we found that approximately 28% (6946/25018) of predicted MRESS and CNM SNPs identified here were first identified by the 1000 Genomes project.
Some SNPs identified in GWAS are in LD with predicted conserved MRESS and CNM SNPs
GWAS have been a powerful approach to identify genetic variants that contribute to disease risk. However, a functional role for many of the SNPs identified has not been elucidated. It is likely some of these SNPs are in strong LD with unknown functional ones, some of which could be among the predicted conserved MRESS and CNM SNPs. To investigate this possibility, we searched the resulting MRESS and CNM SNP data for variants in LD with SNPs showing association, of GWAS significance, with disease traits and related phenotypes. To do so, we retrieved a dataset of 4817 reported associations, collected from GWAS studies, between 3943 unique SNPs and disease traits shown to have P-values meeting a threshold of < 1.0 × 10-5 . These data were processed through SNAP http://www.broadinstitute.org/mpg/snap/ldsearch.php which yielded a list of SNPs (including those from the pilot 1000 Genomes Project data) in LD with those reported in the GWAS. We limited our search to an r2 of > 0.8 for the CEU population. The results of this query were searched against both the MRESS SNP and the CNM SNP predictions. This query identified 35 instances of an MRESS SNP (some MRESS SNPs are in LD with more than one reported GWAS SNP associating with multiple phenotypes), in LD (r2 > 0.8) with a least one reported GWAS SNP or an original GWAS SNP, associating with disease phenotypes (Additional file 1). In total there were 14 MRESS SNPs in 11 genes associating with 16 traits. We also identified 124 instances of a GWAS SNP that associates with disease traits and is in LD (r2 > 0.8) with CNM SNPs (Additional file 2). There were 73 CNM SNPs in 73 genes associating with 52 traits.
In total we identified 87 SNPs (14 MRESS and 73 CNM SNPs) in very strong LD with SNPs reported as associating with disease related phenotypes. These 87 SNPs represent 2.22% of the total unique SNPs reported in the GWAS data. Using the SNAP pairwise LD tool we determined that 6 of the 3943 GWAS SNPs are in LD with each other, giving us 3940 SNPs or regions associating with disease traits. To determine the possibility of this number occurring by chance we first filtered dbSNP 132 for SNPs having a minor allele frequency (MAF) > = 1%, which was the lowest MAF reported in the GWAS data. We next selected randomly 3940 SNPs and, as we did with the GWAS data, ran them through SNAP to determine the SNPs in LD. From this list we determined the number of SNPs found in our MRESS and CNM SNP data, and this analysis was repeated 1000 times. From these 1000 simulations the mean number of SNPs found in the MRESS and CNM SNP data was 24.99 and the standard deviation 6.19. The probability of finding 87 SNPs by chance based on this distribution was calculated to be 1.08 × 10-23. These SNPs may be considered the likely putative functional variants which represent proxy SNPs identified by GWAS.
Co-expression data identify functional candidates
MRESS SNPs in LD with GWAS variants and showing co-expression of miR and mRNA.
Primary tooth dev
Systemic lupus erythematosus
Chronic kidney disease and serum creatinine concentration
Type 2 diabetes
eQTL data support several MRE target predictions when a MRESS or CNM SNP is present
Variation in gene transcript levels is thought to be an important modulator of disease risk in humans and SNPs that may mediate this variation are thought to be of great functional significance . To investigate the contribution of SNPs associating with disease traits to transcript level variation, a number of expression Quantitative Trait Loci (eQTL) studies have been performed [27–29]. These results have demonstrated a number of SNPs associating significantly with expression differences across collected tissue samples. Importantly, these studies have noted differences in the amount of transcript variation across tissue samples, suggesting SNPs may modulate regulatory mechanisms, in some cases, in a tissue specific manner . Interestingly, a recent study has estimated that > 80% of miRs act to lower mRNA levels demonstrating mRNA destabilization is the primary mode of action of miRs on target mRNAs .
SYS1 transcript levels were shown to be significantly different among rs2245717 genotypes from LCL in one of the two twin study groups (Figure 2). Although, the second twin group failed to achieve significance, the direction of the effect was in agreement with the first group. Furthermore, the lower transcript levels associate with the allele predicted to create the miR-150 binding site. Neither of the two twin adipose tissue sample groups for GFOD2 transcript levels showed a significant difference among rs12449157 genotypes. Interestingly, both p-values are of nominal significance and the direction of the effect supports the predicted miR interaction and subsequent effect of the CNM SNP (Figure 2). IKZF3 transcript levels measured in LCL cells showed significant differences among the rs907091 genotypes in both twin groups. Lower IKZF3 levels were observed in carriers of the G allele which is predicted to create a miR-326 MRESS. However, a second probe found on the Illumina whole genome expression array, used in the study, shows conflicting data where there is no difference between transcript levels in either group. The rs3810291 SNP in the ZC3H4 3'UTR shows no significant difference in transcript levels between alleles in adipose samples used in the MuTHER study. However, literature mining for "eQTL, " and the corresponding gene and phenotype identified an additional study showing eQTL data that supports an allelic difference, in the correct direction, and in adipose tissue for rs3810291 . Taken together, this information suggests these four SNPs may have functional significance.
MRESS and CNM SNPs in Positive Selection
MRESS and CNM SNPs showing highest levels of population sub-division.
In the work presented here, we utilized the latest release of dbSNP, including the 1000 Genomes Project data, to perform a genome-wide scan of human variation within validated and predicted miR binding sites, our hypothesis being that genetic variants at miR binding sites are functional, and important contributors to phenotypic variation and disease susceptibility. We have taken careful measures to assign SNPs as creating or altering miR-mRNA interactions. We identified 5797 instances of a SNP falling within a conserved predicted MRESS based on stringent filtering of conservation and interaction scores predicted by Betel et al . Interestingly, 38% of these predicted disruptions were identified in 8mer target predictions. 8mer target sites have been shown to have the highest efficacy of target repression and therefore are considered higher priority predictions than those with lesser complimentarity . Overall, we estimate that 3% of predicted conserved MRESSs contain SNPs. Our analysis also identified 49407 instances of a SNP creating an MRESS. Given that no conservation restraint was utilized for identification of these CNM SNPs, we must be cautious for it is likely many of them are potential false positives. We also determined that 87 of the MRESS and CNM SNPs identified are in LD with SNPs identified in GWAS. We demonstrate that 2.2% of GWAS SNPs are, or are in LD with, MRESS or CNM SNPs. However, this may be a conservative estimate, given that 1) we limited our SNP selection based on conservation and other strict cutoffs, 2) the catalog of GWAS SNPs investigated is not all encompassing 3) that these GWAS studies do not consider gene by environment interactions and 4) LD estimates only cover SNPs up to the 1000 Genomes Project pilot study 1 data.
Recently, it has been demonstrated that SNPs previously identified in GWAS are in LD with SNPs found in enhancer motifs regulating gene expression . Furthermore, other studies have linked SNPs falling in gene regulatory motifs, and not found on commercial SNP arrays, to be in LD with top scoring GWAS hits [12, 13]. In a similar fashion, we suggest that many of the SNPs found in this study to be in LD with GWAS SNPs may have functional significance. To further explore this possibility we utilized several publicly available data sets and tools and showed 39 of these 87 variants found to have evidence of co-expression of target mRNA and the predicted miR. We found that four SNPs from this list have supporting eQTL data demonstrating variation in transcripts between alleles.
Our analyses have identified four SNPs predicted to modulate allele-specific miR-mRNA interactions which are supported by co-expression and eQTL data. The rs907091 SNP falls in the IZKF3 transcript and is in LD (r2 > 0.90) with eight SNPs associating with increased risk for a variety of autoimmune diseases. IZKF3 is a transcription factor important for B-cell activation, and mice lacking this gene develop a lupus like syndrome, suggesting a role for IZKF3 in autoimmunity . The rs907091 minor T allele is predicted to create a CNM for mir-326. There is evidence for expression of miR-326 and IZKF3 in human B-lymphocytes. Interestingly, miR-326 is important for T-cell differentiation and has been implicated in the pathogenesis of autoimmune multiple sclerosis . A study investigating transcript levels between the T and C alleles of rs907091 in a lymphoblastoid cell line (LCL) demonstrate significantly lower levels of IZKF3 in subjects carrying the T allele . These data suggest that carriers of the T allele may have reduced levels of IZKF3, in part through miR-326.
In addition the minor allele of rs3810291 is predicted to create an MRE for mir-502-3p within the ZC3H4 transcript and associates with BMI. ZC3H4 is a poorly characterized zinc finger protein. There is eQTL evidence supporting this prediction where minor allele carriers have reduced ZC3H4 expression compared to non-carriers, in adipose tissue . Both mir-502-3p and ZC3H4 are expressed in adipose . The rs2245717 SNP, predicted to create an MRE for miR-155 in the SYS1 transcript, is in perfect LD with rs1008953 a SNP associating with psoriasis . The MRE-creating allele of SYS1 is also associated with lower SYS1 transcript levels in LCL cells . While a role for SYS1 in immune function could not be found in the literature, it is known that miR-155 is involved in the immune response . The rs12449157 SNP is found in the poorly characterized glucose-fructose oxidoreductase domain containing 2 (GFOD2) transcript showing association with HDL-C . Our analysis predicts that the minor allele of rs12449157 creates a CNM for mir-125a-3p and that it is associated with reduced GFOD2 levels. Interestingly, both RNAs are expressed in adipose tissue . Further, we identify rs12449157 as an FST outlier suggesting this SNP may be undergoing population specific selection.
In addition to these four SNPs, we identified 39 others with data indicating co-expression with the predicted target mRNA and these should be considered as candidates for functional studies. Of these 39 candidates, a SNP within the HOXB2 loci has shown eQTL peaks identified from lymphoblastoid cell lines [28, 46]. While our analysis has generated many MRESS and CNM SNP predictions for which no miR expression data are available, it is likely that as more miR expression and eQTL data become accessible, particularly for different cell types and specific conditions, many of these SNPs could be seen as functionally relevant. Recent data indicate some miRs may act intracellularly, carried by HDL particles to recipient cells . Therefore, it may be that co-expression is not essential for all predicted miR-mRNA interactions.
As new variants arise in a population and are exposed to different environmental conditions, those variants may be subject to forces of selection. Moreover, if these SNPs alter gene expression they may modulate the individual's response to the environment and potentially the risk for particular disease state. Based on this, we hypothesized that allele-specific miR-mRNA interactions would show a greater level of selection than SNPs not classified as MRESS SNPs. We show that, as a group, predicted MRESS and CNM SNPs have a significantly higher mean FST than do those SNPs which do not create or disrupt a predicted MRESS. We identify those MRESS and CNM SNPs showing the highest degree of population subdivision and suggest these SNPs and the interactions they are predicted to modulate, as candidates for functional studies.
We show that the frequency of MRESS SNPs in validated MREs (5.5 SNPs/kb) is less than in surrounding regions and this supports prior work showing a higher degree of negative selection on MRESSs [3, 10]. Although the level of variation within this region is lower, we do show that the occurrence of variation across validated MRESSs is not rare (~5%). Supporting the notion that miR SNPs are high priority candidates for functional consequence we show that 22% of SNPs falling within validated MRESSs have reported associations related to a disease phenotype or risk. Of note, our results differ somewhat from the MRESS SNPs reported in Saunders, et al . This is most likely due to the fact that we utilized a more current database of validated MRE targets, and also that we required functional evidence of MRESS for inclusion.
There are several web based MRE SNP prediction databases available to query a SNP for creation or disruption of a MRESS, however these tools incorporate a relatively limited amount of functional annotation (GWAS, co-expression and eQTL data) for identification of the most promising MRESS SNPs [48–50]. SNPinfo, is a web tool which offers the calculation of LD between query SNPs and GWAS SNPs in addition to functional prediction of these SNPs for abrogation or creation of potential MRESS . Approximately 70% of the SNPs found in Tables S1 and S2 are also identified at SNPinfo web portal as being a SNP in LD with a GWAS SNP, and SNPinfo also includes prediction of that SNP as a MRESS SNP. Importantly, our work differs from what may be found at SNPinfo and others, in that we present a more comprehensive summary of potential MRESSs SNPs, being the first to investigate 1000 Genomes data for MRESS SNPs. Furthermore, we use this MRESS SNP information in combination with a variety of publically available web tools and data sets (including co-expression and eQTL data), not currently incorporated in other resources, to determine which of these SNPs are most likely functional. Our data demonstrates the utility of using multiple publically available datasets and resources to identify functional candidates.
In Summary, we have surveyed the most current human SNP data and identified variants that provide functional hypotheses for observed GWAS associations. Our work also suggests that a considerable number of SNPs create or abrogate MREs in the human genome. Our results further suggest MRE SNPs that modulate gene expression are likely to be under selective pressure. With relevance to human disease we show that publicly available resources can be used to identify high priority candidate SNPs for functional studies.
Retrieval and use of dbSNP information
We retrieved all dbSNP build 132 SNP (as of 11-31-10) information by downloading the vcf file available through the 1000 genomes home page . To ensure we only surveyed variation in the form of SNPs and not indels and/or copy number variants we removed all SNPs not reported as bi-allelic. A subset of data containing all 3'UTR SNPs (n = 210042) was extracted using Perl. This dataset was used for all subsequent analyses. To determine the percentage of SNPs that were submitted to dbSNP by the 1000 Genomes Project we used UCSC Genome Browser to identify all dbSNP build 132 SNPs where the submitter status handle was equal to only the 1000GENOMES tag. This data set was then searched against our MRESS and CNM data sets to identify those SNPs contributed by the 1000 Genomes Project. To retrieve allele frequency data on SNPs reported in MRESSs and CNMs, we utilized the Perl API variation tools accessing the latest human genome variation data, build 61. All data analysis was performed on the NUGO information network .
Identifying MRE SNPs in validated targets
miRecords hosts a collection of validated miR-mRNA interactions built from an exhaustive literature search and the database of records was download in a tab-delimited format . We next annotated each hit for target site functionality, by checking the literature source for evidence of a loss of function experiment, which provided us with 606 validated MRE targets. To identify SNPs falling within these 606 validated targets sites, a Perl script was written to search each SNP gDNA coordinate against the gDNA coordinates of each target transcript MRESS, retrieved from Ensembl.
SNP Density determination
To calculate SNP density, a Perl script was written to perform a sliding window search (W = 6 bases as this corresponds in size to bases 2-7 of the MRESS) of the 606 validated MREs for SNPs, starting 18 bases upstream of the first base of the MRESS and ending 18 bases downstream of the last base of the MRESS. We report the number of SNPs for each position of the window across this sequence. Values are reported as SNPs/kb.
Identification of conserved MRESS and CNM SNPs
To identify SNPs falling in conserved MRESSs, we downloaded the "good mirsrv_score conserved miRNA" datafile from the http://microrna.org website. This file contains all predicted mRNA target motifs for targeting microRNAs which belong to conserved microRNA families. Conservation signal is used to predict functional MREs, however it has been determined that a conservation signal above background for MREs of the most recent mammalian specific miRNA families (non-conserved) was unlikely due to the relatively short time between the emergence of these miRs and the occurrence of new MREs within 3 'UTRs . Therefore, to eliminate false positives that would arise from this form of analysis we utilize predictions for only conserved miR families - which are contained in the "good mirsrv_score conserved miRNA" datafile. To add an additional measure of conservation, we implemented a conservation score cutoff for predicted miR targets using a Phastcon score of > = 0.57, which authors of the work estimate corresponds to conservation across the mammalian genomes used in their study [21, 52, 53]. The Phastcon scores are provided in the predictions data file from http://microRNA.org. Additionally, predictions generated by the http://microRNA.org tool provide a ranking score (mirSVR score) which is calibrated to correlate linearly with the extent of down regulation of a miR on its target. Importantly, these scores may be interpreted as an empirical probability of down regulation. From these data we selected a mirsvr score cutoff < = -0.6, representing the top 12% of all predictions. A Perl script was then used to compare the gDNA coordinates of each predicted MRESS (n = 197287) against the gDNA coordinates of each 3'UTR SNP in dbSNP132.
To identify CNM SNPs, we utilized the Ensembl variation Perl API tools (Build 61) to retrieve the 22 bases flanking the 5' and 3' regions of every 3'UTR SNP in dbSNP 132. We generated the reverse complement for those mRNAs transcribed from the negative strand. These data were then run locally through the miRanda target prediction algorithm. To limit identification of potential false positives we implemented an arbitrary paring score cutoff of > = 150 and an energy cutoff of < = -20. We identified all predicted MRESS created by CNM SNPs by filtering hits on the position of their target prediction on the mRNA, where each SNP is located at position 23 of 45.
Retrieval of GWAS results and LD calculations
We first downloaded a catalog of human variation associating with disease phenotypes (1-25-11) . The list was then submitted to the SNAP tool http://www.broadinstitute.org/mpg/snap/ldsearch.php using r2 of > 0.8 for the CEU population. The resulting list was then searched against each of the CNM and MRESS SNP lists to identify 87 MRESS and CNM SNPs. To determine the probability of observing this number by chance, 3940 SNPs were randomly selected from the dbSNP build 132 data set filtered for SNPs with a MAF > = 1% and run through the SNAP tool to identify all SNPS in LD, using r2 of > 0.8 for the CEU population. This procedure was repeated 1000 times. The probability of observing 87 MRESS or CNM SNPs randomly from the genome was determined based on the normal distribution generated from the 1000 simulations.
We utilized the mimiRNA web tool to identify miR-mRNA predictions with co-expression evidence . The mimiRNA tool provides expression data for 564 mRNAs and 636 miRs, normalized across samples, from four large scale miR expression studies, and one mRNA expression study. We queried each miR-mRNA pair for co-expression using the tools provided on the webpage. Because not all of the miRs implicated in our work are in the mimiRNA dataset, we also searched the literature via PubMed using the search terms of the miR-name and the term "expression." To determine if the matching gene was expressed in the same tissue type, we queried the GEOprofile database.
To identify association of transcript levels with MRESS and CNM SNPs we searched eQTL data from the MuTHER study using the Genevar web tool [29, 32]. eQTL data was generated from Fat cell biopsy (n = 160), LCL cells (n = 166) and skin punch biopsy (n = 160) taken from healthy adult female twins (both mono and di-zygotic). Twin pairs where separated in two to unrelated groups, thereby performing 2 independent eQTL analysis, as described in Nica, et al.
Genevar provides Spearman's rank correlation coefficient estimates for the strength of relationships between alleles and gene expression intensities for each study group. Furthermore, to test the significance of the relationship, Genevar generates a t-statistic for correlation analysis. Adjusted non-parametric permutation P-values are also provided .
Genotype characteristics of 11 HapMap Phase 3 populations were split into 4 groups of similar ancestry; Asian, African, European and American. FST values where calculated for each HapMap Phase 3 SNP between these 4 groups and reported in a downloadable file . Using a Perl script we extracted all MRESS and CNM SNPs with FST values from this dataset. We used the statistical analysis software (SAS) boxcoxar macro to transform the FST data to fit a normal distribution. We then performed an unpaired Student's t-test using the transformed values for these two groups to determine if they were significantly different. To identify MRESS and CNM FST outliers we selected those SNPs with FST > 2 SDs from the mean.
Authors have no competing interests to declare.
Funding and Acknowledgements
This work was supported by National Heart, Lung, and Blood Institute grants HL-54776; National Institute of Diabetes and Digestive and Kidney Diseases, Grant Number DK075030; and by contracts 53-K06-5-10 and 58-1950-9-001 from the US Department of Agriculture Research.
- Bartel DP: MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136: 215-233. 10.1016/j.cell.2009.01.002.PubMed CentralPubMedView ArticleGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol. 2005, 3: e85-10.1371/journal.pbio.0030085.PubMed CentralPubMedView ArticleGoogle Scholar
- Chen K, Rajewsky N: Natural selection on human microRNA binding sites inferred from SNP data. Nat Genet. 2006, 38: 1452-1456. 10.1038/ng1910.PubMedView ArticleGoogle Scholar
- Sethupathy P, Collins FS: MicroRNA target site polymorphisms and human disease. Trends in genetics: TIG. 2008, 24: 489-497. 10.1016/j.tig.2008.07.004.PubMedView ArticleGoogle Scholar
- Botma GJ, Verhoeven AJ, Jansen H: Hepatic lipase promoter activity is reduced by the C-480T and G-216A substitutions present in the common LIPC gene variant, and is increased by Upstream Stimulatory Factor. Atherosclerosis. 2001, 154: 625-632. 10.1016/S0021-9150(00)00478-0.PubMedView ArticleGoogle Scholar
- Talmud PJ, Palmen J, Putt W, Lins L, Humphries SE: Determination of the functionality of common APOA5 polymorphisms. J Biol Chem. 2005, 280: 28215-28220. 10.1074/jbc.M502144200.PubMedView ArticleGoogle Scholar
- Mishra PJ, Bertino JR: MicroRNA polymorphisms: the future of pharmacogenomics, molecular epidemiology and individualized medicine. Pharmacogenomics. 2009, 10: 399-416. 10.2217/14622418.104.22.1689.PubMed CentralPubMedView ArticleGoogle Scholar
- Sethupathy P, Collins FS: MicroRNA target site polymorphisms and human disease. Trends Genet. 2008, 24: 489-497. 10.1016/j.tig.2008.07.004.PubMedView ArticleGoogle Scholar
- Richardson K, Louie-Gao Q, Arnett D, Parnell L, Lai C-Q, Davalos A, Fox C, Demissie S, Cupples L, Fernandez-Hernando C, Ordovas J: The PLIN4 Variant rs8887 Modulates Obesity Related Phenotypes in Humans through Creation of a Novel miR-522 Seed Site. PLoS ONE. 2011, 6: e17944-10.1371/journal.pone.0017944.PubMed CentralPubMedView ArticleGoogle Scholar
- Saunders MA, Liang H, Li WH: Human polymorphism at microRNAs and microRNA target sites. Proceedings of the National Academy of Sciences of the United States of America. 2007, 104: 3300-3305. 10.1073/pnas.0611347104.PubMed CentralPubMedView ArticleGoogle Scholar
- Visel A, Rubin EM, Pennacchio LA: Genomic views of distant-acting enhancers. Nature. 2009, 461: 199-205. 10.1038/nature08451.PubMed CentralPubMedView ArticleGoogle Scholar
- Wasserman NF, Aneas I, Nobrega MA: An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res. 2010, 20: 1191-1197. 10.1101/gr.105361.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Kulkarni S, Savan R, Qi Y, Gao X, Yuki Y, Bass SE, Martin MP, Hunt P, Deeks SG, Telenti A, et al: Differential microRNA regulation of HLA-C expression and its association with HIV control. Nature. 2011Google Scholar
- Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011Google Scholar
- Kozomara A, Griffiths-Jones S: miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011, 39: D152-157. 10.1093/nar/gkq1027.PubMed CentralPubMedView ArticleGoogle Scholar
- Friedman RC, Farh KK, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009, 19: 92-105.PubMed CentralPubMedView ArticleGoogle Scholar
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.PubMedView ArticleGoogle Scholar
- Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Gibbs RA, Hurles ME, McVean GA: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073. 10.1038/nature09534.PubMedView ArticleGoogle Scholar
- Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T: miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 2009, 37: D105-110. 10.1093/nar/gkn851.PubMed CentralPubMedView ArticleGoogle Scholar
- On beyond GWAS. Nat Genet. 2010, 42: 551-
- Betel D, Koppal A, Agius P, Sander C, Leslie C: Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol. 2010, 11: R90-10.1186/gb-2010-11-8-r90.PubMed CentralPubMedView ArticleGoogle Scholar
- Rajewsky N: microRNA target predictions in animals. Nat Genet. 2006, 38 (Suppl): S8-13.PubMedView ArticleGoogle Scholar
- Stark A, Brennecke J, Russell RB, Cohen SM: Identification of Drosophila MicroRNA targets. PLoS Biol. 2003, 1: E60-10.1371/journal.pbio.0000060.PubMed CentralPubMedView ArticleGoogle Scholar
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America. 2009, 106: 9362-9367. 10.1073/pnas.0903103106.PubMed CentralPubMedView ArticleGoogle Scholar
- Ritchie W, Flamant S, Rasko JE: mimiRNA: a microRNA expression profiler and classification resource designed to identify functional correlations between microRNAs and their targets. Bioinformatics. 2010, 26: 223-227. 10.1093/bioinformatics/btp649.PubMedView ArticleGoogle Scholar
- Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.PubMed CentralPubMedView ArticleGoogle Scholar
- Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG: Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 2007, 39: 226-231. 10.1038/ng1955.PubMed CentralPubMedView ArticleGoogle Scholar
- Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M, et al: A genome-wide association study of global gene expression. Nat Genet. 2007, 39: 1202-1207. 10.1038/ng2109.PubMedView ArticleGoogle Scholar
- Nica AC, Parts L, Glass D, Nisbet J, Barrett A, Sekowska M, Travers M, Potter S, Grundberg E, Small K, et al: The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet. 2011, 7: e1002003-10.1371/journal.pgen.1002003.PubMed CentralPubMedView ArticleGoogle Scholar
- Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, et al: Common regulatory variation impacts gene expression in a cell type-dependent manner. Science. 2009, 325: 1246-1250. 10.1126/science.1174148.PubMed CentralPubMedView ArticleGoogle Scholar
- Guo H, Ingolia NT, Weissman JS, Bartel DP: Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature. 2010, 466: 835-840. 10.1038/nature09267.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang TP, Beazley C, Montgomery SB, Dimas AS, Gutierrez-Arcelus M, Stranger BE, Deloukas P, Dermitzakis ET: Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies. Bioinformatics. 2010, 26: 2474-2476. 10.1093/bioinformatics/btq452.PubMed CentralPubMedView ArticleGoogle Scholar
- Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Allen HL, Lindgren CM, Luan J, Magi R, et al: Association analyses of 249, 796 individuals reveal 18 new loci associated with body mass index. Nature genetics. 2010, 42: 937-948. 10.1038/ng.686.PubMed CentralPubMedView ArticleGoogle Scholar
- Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, et al: Natural selection on protein-coding genes in the human genome. Nature. 2005, 437: 1153-1157. 10.1038/nature04240.PubMedView ArticleGoogle Scholar
- Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, et al: A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005, 3: e170-10.1371/journal.pbio.0030170.PubMed CentralPubMedView ArticleGoogle Scholar
- Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002, 12: 1805-1814. 10.1101/gr.631202.PubMed CentralPubMedView ArticleGoogle Scholar
- Myles S, Davison D, Barrett J, Stoneking M, Timpson N: Worldwide population differentiation at disease-associated SNPs. BMC Med Genomics. 2008, 1: 22-10.1186/1755-8794-1-22.PubMed CentralPubMedView ArticleGoogle Scholar
- Kullo IJ, Ding K: Patterns of population differentiation of candidate genes for cardiovascular disease. BMC Genet. 2007, 8: 48-PubMed CentralPubMedView ArticleGoogle Scholar
- Cheng F, Chen W, Richards E, Deng L, Zeng C: SNP@Evolution: a hierarchical database of positive selection on the human genome. BMC evolutionary biology. 2009, 9: 221-10.1186/1471-2148-9-221.PubMed CentralPubMedView ArticleGoogle Scholar
- Sun J, Matthias G, Mihatsch MJ, Georgopoulos K, Matthias P: Lack of the transcriptional coactivator OBF-1 prevents the development of systemic lupus erythematosus-like phenotypes in Aiolos mutant mice. J Immunol. 2003, 170: 1699-1706.PubMedView ArticleGoogle Scholar
- Du C, Liu C, Kang J, Zhao G, Ye Z, Huang S, Li Z, Wu Z, Pei G: MicroRNA miR-326 regulates TH-17 differentiation and is associated with the pathogenesis of multiple sclerosis. Nat Immunol. 2009, 10: 1252-1259. 10.1038/ni.1798.PubMedView ArticleGoogle Scholar
- Estep M, Armistead D, Hossain N, Elarainy H, Goodman Z, Baranova A, Chandhoke V, Younossi ZM: Differential expression of miRNAs in the visceral adipose tissue of patients with non-alcoholic fatty liver disease. Alimentary pharmacology & therapeutics. 2010, 32: 487-497. 10.1111/j.1365-2036.2010.04366.x.View ArticleGoogle Scholar
- Stuart PE, Nair RP, Ellinghaus E, Ding J, Tejasvi T, Gudjonsson JE, Li Y, Weidinger S, Eberlein B, Gieger C, et al: Genome-wide association analysis identifies three psoriasis susceptibility loci. Nat Genet. 2010, 42: 1000-1004. 10.1038/ng.693.PubMed CentralPubMedView ArticleGoogle Scholar
- Tsitsiou E, Lindsay MA: microRNAs and the immune response. Curr Opin Pharmacol. 2009, 9: 514-520. 10.1016/j.coph.2009.05.003.PubMed CentralPubMedView ArticleGoogle Scholar
- Waterworth DM, Ricketts SL, Song K, Chen L, Zhao JH, Ripatti S, Aulchenko YS, Zhang W, Yuan X, Lim N, et al: Genetic variants influencing circulating lipid levels and risk of coronary artery disease. Arteriosclerosis, thrombosis, and vascular biology. 2010, 30: 2264-2276. 10.1161/ATVBAHA.109.201020.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME: Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet. 2008, 82: 631-640. 10.1016/j.ajhg.2007.12.015.PubMed CentralPubMedView ArticleGoogle Scholar
- Vickers KC, Palmisano BT, Shoucri BM, Shamburek RD, Remaley AT: MicroRNAs are transported in plasma and delivered to recipient cells by high-density lipoproteins. Nat Cell Biol. 2011, 13: 423-433. 10.1038/ncb2210.PubMed CentralPubMedView ArticleGoogle Scholar
- Bao L, Zhou M, Wu L, Lu L, Goldowitz D, Williams RW, Cui Y: PolymiRTS Database: linking polymorphisms in microRNA target sites with complex traits. Nucleic Acids Res. 2007, 35: D51-54. 10.1093/nar/gkl797.PubMed CentralPubMedView ArticleGoogle Scholar
- Barenboim M, Zoltick BJ, Guo Y, Weinberger DR: MicroSNiPer: a web tool for prediction of SNP effects on putative microRNA targets. Hum Mutat. 2010, 31: 1223-1232. 10.1002/humu.21349.PubMed CentralPubMedView ArticleGoogle Scholar
- Xu Z, Taylor JA: SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, 37: W600-605. 10.1093/nar/gkp290.PubMed CentralPubMedView ArticleGoogle Scholar
- Harttig U, Travis AJ, Rocca-Serra P, Renkema M, van Ommen B, Boeing H: Owner controlled data exchange in nutrigenomic collaborations: the NuGO information network. Genes Nutr. 2009, 4: 113-122. 10.1007/s12263-009-0123-8.PubMed CentralPubMedView ArticleGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-1050. 10.1101/gr.3715005.PubMed CentralPubMedView ArticleGoogle Scholar
- Betel D, Wilson M, Gabow A, Marks DS, Sander C: The microRNA.org resource: targets and expression. Nucleic Acids Res. 2008, 36: D149-153.PubMed CentralPubMedView ArticleGoogle Scholar