miRdSNP: a database of disease-associated SNPs and microRNA target sites on 3'UTRs of human genes
© Bruno et al; licensee BioMed Central Ltd. 2012
Received: 22 September 2011
Accepted: 25 January 2012
Published: 25 January 2012
Single nucleotide polymorphisms (SNPs) can lead to the susceptibility and onset of diseases through their effects on gene expression at the posttranscriptional level. Recent findings indicate that SNPs could create, destroy, or modify the efficiency of miRNA binding to the 3'UTR of a gene, resulting in gene dysregulation. With the rapidly growing number of published disease-associated SNPs (dSNPs), there is a strong need for resources specifically recording dSNPs on the 3'UTRs and their nucleotide distance from miRNA target sites. We present here miRdSNP, a database incorporating three important areas of dSNPs, miRNA target sites, and diseases.
miRdSNP provides a unique database of dSNPs on the 3'UTRs of human genes manually curated from PubMed. The current release includes 786 dSNP-disease associations for 630 unique dSNPs and 204 disease types. miRdSNP annotates genes with experimentally confirmed targeting by miRNAs and indexes miRNA target sites predicted by TargetScan and PicTar as well as potential miRNA target sites newly generated by dSNPs. A robust web interface and search tools are provided for studying the proximity of miRNA binding sites to dSNPs in relation to human diseases. Searches can be dynamically filtered by gene name, miRBase ID, target prediction algorithm, disease, and any nucleotide distance between dSNPs and miRNA target sites. Results can be viewed at the sequence level showing the annotated locations for miRNA target sites and dSNPs on the entire 3'UTR sequences. The integration of dSNPs with the UCSC Genome browser is also supported.
miRdSNP provides a comprehensive data source of dSNPs and robust tools for exploring their distance from miRNA target sites on the 3'UTRs of human genes. miRdSNP enables researchers to further explore the molecular mechanism of gene dysregulation for dSNPs at posttranscriptional level. miRdSNP is freely available on the web at http://mirdsnp.ccr.buffalo.edu.
Single nucleotide polymorphisms (SNPs) underlie disease susceptibility through their effects on protein function and gene expression. Most identified mutations are non-synonymous SNPs that result in amino acid changes in proteins. It is well known that non-coding disease-associated SNPs (dSNPs) within regulatory regions of the genome can result in gene dysregulation at either transcriptional or posttranscriptional level. One potential source for the latter is SNPs which create, destroy, or modify the efficiency of miRNA binding to the 3'UTR of a gene. Supporting this idea, SNPs within the miRNA target sites of genes have been implicated in hippocampal sclerosis , parkinson disease , tourette's syndrome , asthma , cardiovascular disease , neurodegenerative disease , periodontal diseases , tumor susceptibility , and various types of cancers [9–12]. Other than SNPs within miRNA target sites, SNPs outside miRNA binding site can affect miRNA function. One recent finding  demonstrates that a polymorphism outside the miR-24 binding site in the 3'UTR of human dihydrofolate reductase (DHFR) affects DHFR expression by interfering with miR-24 function, resulting in DHFR over-expression and methotrexate resistance. There is also a report suggesting that SNPs within a certain region on both sides of miRNA target sites have the highest influence on miRNA binding to the target sites and that SNPs on the rest of 3'UTR sequences have impact on miRNA function as well .
A few databases have been built to aid researchers in exploring the impact of SNPs on the binding of miRNA and targets. While polymiRTS  represents the polymorphism in putative miRNA target sites and their involvement in quantitative trait locus effects, Patrocles database compiles DNA sequence polymorphisms in the 3'UTR of genes in seven vertebrate species that perturb miRNA-mediated gene regulation . The findings of dSNPs on the 3'UTRs have been growing rapidly during the past few years. Furthermore, a few dSNPs [1–5, 7–12] have been proven to alter gene expression through modifying specific miRNA target sites, however the molecular mechanism causing diseases for majority of dSNPs on the 3'UTRs is largely not known. There is a strong need to have a database specifically recording dSNPs and tools for capturing their proximity to miRNA target sites on the 3'UTRs so that researchers can explore further the molecular mechanism of gene dysregulation for dSNPs at posttranscriptional level.
Aiming to provide a comprehensive data source of dSNPs affecting posttranscriptional regulation of disease-related genes and tools for exploring the nucleotide distance between miRNA target sites and dSNPs, we present here miRdSNP, a database of manually curated dSNPs on the 3'UTRs of human genes from available publications in PubMed. A robust web interface and advanced search tools are provided showing the nucleotide distance between dSNPs and predicted miRNA target sites from the most popular algorithms, namely TargetScan  and PicTar . We also incorporated all SNPs on 3'UTRs of individual genes into the database so the relationship of SNPs with both dSNPs and miRNA binding sites can be analyzed using the web interface. In addition, we also include predicted miRNA target sites generated by dSNPs based on our analysis and annotate genes with experimentally confirmed targeting by miRNAs from four separate curated databases.
Construction and content
We obtained 3'UTRs of human RefSeq genes (hg18 March 2006 assembly) from the UCSC Genome browser . A total of 19,834 genes (including introns) were parsed and loaded into miRdSNP and the chromosomal coordinates for each gene were indexed along with the exon lengths. If a gene had multiple transcripts we selected the one with the longest sequence length. We obtained the SNP dataset from UCSC Genome browser (NCBI dbSNP  Build 130). Of 18,833,531 SNPs we indexed the genomic coordinates for a subset of 175,351 located on 3'UTRs of 16,810 genes. SNPs aligning to more than 1 locus or mapped to intron regions were excluded. We then annotated dSNPs using an in-house developed data pipeline which searches for PubMed articles linked to SNPs. The data pipeline queries ELink from the Entrez Programming Utilities to find all PubMed IDs linked to SNPs via the "snp pubmed" link. We queried all 3'UTR SNPs and found 2,785 PubMed publications linked to 16,447 unique SNPs. We then manually reviewed these literatures and identified 630 dSNPs for 204 human diseases from 754 publications. The data pipeline for harvesting PubMed-SNP associations from Entrez is automated and the results are displayed in a web interface, allowing multiple users to manually review articles in parallel. This enables us to update the curated dSNP dataset frequently as the literature evolves.
We captured linkage disequilibrium (LD) information for each dSNP using the latest data provided by the HapMap project  (version 2009-04_rel27). We downloaded the raw LD files for each population and searched for pairs of genetic variants that included dSNPs. We then indexed all variants in strong LD of each dSNP using an R2 ≥ 0.80 threshold. Regional LD plots were also generated for each dSNP using a modified version of the R script provided by SNAP  and data from the CEU (CEPH Utah Residents with Northern and Western European Ancestry) population.
We obtained miRNA target site datasets from two miRNA target prediction algorithms, namely TargetScan 5.1 and PicTar which were found to have the highest precision and sensitivity out of eight commonly used algorithms . The data for each prediction algorithm was downloaded from the respective source, and the genomic coordinates for each target site were indexed and mapped to RefSeq genes. In addition, each miRNA target site was cross referenced with miRBase  and targets referencing dead or non-existent miRNAs were excluded. The genomic coordinates from PicTar were converted from hg17 assembly to hg18 assembly using the LiftOver utility from UCSC. The exon index for each miRNA target site was computed and used to calculate the nucleotide distance between SNPs and predicted miRNA target sites. This distance is calculated from the start or end location of the miRNA target "seed" region (~7nt), depending on whether the SNP is upstream or downstream of the miRNA target site. SNPs which fall inside the miRNA target "seed" region have a distance of 0. To address the low prediction specificity of the miRNA target prediction algorithms we incorporate data from four curation databases (TarBase , miRTarBase , miRecords , and miR2disease ) which collect experimentally confirmed miRNA target interactions. Genes with experimentally confirmed targeting by miRNAs were annotated in miRdSNP and displayed with a green check icon within the user interface.
To predict new miRNA target sites created by dSNPs, we first generated candidate sequences using a 6-nt flank up/down stream from each dSNP location replacing the dSNP with the observed allele. Using these new candidate sequences we searched for perfect match 7-mer seed regions from miRBase mature sequence data. We then extracted 25-nt flank upstream of the matching seed region and used the miRNA target prediction program miRanda  with default cutoff to further eliminate false positives. We were able to identify 180 newly created miRNA target sites from 138 dSNPs.
Loading and indexing new data in miRdSNP is an automated process allowing for streamlined updates to the database as new RefSeq, miRNA target prediction, and SNP data become available. All data in miRdSNP is available for download in raw text format (CSV, BED) with access to previous versions.
Utility and discussion
miRdSNP is an ongoing effort to create a comprehensive data source for exploring the effect of SNPs on miRNA binding in relation to human diseases. We are working on importing data from other miRNA target prediction algorithms such as DIANA-microT v3.0  and ElMMo . Since the accuracy of the manually curated database of dSNPs is an integral part of miRdSNP, we aim to further broaden the amount of data captured from the manual process. Data such as study design, sample size, and p-values would further enhance the ability to determine the disease-SNP association. We aim to update the dSNP curation database yearly and as new versions of the miRNA target prediction algorithms become available.
Availability and requirements
miRdSNP is freely available on the web at http://mirdsnp.ccr.buffalo.edu.
This work was supported, in part, by United States Public Health Service (National Institutes of Health) grant 1R01EY020545 (ZH) and by an unrestricted grant from Research to Prevent Blindness (ZH).
- Dickson DW, Baker M, Rademakers R: Common variant in GRN is a genetic risk factor for hippocampal sclerosis in the elderly. Neurodegener Dis. 2010, 7 (1-3): 170-174. 10.1159/000289231.PubMed CentralView ArticlePubMed
- Wang G, van der Walt JM, Mayhew G, Li YJ, Züchner S, Scott WK, Martin ER, Vance JM: Variation in the miRNA-433 binding site of FGF20 confers risk for Parkinson disease by overexpression of alpha-synuclein. Am J Hum Genet. 2008, 82 (2): 283-289. 10.1016/j.ajhg.2007.09.021.PubMed CentralView ArticlePubMed
- Abelson JF, Kwan KY, O'Roak BJ, Baek DY, Stillman AA, Morgan TM, Mathews CA, Pauls DL, Rasin MR, Gunel M, Davis NR, Ercan-Sencicek AG, Guez DH, Spertus JA, Leckman JF, Dure LS, Kurlan R, Singer HS, Gilbert DL, Farhi A, Louvi A, Lifton RP, Sestan N, State MW: Sequence variants in SLITRK1 are associated with Tourette's syndrome. Science. 2005, 310 (5746): 317-320. 10.1126/science.1116502.View ArticlePubMed
- Tan Z, Randall G, Fan J, Camoretti-Mercado B, Brockman-Schneider R, Pan L, Solway J, Gern JE, Lemanske RF, Nicolae D, Ober C: Allele-specific targeting of microRNAs to HLA-G and risk of asthma. Am J Hum Genet. 2007, 81 (4): 829-834. 10.1086/521200.PubMed CentralView ArticlePubMed
- Martin MM, Buckenberger JA, Jiang J, Malana GE, Nuovo GJ, Chotani M, Feldman DS, Schmittgen TD, Elton TS: The human angiotensin II type 1 receptor +1166 A/C polymorphism attenuates microrna-155 binding. J Biol Chem. 2007, 282 (33): 24262-24269. 10.1074/jbc.M701050200.PubMed CentralView ArticlePubMed
- Rademakers R, Eriksen JL, Baker M, Robinson T, Ahmed Z, Lincoln SJ, Finch N, Rutherford NJ, Crook RJ, Josephs KA, Boeve BF, Knopman DS, Petersen RC, Parisi JE, Caselli RJ, Wszolek ZK, Uitti RJ, Feldman H, Hutton ML, Mackenzie IR, Graff-Radford NR, Dickson DW: Common variation in the miR-659 binding-site of GRN is a major risk factor for TDP43-positive frontotemporal dementia. Hum Mol Genet. 2008, 17 (23): 3631-3642. 10.1093/hmg/ddn257.PubMed CentralView ArticlePubMed
- Schaefer AS, Richter GM, Nothnagel M, Laine ML, Rühling A, Schäfer C, Cordes N, Noack B, Folwaczny M, Glas J, Dörfer C, Dommisch H, Groessner-Schreiber B, Jepsen S, Loos BG, Schreiber S: A 3' UTR transition within DEFB1 is associated with chronic and aggressive periodontitis. Genes Immun. 2010, 11: 45-54. 10.1038/gene.2009.75.View ArticlePubMed
- Nicoloso MS, Sun H, Spizzo R, Kim H, Wickramasinghe P, Shimizu M, Wojcik SE, Ferdin J, Kunej T, Xiao L, Manoukian S, Secreto G, Ravagnani F, Wang X, Radice P, Croce CM, Davuluri RV, Calin GA: Single-nucleotide polymorphisms inside microRNA target sites influence tumor susceptibility. Cancer Res. 2010, 70 (7): 2789-2798. 10.1158/0008-5472.CAN-09-3541.PubMed CentralView ArticlePubMed
- Brendle A, Lei H, Brandt A, Johansson R, Enquist K, Henriksson R, Hemminki K, Lenner P, Försti A: Polymorphisms in predicted microRNA-binding sites in integrin genes and breast cancer: ITGB4 as prognostic marker. Carcinogenesis. 2008, 29 (7): 1394-1399. 10.1093/carcin/bgn126.View ArticlePubMed
- Chin LJ, Ratner E, Leng S, Zhai R, Nallur S, Babar I, Muller RU, Straka E, Su L, Burki EA, Crowell RE, Patel R, Kulkarni T, Homer R, Zelterman D, Kidd KK, Zhu Y, Christiani DC, Belinsky SA, Slack FJ, Weidhaas JB: A SNP in a let-7 microRNA complementary site in the KRAS 3' untranslated region increases non-small cell lung cancer risk. Cancer Res. 2008, 68 (20): 8535-8540. 10.1158/0008-5472.CAN-08-2129.PubMed CentralView ArticlePubMed
- He H, Jazdzewski K, Li W, Liyanarachchi S, Nagy R, Volinia S, Calin GA, Liu CG, Franssila K, Suster S, Kloos RT, Croce CM, de la Chapelle A: The role of microRNA genes in papillary thyroid carcinoma. Proc Natl Acad Sci USA. 2005, 102 (52): 19075-19080. 10.1073/pnas.0509603102.PubMed CentralView ArticlePubMed
- Saetrom P, Biesinger J, Li SM, Smith D, Thomas LF, Majzoub K, Rivas GE, Alluin J, Rossi JJ, Krontiris TG, Weitzel J, Daly MB, Benson AB, Kirkwood JM, O'Dwyer PJ, Sutphen R, Stewart JA, Johnson D, Larson GP: A risk variant in an miR-125b binding site in BMPR1B is associated with breast cancer pathogenesis. Cancer Res. 2009, 69 (18): 7459-7465. 10.1158/0008-5472.CAN-09-1201.PubMed CentralView ArticlePubMed
- Mishra PJ, Humeniuk R, Mishra PJ, Longo-Sorbello GSA, Banerjee D, Bertino JR: A miR-24 microRNA binding-site polymorphism in dihydrofolate reductase gene leads to methotrexate resistance. Proc Natl Acad Sci USA. 2007, 104 (33): 13513-13518. 10.1073/pnas.0706217104.PubMed CentralView ArticlePubMed
- Hu Z, Bruno AE: The Influence of 3'UTRs on microRNA Function Inferred from Human SNP Data. Comparative and Functional Genomics. 2011, 2011, [http://dx.doi.org/10.1155/2011/910769]
- Bao L, Zhou M, Wu L, Lu L, Goldowitz D, Williams RW, Cui Y: PolymiRTS Database: linking polymorphisms in microRNA target sites with complex traits. Nucleic Acids Res. 2007, 35 (Database): D51-D54. 10.1093/nar/gkl797.PubMed CentralView ArticlePubMed
- Hiard S, Charlier C, Coppieters W, Georges M, Baurain D: Patrocles: a database of polymorphic miRNA-mediated gene regulation in vertebrates. Nucleic Acids Res. 2010, 38 (Database): D640-D651. 10.1093/nar/gkp926.PubMed CentralView ArticlePubMed
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.View ArticlePubMed
- Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nat Genet. 2005, 37 (5): 495-500. 10.1038/ng1536.View ArticlePubMed
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12 (6): 996-1006.PubMed CentralView ArticlePubMed
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001, 29: 308-311. 10.1093/nar/29.1.308.PubMed CentralView ArticlePubMed
- Consortium IH, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MMY, Tsui SKW, Xue H, Wong JTF, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PKH, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PIW, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CDM, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449 (7164): 851-861. 10.1038/nature06258.View Article
- Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O'Donnell CJ, de Bakker PIW: SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008, 24 (24): 2938-2939. 10.1093/bioinformatics/btn564.PubMed CentralView ArticlePubMed
- Alexiou P, Maragkakis M, Papadopoulos GL, Reczko M, Hatzigeorgiou AG: Lost in translation: an assessment and perspective for computational microRNA target identification. Bioinformatics. 2009, 25 (23): 3049-3055. 10.1093/bioinformatics/btp565.View ArticlePubMed
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006, 34 (Database): D140-D144.PubMed CentralView ArticlePubMed
- Sethupathy P, Corda B, Hatzigeorgiou AG: TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA. 2006, 12 (2): 192-197.PubMed CentralView ArticlePubMed
- Hsu SD, Lin FM, Wu WY, Liang C, Huang WC, Chan WL, Tsai WT, Chen GZ, Lee CJ, Chiu CM, Chien CH, Wu MC, Huang CY, Tsou AP, Huang HD: miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res. 2011, 39 (Database): D163-D169. 10.1093/nar/gkq1107.PubMed CentralView ArticlePubMed
- Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T: miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 2009, 37 (Database): D105-D110. 10.1093/nar/gkn851.PubMed CentralView ArticlePubMed
- Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y: miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009, 37 (Database): D98-104. 10.1093/nar/gkn714.PubMed CentralView ArticlePubMed
- Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS: MicroRNA targets in Drosophila. Genome Biol. 2003, 5: R1-10.1186/gb-2003-5-1-r1.PubMed CentralView ArticlePubMed
- Maragkakis M, Alexiou P, Papadopoulos GL, Reczko M, Dalamagas T, Giannopoulos G, Goumas G, Koukis E, Kourtis K, Simossis VA, Sethupathy P, Vergoulis T, Koziris N, Sellis T, Tsanakas P, Hatzigeorgiou AG: Accurate microRNA target prediction correlates with protein repression levels. BMC Bioinformatics. 2009, 10: 295-10.1186/1471-2105-10-295.PubMed CentralView ArticlePubMed
- Gaidatzis D, van Nimwegen E, Hausser J, Zavolan M: Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC Bioinformatics. 2007, 8: 69-10.1186/1471-2105-8-69.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.