In our current study, we demonstrated transcriptome-wide DASE analysis as a novel approach to identify breast cancer susceptibility loci. The HumanOmni1-Quad BeadChip we utilized has state of the art coverage of common SNPs on the human transcriptome. Our study identified 60 candidate DASE loci by both SNP- and gene-based methods (Table 1 and Figure 2). Pathway analysis reveals one major DASE gene network which is likely associated with breast tumorigenesis (Figure 3). Using PCR and Sanger sequencing, we successfully validated the DASE predictions in this breast cancer-related network, which includes cancer causative genes ZNF331 and USP6, and breast cancer risk associated gene DMBT1 (Table 2). By analyzing the 5′UTR region of DMBT1, we successfully identified rs2981745 as the causal variant for the DASE in DMBT1 (Figure 4). Therefore, we presented an example supporting our original expectation that DASE analysis may lead to the discovery of functional DASE-causing variants. DASE in ZNF331 has possibly resulted from genomic imprinting as indicated from studies in human extraembryonic tissues , but we are not able to verify such speculation for the lack of genetic material from parents. Nevertheless, we reported for the first time about such phenomenon of ZNF331 in primary cultures of adult human tissue. We did not investigate further with the 26 non-coding DASE candidate loci in current studies largely because very little information could be obtained to help interpret their roles. However, there have been a few studies denoting the importance of non-coding RNAs in cellular processes during recent years [27, 28]. Considering the high percentage (nearly 50%) of such loci in our candidate list, it is an important next step for us to validate non-coding candidate DASE loci and study their likely roles in breast tumorigenesis.
As the Illumina Human Omni1-Quad Array is originally designed for targeting genomic DNA sequences instead of cDNA sequences, intronic SNPs and many SNPs at the exon/intron boundary are left out for DASE profiling. Although the number of “usable” SNPs covers the majority of transcripts, the coverage could be further increased with newly developed platforms with additional SNP markers. Furthermore, the size of the probes on the Illumina SNP array is 50mer, and it is likely difficult for them to pick up small size transcripts, such as miRNAs. Therefore, it would be logical to consider designing customized BeadChips covering pre-selected probes to improve both cost-effectiveness and specificity, and to reduce the data processing load for global DASE profiling as well. In addition, we observed fluorescent interference between X and Y channels during the data analysis due to the initial probe design , and it was able to be alleviated by canonical data normalization. Despite these limitations, Infinium-based BeadChip is still a practical platform for whole transcriptome DASE analysis as indicated by the results of successful validation using Sanger sequencing (Table 2).
As mentioned in the Introduction, currently known high and moderate penetrance risk alleles help only to explain a fraction of familial breast cancer incidents and the existence of more susceptibility genes are likely to be very rare. Thus, it is plausible that the remaining cases could be complemented by the synergistic effects of multiple low penetrance alleles, each conferring an elevated risk of <1.5 fold . The completion of the Human Genome Project and fast development of SNP array technologies have made it practical to perform genome-wide association studies (GWAS) to identify genetic factors that account for breast tumorigenesis. Since the first wave of GWAS to search for such alleles, 22 loci have been reported to significantly associate with breast cancer risk by 16 studies [31–34]. Among those candidate risk loci, nearly half (2q35, 3p24.1, 5p12, 5q11.2, 6q25.1, 8q24.21, 10q26.13, 11p15.5, 16q21.1-21.2) were reproducible in multiple independent studies, which denotes the reliability of GWAS prediction. Despite these successes, the mechanisms of how these GWA variants affect breast cancer pathogenesis are often unknown, because in most cases it is not clear which gene(s) are associated with GWA signals [35, 36]. Therefore, understanding the function of these breast cancer associated variants and the mechanisms of how they contribute to breast cancer is a logical next step to validate GWAS findings. Until now, most of them are merely SNP tags for yet unknown breast cancer causal variants . A couple of exceptions so far were rs2981582-A  and rs1219648-G  on FGF2R locus. The discovery of these two common low risk variants eventually pinpointed rs2981576 and rs2981578 as causal variants by genetic mapping and CHIP assay . Evaluation of the functional impact of putative causal variants identified by GWAS could be very challenging as many GWA signals (e.g., 8q24) map some distance from the nearest coding regions, and are likely to mediate disease predisposition through remote regulatory effects on transcription . In addition, the causal alleles involved in breast cancer susceptibility are likely to have moderate molecular and cellular effects, and the measurable effects of an allele in a given functional assay may not exhibit a causal role in breast cancer pathogenesis by itself. Therefore, novel strategies are required to functionally characterize the multiple GWA signals in a genome-wide manner for convincing genetic evidence. Based on previous reports [24, 40], our study has pinpointed a likely causal variant for DASE in DMBT1 (Figure 4). In addition, genetic mapping in mice has indicated that DMBT1 is a candidate modifier of mammary tumors and breast cancer risk , and our results further support that allelic loss of expression in DMBT1 could contribute to breast cancer development, which is consistent with the reports from previous studies [42, 43]. These findings support the idea that global DASE profiling mapping could be a powerful approach to validate GWAS findings when the DASE loci map is overlapped with existing data from GWAS .
In our study, we have compared top DASE loci identified in our genome-wide ASE studies with the current cancer genome database provided by Cancer Gene Census (Sanger Institute) and have identified two DASE loci (USP6 and ZNF331) listed as cancerous genes . Somatic mutations in USP6 (p.V678A and p.R475Q) and in ZNF331 (p.G193E) have been reported in 1% of primary breast cancer and 2% of ovarian cancer, respectively . These findings suggest that global DASE profiling could be a powerful tool in combination with currently available cancer genome databases to identify novel breast cancer “driver” genes. Furthermore, on-going cancer genome sequencing projects (e.g. Cancer Genome Atlas by NCI) have identified thousands of so-called variants of unknown/uncertain significance (VUS), including variations typically characterized by a single base change, or a change to several intronic bases. This large amount of VUS data produced by the next generation sequencing-based cancer genome projects has been termed “overkill” from a clinical perspective . Currently, the consequence of most VUS in oncogenesis has yet to be established. As DASE is a sensitive functional index for pathogenic mutations, it can be applied for validating these VUS at a genome-wide scale as well.