Tumor and reproductive traits are linked by RNA metabolism genes in the mouse ovary: a transcriptome-phenotype association analysis
© Urzúa et al. 2010
Published: 22 December 2010
Skip to main content
© Urzúa et al. 2010
Published: 22 December 2010
The link between reproductive life history and incidence of ovarian tumors is well known. Periods of reduced ovulations may confer protection against ovarian cancer. Using phenotypic data available for mouse, a possible association between the ovarian transcriptome, reproductive records and spontaneous ovarian tumor rates was investigated in four mouse inbred strains. NIA15k-DNA microarrays were employed to obtain expression profiles of BalbC, C57BL6, FVB and SWR adult ovaries.
Linear regression analysis with multiple-test control (adjusted p ≤ 0.05) resulted in ovarian tumor frequency (OTF) and number of litters (NL) as the top-correlated among five tested phenotypes. Moreover, nearly one-hundred genes were coincident between these two traits and were decomposed in 76 OTF(–) NL(+) and 20 OTF(+) NL(–) genes, where the plus/minus signs indicate the direction of correlation. Enriched functional categories were RNA-binding/mRNA-processing and protein folding in the OTF(–) NL(+) and the OTF(+) NL(–) subsets, respectively. In contrast, no associations were detected between OTF and litter size (LS), the latter a measure of ovulation events in a single estrous cycle.
Literature text-mining pointed to post-transcriptional control of ovarian processes including oocyte maturation, folliculogenesis and angiogenesis as possible causal relationships of observed tumor and reproductive phenotypes. We speculate that repetitive cycling instead of repetitive ovulations represent the actual link between ovarian tumorigenesis and reproductive records.
Epidemiological evidence indicates that multiparity and breastfeeding as well as endocrine disrupting agents -used in oral contraception, hormone replacement therapy and infertility treatment- modulate the risk of ovarian cancer . Repetitive lifetime ovulations would induce a persistent wound repair process of the ovarian surface epithelium cells leading to pre-neoplastic alterations . In addition, oral contraceptives and pregnancy reduce levels of circulating gonadotropins whereas fertility drugs induce follicle-stimulating hormone (FSH) production. Gonadotropins also increase with reproductive ageing, and have been implicated in ovarian cancer etiology since this malignancy predominantly occurs in menopausal women .
The laboratory mouse has been increasingly used to model several aspects of ovarian cancer . Indeed, reproductive biology of mouse resembles human reproduction in many aspects. Analogous to menstrual cycles in women, female mice undergo estrous cycles that last 4-5 days and consist of four successive phases. Proestrus and estrous phases together constitute the follicular phase while metestrus and diestrus phases together represent the luteal phase . Similar to humans, the length of estrous cycles increases while the monthly cycle frequency decreases in ageing mice . The number of ovulations roughly reaches 500 during the reproductive life of women while in mice this number can be achieved earlier than middle age due to multiple ovulations in a single cycle  as judged by the litter size observed in mice . Cysts, invaginations and cell layering are also common observations in the mouse and human ovaries .
Mouse inbred strains display measurable traits that are described as continuous phenotypes in the Mouse Phenome  and the Mouse Tumor Biology  databases. The natural variability observed in mice strains offers the opportunity to study disease susceptibility in a genetically defined background. In simple terms, phenotypic variability could be due to the interplay of gene transcripts and/or proteins expressed at different relative abundances across individuals in a tissue or cell type implicated in a phenotype. Thus, if a correlation exists between a continuous phenotype and gene expression, a measure of each gene’s contribution to the observed phenotype can be inferred. DNA microarrays are suited to measure transcript levels for hundreds or thousands of genes simultaneously, and thus such contribution can be addressed in a wide-genome format. A number of statistical approaches have been recently formulated to correlate DNA microarray data with phenotypic covariates .
In an attempt to gain novel information linking reproductive parameters with ovarian tumorigenesis we describe here a correlation analysis between spontaneous ovarian tumors, reproductive phenotypes and gene expression profiles obtained with NIA15k-DNA microarrays from ovaries of four mouse inbred strains. Using a linear regression approach with control of multiple testing, “ovarian tumor frequency” (OTF) and “number of litters” (NL) were the top-correlated of 5 analyzed phenotypes. About one hundred genes were coincident between OTF and NL. The enriched biological functions in this overlapped sub-set were “RNA-binding/mRNA-processing” and “protein folding”. The relevant information concerning the significant genes was mined and the relationship between ovarian function and ovarian tumorigenesis at the molecular level is discussed.
Ovarian tumor and reproductive phenotypes in selected mouse strains
Number of litters
Productive matings (%)
Summary of correlation results between ovarian gene expression and phenotypes
Direction and strength of correlation (R value range)c
Gene expression shift (δ log2 value range)d
Ovarian tumor frequency (OTF)
3.26 - 0.19
2.96 - 0.21
Number of litters (NL)
2.94 - 0.32
3.00 - 0.24
Litter size (LS)
2.50 - 0.32
2.18 - 0.41
Relative fecundity (RF)
1.50 - 0.29
0.85 - 0.51
Complex phenotypes are the outcome of many genes interacting with each other and with endogenous or exogenous factors. Mouse strains displaying phenotype variability allow interrogation on their molecular basis in a particular tissue or condition. In this report, the ovarian expression of roughly 400 genes (corresponding to 590 transcripts in Table 2) was significantly correlated to 4 of 5 mouse tumor and reproductive phenotypes assessed with a linear regression model. The predominant gene ontology (GO) terms were “regulation of transcription”, “RNA binding” and “RNA metabolism” accounting for 105 of all correlated genes. A minor, but significant group was “ubiquitin cycle” with 14 genes. Links to reproductive processes are described for Aplp2, Chuk, Dnaja1, Htt, Pten, Rps6kb1, Sf1, Spin1 and Tnc in the GO directory. Rps6kb1 is involved in proliferation of granulosa cells in response to FSH . Rps6kb1 and Chuk, in addition to Nfkb1, Map3k10, Flna, Kras, Rap1 a, and Hspb1 belong to the MAP kinase signaling pathway which has been implicated in mammalian oocyte maturation and fertilization . The correlation of Kras (K-Ras 2), commonly mutated in various human tumors, with litter size (LS) can be supported by its involvement in granulosa cell differentiation and ovulation .
In a study on null Foxo3 mice, a mutant displaying early ovarian hyperplasia due to synchronous primordial follicle activation, 6 genes (Spin1, Slc45a3, Rspo2, Star, Trim71 and Gm196) present in our 400-genes list were postulated as fertility factors . Star (steroidogenic acute regulatory) protein was positively correlated to OTF (see Figure 2). Star transports cholesterol into the mitochodria, a key process in steroid-hormone synthesis in all major steroidogenic tissues . Additional genes related to steroid metabolism and present in the 400-list included Hmgcr (3-hydroxy-3-methylglutaryl-CoA reductase), the major regulatory step in cholesterol synthesis; Idi1 (isopentenyl-diphosphate delta isomerase) involved in conversion of mevalonate into activated isoprene units, and Lss (lanosterol synthase) that catalyzes the cyclization of squalene-2,3-epoxide to lanosterol. A recent work has implicated metabolic products of lanosterol in primordial folliculogenesis by regulation of oocyte meiosis and apoptosis . Other indirectly steroid-related gene was Mbtps2, a membrane-embedded zinc metalloprotease which activates signaling proteins involved in transcription induced by steroids .
A large portion of the 590 list (24.6%, i.e. 145 clones) was found to be associated both to spontaneous ovarian tumor frequency (OTF) and number of litters (NL) as shown in Table 2. A link between OTF and NL agrees with an increased risk of ovarian tumorigenesis due to successive menstrual cycles in women. In contrast, conditions that interrupt cycles block ovulations and thus reduce risk . Accordingly, a mouse strain displaying high NL has been subjected to a longer period without cycling than compared with a low NL strain. Successive pregnancies and lactation may be responsible of this effect. We detected a set of 76 mouse genes that were positively correlated to NL, i.e. elevated expression levels were observed in strains showing high NL. Thus, since a concomitant negative correlation was observed with ovarian tumor frequency (OTF), over-expression of these 76 genes set could be considered “protective”. By analogy, the 20 genes that were negatively correlated to NL, i.e. down-regulated in high NL mice, may have a role as “susceptibility” genes since they showed a parallel positive correlation with OTF. Importantly, high litter sizes (LS) involve multiple simultaneous ovulations but no association was detected between OTF and LS in our data. It may be hypothesized that the damage caused to the epithelial surface during a multiple-ovulation event can be repaired during subsequent pregnancy and lactation, a period without cycling and ovulations.
Additional file 4 shows that 11 of the 76 OTF–/NL+ genes are implicated either in normal or pathological ovarian processes. RNA binding was the predominant GO term in the 76-genes list reaching 22 genes. RNA binding proteins have been implicated in mammalian germ cell development . The genes Cpsf6, Ddx17, Fubp1, Hnrpa2b1, Rbm25, Rbm39, Sfrs2 and Sfrs6 share GO terms (RNA binding, mRNA processing) and a relationship with normal ovarian function or ovarian-related disease. Some genes not related to RNA metabolism but linked to ovarian function were correlated to OTF–/NL+. These include Ece1, expressed in steroidogenic and follicular endothelial ovarian cells in a parallel fashion to corpus luteum maturation , and the genes Arnt and Nfat5 that regulate the activity of vascular endothelial growth factor (VEGF), an important angiogenesis modulator in normal and pathological conditions including ovarian malignancies . Arnt (alias HIF-beta) binds the hypoxia inducible factor-1 alpha (HIF-1alpha) to form a heterodimer that recognizes the VEGF promoter . Arnt can alternatively bind the aryl hydrocarbon receptor (AHR) forming an AHR/ARNT complex which controls FSH and LH concentrations in response to AHR ligands . Furthermore, down-regulation of Nfat5 (nuclear factor of activated T-cells 5) parallels a decrease in VEGF’s receptor VEGFR1 and an increase in VEGFR2 in hemangioma endothelial cells .
Additional regulators of VEGF’s function in the OTF–/NL+ list were the genes Sfrs6 and Rbm39. The splicing factor Sfrs6 (alias SRp55) upregulates the anti-angiogenic VEGF isoform, an alternative splicing product involving the 8th exon of VEGF’s pre-mRNA . Since Sfrs6 displayed an OTF–/NL+ correlation pattern, this is suggestive of an anti-angiogenic condition associated to multiparity and low ovarian tumor incidence. VEGF levels itself were neither differentially expressed nor correlated to any phenotype in this study, but the CDC-like kinase 1 (Clk1), which mediates Sfrs6 activity on anti-angiogenic VEGF levels , was also part of the OTF–/NL+ list (see Additional file 3 “OTF&NL_correlated_transcripts.xls”). In addition, Rbm39 (alias CAPERalpha) is able to alter the VEGF-121/VEGF-189 ratio in breast cancer cells . Rbm39 was originally described as an ERα/β transcriptional coactivator . Analogous role has been reported for the DEAD-box RNA helicases Ddx17 (alias p72) and Ddx5 (alias p68). . Ddx17 was in a OTF–/NL+ fashion while Ddx5 plus Ddx26b, Ddx3x and Ddx42 were present in the overall 400-genes list.
Regarding the OTF+/NL– correlated genes, the predominant GO term was “protein folding” including Dnajb1, Hsp90aa1 and P4hb (see Additional file 4 and Figure 2). Dnajb1 is a member of the Hsp40 co-chaperone protein family of which its Drosophyla homolog participates in oogenesis . Related Hsp-40 genes Dnaja1, Dnajb6, and Dnajc7 were present in the overall 400-genes list. Hsp90aa1 expression is up-regulated in ovarian endometriosis  while P4hb is involved in post-translational modifications of procollagen synthesis . Other transcripts with OTF+/NL– correlation were Spin1 (Spindlin 1), a protein that associates to CPEB, a RNA-binding protein implicated in polyadenylation during meiotic progression in oocytes , and the transcription factor Cited1, reported as a FSH regulated gene in human granulosa cells .
All the five phenotypes studied are complex and many causal effects are certainly involved. It is quite possible that a large fraction of correlated transcripts may be simply bystanders but not lie behind the measured phenotype. Of the 56 genes listed in Additional file 4, seventeen are classified under the GO term “regulation of transcription”. These may be considered “master genes”, i.e. encoding for protein products that somehow interact with DNA regulatory sequences or transcriptional multiprotein complexes thus modulating the transcriptional activity of downstream genes. Among the OTF–/NL+ correlated genes with roles in regulation of transcription, we identified Fubp1, Rbm39 and Arnt which are directly linked to ovarian biology or disease (see Additional file 4). Fubp1 encodes a ssDNA binding protein that activates the “far upstream element” of c-myc thus stimulating its transcription. Interestingly, promoter regions of the OTF–/NL+ genes Ccnl1, Clk4, Coq10a, Ddx17, Ict1, Zc3h11a and Zc3h7a were found to contain binding sites for Arnt (data not shown). In addition, the genes Hnrpdl, Sltm, Tardbp, Ccnl1, Ccnl2, Dmtf1, Mll3, Mycbp2, Nfat5, Suv420h2, and 1810007M14 display diverse roles in cancer and developmental processes (see Additional file 4 for References). Special mention deserve the c-myc binding protein Mycbp2, the gene Sltm, which has been described as modulator of estrogen induced transcription, and the cyclins Ccnl1 and Ccnl2 which are transcriptional regulators of pre-mRNA splicing.
Phenotypic information obtained from independent studies on animals need to be integrated in order to reliably compare results across different mice colonies and laboratory set-ups. The sources of phenotypic data used in the present study are metadatabases in which uniformity criteria and manual curation has been imposed on assembled records. The Mouse Tumor Biology Database (MTB) contains both spontaneous and induced tumor information for over 50 inbred strains, which is primarily extracted from the literature, from tumor pathology images submitted by investigators, and from routine animal health screenings of mouse colonies at Jackson Laboratory . Then, is curated with the help of natural language processing tools to cope with increasing amounts of phenotype information in the literature . Similarly, the Mouse Phenome Database (MPD) has developed standards for deposition of phenotypic data of mice including strain purity, study design, animal age and statistical power. Contributors are requested to provide complete measurement descriptions, experimental protocols as well as housing, diet and health status of animals. MPD curates data and computes summary statistics for each measurement in all strains .
Results presented here were obtained with a linear regression analysis. However, interplaying gene networks linked to phenotypes may not necessarily follow linear relationships with regard to transcript levels. Recently, a few reports have attempted to identify non-monotone or non-linear phenotype-transcriptome associations. Lin et al. (2008) proposed the coefficient of multiple determination (R 2) of a natural cubic spline regression model . In a related work, three correlation methods (Pearson, Spearman and Hoeffding’s D) were compared to analyze co-expressed genes. Hoeffding’s D dependence measure was found to be the best suited to identify nonlinear and non-monotonic associations . These types of analytical approaches are needed to uncover causal phenotype-transcriptome connections that do not follow obvious linear behaviors.
Finally, since one of the strains showed a much higher OTF than the other three, we were interested in search for stronger gene links with this phenotype in the SWR strain. A t-test conducted between SWR versus the remaining 3 strains resulted in 530 statistically significant clones (see Additional file 1, figure 1). Of these, 373 clones were common with the regression test while 280 were coincident with the ANOVA test. The overlap between the 3 tests resulted in a list of 266 clones having a functional profile that resembled the terms described in Additional file 1, Table 2 for the regression results. On the other hand, 143 clones were exclusive in the t-test. Reduction of the latter subset resulted in 92 unique gene identities which were subjected to a GO analysis summarized in Additional file 1, table 3. The combined 10-terms list suggests the involvement of intracellular vesicle traffic, protein sorting and actin cytoskeleton dynamics in the observed high OTF of SWR strain. Indeed, oocyte meiotic maturation involves events related to spindle assembly. In somatic cells, chromosome segregation errors during mitosis may contribute to cancer development and progression . The genes App, Aplp2 and Appbp2, all related to the amyloid beta precursor protein, were recurrent in Additional file 1, table 3. The well known involvement of amyloid beta protein in Alzheimer’s disease pathogenesis may actually be due to a chromosomal instability process . Analogous mechanisms may partly explain the high OTF observed in SWR mice.
This work describes statistically significant variation in ovarian gene expression of four commonly studied mouse strains. We found that over 60% of these differences are linked to the biological variability observed in spontaneous ovarian tumor rates and reproductive parameters across strains. If NL is equivalent to multiparity, the inverse relationship detected between genes correlated to OTF and NL points to a protective effect of successive pregnancies. Post-transcriptional control of ovarian angiogenesis, folliculogenesis and oocyte maturation seems to be major contributors to this effect. Conversely, overexpression of protein folding genes might be considered as a susceptibility factor. These findings, in addition to the lack of association between OTF and LS -a measure of multiple ovulation- support repetitive menstrual cycling instead of repetitive ovulations as an important contributor to ovarian tumorigenesis. Further experimental research as well as development of bioinformatic and statistical tools to uncover complex phenotype-transcriptome associations is needed.
Mouse strains BALB/c, C57BL/6, FVB and SWR were maintained at the Laboratory Animal Sciences Program, SAIC-NCI Frederick (Frederick, MD), under protocols of the Institutional Animal Care and Use Committee (IACUC). Adult (8-weeks old) females grown from trio mating-established colonies were euthanised on late metestrus phase by cervical dislocation after gaseous CO2 administration. Whole ovaries from 4-5 animals were removed from surrounding adipose tissue, thereafter pooled and immediately frozen in liquid nitrogen. Total RNA was extracted with Trizol (Invitrogen, CA) and directly labeled as Cyanine-3 or Cyanine-5 fluorescent cDNA using reverse transcription under conditions previously described .
NIA-15K mouse cDNA microarrays were used. This is a curated collection consisting of 15,261 clones derived from expression libraries obtained from pre- and peri-implantation embryos, E12.5 female gonad/mesonephros and newborn ovaries . Microarrays were spotted at the Laboratory of Molecular Technology, SAIC-NCI Frederick (Frederick, MD) with a BioRobotics Microgrid arrayer (Genomic Solutions, MI). Hybridization conditions and washes have been described elsewhere . Samples were co-hybridized against a whole-newborn mouse total RNA as common reference sample using a replicated dye-swap design. A total of 24 microarray hybridizations were performed. TIFF images were captured with a GenePix 4000B fluorescent scanner (Molecular Devices, CA) and then saved for further analysis.
Scanned microarray images were extracted as GPR files using the GenePix 5.0 software and uploaded to the NCI’s Microarray Database (“mAdb”; http://nciarray.nci.nih.gov). Data files containing updated gene annotation were subjected to local (“print-tip”) loess normalization, scale adjustment, and filtering/imputation of missing values with the “DNMAD” and “PreProcessor” tools at the GEPAS server (http://www.gepas.org). Reproductive and spontaneous ovarian tumor records were extracted from the MPD at http://www.jax.org/phenome and from the MTB at http://tumor.informatics.jax.org/mtbwi/index.do databases, respectively. Tumor data corresponds to the “highest reported tumor frequency” in all literature records collected in MTB for each strain. Continuous tumor and reproductive data for the 4 mouse strains (see Table 1) were correlated to gene expression log2 ratios using linear regression analysis under multiple-test control (FDR indep) with the tool Pomelo II using 200,000 permutations (http://pomelo2.bioinfo.cnio.es/). The ANOVA test among the 4 strains and a t-test between SWR and the 3 remaining strains were also conducted with Pomelo II using 200,000 permutations. Gene functionality was primarily assessed with Gene Ontology (GO) terms using hypergeometric tests conducted with WebGestalt (http://bioinfo.vanderbilt.edu/webgestalt). Literature mining with HUGO approved gene symbols, associated aliases and keywords, was carried out in PubMed queried through GeneCards (http://www.genecards.org) and with SciMiner (http://jdrf.neurology.med.umich.edu/SciMiner/).
Primer pairs design, cDNA preparation, thermocycling conditions and equipment has been previously described . Quantification of mRNAs was based on CT values, which is defined as the PCR cycle at which an increase in reporter fluorescence above baseline signal can be detected. Normalization was done with the 18S rRNA as reference transcript assayed under identical conditions respective to the gene of interest in both the test and the reference RNA samples. The ΔΔCT-Sample value (ΔΔCT-Sample = ΔΔCT-Sample - ΔCT-Reference) was transformed by taking the result of the expression: If 2(-ΔΔCT) - 1 > 0 then the result = 2(-ΔΔCT) - 1 or else the result = -1/2(-ΔΔCT) This calculation converted the linear range for down regulation from 1→0 to 0→(-∞), and up regulation from 1→(+∞) to 0→(+∞) in the log2 scale.
UU was a fellow at the ORFD Program, OIA, National Cancer Institute (NCI) at National Institutes of Health (NIH). This work was funded in part with Federal Funds from the NCI, NIH under contract N01-C0-12400. The bioinformatic section of this paper was supported by the grant I205/09-2 from Vicerrectoria de Investigacion y Desarrollo (VID), Universidad de Chile. The pilot Microarray facility at ICBM, Facultad de Medicina, Universidad de Chile has been supported by the grant Mecesup UCH0115. Travel funding to attend the 2009 X-Meeting was granted by VID, Universidad de Chile.
This article has been published as part of BMC Genomics Volume 11 Supplement 5, 2010: Proceedings of the 5th International Conference of the Brazilian Association for Bioinformatics and Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/11?issue=S5.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.