Identification of novel pathway partners of p68 and p72 RNA helicases through Oncomine meta-analysis

Background The Oncomine™ database is an online collection of microarrays from various sources, usually cancer-related, and contains many "multi-arrays" (collections of analyzed microarrays, in a single study). As there are often many hundreds of tumour samples/microarrays within a single multi-array results from coexpressed genes can be analyzed, and are fully searchable. This gives a potentially significant list of coexpressed genes, which is important to define pathways in which the gene of interest is involved. However, to increase the likelihood of revealing truly significant coexpressed genes we have analyzed their frequency of occurrence over multiple studies (meta-analysis), greatly increasing the significance of results compared to those of a single study. Results We have used the DEAD-box proteins p68(Ddx5) and p72(Ddx17) as models for this coexpression frequency analysis as there are defined functions for these proteins in splicing and transcription (known functions which we could use as a basis for quality control). Furthermore, as these proteins are highly similar, interact together, and may be to some degree functionally redundant, we then analyzed the overlap between coexpressed genes of p68 and p72. This final analysis gave us a highly significant list of coexpressed genes, clustering mainly in splicing and transcription (recapitulating their published roles), but also revealing new pathways such as cytoskeleton remodelling and protein folding. We have further tested a predicted pathway partner, RNA helicase A(Dhx9) in a reciprocal meta-analysis that identified p68 and p72 as being coexpressed, and further show a direct interaction of Dhx9 with p68 and p72, attesting to the predictive nature of this technique. Conclusion In summary we have extended the capabilities of Oncomine™ by analyzing the frequency of coexpressed genes over multiple studies, and furthermore assessing the overlap with a known pathway partner (in this case p68 with p72). We have shown our predictions corroborate previously published studies on p68 and p72, and that novel predictions can be easily tested. These techniques are widely applicable and should increase the quality of data from future meta-analysis studies.

from many patient samples [2]. These "multi-arrays" usually utilise either normal or tumour biopsy samples (or compare both together), from various tissue sources.
One function of Oncomine™ is a search tool where the user's chosen gene is correlated in expression, within multi-arrays, with other genes in the array (both high and low expression, over all the samples in the multi-array).
For example searching p72 (DDX17) gives several correlations in many multi-arrays. Focusing within the study Whitney_normal there is a high correlation with expression of fibrillarin, over the 147 blood samples tested (Figure 1A). In samples where p72 expression was diminished, so was fibrillarin, and conversely when p72 expression was high, so is that of fibrillarin. This result is Oncomine studies utilised and methodology of analysis Figure 1 Oncomine studies utilised and methodology of analysis. (A) Screenshot example of Oncomine™ output of p72 (DDX17) coexpression with fibrillarin (FBL) in one multi-array study, covering 147 samples. p72 is X-axis and fibrillarin is Y-axis. (B) Procedure employed for meta-analysis of 19 different multi-arrays after searching for either p68 or p72, extracting the top 400 coexpressed genes from each multi-array, and comparing for frequency of repetition. (C) Chosen multi-arrays to be studied for both p68 and p72. Multi-array made more significant given that p72 and fibrillarin have previously been shown to interact together [3].
Correlations like this can show if proteins may be in the same pathway (e.g. both coregulated together, or one directly affecting the other), although it cannot show more than association. In an attempt to further increase the stringency of Oncomine™ to elude to these pathways we chose to test the DEAD-box proteins p68 and p72 because they are highly similar proteins that interact together and have been shown to be involved in defined cellular functions including splicing and transcription, which can then be used as a quality control measure of this technique [4][5][6][7][8][9][10]. Also as p68 and p72 are so similar there is the possibility that they may to some extent be functionally redundant.
In total this means that we can perform a meta-analysis of p68 coexpressed genes independent to that of p72, then compare the results for overlap ( Figure 1B). If the gene lists were to give a significant overlap then this would act to support the notion that the technique is highly selective. Our results reveal that, not only does this technique corroborate previously published data on p68 and p72, it also generates testable predictions of novel pathway partners of p68 and p72.

Overlapping coexpressed genes of p68 and p72
Multi-arrays chosen for meta-analysis had many individual samples/microarrays, indicating that a good correlation coefficient given by Oncomine™ is already highly significant. Figure 1C indicates the chosen multi-array studies for p68 and p72. Note that there is almost a 50% overlap of studies chosen.
Meta-analysis results, with frequency of 3 or more, for p68 yielded a higher volume of hits than for p72 (see Additional file 1). Both of these lists were compared for common genes and the common list was further assessed for ontology and full gene names (Table 1). Remarkably, we observed a large number of overlapping genes, indicative of the stringency employed in this technique.
Even when the stringency was further augmented by increasing the p68 frequency cut-off to 4 or more multiarrays (21% and above overlap within p68 multi-arrays), this lost almost 300 p68 hits, but only reduced the number of overlapping genes with p72 from 90 to 70 (Figure 2A). The highest frequency of overlap of p68 and p72 occurred in splicing, consistent with previous reports of their role in this process. Further validation of this technique was observed by the reciprocal gene hits of p68 and Analysis of overlap of p68 and p72 coexpressed genes Figure 2 Analysis of overlap of p68 and p72 coexpressed genes. (A) Venn diagram of overlap of frequency = 3 or more, genes from p68 and p72 analysis, and when p68 frequency is increased to 4 or more. (B) Ontology pie-chart of p68/p72 overlapping frequency = 3 or more, gene products. 16% 16% ? (GTPase) Rho-related BTB domain containing 3 ? -Unknown or unidentified gene product function 90 genes were identified to be both coexpressed with p68 and p72, and are arranged by function. For clarity all coexpressed gene products with a 30% or greater coexpression frequency correlation for either p68 or p72 are in bold. p72 (i.e. p72 was a positive for p68 and vice-versa), again consistent with their interaction within the same pathways.
The next most abundant function of p68 and p72 appeared to be in transcription ( Figure 2B), once more consistent with previous reports. This is especially interesting given that p68 and p72 were previously shown to act as coactivators for the nuclear receptor estrogen receptor α (ERα) transcription factor, and we have identified Xbox binding protein 1 (XBP1), associated with the ERα pathway. We have also identified 2 other nuclear receptor pathway proteins, the thyroid hormone receptor associated protein 2 (THRAP2) and the retinoic acid receptorrelated orphan receptor α (RORA) transcription factor.

RNA Helicase A(Dhx9) coexpresses and interacts with p68 and p72
A further interesting transcription-associated gene identified was RNA helicase A (DHX9), a member of a similar protein family to p68 and p72, all of which have been shown to interact with p300/CBP coactivators [6,[11][12][13]. The frequency for both p68 and p72 were observed to be high for RNA helicase A (almost 50% of multi-arrays for p68, and over 30% for p72).
For this reason a similar coexpression analysis was separately performed for DHX9. Surprisingly, not only were p68 and p72 reciprocally coregulated with DHX9, but over 50% of the p68:p72 overlapped positives were also coexpressed with DHX9 (47 out of 90 -see Additional file 2). This was powerful evidence linking Dhx9, p68 and p72 to similar pathways.
As this overlap was so high it was possible that p68 and p72 were functioning in the same complex as Dhx9. This was tested experimentally in HEK293 cells. With immunoprecipitation of either transiently transfected p68 or p72 we observed a clear interaction with endogenous Dhx9 ( figure 3A). Further imunoprecipitations of endogenous p68 and p72 from lysate of mouse liver confirmed the interaction with Dhx9 ( figure 3B). This was performed after incubation with RNaseA, indicating a protein:pro-tein interaction (as p68/p72/Dhx9 can all bind RNA). In the liver extract p68 and p72 also strongly immunoprecipitated a protein of 100 kDa, recognised by the Dhx9 antibody (figure 3B). It currently remains unclear if this is a different isoform of Dhx9 or a cross-reacting protein.
Altogether, these data both supported the hypothesis of p68/p72/Dhx9 existing within the same complex, and further acted as strong evidence of the predictive capabilities of the Oncomine™ analysis technique described here.
p68 and p72 interact directly with predicted pathway partner Dhx9

Other coexpressed genes of p68 and p72
Interestingly, there were 4 overlapped hits in the ubiquitin pathway (and one proteasome) which may be related to the observation that p68 is highly ubiquitinated in colon cancers [14]. p68 was also recently shown to be SUMO modified, specifically SUMO-2 by PIAS1 ligase [15]. Here we shown that PIAS1 is coexpressed with p68/p72, and SUMO-2 is coexpressed with p68.
p68/p72 have also recently been shown to interact in a complex with ILF3, hnRNPU, and hnRNPH1 for micro-RNA processing [16]. Here, these gene products are also shown to be highly coexpressed with p68 and p72, supporting their role in the same complex/pathway (furthermore DDX3X is identified here with p68 and is also part of this microRNA processing complex).
A new role for p68 and p72, suggested by our meta-analysis, might be in nuclear transport, given that a member of nuclear pore complex (Nup133) as well as nuclear import (transportin1) and export (exportin1) genes were identified as coexpressed genes.
Furthermore, coexpressed genes presented here are not limited to nuclear processes given that several cytoskeletal proteins are identified in the screening, implicating p68 and p72 in these processes (although probably indirectly as p68 and p72 are predominantly nuclear, perhaps acting via transcription or splicing). This is also true for endoplasmic reticulum (ER) or golgi proteins. Indeed, the RAB6A trafficking protein had the highest frequency overlap for p68 (almost 70% overlap), while being one of the lowest for p72 (16% overlap), possibly indicative of a functional difference between both. The family member RAB14 was also identified for both.
A further significant group of genes identified were involved in signal transduction, and may provide a start into analysis of regulation of p68 and p72 (although a meta-analysis like this can identify frequency of coexpression, it is impossible to say which protein may be regulating another, or indeed if both are targets of another protein).
Altogether the results of the overlapping coexpressed genes not only reiterate previous studies with either p68/ p72 but predict new potential pathways in which p68/p72 may act.

Selected non-overlapping coexpressed genes of p68 and p72
While p68 and p72 may be highly similar and involved in the same pathways, it remains likely that they are also involved in subtly different pathways. For this reason a similar ontology analysis was performed on genes that do not overlap between p68 and p72. However, given the extensive nature of the gene hits we selected all genes with frequency overlap above 30%, as well as some genes of interest from lower frequencies ( Table 2).
For p68 the genes above 30% generally fell into the same categories as previously, while there was only 1 gene identified for p72, with no obvious molecular function. Ofcourse the selected genes below 30% were chosen based on interest and common ontological groupings, and may not be representative. However, we note that for p68 more RAB family members are identified (RAB1A, RAB11B) as well as more ER proteins, particularly protein folding chaperones (Tapasin, Calnexin, Calreticulin).
With regard to transcription, p68 coexpressed with ELK3 and HDAC2 transcriptional repressors, while p72 coexpressed with CTBP1 and HDAC7 repressors. This might be relevant given that p68 and p72 have been shown to act as transcriptional repressors, hypothesised to have different mechanisms of action as they act in a promoter-specific manner [7]. However it has been shown that CTBP1 repressive function is antagonized by pinin [18], and here, both p68 and p72 also coexpress with pinin (PNN) [17]. p68 has also been shown to be involved in p53 coactivation [4], and here we identify a coexpressed p53 coactivator hnRNPK [19] for p68/p72 and the p53-induced protein 7 (LITAF), for p68. For other transcription roles for p68 there were more nuclear receptor pathway proteins including thyroid receptor interacting protein 8 (JMJD1C),THRAP1 (THRAP2 was identified above for both p68 and p72), estrogen receptor binding protein (ERBP), and the retinoic acid receptor alpha (RARA) transcription factor. p72 coexpressed with the ER-alpha repressor MTA1. We have also observed that p68 coexpressed gene ZNF9 is in the same pathway as p68/p72 coexpressed MBNL1, implicated in myotonic dystrophy [20].
For p72 we note that NonO (p54nrb) has been shown to interact with SFPQ/PSF [21] (SFPQ identified as coexpressed for both p68 and p72). Furthermore EDD (a ubiquitin E3 ligase), also identified here with p72, has been shown in a complex with SFPQ [22]. Remarkably p68 has also very recently been shown to interact in a complex with NonO and SFPQ/PSF [23], again confirming the validity of the technique described here.

Discussion
The technique described here has proven useful in increasing the stringency of Oncomine™ meta-analysis, and will prove to be widely applicable. Generally individual gene levels cannot be compared from one study to another, but the strength of our analysis is an inter-study comparison (meta-analysis) after an intra-study Oncomine™ analysis (coexpression gene search).
While we still retain the strongest 400 coexpressed genes from each multi-array, it becomes de-sorted when analyzing for frequency over different studies. An example is EDEM1 (involved in protein folding in the ER), which is consistently one of the strongest correlated genes with p72, while having only a 32% frequency overlap. The same is true for p68 and Sp3 transcription factor with a frequency overlap of 37%, and very highly coexpressed in these individual studies. Conversely, the technique described here is useful for comparison of coexpressed genes which may not always have a high coexpression coefficient, giving another advantage over analysis of single studies.
An interesting exception is RAB6A with p68 which has both the highest frequency overlap with p68 (68%) and is almost always within the first 100 genes coexpressed with p68 in individual multi-array studies. A further exception is RNA helicase A (DHX9) which again has a high frequency of overlap with p68 (47%) and usually is within the first 50 coexpressed genes with p68. We have also shown here for the first time an interaction by immunoprecipitation of p68 (and also p72), with Dhx9. Furthermore, the technique described here is most useful in clustering specific genes involved in pathways when meta-analysis hits from known interacting proteins can be overlapped. We observed with our example of p68 and p72 that the overlapping hits mainly clustered into the classes of ontology in which p68/p72 had already been reported, namely splicing and transcription, further acting as validation for this type of analysis. There also seems to be a more general role for p68 and p72 in nuclear receptor transcription pathways than first assumed (ERα pathway as above), for example JMJD1C, THRAP1, THRAP2, RARA, RORA, all coexpress with p68 and/or p72.
While it is clear that we have obtained a highly stringent list of potential pathway partners of p68 and p72, with regard to separable functions (i.e. non-overlapping genes of p68 and p72) we cannot say with confidence as genes generally clustered into the same pathways as for the overlapping list. This may be due to a high false-negative rate of this technique as we have used several levels of stringency, and will most likely exclude many true pathway partners of p68 and p72. However, this cost is offset by high quality results using our rigorous analysis.

Conclusion
It is apparent that we have increased the scope of the Oncomine™ database, by utilising frequency of coexpression (meta-analysis) over different multi-array studies to predict pathway partners of searched proteins. With regard to the p68 and p72 RNA helicases we have identi- GTPase activating RANGAP domain-like 1 ?-Genes with unknown function. Genes with > 30% frequency overlap are in bold.
All coexpressed but non-overlapping gene products for p68 and p72 over 30% frequency are shown (and are in bold). Selected coexpressed gene products below 30% are shown and were chosen based on interest and common ontology groups. fied a non-exhaustive list of gene products that are likely to be present in various pathways in which p68 and/or p72 act, both corroborating previous studies and making novel predictions. For one of these, RNA helicase A(Dhx9), we have shown there is a direct interaction with p68 and p72. Future experimental studies using this list as a reference point will reveal the validity of this technique.

Oncomine analysis
The following procedure was undertaken for meta-analysis (figure 1B): (1) Oncomine™ expression correlations were searched for p68 (DDX5) or p72(DDX17). (2) 19 different mult-arrays were chosen and the first 400 correlated genes within each multi-array were compared using Microsoft Excel, (separately for p68 and p72). Importantly, repetitive genes were then removed within each study, leaving only 1 representative per multi-array study. When a coregulated gene appeared in more than 3 multi-array experiments it was accepted as significant (3 = 16% frequency of the 19). These genes were taken as more significant than analysis of a single Oncomine™ output. Furthermore, given that the user cannot choose which multi-array will be given by Oncomine™ there was no attempt to specifiy different tissue types or cancer types. This had the advantage of giving a more generalised result of which pathways the proteins may be involved in, which was preferred for an initial study such as that performed here. (3) These sorted lists of coregulated genes given for p68 and p72 were compared for overlapping genes which added another level of stringency, and greatly increased the significance of the results. The genes listed were then investigated for ontology, and full gene/gene-product names, using a combination of Pubmed searches[24], Fatigo[25], and Genecards [26].
For endogenous co-immunoprecipitation liver was extracted from a 3 mth old male mouse and homogenised in buffer B (Brinkmann polytron). Lysis was allowed to happen, rotating at 4°C for 30 min. Sample was then centrifuged to remove debris and further incubated with RNa-seA, rotating at 4°C for 30 additional minutes, while preclearing lysate with protein G sepharose. 2 mg of this lysate was used with 3 μg of either p68 or p72 antibodies (Bethyl Laboratories) per immunoprecipitation, which were performed as above.