Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis

  • Je-Keun Rhee1, 2,

    Affiliated with

    • Je-Gun Joung3,

      Affiliated with

      • Jeong-Ho Chang4,

        Affiliated with

        • Zhangjun Fei3, 5 and

          Affiliated with

          • Byoung-Tak Zhang1, 2, 6Email author

            Affiliated with

            BMC Genomics200910(Suppl 3):S29

            DOI: 10.1186/1471-2164-10-S3-S29

            Published: 3 December 2009

            Abstract

            Background

            Gene regulation is a key mechanism in higher eukaryotic cellular processes. One of the major challenges in gene regulation studies is to identify regulators affecting the expression of their target genes in specific biological processes. Despite their importance, regulators involved in diverse biological processes still remain largely unrevealed. In the present study, we propose a kernel-based approach to efficiently identify core regulatory elements involved in specific biological processes using gene expression profiles.

            Results

            We developed a framework that can detect correlations between gene expression profiles and the upstream sequences on the basis of the kernel canonical correlation analysis (kernel CCA). Using a yeast cell cycle dataset, we demonstrated that upstream sequence patterns were closely related to gene expression profiles based on the canonical correlation scores obtained by measuring the correlation between them. Our results showed that the cell cycle-specific regulatory motifs could be found successfully based on the motif weights derived through kernel CCA. Furthermore, we identified co-regulatory motif pairs using the same framework.

            Conclusion

            Given expression profiles, our method was able to identify regulatory motifs involved in specific biological processes. The method could be applied to the elucidation of the unknown regulatory mechanisms associated with complex gene regulatory processes.

            Background

            One of the major challenges in current biology is to elucidate the mechanism governing the gene expression. Gene expression programs depend mainly on transcription factors which bind to upstream sequences by recognizing short DNA motifs called transcription factor binding sites (TFBSs) to regulate their target gene expression [1]. Although many regulatory motifs have been identified, large amount of functional elements still remain unknown [2].

            Many genome-wide approaches have been developed in attempt to discover regulatory motifs from upstream sequences. The early computational approach for identifying regulatory motifs is based on statistical analyses using only upstream sequences of genes. Statistical methods such as maximum-likelihood estimation or Gibbs sampling, are effective for searching directly significant sequence motifs from multiple upstream sequences [3, 4]. Several computational approaches based on machine learning methods have also been implemented. A SOM (self-organizing map)-based clustering method can find regulatory sequence motifs by grouping relevant sequence patterns [5] and a graph-theoretic approach has tried to identify regulatory motifs by searching the maximum density subgraph [6].

            More advanced approaches have been developed that can identify regulatory motifs by linking gene expression profiles and motif patterns. The main advantage of these approaches is that they can identify motifs correlated to specific biological processes. Most early trials used a unidirectional search, such as approaches that search for shared patterns with upstream sequences in a set of co-expressed genes that were found by clustering algorithms [7, 8] or those that determine whether genes with common regulatory elements are co-expressed [9, 10]. In addition, it is also possible to link motifs to gene expression patterns using linear regression models or regression trees [11, 12]. Recently, several techniques for a bidirectional search to detect the relationship between the regulatory motifs and the gene expression profiles have been emerged [13, 14]. They search regulatory motifs more efficiently than unidirectional approaches since they search similar expression patterns and regulatory motifs correlated to them simultaneously.

            In this study, we propose a novel bidirectional approach using a kernel-based method, kernel CCA (kernel canonical correlation analysis), to analyze the relationship between regulatory sequences and gene expression profiles [1517]. The expression and sequence features are mapped from the original input space to a higher dimension space using a kernel trick, and the relationship between the two projected objects is interpreted to identify highly correlated motifs (Figure 1). Our method has advantages that it can detect core motifs relevant to a specific cellular process without the additional efforts of clustering and intensive motif sampling process in upstream sequences.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Fig1_HTML.jpg
            Figure 1

            Basic scheme of the kernel CCA. The sequence and expression data are transformed to Hilbert space by φ function. By taking inner products, u exp and u seq were derived, which maximize the correlation between the upstream sequences and the expression profiles.

            We applied the kernel CCA to a paired set of upstream sequence motifs of genes and their expression profiles in yeast (Saccharomyces cerevisiae) cell cycle, and explored significant relationships between motifs and expression profiles. We also searched for regulatory motifs correlated with specific expression patterns. Our method retrieved regulatory motifs that play an important role in cell cycle regulation including several well-known cell cycle regulatory motifs: MCB, SCB and SFF'. Furthermore, we identified motif pairs associated with the gene expression to construct a map of combinatorial regulation of regulators.

            Results and discussion

            We applied a computational method, kernel CCA, to the identification of novel transcriptional regulatory elements. The main purpose of our experiments was to find regulatory motifs that were associated with gene regulation in specific biological processes. Using the kernel CCA, we first found highly correlated features between expression profiles and the sequence motifs. The key motifs in gene regulation were then identified from the weight scheme by the kernel CCA (see Methods section). Furthermore we demonstrate that it is possible for our method to be applied for identification of motif pairs using raw upstream sequences.

            Identification of the relationship between gene expression and known motifs

            We first explored the relationship between gene expression profiles and known motifs using a yeast gene expression dataset related to the cell cycle [18] and a set of known motifs (Table 1) extracted by AlignACE [9]. A total of 551 ORFs (open reading frames) in the expression dataset contained at least one known motif. In the parameter setting, the degree of polynomial kernel was set to 3, the parameter σ in Gaussian RBF kernel was 0.5, and the regularization parameter was 0.1. These parameters were chosen based on the parameter setting that produced a high correlation from multiple runs.
            Table 1

            Known regulatory motifs in yeast (Saccharomyces cerevisiae)

            Motif

            RAP1

            RPN4

            GCN4

            MCB

            HAP234

            MIG1

            AFT1

            STRE'

            CCA

            CSRE

            PHO4

            STE12

            HSE

            ABF1

            ATRepeat

            GAL

            Leu3

            LYS14

            MET31-32

            OAF1

            PAC

            PDR

            PHO

            REB1

            STRE

            ECB

            ndt80 (MSE)

            Yap1

            SCB

            Gcr1

            zap1

            MCM1'

            MCM1

            SFF

            SFF'

            BAS1

            Ume6 (URS1)

            SWI5

            ALPHA1'

            ALPHA1

            ALPHA2'

            ALPHA2

              
            The results from the kernel CCA were visualized using the CC1 (first canonical correlation) score (Figure 2). In Figure 2, each point corresponds to a gene, and a cloud of the diagonal points illustrated the correlation between the expression and the motifs. The shape of diagonal points and the high correlation coefficient (0.996) indicated that the kernel CCA was able to find the close relationship between the expression profiles and the sequence motifs. We then performed the linear canonical correlation analysis using the same datasets. The correlation coefficient (0.612) obtained from the linear CCA was much lower. As shown in Additional file 1, the linear CCA could not identify the significant correlation between expression profiles and motifs. This further supports that kernel CCA improve significantly in finding the correlation between the two datasets.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Fig2_HTML.jpg
            Figure 2

            Relationship between gene expression profiles and regulatory sequence motifs. (a) The plot shows the correlation between gene expression profiles and the regulatory sequence motifs. Each dot represents one gene in the dataset, and x-axis means the value of u exp , y-axis is u seq . (b) The plot is a close-up view of the boxed area in (a).

            The motifs were searched by the weight function of Equation 6 (see Methods section) with the model obtained by the kernel CCA and the top ranked motifs are shown in Table 2. SWI5 motif, a binding site of SWI5 protein, has the highest weight value. SWI5 has been known to act in G1 phase and in the M/G1 boundary in the cell cycle [19, 20]. SFF' motif is a binding site of FKH1 transcription factor that affects the expression of genes controlling the cell cycle during the G2-S phase change [21]. The MCB motif is one of the well-known motifs in the yeast cell cycle as a binding site in the MBF protein complex. MBF protein is composed of MBP1 and SWI6, and MBP1 is a DNA binding component while SWI6 has regulatory roles. It is well known that the MBF protein complex regulates the transcription of many genes in the late G1 phase [19, 22]. ALPHA2 protein also plays a role in the cell cycle. It operates synergistically with MCM1 protein to repress the expression of its target genes [23, 24]. MCM1 protein is a key regulator involved in the transcription of several M/G1 genes during the cell cycle [10, 22, 25]. A high weight value of ALPHA2 is supported by the evidence that ALPHA2 protein binds to the MCM1 protein and influences the regulation of other cell cycle-related genes [26, 27]. Using the set of known motifs, our results are consistent with previous reports, validating the analysis method employed.
            Table 2

            The list of top ranked motifs based on the weight scheme by the kernel CCA

            Motif

            Weight

            Function

            Reference

            SWI5

            0.89026

            Transcription activation in G1 phase

            [19, 20]

            SFF'

            0.45399

            FKH1 binding site that regulate the cell cycle

            [21]

            MCB

            0.29633

            MBF binding site that activates in late G1 phase

            [19, 22]

            LYS14

            0.21796

            Lysine biosysthesis pathway

             

            ALPHA2

            0.16532

            Encoding a homeobox-domain

            [23, 24]

            To further validate the result of top-ranked motifs extracted by kernel CCA, we compared the weights obtained from cell cycle-related ORF set with those obtained from randomly selected set. We performed the same procedure using random ORFs that are not known to be related to the cell cycle. Figure 3 shows the highly weighted motifs obtained from our method in cell cycle-related gene set and non cell cycle set, and the relative positions of those motifs are presented in the weight distribution of all motifs. The weight values obtained from random set were significantly lower than those obtained from cell cycle-related ORF set. We could infer that the significantly correlated motifs were not extracted from these random datasets. In summary, our method could identify the regulatory motifs that have high weights indicating high correlation between the upstream sequences and the gene expression profiles.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Fig3_HTML.jpg
            Figure 3

            Weight distributions for MCB, SFF' and SWI5 motifs derived from cell cycle and non cell cycle-related datasets. The dotted line indicates the weight distribution from the non-cell cycle datasets and the solid line from cell cycle datasets.

            Identification of cell cycle-related motifs

            We then applied the linear kernel to the motif sequence data containing a total of 1,024 features (window size l = 5) extracted from the raw upstream sequences of genes and Gaussian RBF kernels with parameter σ values of 0.3 to the expression data. The regularization parameter was set to 0.1. These parameters are also empirically chosen based on the fact that they produced a high correlation. Figure 4 shows the CC1 score which represents the correlation between the expression profiles and the sequence patterns. When the linear kernel was applied to the sequence dataset, the expression data is closely related to the motif data using the raw sequences of 5-mers.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Fig4_HTML.jpg
            Figure 4

            Correlation between expression profiles and motifs derived by using the raw upstream sequence data. The plot on (b) is an enlargement of the boxed area in (a).

            The 5-mer motif patterns with high weights are listed in Table 3. The 5-mer with the highest weight is 5'-GCGTG-3', which is similar to the MCB motif (5'-ACGCGT-3'). As described previously, MCB is an important motif involved in the cell cycle. The second-ranked sequence (5'-CGTGT-3') matched to the first five bases of the ALPHA2 motif sequence. From the second component, we also found several significant sequences, including a consensus sequence (5'-CGCGT-3') that is identical to the MCB motif (5'-ACGCGT-3'). This further confirmed that the MCB motif affects gene expression in the cell cycle. Another interesting motif is 5'-CCACG-3', which is a sequence block with one base shift from the known SCB motif (5'-CACGAAA-3'). The SCB motif is a binding site of the SBF protein, which is a complex of SWI4 (a DNA-binding component) and SWI6 (a regulatory component) [22], and SBF is a major regulator in the G1/S transition. In each component, the list of 100 motif patterns with high weights is provided in Additional file 2.
            Table 3

            High-scored motifs in the first and the second components using 5-mer raw upstream sequences

            Sequence

            Motif Description

            Weight

            Component

            Rank

            GCGTG

            MCB (ACGCGT)

            0.079567

            1

            1

            CGTGT

            MATalpha2 (CRTGTWWWW)

            0.075340

            1

            2

            CATGT

            MATalpha2 (CRTGTWWWW)

            0.046299

            1

            12

            CCACG

            SCB (CACGAAA)

            0.018992

            2

            4

            CGCGT

            MCB (ACGCGT)

            0.017870

            2

            5

            GTGTT

            MATalpha2 (CRTGTWWWW)

            0.016595

            2

            9

            Combinational effects of regulatory motifs

            We searched the motif pairs that have synergistic or co-regulatory combination effects in the yeast cell cycle. The regulatory mechanisms of eukaryotes are highly complex since most genes are normally synergistically regulated by different transcription factors. Therefore, identifying the synergistic motif combinations can contribute to systematically understanding the regulatory circuit.

            In the present study, using the kernel CCA we calculated the weight value for each motif pair of 42 known motifs. The heat map of weight values of all motif pairs is provided in Additional file 3. Table 4 presents the top ten motif pairs with the highest weight values and with occurrence of more than ten in all the investigated upstream sequences. It also shows ECRScores which represent gene expression coherence. All these scores are relatively high compared to the previously identified synergistic motif pairs (ECRScores > 0.075) [9]. As shown in Table 4, the pair with the highest weight value is MCB-MCM1. According to a previous study, MCB and MCM1 were characterized as a significantly cooperative motif pair in the regulation of the cell cycle [28]. Other highly ranked pairs, such as ECB-ALPHA2 and MCM1-ALPHA2, are already known that they are required for transcriptional regulation of early cell cycle genes. MCM1 activates transcription of ECB (early cell cycle box)-dependent genes during M/G1 phase [29], and the MCM1 protein can interact with the ALPHA2 factor regulating the expression of mating-type-specific genes [26, 27]. These evidences support that two ALPHA2-related motif pairs act synergistically in the expressional regulation of the yeast cell cycle process. The REB1 motif, a binding site of REB1 protein, is frequently found among the pairs of motifs with the highest weights. The REB1 protein is an RNA polymerase I enhancer-binding protein and binds to genes transcribed by both RNA polymerase I and RNA polymerase II [30]. It is a general regulator rather than a condition specific one. Therefore, it is reasonable that this protein shows a high frequency in our results. REB1-SWI5, REB1-MCM1' and REB1-ALPHA1 motif pairs are already identified as acting synergistically in the yeast cell cycle regulation [3133]. Most of our results are consistent with the previous reports. In addition, it's worth noting that several previously uncharacterized motif pairs were identified by our kernel CCA methods.
            Table 4

            The top 10 ranked motif pairs and their ECRScores

            Weight

            Motif Pair

            ECRScore

            # of ORFs

            Reference

            2.5368

            MCB

            MCM1

            0.390

            15

            [28]

            2.5018

            MCB

            ECB

            0.439

            12

             

            2.0177

            PHO

            MCM1'

            0.088

            17

             

            1.848

            ECB

            ALPHA2

            0.088

            14

             

            1.7535

            MCM1

            ALPHA2

            0.074

            17

            [26, 27]

            1.7263

            ATRepeat

            MCM1

            0.076

            12

             

            1.6995

            PHO

            ECB

            0.127

            11

             

            1.6823

            REB1

            SWI5

            0.099

            14

            [31]

            1.6476

            REB1

            MCM1'

            0.115

            13

            [32, 33]

            1.4256

            REB1

            ALPHA1

            0.067

            15

            [33]

            Conclusion

            We presented a novel method that can identify the candidate conditional specific regulatory motifs by employing kernel-based methods. The application of the kernel CCA enables us to detect correlations between heterogeneous datasets, consisting of upstream sequences and expression profiles. From a data-mining perspective, our work is regarded as a new approach for detecting important features from regulatory sequences and gene expression profiles. We demonstrated that major motifs in a specific biological process can be extracted by a CC score via modelling a close relationship between two datasets related to gene regulation.

            As genome-wide datasets of various types become available, it's important to analyze these datasets in an integrated manner [34]. It is possible to come up with novel biological hypotheses by integrating diverse biological resources generated for specific research purposes. In these aspects, the kernel CCA is regarded as a useful method that can extract the biological factors with significant roles by integrating different types of biological data. Many studies for identifying motifs have been based on sequence conservation or sequence characteristics, regardless of the biological processes. Therefore our method can be regarded as complementary approach in the analysis of gene regulation.

            Our method found important motifs related to the cell cycle by using raw upstream sequences as well as known motif sets. In the present study we used the raw sequences of window size, l = 5. If we enlarged the window size, the dimension for sequence features increased exponentially, whereas the frequency of motifs decreased. Although the window size used in our experiments was shorter than the length of several known transcription factor binding sequences, it was long enough to obtain worthwhile results.

            In the future research, we will apply the proposed method to diverse gene expression datasets, especially cancer-related datasets. The cancer-related regulatory program can be elucidated by analyzing regulatory motifs from a set of enriched genes in the cancer transcriptome [35]. Using the kernel CCA, a correlation analysis between regulatory sequences and the cancer transcriptome may directly catch regulatory motifs related to the abnormal gene regulatory program.

            Methods

            Investigation of the relationship between regulatory sequence motifs and expression profiles

            Kernel CCA (Canonical correlation analysis) is a version of the nonlinear CCA, where the kernel trick is utilized to find nonlinearly correlated features from two datasets [1517]. CCA is a classical multivariate statistical method for finding linearly correlated features from a pair of datasets [36]. Suppose there is a pair of multivariates x and y, CCA finds a pair of linear transformations such that the correlation coefficient between extracted features is maximized. However, if there is a nonlinear relationship between the variates, CCA does not always extract useful features.

            Kernel CCA offers a solution for overcoming the linearity by first projecting the data into a higher dimensional feature space. While CCA is limited to linear features, kernel CCA can capture nonlinear relationships. Kernel CCA has been used for several applications including text retrieval and biological data analysis [15, 37].

            Figure 1 illustrates the basic scheme of the kernel CCA for our integrated analysis of DNA sequence motif and gene expression data. Using kernel CCA, we tried to find maximally correlated features between the gene expression and the sequence motifs. Here, a gene set X is represented by two separate profiles in terms of its transcriptional behaviour and upstream sequences, x exp and x seq . These are composed of the expression profile, x exp = (e 1, e 2, ..., e N ) and the sequence profile, x seq = (m 1, m 2, ..., m M ) of each gene. Here e i (1 ≤ iN) is the expression value of the gene in the i-th sample or experimental condition from the microarray dataset, and m j (1 ≤ jM) denotes the occurrence frequency of the j-th sequence motif in the upstream region of the gene. For the detection of correlated features between the two datasets, x exp and x seq are first mapped to Hilbert space, H, by function φ. That is, each x is projected into two directions, f exp and f seq , in Hilbert space according to its representation:
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ1_HTML.gif
            (1)
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ2_HTML.gif
            (2)
            where ⟨•,•⟩ denotes the dot product. Kernel CCA looks for maximally correlated features between x exp and x seq :
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ3_HTML.gif
            (3)
            where λ exp and λ seq are regularization parameters, var(•) means a variance and cov(•,•) is a covariance between two variables. The kernel CCA can be given by solving a generalized eigenvalue problem:
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ4_HTML.gif
            (4)
            where I denotes the identity matrix, K exp is the kernel matrix for expression profiles, and K seq is the kernel matrix for sequence motifs. When given α exp and α seq as the solution of the above generalized eigenvalue problem with the largest eigenvalue, canonical correlation scores (CC scores) for x seq and x exp are estimated by u seq = K seq α seq and u exp = K exp α exp , respectively. The CC scores are based on the low dimensional-mapping of genes in terms of two separated representations and can be used to show the salient correlation between the two. Once we obtain the α vector, the weights of the motif and expression profile, W seq and W exp , are obtained as following:
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ5_HTML.gif
            (5)
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ6_HTML.gif
            (6)

            A high weight value of the specific sequence motif means that the motif is strongly correlated with the expression patterns of genes whose upstream region includes the motif and whose CC scores are high. If a weight of a specific motif has a high absolute value, the motif is more likely to play a regulatory role in the specific biological process. The kernel CCA was implemented using Matlab.

            Preparation of the gene expression datasets

            Expression profiles of all ORFs (open reading frames) during the yeast cell cycle that consists of 18 time points in the alpha factor synchronization case [18] were used as the expression dataset. To map from the expression profiles to high dimensional space, we converted them to the kernel matrix. We applied a gaussian RBF kernel to the expression profile matrix by:
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ7_HTML.gif
            (7)

            where σ is a parameter and function d(•,•) is a Euclidean distance. The x and x' mean the two different instances.

            Preparation of the gene sequence datasets

            The sequence data was used in two ways. In the first case, we used the sequences of a total of 42 known motifs (Table 1) extracted by Pilpel [9]. We then scanned the upstream regions of ORFs for the presence of these motifs using the AlignACE program [3]. The sequence profile was represented by the occurrence of these motifs in the promoters of each gene in the genome.

            In the second case, we analyzed the relationship between the expression profiles and the raw upstream sequences. We extracted ~1 kb upstream sequences of each gene. From these sequences, we calculated the frequency of all possible l-mers in each gene. For l = 5, each gene had 1,024 (= 45) different base combinations. The sequence profile was encoded in the frequency of l-mers.

            We applied the kernel as http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_IEq1_HTML.gif to the sequence data. When d = 1, it is the linear kernel, and when d > 1, it is the polynomial kernel.

            Measurement of the effect of motif pairs

            To measure the effect of the motif pairs, we defined the ECRScore (Expression Coherence coRrelation Score) calculated by a Pearson correlation coefficient of expression profiles for all possible pairs of genes whose upstream regions had the two motifs, m i and m j :
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-S3-S29/MediaObjects/12864_2009_Article_2587_Equ8_HTML.gif
            (8)

            where N(m i m j ) is the number of all pairs of genes whose upstream regions have the two motifs, and N τ (m i m j ) is the number of gene pairs whose correlation coefficient is larger than the threshold τ. The threshold was chosen based on the fifth percentile of the distribution for correlation coefficients of randomly sampled gene pairs.

            Note

            Other papers from the meeting have been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, available online at http://​www.​biomedcentral.​com/​1471-2105/​10?​issue=​S15.

            Declarations

            Acknowledgements

            This work was supported in part by KEIT through the MARS project (IITA-2009-A1100-0901-1639), KRF Grant funded by the Korean Government (MOEHRD) (KRF-2008-314-D00377) and the BK21-IT program funded by Korean Government (MEST). JHC has been supported by Korean Ministry of Information and Communications under 2005 IT scholarship program. The ICT at Seoul National University provides research facilities for this study.

            This article has been published as part of BMC Genomics Volume 10 Supplement 3, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Computational Biology. The full contents of the supplement are available online at http://​www.​biomedcentral.​com/​1471-2164/​10?​issue=​S3.

            Authors’ Affiliations

            (1)
            Graduate Program in Bioinformatics, Seoul National University
            (2)
            Center for Biointelligence Technology (CBIT), Seoul National University
            (3)
            Boyce Thompson Institute for Plant Research, Cornell University
            (4)
            Konan Technology Inc
            (5)
            USDA Robert W. Holley Center for Agriculture and Health
            (6)
            School of Computer Science and Engineering, Seoul National University

            References

            1. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002,298(5594):799–804.View ArticlePubMed
            2. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M: Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 2005,434(7031):338–345.View ArticlePubMed
            3. Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 2000,296(5):1205–1214.View ArticlePubMed
            4. Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994, 2:28–36.PubMed
            5. Mahony S, Hendrix D, Golden A, Smith TJ, Rokhsar DS: Transcription factor binding site identification using the self-organizing map. Bioinformatics 2005,21(9):1807–1814.View ArticlePubMed
            6. Fratkin E, Naughton BT, Brutlag DL, Batzoglou S: MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 2006,22(14):e150–157.View ArticlePubMed
            7. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999,22(3):281–285.View ArticlePubMed
            8. Brazma A, Jonassen I, Vilo J, Ukkonen E: Predicting gene regulatory elements in silico on a genomic scale. Genome Res 1998,8(11):1202–1215.PubMed
            9. Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001,29(2):153–159.View ArticlePubMed
            10. Park PJ, Butte AJ, Kohane IS: Comparing expression profiles of genes with similar promoter regions. Bioinformatics 2002,18(12):1576–1584.View ArticlePubMed
            11. Bussemaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression. Nat Genet 2001,27(2):167–171.View ArticlePubMed
            12. Keles S, Laan M, Eisen MB: Identification of regulatory elements using a feature selection method. Bioinformatics 2002,18(9):1167–1175.View ArticlePubMed
            13. Segal E, Yelensky R, Koller D: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 2003,19(Suppl 1):i273–282.View ArticlePubMed
            14. Jeffery IB, Madden SF, McGettigan PA, Perriere G, Culhane AC, Higgins DG: Integrating transcription factor binding site information with gene expression datasets. Bioinformatics 2007,23(3):298–305.View ArticlePubMed
            15. Hardoon DR, Szedmak S, Shawe-Taylor J: Canonical correlation analysis; An overview with application to learning methods. Technical Report CSD-TR-03–02 Royal Holloway University of London 2003.
            16. Bach FR, Jordan MI: Kernel independent component analysis. Technical Report UCB//CSD-10–1166 UC Berkeley 2001.
            17. Akaho S: A kernel method for canonical correlation analysis. International meeting of Psychometric Society (IMP2001) 2001.
            18. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998,9(12):3273–3297.PubMed
            19. Dohrmann PR, Butler G, Tamai K, Dorland S, Greene JR, Thiele DJ, Stillman DJ: Parallel pathways of gene regulation: homologous regulators SWI5 and ACE2 differentially control transcription of HO and chitinase. Genes Dev 1992,6(1):93–104.View ArticlePubMed
            20. Dohrmann PR, Voth WP, Stillman DJ: Role of negative regulation in promoter specificity of the homologous transcriptional activators Ace2p and Swi5p. Mol Cell Biol 1996,16(4):1746–1758.PubMed
            21. Morillon A, O'Sullivan J, Azad A, Proudfoot N, Mellor J: Regulation of elongating RNA polymerase II by forkhead transcription factors in yeast. Science 2003,300(5618):492–495.View ArticlePubMed
            22. Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, Volkert TL, Wyrick JJ, Zeitlinger J, Gifford DK, Jaakkola TS, et al.: Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 2001,106(6):697–708.View ArticlePubMed
            23. Vershon AK, Johnson AD: A short, disordered protein region mediates interactions between the homeodomain of the yeast alpha 2 protein and the MCM1 protein. Cell 1993,72(1):105–112.View ArticlePubMed
            24. Zhong H, McCord R, Vershon AK: Identification of target sites of the alpha2-Mcm1 repressor complex in the yeast genome. Genome Res 1999,9(11):1040–1047.View ArticlePubMed
            25. Lydall D, Ammerer G, Nasmyth K: A new role for MCM1 in yeast: cell cycle regulation of SW15 transcription. Genes Dev 1991,5(12B):2405–2419.View ArticlePubMed
            26. Keleher CA, Passmore S, Johnson AD: Yeast repressor alpha 2 binds to its operator cooperatively with yeast protein Mcm1. Mol Cell Biol 1989,9(11):5228–5230.PubMed
            27. Mead J, Zhong H, Acton TB, Vershon AK: The yeast alpha2 and Mcm1 proteins interact through a region similar to a motif found in homeodomain proteins of higher eukaryotes. Mol Cell Biol 1996,16(5):2135–2143.PubMed
            28. Das D, Banerjee N, Zhang MQ: Interacting models of cooperative gene regulation. Proc Natl Acad Sci USA 2004,101(46):16234–16239.View ArticlePubMed
            29. MacKay VL, Mai B, Waters L, Breeden LL: Early cell cycle box-mediated transcription of CLN3 and SWI4 contributes to the proper timing of the G(1)-to-S transition in budding yeast. Mol Cell Biol 2001,21(13):4140–4148.View ArticlePubMed
            30. Morrow BE, Johnson SP, Warner JR: Proteins that bind to the yeast rDNA enhancer. J Biol Chem 1989,264(15):9061–9068.PubMed
            31. Banerjee N, Zhang MQ: Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Res 2003,31(23):7024–7031.View ArticlePubMed
            32. Tsai HK, Lu HH, Li WH: Statistical methods for identifying yeast cell cycle transcription factors. Proc Natl Acad Sci USA 2005,102(38):13532–13537.View ArticlePubMed
            33. Hvidsten TR, Wilczynski B, Kryshtafovych A, Tiuryn J, Komorowski J, Fidelis K: Discovering regulatory binding-site modules using rule-based learning. Genome Res 2005,15(6):856–866.View ArticlePubMed
            34. Kasturi J, Acharya R: Clustering of diverse genomic data using information fusion. Bioinformatics 2005,21(4):423–429.View ArticlePubMed
            35. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Barrette TR, Ghosh D, Chinnaiyan AM: Mining for regulatory programs in the cancer transcriptome. Nat Genet 2005,37(6):579–583.View ArticlePubMed
            36. Hotelling H: Relations between two sets of variates. Biometrika 1936, 28:312–377.
            37. Yamanishi Y, Vert JP, Nakaya A, Kanehisa M: Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 2003,19(Suppl 1):i323–330.View ArticlePubMed

            Copyright

            © Rhee et al. 2009

            This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            Advertisement