Reconstructing differentially co-expressed gene modules and regulatory networks of soybean cells
© Zhu et al.; licensee BioMed Central Ltd. 2012
Received: 15 March 2012
Accepted: 22 August 2012
Published: 31 August 2012
Current experimental evidence indicates that functionally related genes show coordinated expression in order to perform their cellular functions. In this way, the cell transcriptional machinery can respond optimally to internal or external stimuli. This provides a research opportunity to identify and study co-expressed gene modules whose transcription is controlled by shared gene regulatory networks.
We developed and integrated a set of computational methods of differential gene expression analysis, gene clustering, gene network inference, gene function prediction, and DNA motif identification to automatically identify differentially co-expressed gene modules, reconstruct their regulatory networks, and validate their correctness. We tested the methods using microarray data derived from soybean cells grown under various stress conditions. Our methods were able to identify 42 coherent gene modules within which average gene expression correlation coefficients are greater than 0.8 and reconstruct their putative regulatory networks. A total of 32 modules and their regulatory networks were further validated by the coherence of predicted gene functions and the consistency of putative transcription factor binding motifs. Approximately half of the 32 modules were partially supported by the literature, which demonstrates that the bioinformatic methods used can help elucidate the molecular responses of soybean cells upon various environmental stresses.
The bioinformatics methods and genome-wide data sources for gene expression, clustering, regulation, and function analysis were integrated seamlessly into one modular protocol to systematically analyze and infer modules and networks from only differential expression genes in soybean cells grown under stress conditions. Our approach appears to effectively reduce the complexity of the problem, and is sufficiently robust and accurate to generate a rather complete and detailed view of putative soybean gene transcription logic potentially underlying the responses to the various environmental challenges. The same automated method can also be applied to reconstruct differentially co-expressed gene modules and their regulatory networks from gene expression data of any other transcriptome.
Genes and proteins in a cell are often organized as a network of interacting modules (e.g. biological pathways) in order to carry out their biological functions. For instance, multiple proteins may form a stable protein complex to regulate gene expression or interact transiently to transduce biological signals. Similarly, a number of genes involved in the same biological process may show coordinated regulation in order to respond effectively to biotic and abiotic stresses. Identifying and characterizing the functional modules (e.g. co-regulated genes and their transcription logic) in a cell would be a natural and necessary approach to studying biological mechanisms underlying various cell activities. Genome-wide profiling of transcriptomes by high-throughput microarray and RNA-sequencing techniques can generate a dynamic, global view of gene expression reflecting gene regulation activities under various biological conditions, which provides necessary information for developing and testing computational modeling methods to decipher transcriptional responses underlying various cellular and molecular processes.
As in, the whole regulation machinery of a cell can be dissected into a number of transcription regulatory modules. A transcription module is generally comprised of several transcription factors (TFs) and a group of target genes collaboratively or alternatively regulated by the TFs in a combinatorial way. Upon internal or external cellular stimuli, transcription factors of a module may be activated to either up- or down-regulate the target genes in order to respond to the stimuli. The changes in expression levels of target genes and transcription factors captured by microarray techniques can be combined with other genomic data, such as the sequence information and functional annotations of all the genes in order to reversely infer co-regulated genes and their regulators. Accurate prediction of transcription regulatory modules can generate valuable testable hypotheses for designing biological experiments to identify genes and interactions important for biological phenotypes and to elucidate cellular mechanisms underlying various biological conditions and environmental stresses.
Several computational methods have been developed to construct gene regulatory networks or modules from gene expression data[2, 4–6]. The Bayesian probabilistic network method can integrate multiple sources of observed information such as gene expression, known transcription factors, and known DNA binding motifs with a probabilistic inference framework to infer co-regulated genes and their putative regulators – transcription factors. The method was successfully applied to the microarray gene expression data of a model species - Saccharomyces cerevisiae - measured in multiple biological conditions, identifying a number of highly confident gene regulatory modules. Thus, it is very desirable to develop a general tool that implements and adapts this method to construct gene regulatory modules from gene expression data of any species perturbed by any biological condition, such as plants which have very large, but less well studied genomes and transcriptomes.
Some existing gene regulatory network inference methods require prior biological information about relationships between transcription factors and target genes. Dana et al. used mutual information to evaluate relations between a target and its active regulators. This method often limits its search on a set of prior candidate transcription factors, whereas our method (MULTICOM-GNET) can consider all differentially expressed TFs shown in data or even all the known TFs of a species without the need of such prior knowledge. Yao et al. developed a maximum likelihood method to prune a rough gene regulatory network based on microarray data. The initial network was constructed from the potential TF-gene regulatory pairs obtained by mining the literature and databases. Thus, the method might not be effectively applied to species with limited gene regulatory knowledge, such as most plants. Joshi et al. developed the method for network inference by automatically selecting centroid-like clusters and their TFs. In this method, clustering of genes and assignment of TFs were two separated steps, where our method optimizes the two steps iteratively.
Although a few computational methods were designed to predict transcription regulators and their target genes in Arabidopsis thaliana[3, 8], the computational prediction of gene regulatory networks for plant species is still at an early stage, partially due to lack of bioinformatic tools or integration methods to combine gene expression data with other data sources to study co-expressed gene modules. Specifically, very little work has been done to construct gene regulatory networks for soybean, an important agricultural crop[9–11], despite the huge amount of gene expression data accumulated for this species during the last several years.
With the availability of the complete genome sequence of the soybean and numerous subsequent annotations of soybean genes and proteins (e.g. SoyDB, a function annotation database of all putative transcription factors and SoyKB, a comprehensive all-inclusive web resource for soybean), it is important and also possible to develop and integrate a set of bioinformatic methods to reliably construct gene regulatory modules by integrating the vast soybean gene expression data with functional genomics data. In this direction, we designed and developed a modular protocol to integrate a set of complementary bioinformatic methods for gene expression data preprocessing, differential expression analysis, gene expression clustering, co-regulated gene module and regulator construction, DNA binding motif identification, and gene function prediction to construct and validate gene regulatory modules. The approach combines both transcriptomic and genomic data to improve gene regulatory network construction. We applied the approach to the gene expression data of soybeans derived from various stress conditions. The analysis produced 32 gene regulatory modules with a high co-expression correlation and function coherence. Approximately half of these modules could be partially validated by the literature. The results demonstrate that our approach can be reliably applied to specific, large-scale expression data of a complicated eukaryotic transcriptome to elucidate the underlying biological mechanisms and processes. The testable gene regulatory networks not only explain the gene expression data and previously known biological evidence but also, perhaps more importantly, can be used to formulate hypotheses that can be tested to generate new biological understanding.
The input data required by our approach includes the soybean genome sequence and gene annotations, a list of the candidate transcription factors (TF) curated in SoyDB, and the gene expression profiles calculated from the microarray data of soybean cells from a number of stress-induced experiments.
Microarray gene expression data
The gene expression data used to construct the gene regulatory modules is the publicly available Affymetrix microarray data of soybean cells measured under a number of stress treatments. The RNAs used to generate the data were isolated from multiple soybean tissues, such as leaves and roots. The data include expression measurements of 61,169 gene probes on 99 microarrays. The data was grouped into eight sets corresponding to eight categories of stress treatments, which are 1) iron deficient, 2) Phytopthora sojae infected hypocotyl, 3) RNAi storage protein suppression, 4) RNAi oleosion suppression, 5) inoculated with the nitrogen fixing symbiont, Bradyrhizobium japonicum 6) inoculated with the fungal pathogen, Phakopsora pachyrhizi, 7) Syncytium infected, and 8) infected with the soybean cyst nematode parasite, Heterodera glycines. Each set may include several sub-sets of expression data measured with different levels of stress treatment. The total number of treatments for all eight stress categories is 35, each of which may have a few microarray replicates. The average expression value of a gene in multiple microarray replicates was used to represent the expression value of the gene under the treatment.
Candidate transcription factors
All the 5671 transcription factors (TF) curated in SoyDB, which had been automatically classified into 63 TF families by hidden Markov models, were initially selected as candidate gene regulators. The 5480 TFs that actually had expression profiles in the microarray data were used to construct gene regulatory networks.
Soybean genomics data
We retrieved protein sequences of 46,430 highly confident putative genes from the soybean genome database for gene and protein function prediction, and extracted gene sequences and their 500 upstream DNA sequences from start codon for analysis of DNA binding sites according to the genome locations of the gene probes downloaded from the Affymetrix website (http://www.affymetrix.com).
Differential gene expression analysis
The microarray data were normalized into gene expression profiles using the RMA algorithm in GeneSpring 10GX. The signal of a probe was normalized by median signal value, i.e., the median of the logarithm expression values of each probe from all samples was subtracted from the logarithm expression value in each sample. Genes with normalized expression values > 3 or < −3 were selected as differentially expressed genes. A total of 10,618 genes (more precisely gene probes), a union of differentially expressed genes in all treatment conditions, were used in gene regulatory network construction. This list included putative transcription factor genes.
Co-expressed gene clustering and regulatory network construction
As in, a regulatory module includes several TFs and a number of genes whose expression is presumably regulated by TFs collaboratively in a series of biological conditions (e.g. stress treatments). It is assumed that the TFs regulate the expression of the genes in a module through the change of their own expression level, which may also be transcriptionally regulated. This assumption is an incomplete simplification of the complex regulatory logic of TFs, because some TFs may not be regulated at the transcriptional level but at a post-translational level, e.g. phosphorylation of TFs. In this work, our method mainly considers the case that the expression profiles of TFs provide information about their activity levels, although other kinds of regulation (e.g., phosphorylation) can be added into our method if data are available. Based on the expression profile – a vector of expression levels under different biological conditions, the expression levels of a TF were clustered into either two or three categories (1: highly expressed, 0: normally expressed, -1: lowly expressed) using the K-means clustering algorithm, where the number of categories equals the number of types (>3, <−3, or between) of the expression situation. The set of transcription factors are assumed to regulate the expression of the genes in a module through a path in the binary decision tree composed of the TFs as internal nodes and condition sub-groups as leaf nodes (Figure1). A regulatory path from the root node to the leaf node can be interpreted as a series of binary queries on the expression level (up-regulated or not OR down-regulated or not) of internal nodes (i.e., TFs) under treatment conditions leading to the observed expression levels of the genes in the leaf node under the same treatment conditions. Therefore, the regulatory decision tree represents the combinatorial logic by which the TFs regulate the expression of the genes in the module under different treatment conditions.
In order to reduce the complexity of gene regulatory network construction, all the differentially expressed genes were clustered using the K-means algorithm, aiming to assign genes exhibiting similar expression patterns across all the treatment conditions into the same cluster. The number of clusters (k) was chosen based on how average correlation coefficients of expression values of genes in the same clusters changed with cluster numbers or the average size of clusters (i.e., number of genes). The knee in the plot of the correlation coefficient versus the average size, which represents the most drastic change of the balance between the two factors, was used to determine the number of clusters. In our experiment, the genes were initially clustered into 100 clusters (see Additional file1: Figures S1 and S2).
Starting from the initial gene clusters, the gene regulatory modules were constructed in an iterative two-step manner, including (1) constructing a binary tree consisting of several TFs that can best interpret the expression of the genes in a cluster, and (2) re-assigning genes into clusters whose regulatory tree can explain their expression patterns best, i.e., with maximum likelihood. The two steps were alternated until the likelihood of the gene expression data was maximized. Given a gene cluster, a regulatory decision tree was constructed by recursively selecting TFs to divide experimental conditions into two sub-groups such that the expression values of the genes in each group are more coherent. Specifically, the experimental conditions are separated into two sub-groups according to the expression level of a TF in the conditions, i.e., the conditions where a TF was highly expressed (resp. lowly expressed) were assigned to one sub-group and others to the other sub-group. Assuming the expression values in each sub-group of conditions obey the normal distribution, the probability of the expression value (x) of a gene i (gi) is calculated as , where μ is the mean expression value in the sub-group, σ the standard deviation of expression values in the sub-group, and x the expression value of the gene gi in a condition assigned to the sub-group. The product of the probabilities of the expression values of every gene under every condition is considered the probability that the TF regulates this group of conditions. The product was further transformed by the logarithm function into a likelihood score. The TF that divided the group of conditions with the highest likelihood score was selected. The sub-group of conditions can be further divided by another TF in the same fashion until the gene expression values grouped together are similar enough or the maximum level of partitions had been reached. The TFs were selected to divide conditions from the nodes of the regulatory tree, where each node has two branches corresponding to its two expression states: highly expressed (resp. lowly expressed) or not-highly-expressed (resp. not-lowly-regulated). A branch connects one TF node (parent node) in the upper level to another TF node (child node) in the lower level that was selected to divide a sub-group generated by the parent TF node. The first TF selected to divide the whole group of genes was the root of the tree. The TFs that do not have TF children nodes directly connect to the sub-groups of conditions divided by them. Generally a regulatory decision tree has one to five levels of TF nodes. A path from the root node to a leaf node forms a regulatory logic, i.e., a list of combinatorial states of TF nodes on the path, which regulates the expression of genes in a sub set of conditions represented by the leaf node.
After a gene regulatory tree was constructed for every gene cluster, a gene re-assignment procedure was used to assign each gene to a cluster whose regulatory tree best explained its expression values in all the treatment conditions as follows:
Assuming that a regulatory tree divides experimental conditions into a set of sub-groupsand the mean and standard deviation of the gene expression values in a sub-group Sk were μk and σk, respectively, according to the normal distribution, the probability (likelihood) of the expression values of a gene gi under all treatment conditions was calculated as:
where x ij was the expression value of gi under condition j. This calculation of likelihood was based on the simplified assumption that normalized expression values of a gene under different conditions were independent, which was a largely reasonable approximation if gene expression experiments were carried out independently. However, the approximation did not account for the co-variation between expression values. The log-likelihood of gene gi was. After the log-likelihood scores of gi were calculated with respect to all regulatory trees, gi was assigned to the regulatory tree yielding the highest likelihood score. The genes assigned to the same regulatory tree formed a cluster. In this way, all the genes were clustered into a new set of clusters. The log-likelihood of a gene cluster can be represented as:. For each group, a regulatory tree was constructed according to the same protocol described above. The regulatory tree construction step and the gene re-assignment step were iterated until the assignment of genes did not change. When the protocol stopped, the final clusters of genes and their regulatory trees formed a set of predicted gene regulatory modules.
We used MULTICOM[18, 19], a protein structure and function prediction software, to predict the functions of the differentially expressed genes in order to study the function coherence of genes in regulatory modules. MULTICOM tried to predict three categories of functions (i.e. biological processes (BP), molecular function (MF), and cellular component (CC)) in terms of the Gene Ontology (GO) definition for each differentially expressed gene. The predictions were presented as both GO terms and human readable descriptions.
Statistical consistency analysis of modules
We evaluated the coherence of each regulatory module from two aspects: 1) Pearson correlation coefficient of expression values of genes in the module; and 2) GO gene function enrichment. According to gene expression values, we calculated the Pearson correlation coefficient of every two genes of the module, and then we averaged all the pairwise correlation coefficients within the module as the correlation coefficient of the module.
Gene function enrichment analysis
To analyze the biological relevance of each module, we studied the functional consistency of genes assigned to the same module.The predicted GO function terms of the genes were compared according to the hypergeometric distribution[21, 22] in order to check if some biological process terms or molecular function terms were significantly more enriched than by chance. If one gene probe on the microarray chip corresponded to more than one gene (e.g. multiple isoforms of a gene), all the corresponding genes were considered for functional analysis.
DNA binding site analysis
As in, we used a DNA binding site analysis to further validate if the predicted TFs of a module likely regulated the genes in the module. The locus information corresponding to each probe was downloaded from http://www.affymetrix.com. We extracted the upstream 500 bp sequences of genes within every predicted gene module. We used MEME to analyze 500 bp upstream sequences of the genes in each module to identify significantly conserved sequence motifs consisting of 6 to 18 nucleotides, which were considered potential sites for TFs to bind. The sites with p-value < 0.001 were selected as putative motifs. The putative motifs were compared with known TF binding motifs in a transcription factor database JASPAR by TomTom. TomTom ranked the motifs in the target database according to their similarity with the putative motifs of the genes. The annotated transcription factors of the motifs in the JASPAR database that were significantly similar to the putative motifs of the genes in the module were examined against the predicted regulators of the module. If they shared similar function or belonged to the same TF family, the predicted regulators were considered more likely to regulate the genes in the module.
The overall analysis of all differentially expressed genes
A large-scale analysis of predicted gene regulatory modules
32 Gene regulatory modules with high correlation coefficients and other supports
Most enriched biological process*
positive regulation of release of sequestered calcium ion into cytosol
vegetative to reproductive phase transition of meristem
steroid biosynthetic process
regulation of cell shape
ATP-dependent chromatin remodeling
photosynthesis, light harvesting
regulation of flower development
skeletal system morphogenesis
carotenoid biosynthetic process
lactate metabolic process
establishment or maintenance of polarity of embryonic epithelium
flavonoid biosynthetic process
flavonoid biosynthetic process
acetyl-CoA biosynthetic process
auxin mediated signaling pathway
carbohydrate metabolic process
superoxide metabolic process
PSII associated light-harvesting complex II catabolic process
secondary cell wall biogenesis
DNA replication initiation
A detailed case study of some gene regulatory modules
Enriched gene ontology functions in gene regulatory module 1
Enriched GO function
Number of genes
P:regulation of transcription, DNA-dependent
P:cellular cell wall organization
P:ethylene mediated signaling pathway
P:regulation of cell shape
P:hydrogen peroxide catabolic process
P:peptidoglycan biosynthetic process
F:sequence-specific DNA binding
F:transcription factor activity
F:transcription regulator activity
F:protein dimerization activity
F:AT DNA binding
F:double-stranded DNA binding
Enriched functional terms of gene regulatory module 2
GO function name
P:flavonoid biosynthetic process
P:response to biotic stimulus
P:hydrogen peroxide catabolic process
F:electron carrier activity
F:dihydrokaempferol 4-reductase activity
F:flavanone 4-reductase activity
In order to assess the robustness of our module construction process against thresholds of selecting DEGs, we compared the two modules above with the corresponding modules constructed from DEGs with absolute normalized expression values > 4 and 5, respectively. Based on threshold 4, 49 of 95 genes (more precisely gene probes on the microarray chip) in the gene regulatory module (based on threshold 3 above) were retained, and 41 out of 49 genes were grouped into the same cluster. Three TFs predicted for the cluster based on threshold 3 were the same as those in module 1 based on threshold 3. If the threshold were increased to 5, only 8 of 95 genes would be kept, which was probably too small to form a cluster in this case. As for the gene regulatory module 2, based on threshold 5, 47 out of 56 genes were selected, and 34 of them were grouped into the same cluster. Furthermore, 29 out of 34 genes in the cluster overlapped with the cluster of module 2. And interestingly, all 9 TFs predicted for the cluster were exactly the same as those in module 2. This analysis demonstrates that our method can produce rather consistent results with a reasonable range of thresholds of selecting differentially expressed genes. The threshold may be decided based on both the stringency of statistics significance and how muchbiological information is needed.
DNA binding site validation
In this work, we developed and applied a series of computational methods to construct the gene regulation networks involved in soybean’s responses to a number of stress conditions. The soybean is a good choice for this demonstration since, although of major agronomic importance, this plant has not been as extensively studied as other model species, such as yeast, mouse, human and Arabidopsis. The networks consisted of a list of gene regulation modules that included both a set of genes expressed similarly under the various stress conditions and several putative TF regulators. The regulatory networks were reconstructed from gene expression data in conjunction with other data sources such as genomics data and protein function data. In the same computational framework, a large number of predicted gene regulatory modules were validated by gene expression coherence, function enrichment analysis, TF-gene binding potentials, and the literature[27–29, 36–46]. The results demonstrate that the approach can infer detailed and testable gene regulatory modules that link TFs, regulated genes, and biological conditions together, which can be used to design targeted biological experiments, such as gene knock out, chip-Seq DNA binding analysis, protein-protein interaction test, and RNA interference of TFs. Particularly, predictive hypotheses may be used to validate predicted TFs that have not yet been confirmed.
In addition to the capability of integrating multiple data sources, applicable to gene expression data of any species, and considering both spatial and temporal information in different tissues and multiple replicates, our approach is unique in focusing on differentially expressed genes in the process of gene regulatory network construction, which appears to reduce its complexity and increase its biological relevance. In the future, we plan to integrate more data sources such as protein-protein interaction, protein phosphorylation, proteomics, and miRNA data to improve the accuracy of gene regulatory network construction. We also aim to elucidate the relationships between gene regulatory modules through shared genes and TFs, and to construct metabolic and signal transduction pathways involving genes in the same regulatory modules.
In this work, we developed and applied a modular bioinformatics procedure to automatically construct gene regulatory networks for any species by integrating microarray gene expression data with other data sources. We benchmarked the method on the gene expression data of soybean. It effectively predicted a number of partially validated gene regulatory modules. The experiment demonstrates that the bioinformatics approach can be used to automatically predict gene regulatory networks from large-scale transcriptomic and genomic data for a species with large genome and transcriptome under specific biological conditions. The predicted networks can be used to generate biological hypotheses for experimental design and validation.
The work was partially supported by a NSF grant (IOS1025752 to GS, DX and JC) and a NIH grant (1R01GM093123 to JC) and a NSF grant (DBI1149224 to JC) and a grant from the US Department of Energy, Office of Science, Biological and Environmental Research Program (grant No. DE-SC0004898 to GS, DX, and JC).
- Li J, Liu ZJ, Pan YC, Liu Q, Fu X, Cooper NG, Li Y, Qiu M, Shi T: Regulatory module network of basic/helix-loop-helix transcription factors in mouse brain. Genome Biol. 2007, 8 (11): R244-PubMed CentralView ArticlePubMedGoogle Scholar
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003, 34 (2)): 166-176.View ArticlePubMedGoogle Scholar
- Yao CW, Hsu BD, Chen BS: Constructing gene regulatory networks for long term photosynthetic light acclimation in Arabidopsis thaliana. BMC Bioinforma. 2011, 12: 335-View ArticleGoogle Scholar
- Tanay A, Shamir R: Computational expansion of genetic networks. Bioinformatics. 2001, 17 (Suppl 1): S270-S278.View ArticlePubMedGoogle Scholar
- Pe'er D, Regev A, Tanay A: Minreg: inferring an active regulator set. Bioinformatics. 2002, 18 (Suppl 1): S258-S267.View ArticlePubMedGoogle Scholar
- Pe'er D, Regev A, Elidan G, Friedman N: Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001, 17 (Suppl 1): S215-S224.View ArticlePubMedGoogle Scholar
- Joshi A, De Smet R, Marchal K, Van de Peer Y, Michoel T: Module networks revisited: computational assessment and prioritization of model predictions. Bioinformatics. 2009, 25 (4): 490-496.View ArticlePubMedGoogle Scholar
- Needham CJ, Manfield IW, Bulpitt AJ, Gilmartin PM, Westhead DR: From gene expression to gene regulatory networks in Arabidopsis thaliana. BMC Syst Biol. 2009, 3: 85-PubMed CentralView ArticlePubMedGoogle Scholar
- Henkel J: Soy. Health claims for soy protein, questions about other components. FDA Consum. 2000, 34 (3): 13–15-18–20.PubMedGoogle Scholar
- Han BZ, Rombouts FM, Nout MJ: A Chinese fermented soybean food. Int J Food Microbiol. 2001, 65 (1–2): 1-10.View ArticlePubMedGoogle Scholar
- Carpenter J, Gianessi L: Agricultural biotechnology: updated benefit estimates. 2001, Washington DC: National Center for Food and Agricultural PolicyGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, et al: Genome sequence of the palaeopolyploid soybean. Nature. 2010, 463 (7278): 178-183.View ArticlePubMedGoogle Scholar
- Wang Z, Libault M, Joshi T, Valliyodan B, Nguyen HT, Xu D, Stacey G, Cheng J: SoyDB: a knowledge database of soybean transcription factors. BMC Plant Biol. 2010, 10: 14-PubMed CentralView ArticlePubMedGoogle Scholar
- Joshi T, Patil K, Fitzpatrick MR, Franklin LD, Yao Q, Cook JR, Wang Z, Libault M, Brechenmacher L, Valliyodan B, et al: Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics. 2012, 13 (Suppl 1): S15-PubMed CentralView ArticlePubMedGoogle Scholar
- Guttikonda SK, Trupti J, Bisht NC, Chen H, An YQ, Pandey S, Xu D, Yu O: Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases. BMC Plant Biol. 2010, 10: 243-PubMed CentralView ArticlePubMedGoogle Scholar
- Juhola M: On machine learning classification of otoneurological data. Stud Health Technol Inform. 2008, 136: 211-216.PubMedGoogle Scholar
- Satopaa V, Albrecht JR, Irwin D, Raghavan B: ICDCS Workshops: 2011. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. 2011, Minneapolis Minnesota: 31st IEEE International Conference on Distributed Computing Systems WorkshopsGoogle Scholar
- Wang Z, Eickholt J, Cheng J: MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics. 2010, 26 (7): 882-888.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J: A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS One. 2011, 6 (3): e17906-PubMed CentralView ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. Database issue)View ArticlePubMedGoogle Scholar
- Rice JA: Mathematical Statistics and Data Analysis. 2007, Belmont CA: Thomson Learning, Duxbury Press, 3Google Scholar
- Rivals I, Personnaz L, Taing L, Potier MC: Enrichment or depletion of a GO category within a class of genes: which test?. Bioinformatics. 2007, 23 (4): 401-407.View ArticlePubMedGoogle Scholar
- Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009, 37: W202-W208. Web Server issuePubMed CentralView ArticlePubMedGoogle Scholar
- Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004, 32: D91-D94. Database issuePubMed CentralView ArticlePubMedGoogle Scholar
- Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS: Quantifying similarity between motifs. Genome Biol. 2007, 8 (2): R24-PubMed CentralView ArticlePubMedGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011, 39: D561-D568. Database issuePubMed CentralView ArticlePubMedGoogle Scholar
- Riechmann JL, Meyerowitz EM: The AP2/EREBP family of plant transcription factors. Biol Chem. 1998, 379 (6): 633-646.PubMedGoogle Scholar
- Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008, 36: D1009-D1014. Database issue)PubMed CentralView ArticlePubMedGoogle Scholar
- Hichri I, Barrieu F, Bogs J, Kappel C, Delrot S, Lauvergeat V: Recent advances in the transcriptional regulation of the flavonoid biosynthetic pathway. J Exp Bot. 2011, 62 (8): 2465-2483.View ArticlePubMedGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676.View ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012, 40: D109-D114. Database issuePubMed CentralView ArticlePubMedGoogle Scholar
- Gough J, Chothia C: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 2002, 30 (1): 268-272.PubMed CentralView ArticlePubMedGoogle Scholar
- Brohl S, Lisowsky T, Riemen G, Michaelis G: A new nuclear suppressor system for a mitochondrial RNA polymerase mutant identifies an unusual zinc-finger protein and a polyglutamine domain protein in Saccharomyces cerevisiae. Yeast. 1994, 10 (6): 719-731.View ArticlePubMedGoogle Scholar
- Stein T, Kricke J, Becher D, Lisowsky T: Azf1p is a nuclear-localized zinc-finger protein that is preferentially expressed under non-fermentative growth conditions in Saccharomyces cerevisiae. Curr Genet. 1998, 34 (4): 287-296.View ArticlePubMedGoogle Scholar
- Slattery MG, Liko D, Heideman W: The function and properties of the Azf1 transcriptional regulator change with growth conditions in Saccharomyces cerevisiae. Eukaryot Cell. 2006, 5 (2): 313-320.PubMed CentralView ArticlePubMedGoogle Scholar
- Hong-Bo S, Li-Ye C, Ming-An S: Calcium as a versatile plant signal transducer under soil water stress. BioEssays: news and reviews in molecular, cellular and developmental biology. 2008, 30 (7): 634-641.View ArticleGoogle Scholar
- Weingartner M, Subert C, Sauer N: LATE, a C(2)H(2) zinc-finger protein that acts as floral repressor. The Plant journal: for cell and molecular biology. 2011, 68 (4): 681-692.View ArticleGoogle Scholar
- Urao T, Yamaguchi-Shinozaki K, Urao S, Shinozaki K: An Arabidopsis myb homolog is induced by dehydration stress and its gene product binds to the conserved MYB recognition sequence. Plant Cell. 1993, 5 (11): 1529-1539.PubMed CentralView ArticlePubMedGoogle Scholar
- Vetter M, Harris W: Neural Development: bHLH Genes. Encyclopedia of Life Sciences (ELS). 2003, Chichester: John Wiley & Sons, LtdGoogle Scholar
- Tran LS, Nakashima K, Sakuma Y, Simpson SD, Fujita Y, Maruyama K, Fujita M, Seki M, Shinozaki K, Yamaguchi-Shinozaki K: Isolation and functional analysis of Arabidopsis stress-inducible NAC transcription factors that bind to a drought-responsive cis-element in the early responsive to dehydration stress 1 promoter. Plant Cell. 2004, 16 (9): 2481-2498.PubMed CentralView ArticlePubMedGoogle Scholar
- Stockel J, Bennewitz S, Hein P, Oelmuller R: The evolutionarily conserved tetratrico peptide repeat protein pale yellow green7 is required for photosystem I accumulation in Arabidopsis and copurifies with the complex. Plant Physiol. 2006, 141 (3): 870-878.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiao Y, Meyerowitz EM: Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control. Mol Syst Biol. 2010, 6: 419-PubMed CentralView ArticlePubMedGoogle Scholar
- Mentzen WI, Peng J, Ransom N, Nikolau BJ, Wurtele ES: Articulation of three core metabolic processes in Arabidopsis: fatty acid biosynthesis, leucine catabolism and starch metabolism. BMC Plant Biol. 2008, 8: 76-PubMed CentralView ArticlePubMedGoogle Scholar
- Paciorek T, Friml J: Auxin signaling. J Cell Sci. 2006, 119 (Pt 7): 1199-1202.View ArticlePubMedGoogle Scholar
- Saibo NJ, Lourenco T, Oliveira MM: Transcription factors and regulation of photosynthetic and related metabolism under environmental stresses. Ann Bot. 2009, 103 (4): 609-623.PubMed CentralView ArticlePubMedGoogle Scholar
- Bosch M, Mayer CD, Cookson A, Donnison IS: Identification of genes involved in cell wall biogenesis in grasses by differential gene expression profiling of elongating and non-elongating maize internodes. J Exp Bot. 2011, 62 (10): 3545-3561.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.