ChlamyNET: a Chlamydomonas gene co-expression network reveals global properties of the transcriptome and the early setup of key co-expression patterns in the green lineage
© Romero-Campero et al. 2016
Received: 31 July 2015
Accepted: 2 March 2016
Published: 12 March 2016
Chlamydomonas reinhardtii is the model organism that serves as a reference for studies in algal genomics and physiology. It is of special interest in the study of the evolution of regulatory pathways from algae to higher plants. Additionally, it has recently gained attention as a potential source for bio-fuel and bio-hydrogen production. The genome of Chlamydomonas is available, facilitating the analysis of its transcriptome by RNA-seq data. This has produced a massive amount of data that remains fragmented making necessary the application of integrative approaches based on molecular systems biology.
We constructed a gene co-expression network based on RNA-seq data and developed a web-based tool, ChlamyNET, for the exploration of the Chlamydomonas transcriptome. ChlamyNET exhibits a scale-free and small world topology. Applying clustering techniques, we identified nine gene clusters that capture the structure of the transcriptome under the analyzed conditions. One of the most central clusters was shown to be involved in carbon/nitrogen metabolism and signalling, whereas one of the most peripheral clusters was involved in DNA replication and cell cycle regulation. The transcription factors and regulators in the Chlamydomonas genome have been identified in ChlamyNET. The biological processes potentially regulated by them as well as their putative transcription factor binding sites were determined. The putative light regulated transcription factors and regulators in the Chlamydomonas genome were analyzed in order to provide a case study on the use of ChlamyNET. Finally, we used an independent data set to cross-validate the predictive power of ChlamyNET.
The topological properties of ChlamyNET suggest that the Chlamydomonas transcriptome posseses important characteristics related to error tolerance, vulnerability and information propagation. The central part of ChlamyNET constitutes the core of the transcriptome where most authoritative hub genes are located interconnecting key biological processes such as light response with carbon and nitrogen metabolism. Our study reveals that key elements in the regulation of carbon and nitrogen metabolism, light response and cell cycle identified in higher plants were already established in Chlamydomonas. These conserved elements are not only limited to transcription factors, regulators and their targets, but also include the cis-regulatory elements recognized by them.
The unicellular green alga Chlamydomonas reinhardtii (Chlamydomonas) is an important model organism for genomic and physiological studies in photosynthetic organisms. Due to its evolutionary position, it diverged from land-plants over a billion years ago, Chlamydomonas is considered a living representative of the photosynthetic organisms that gave rise to the green lineage . Specifically, it has been used as a model organism to study the establishment, conservation and divergence of key biological processes in photosynthetic organisms such as the photoperiod response [2–4]. Recently, Chlamydomonas has attracted substantial interest for biotechnological applications in the context of bio-fuel and bio-hydrogen production [5–7]. The main advantage of using Chlamydomonas over higher plants is that it does not compete for agricultural land use. Additionally, Chlamydomonas posseses powerful genetic tools, metabolic versatility and a haploid genome. However, an important disadvantage is the lack of sufficient functional and regulatory characterization of the molecular mechanisms underpinning these processes with biotechnological interest .
In order to overcome this limitation its genome was sequenced and it is currently in an advanced curated state [1, 9]. The availability of its genome has facilitated the use of Next Generation Sequencing techniques, specially RNA-seq, in order to study its complete transcriptome. This has produced a massive amount of data from a variety of genotypes grown under relevant physiological conditions [10–16]. However, these studies remain fragmented without producing global insights into the organization and regulation of the Chlamydomonas transcriptome. The first steps towards the use of molecular systems biology methodologies to characterize the Chlamydomonas transcriptome has been taken [17–19]. Nevertheless, one of the most widely used tools for the integration and study of massive amounts of transcriptomic data, gene co-expression networks, have not yet been developed for Chlamydomonas, while gene co-expression networks have been used successfully in many other photosynthetic organisms [20–22].
Gene co-expression networks integrate fragmented transcriptomic data obtained in different studies in order to characterize patterns of coordinated gene expression at the whole transcriptome level. In gene co-expression networks nodes represent genes, being nodes connected by an edge if the corresponding genes are significantly co-expressed across appropriately chosen genotypes and physiological conditions . Fundamental network concepts such as node degree, neighbourhood and clustering coefficient have important applications to unravel the organization and functioning of the represented transcriptome [24, 25]. The degree of a node, that is, the number of nodes connected to it, represents the number of genes co-expressed with the corresponding gene. Therefore, genes represented by nodes with high degrees are expected to be relevant in the transcriptome since their expression is coordinated with many others. The neighborhood of a node consists of genes co-expressed with the corresponding gene. This set of genes can be used as target genes candidates when the given gene is a transcription factor or potential regulator. The transcription factor binding sites that are responsible for the coordinated expression of genes can be identified by analyzing the significance of specific motifs in the promoters of co-expressed genes . Additionally, Gene Ontology (GO) term enrichment over gene neighbourhoods can be applied to determine the potential biological processes that are carried out by the orchestrated expression of any given genes. In most gene co-expression networks the probability that a node is connected with k other nodes, P (k), follows a negative exponential distribution, P (k) ~ k -λ . This is the defining property of scale-free networks . In scale-free networks most nodes are connected with few nodes, whereas there exists a small number of highly connected nodes called hubs that dominate the network dynamics . Genes that correspond to hub nodes play a key role in the correct functioning of biological processes and, therefore, their mutation can lead to severely affected phenotypes and even lethality . The clustering coefficient of a node meassures the tendency of nodes to group together around the given node, and when applied to gene co-expression networks, this concept indicates the tendency of genes to form highly co-expressed gene clusters. Scale-free networks with an average high clustering coefficient are called small world networks . In this class of networks the existence of a clustering structure around hub nodes produces short paths that connect any pair of nodes. It has been often observed that biological co-expression networks are scale-free and small world networks [20, 25].
In this study we have developed ChlamyNET, a gene co-expression network and an associated web-based software tool that integrates the massive amount of RNA-seq data available for the Chlamydomonas transcriptome, see Additional file 1: Table S1. We have used this tool to study the organization and regulation of the algal transcriptome. ChlamyNET aims at becoming an enabling technology for researchers in the Chlamydomonas transcriptome, and in a wider perspective of alga transcriptomics, being the first tool of this kind existing at this date. Researchers can explore the neighbourhood of their genes of interest in ChlamyNET looking for potential targets or regulators. Additionally, our web tool can be used to determine GO terms related to biological processes, functions and components that are significantly present in the annotation of the neighbouring genes. Finally, we have used an independent experimental data set to cross-validate the predictive power of ChlamyNET.
Results and discussion
Network construction and topology
The high resolution provided by RNA-seq data and the diverse physiological conditions and genotypes analyzed allowed us to capture the co-expression relationships between genes in the Chlamydomonas transcriptome. In order to reduce the noise in our analysis, we only considered genes that showed significant changes in at least one comparison between a condition and its corresponding control. Data processing and selection of differentially expressed genes were performed as described in the methods section. Out of the 16624 predicted genes in the Chlamydomonas genome 13699 were differentially expressed in at least one of the conditions analyzed in this study. This represents 82.40 % of the algal genome, which shows that the analyzed conditions and phenotypes are diverse enough to capture the behaviour of most of the Chlamydomonas transcriptome.
ChlamyNET consists of 9171 genes or nodes exhibiting an overall of 139019 co-expression relationships or edges. ChlamyNET is composed of a major connected component consisting of 8443 genes (92.1 % of the entire network) and a multitude of small components consisting of a few genes. This global connectivity property of ChlamyNET is similar to previously constructed and analyzed networks from other organisms such as Saccharomyces cerevisiae  and Arabidopsis thaliana .
The scale-free property of ChlamyNET was corroborated by computing its degree distribution and checking that it follows a negative exponential distribution. Specifically, linear regression over the logarithmic transform of the degree distribution was used (Fig. 1b). Another topological property that we analyzed in ChlamyNET was the clustering coefficient, a measurement of the density of edges or co-expression relationships around genes. The distribution of the clustering coefficient in ChlamyNET was computed (Fig. 1c). The average clustering coefficient of ChlamyNET is 0.66 which is significantlly high when compared to random scale-free networks, see the methods section. This shows that ChlamyNET constitutes a non-random scale-free network with a high clustering coefficient. This type of networks are called small-world networks since the minimal path length between genes is short when compared to random scale-free networks . These properties are common in gene co-expression networks [20, 32]. In the case of ChlamyNET the average minimal path length between genes or the network diameter is 7.5 (Fig. 1d). Therefore, on average any gene on ChlamyNET can be reached from another one through approximately seven gene co-expression relationships.
Biological Process GO terms significally enriched in the 1000 most authoritative hub genes in ChlamyNET
Number of neighbours
protein phosphorylation GO:0006468 (p-value 2.6 x 10-11)
Cre02.g108700 - Serine/Threonine Protein Kinase
g2226 - VH1-Interacting Kinase
Cre12.g537400 - ataurora
Cre17.g742400 - Protein tyrosine kinase
transmembrane transport GO:0055085 (p-value 3 x 10-7)
Cre09.g396000 - Nitrate Transporter
Cre10.g453400 - Mechanosensitive Channel of Small Conductance-like
Cre01.g012700 - Gated Outwardly
Rectifying K+ Channel
response to light stimulus GO:0009416 (p-value 2 x 10-5)
Cre03.g182700 - Bbox Protein Cre02.g118000 - Photolyase Cre12.g510200 - bZIP Protein g6302 - Constans-like Cre06.g295200 - Cryptochrome
carbohydrate metabolic process GO:0005975 (p-value 8 x 10-5)
Cre07.g336950 - Alpha-glucan phosphorylase Cre08.g362450 - Alpha Amylase g3160 - Isoamylase Cre04.g215150 - Soluble Starch Synthase
nitrogen compound metabolic process GO:0006807 (p-value 5.2 x 10-5)
Cre09.g410950 - Nitrate reductase Cre09.g410750 - Nitrite Reductase Cre03.g207250 - Glutamine synthetase
Network clustering analysis and functional annotation
Biological Process GO terms significally enriched in the clusters of the gene co-expression network ChlamyNET and the Metabolic and Signalling Pathways contained in each cluster
Cluster 2 (Brown) 535 genes Silhoutte 0.44
DNA replication (GO:0006260)
Cre01.g015250 - POLD1 Cre16.g651000 - RFA1
Pyrimidine deoxyribonucleotides de novo biosynthesis pathway Cre16.g667850 - DUT Cre17.g715900 - THY Cre03.g190800 - TMPK
Chromosome organization (GO:00051276)
Cre02.g086650 - SMC2 Cre12.g4 934 00 - SMC4
Regulation of Cell Cycle (GO:0010564)
Cre10.g466200 - CYCAB1 Cre03.g207900 - CYCA1
Cluster 9 (Blue) 1058 genes Silhouette 0.40
protein phosphorylation (GO:0006468)
Cre17.g742400 - PTK17 Cre12.g537400 - CrAUR3
Starch Biosynthetic Pathway Cre04.g215150 - SSS Sucrose Biosynthetic Pathway Cre06.g283400 - SPP Nitrogen Assimilation Pathway Cre09.g410750 - NII1
carbohydrate metabolic process (GO:0005975)
Cre08.g384750 - AMY Cre10.g444700 - SBE3
transmembrane transport (GO:0055085)
Cre09.g396000 - NRT2.3 Cre13.g564650 - MRS5
Cluster 1 (Orange) 824 genes Silhouette 0.38
vesicle-mediated transport (GO:0016192)
Cre17.g728150 - Yky6 Cre16.g676650 - AP1G1
TAG Biosynthetic Pathway Cre02.g106400 - PDAT Phospholipid Biosynthetic Pathway Cre01.g035500 - PI3K Coenzyme A Biosynthetic Pathway Cre01.g048050 - COAB
GTPase activity (GO:0043087)
Cre12.g532600 - CGL44 Cre07.g315350 - RABGAP
Cre09.g391500 - APG9
Cluster 3 (Red) 1723 genes Silhouette 0.28
protein phosphorylation (GO:0006468)
Cre02.g145500 - PTK24 Cre12.g498650 - ALK3
TAG Biosynthetic Pathway g9572 - DGAT1 Hydrogen production Cre09.g396600 - HYDA2 MAP kinase cascade Cre10.g461150 - CrMAPKKK
ribosome biogenesis (GO:0042254)
Cre12.g532550 - RPL13a Cre09.g400650 - RPS6
macromolecule biosynthesis (GO:0009059)
Cre03.g207250 - GLN4
Cluster 4 (Purple) 1174 genes Silhouette 0.26
Cre03.g199900 - EIF4E Cre02.g117900 - RH
tRNA Charging Pathway g2951 - TrpS Amino Acid Biosynthesis Cre03.g161400 - WSN2 Pentose Phosphate Non-oxydative Cre12.g511900 - RPE1 TAG Biosynthetic Pathway Cre03.g205050 - DGAT2
RNA processing (GO:0006396)
Cre16.g653050 - SpoU Cre10.g421600 - ThrRS g4 679 - RNase P
lipid metabolism (GO:0006629)
Cre09.g397250 - FAD5 Cre06.g295250 - PAP
Cluster 7 (Green) 909 genes Silhouette 0.25
protein complex assembly (GO:0006461)
g9912 - CSN5 Cre16.g663500 - CrRPN10
Aerobic Respiration Pathway Cre15.g638500 - CYC1 COP9 Signalling g11578 - CSN6
response to misfolded protein (GO:0051788)
Cre06.g280850 - PSMB4 Cre12.g501200 - SKP1
Cluster 6 (Yellow) 1351 genes Silhouette 0.24
chromatin organization (GO:0006325)
g11636 - HDA Cre13.g590750 - HTB37
Chromatin Remodelling Cre13.g591200 - HTB38 Cre13.g562400 - ABI3
posttranscriptional regulation (GO:0010608)
g7250 - DCL
Cluster 5 (Dark Green) 567 genes Silhouette 0.21
response to heat (GO:0009408)
Cre14.g617400 - HSP22F Cre08.g372100 - HSP70A
Stress Response Cre02.g098800 - ERP29 g9861 - TOR
protein folding (GO:0006457)
g9881 - FKBP Cre01.g047700 - CYP40
Cluster 8 (Turquoise) 1030 genes Silhouette 0.10
Cre09.g412100 - PSAF Cre10.g44 04 50 - PSB28
Calvin Cycle Cre12.g554800 - PRK1 TCA Cycle Cre02.g143250 - IDH2
hexose metabolic process (GO:0019318)
Cre17.g725550 - GLD1 Cre02.g093450 - FBA2
Cluster 2, brown - DNA replication, chromosome organization and regulation of cell cycle
The most cohesive gene cluster is also the smallest one. The brown cluster is located in the periphery of ChlamyNET. It presents the highest silhouette value (0.44) in the network and contains 535 genes (Fig. 3b). Our GO term enrichment analysis reveals that this cluster is involved in cell cycle processes. Specifically, it is enriched in genes required for DNA replication (GO:0006260) such as DNA polymerase POLD1 (Cre01.g015250), replication factor RFA1 (Cre16.g651000) and origin recognition complex ORC2 (Cre03.g199400); genes associated with chromosome organization (GO:0051276) like structural maintenance of chromosomes SMC4 (Cre12.g493400) and SMC2 (Cre02.g086650) and genes involved in the regulation of cell cycle process (GO:0010564) such as the cyclin A/B CYCAB1 (Cre10.g466200) and the A-type cyclin CYCA1 (Cre03.g207900).
The metabolic pathways located in this cluster produce DNA and RNA precursors such as the pyrimidine deoxyribonucleotides de novo biosynthesis pathway. For example, the formation of the DNA-specific end product dTTP starts with the hydrolyzation of dUTP to produce dUMP by the dUTP pyrophosphatase DUT (Cre16.g667850), followed by the reductive methylation of dUMP catalyzed by thymidylate synthase THY (Cre17.g715900) which yields dTMP. Finally, the thymidylate kinase TMPK (Cre03.g190800) catalyzes the first phosphorylation of dTMP leading to dTTP. These three enzymes are members of this cluster (Additional file 3: Figure S2 and Table 2).
Cluster 9, blue - protein phosphorylation, carbohydrate metabolic process and transmembrane transport
The blue cluster located in the center of ChlamyNET is enriched with hub genes according to a p-value < 2.2°10−16 obtained using Fisher's exact test. It is the second most cohesive cluster with a silhouette value of 0.40 and 1058 genes (Fig. 3b). The most significantly over-represented GO terms in this cluster are protein phosphorylation (GO:0006468) with genes such as the protein tyrosine kinases PTK17 (Cre17.g742400) and ataurora CrAUR3 (Cre12.g537400), carbohydrate metabolic process (GO:0005975) including genes like the alpha amylase AMY (Cre08.g384750), and transmembrane transport (GO:0055085) including genes coding for magnesium and cobalt transport protein MRS5 (Cre13.g564650). An analysis of the metabolic context of this cluster reveals that core pathways in carbon and nitrogen metabolism are contained in it. Starch is the major reservoir of energy and carbon in photosynthetic organisms. The starch biosynthetic pathway constituted by the enzymes glucose-6-phosphate isomerase PGI (Cre03.g175400), phosphoglucomutase PGM (g2899), ADP glucose pyrophosphorylase APL (Cre16.g683450), starch synthase SSS (Cre04.g215150) and 1,4- -starch branching enzyme SBE3 (Cre10.g444700) is entirely contained in this cluster. In Chlamydomonas, starch is degraded to hexoses during the dark period. The derived hexoses are then used in the sucrose synthesis pathway. Key enzymes in this pathway such as glyceraldehyde 3-phosphate dehydrogenase GAP1 (Cre12.g485150) and sucrose phosphate phosphatase SPP (Cre06.g283400) are members of this cluster. The oxidative branch of the pentose phosphate pathway produces NADPH in the reactions catalized by glucose-6-phosphate dehydrogenase GLD2 (Cre08.g378150) and 6-phosphogluconate dehydrogenase GND1 (Cre12.g526800), enzymes coded by genes included in this cluster. NADPH is an important source of the reducing power required by many enzymes in central metabolic pathways. The anapleurotic pathway that fixes CO2 into oxaloacetate through the enzymes carbonic anhydrase CAH8 (Cre09.g405750) and phosphoenolpyruvate carboxylases PPC (g16646 and g11831) is also part of this cluster (Additional file 4: Figure S3). This pathway replenishes depleted Tricaboxylic Acid (TCA) cycle compounds that have been used for nitrogen assimilation or other tasks . Inorganic and organic nitrogen assimilation pathways are included in this cluster (Additional file 4: Figure S3), including the nitrate transporter NRT2.3 (Cre09.g396000), nitrite transporter NAR1.4 (Cre07.g335600), nitrate reductase NIT1 (Cre09.g410950) and nitrite reductase NII1 (Cre09.g410750) yielding ammonia as a final product. In fact, these reductases need a molybdenum cofactor and the biosynthetic pathway for molybdenum cofactor constituted by the enzymes molybdopterin synthase adenylyltransferase CNX (g10007), cyclic pyranopterin monophosphate synthase CNX2 (Cre13.g602900), molybdopterin synthase sulfurylase MoaE (Cre07.g322250) and molybdopterin molybdotransferase MoeA (Cre10.g451400) is entirely included in this cluster (Additional file 4: Figure S3). Therefore, not only the enzymes, but also the pathways leading to the synthesis of the cofactors needed for nitrate assimilation are tightly co-expressed in ChlamyNET.
Cluster 1, orange - intracellular transport, regulation of GTPase activity, autophagy and proteolysis
The orange cluster consists of 824 genes and is located in the periphery of ChlamyNET (Fig. 4). This cluster presents a high silhouette value of 0.38 (Fig. 3b). The GO term enrichment analysis indicates that genes within this cluster are significantly involved in processes related to intracellular transport to the endoplasmic reticulum and Golgi apparatus such as vesicle-mediated transport (GO:0016192). For instance, we can find genes coding for the endosomal R-SNARE protein Yky6 (Cre17.g728150) and gamma1-Adaptin AP1G1 (Cre16.g676650). Genes in this cluster are also significantly related to the regulation of GTPase activity (GO:0043087) such as those coding for the rab GTPase activator protein CGL44 (Cre12.g532600) and Rab/TBC domain protein (Cre07.g315350). Autophagy (GO:0006914) and proteolyis (GO:0006508) are significant GO terms in this cluster with genes coding for the Autophagy related gene 9 ATG9 (Cre09.g391500) and ubiquitin-conjugating enzyme E2 UBC9 (Cre16.g693700). Therefore, the formation of this gene cluster suggests a connection between Rab GTPase activity and autophagy. Moreover, the positive regulation of Rab GTPase activity over autophagy has been shown in Arabidopis .
The metabolic analysis of this cluster suggests that it is involved in triacylglycerol (TAG) biosynthesis, the major lipid reserve in plants. Many unicellular microalgae accumulate large amounts of TAG under unfavorable conditions, such as the ones leading to autophagy . TAG is produced from diacylglycerol (DAG) and different acyl donors. On the one hand, DAG can be synthesized from a 1,2-diacyl-sn-glycerol 3-phosphate and the enzyme phosphatidate phosphatase PAH (Cre12.g506600), a member of this cluster. On the other hand, phospholipids (major constituents of cellular membranes) are one of the possible donors for DAG to produce TAG. In this case, the enzyme phospholipid:DAG acyltransferase PDAT (Cre02.g106400) present in this cluster catalizes this reaction (Additional file 5: Figure S4). The 3-phosphoinositide biosynthesis pathway is also included in this cluster. Phosphoinositides are involved in phospholipid biosyntehsis as well as membrane trafficking, biological processes over-represented in this cluster. The key enzymes in this pathway are phosphatidylinositol-3-kinase PI3K (Cre01.g035500), phosphatidylinositol 4-kinase PIK1 (Cre05.g245550), phosphatidylinositol-4-phosphate 5-kinase PIP5K3 (g9964) and inositol 5-phosphatase SAC1 (Cre12.g537500) which are also located in this cluster (Additional file 5: Figure S4). Other important lipid metabolic reactions are the activation and deactivation of lipids achieved by the ligation or removal of acyl-CoA. These reactions are catalized by the enzymes long-chain-fatty-acid-CoA ligase LACS (Cre03.g182050) and acyl-CoA thioesterase ACOT (Cre01.g037350) respectively, both members of this cluster. In these reactions the common acyl carrier Coenzyme A is required, and so, key enzymes in its biosynthesis such as ketopantoate hydroxymethyltransferase PAN2 (Cre12.g508550), phosphopantothenate-cysteine ligase COAB (Cre01.g048050) and phosphopantothenoylcysteine decarboxylase COAC (Cre10.g423450) are also co-expressed in this cluster (Additional file 5: Figure S4).
Cluster 3, red - protein phosphorylation, translation, ribosome biogenesis and macromolecule biosynthetic process
The red cluster expands from the periphery of ChlamyNET to its core (Fig. 4). Somehow this cluster serves as an interface between the blue cluster (hub genes involved in protein phosphorylation, carbohydrate metabolic process and transmembrane transport) and the brown cluster (cell cycle processes). This cluster is the largest one including 1723 genes and presenting a moderate silhouette value of 0.28 (Fig. 3b). According to the GO term enrichment analysis, genes in this cluster are significantly involved in diverse biological processes. The three most significant processes are protein phosphorylation (GO:0006468) including genes such as the mitogen activated protein kinase PTK24 (Cre02.g145500) and aurora-like kinase ALK3 (Cre12.g498650); translation (GO:0006412) and ribosome biogenesis (GO:0042254) with genes coding for ribosomal proteins L13 RPL13a (Cre12.g532550) and S6e RPS6 (Cre09.g400650). The next significant biological process is macromolecule biosynthetic process (GO:0009059) with genes such as the glutamine synthetase GLN4 (Cre03.g207250).
The analysis of the metabolic pathways included in this cluster identified the synthesis of triacylglycerol using exclusively as acyl donors galactolipids produced by glycolipid desaturation. The diacylglycerol O-acyltransferase DGAT1 (g9572) and monogalactosyldiacylglycerol synthase FAD6 (Cre13.g590500) are thus included in this cluster (Additional file 3: Figure S2). Although no other metabolic pathway is fully represented in cluster 3, isolated key enzymes for carbon xation, hydrogen production and oxidation such as rubisco RBCS2 (Cre02.g120150) and iron hydrogenase HYDA2 (Cre09.g396600) are co-expressed within this cluster. In fact, our study suggests that this cluster is involved in signalling and transcription control rather than in metabolism. Several serine/threonine protein kinases are included in this cluster. The genes CrAUR1 (Cre16.g669800) and ALK3 (Cre12.g498650) exhibit a high sequence similarity with the and Aurora kinases in Arabidopsis AUR1 (At4g32830) and AUR3 (At2G45490) respectively. It has been described that the diversification of plants and aurora kinases predates the origin of land plants . Here we show that this diversification may be already present in Chlamydomonas. These kinases have been shown to play a key role in cell cycle related signal transduction pathways in Arabidopsis. Several other genes similar to cyclin-dependent protein kinases are located in this cluster such as CDKI1 (Cre12.g494500) and CrMAPKKK (Cre10.g461150). Cyclin-dependent protein kinases play crucial roles in the progression of the cell cycle in eukaryotes. CDKI1 (Cre12.g494500) exhibits a high sequence similarity with the Arabidopsis gene CAK4 (At1g66750), which is known to be involved in the activation of cell proliferation . While CrMAPKKK (Cre10.g461150) is highly similar to the Arabidopis gene MEKK1 (At4g08500). Additionally, other genes in this cluster such as g16721, present a high similarity with the Arabidopsis Mitogen Activated Protein (MAP) kinase MAPKKK6 (At3g07980). The co-expression of these genes suggests that MAP kinase cascades are regulated not only at the posttranslational level but also at the transcriptional level in Chlamydomonas.
As it will be described in detail in the next section, this cluster is also significantly enriched in transcription factors. Several GATA transcription factors such as g7394, Cre05.g242600 and Cre08.g378800; bZIP transcriptions factors like Cre10.g438850 and Cre12.g489000 and the single DOF and CO-like transcription factors in Chlamydomonas CrDOF (Cre12.g521150)  and CrCO (g6302)  are members of this cluster.
A detailed description of the rest of clusters and their functional annotation is available for further exploration at the web page http://viridiplantae.ibvf.csic.es/ChlamyNet/. These results aim at providing researchers in the functional annotation of the Chlamydomonas transcriptome with a solid ground to design specific and targeted experimental studies to validate or refute the predictions produced in this clustering analysis.
Transcription factors and transcriptional regulators in ChlamyNET
In the previous section we performed a functional annotation of the different gene clusters identified according to GO term enrichment and metabolic pathways analysis. In this section, we further investigate the regulatory aspects of the Chlamydomonas transcriptome using ChlamyNET.
Herein we present in detail the results of our analysis over three groups of TFs and TRs of special interest. We discuss the conservation of their function and binding sites when compared to their putative orthologs in higher plants. The results for the remaining groups of TFs and TRs identified in our analysis are available at the web page http://viridiplantae.ibvf.csic.es/ChlamyNet/.
Core metabolic regulation, group a
This constitutes a numerous group of TFs and TRs including 38 members. They are identified in Fig. 6 using green triangles. The TFs and TRs in this group are included in the cluster 9 (blue) at the center of the network where most authoritative hub genes and carbon/nitrogen core metabolic pathways are located. These TFs and TRs seem to be of key importance in the regulation of the Chlamydomonas transcriptome under the conditions of our study since they are co-expressed on average with 87.97 other genes. Some highly authoritative hub genes in ChlamyNET are members of this group such as the B-box TF CrBbox1 (Cre03.g182700), the bHLH TFs g4643 and Cre01.g011150, the SBP TF Cre16.g673250, the RWP-RK TF NIT2 (Cre03.g177700) and the MYB TF Cre03.g197100. These TFs present a normalized authoratitave hub score higher than 0.8. GO term enrichment analysis over the genes directly linked to the TFs and TRs in this group suggests that they are mainly involved in core metabolism regulation and light response. Several GO terms related with metabolic processes are significantly enriched such as carbohydrate metabolic process, fatty acid biosynthetic process and nitrogen compound metabolic process. Representative genes in this group are the alpha-amylase AMA2 (Cre08.g362450), the long-chain acyl-CoA synthetase LACS2 (Cre13.g566650) and the nitrite reductase NII1 (Cre09.g410750), respectively.
Four bHLH transcription factors, Cre01.g011150, Cre14.g620850, g4643 and g4645, out of the 12 recognized members of this family in Chlamydomonas, are members of this group. Only bHLH Cre14.g620850 has similarity with genes present in higher plants. Specifically, its putative Arabidopsis ortholog is PAR1 (At1g69010) that has been shown to be involved in light response . The rest show similarity with other bHLH genes present only in chlorophyceae. A bHLH binding site was found to be significantly present over the genes co-expressed with the TFs and TRs of this group (Table 3). This suggests that the binding site of bHLH TFs is conserved accross the green lineage. Several genes involved in carbohydrate and nitrogen metabolism contain this binding site in their promoters, for instance the glucose-6-phosphate dehydrogenase GLD2 (Cre08.g378150) and the ammonium transporter AMT4 (Cre13.g569850).
Three bZIP TFs out of the 19 identified in the Chlamydomonas genome, Cre10.g454850, Cre12.g510200 and Cre06.g310500, are members of this group. Genes CrHY5 (Cre12.g510200) and CrHYH (Cre06.g310500) present a high similarity with the Arabidopsis genes HY5 (At5g11260) and HYH (At3g17609) respectively. These TFs are known to bind to G-box sequences to regulate light response and metabolism in Arabidopsis [52, 53]. GO term and TFBS enrichment analysis suggest that this mechanism is already present in Chlamydomonas, since a sequence highly similar to the G-box has been found to be significantly present in the genes co-expressed with these two Chlamydomonas genes (Table 3).
Cre13.g572450 and g16739 that code for two ARR-B TFs and CrBbox1 (Cre03.g182700), that codes for a B-box TF, are present in this group. These genes exhibit high similarities with the Arabidopsis genes RR14 (At2g01760), TOC1 (At5g61380) and COL1 (At5g15850), respectively. They have in common a CCT domain at the carboxyl end that directly binds to DNA  that was found to be present in the CrCO (g6302) gene . These genes are known to be involved in light response and circadian rythms in Arabidopsis [55, 56]. These functions seem to have been established already in Chlamydomonas constituting a link between circadian rythms and metabolism.
Five MYB TFs are present in this group. Some of them such as Cre14.g633789 and Cre03.g198800 are putative orthologs of the Arabidopsis genes At3g27785 and At5g61620 that have been associated with metabolic regulation . MYB TF factor binding sites have been found significantly enriched in the promoters of genes co-expressed with this group of TFs and TRs. Such as, the triacylglycerol lipase CrTLL1 (Cre03.g193500) and starch phosphorylase CrPHS1 (Cre07.g336950) that present sequences highly similar to MYB binding sites in their promoters (Table 3).
Finally, several genes coding for TFs from the RWP-RK family are members of this group. One of these TFs, NIT2 (Cre03.g177700), has already been shown to be involved in nitrogen and carbohydrate metabolism regulation [58, 59], whereas the other remain to be studied. Promisingly, the RWP-RK TFs RWP14 (Cre01.g000050), RWP11 (Cre03.g149400) and RWP3 (Cre14.g612100) located in this group are putative orthologs of the Arabdidopsis genes RKD5 (At4g35590) and RKD3 (At5g66990) that have been shown to be involved in nitrogen and light response [60, 61].
Not surprisingly, TFs in this group seem to constitute an intrincate gene regulatory system with mutual regulations among them. For example, bHLH binding sites can be identified in the promoters of the B-box TF CrBbox1 (Cre03.g182700), the bZIP TF CrHY5 (Cre12.g510200), the bHLH TF Cre01.g011150 and the MYB TFs Cre03.g198800 and Cre14.g621050. In turn, G-boxes have been found in the promoters of the bHLH genes Cre01.g011150 and Cre14.g620850 and the bZIP gene Cre10.g454850. Additionally, these TFs seem to exert their regulation in a coordinated manner over the same set of genes since both bHLH and MYB binding sites have been identified in the promoters of genes such as the nitrate transporter NRT2.3 (Cre09.g396000) and the nitrate reductase NIT1 (Cre09.g410950). Such complex interactions are also common in Arabidopsis.
Autophagy regulation, group b
The TFs and TRs in this group are located in the cluster 1 (orange) identified with green squares in Fig. 6. A GO term analysis of the genes directly linked to them reveals a potential regulation over processes involved in vesicle mediated transport, catabolic process, proteolysis and autophagy. In this group we can find the C3H zinc finger TF g8693 presenting a high sequence similarity with the INOSITOL-REQUIRING ENZYME-1b gene (At5g24360) from Arabidopsis. This gene is involved in the regulation of the degradation of the endoplasmic reticulum by autophagy . Directly linked to this gene we can find genes involved in autophagy such as autophagy 9 ATG9 (Cre09.g391500) and proteolysis such as signal peptide peptidase-like 2 (g18126). The GATA transcription factor Cre10.g435450 is also a member of this group and its putative ortholog in Arabidopsis, BME3 (At3g54810), has been shown to be involved in response to salt stress . The MYB transcription factor Cre16.g695600 is also a member of this group whose Arabidopsis putative ortholog At5g06110, is a heat shock protein involved in stress response . Two genes from the chromatin remodeling family SNF2, Cre06.g287950 and Cre06.g270850, are putative orthologs of ATRX (At1g08600) and CHR8 (At2g18760), involved in DNA damage response and recombination . The induction of autophagy as a response to diverse stresses has been shown in Chlamydomonas . Cre03.g173165, Cre03.g174150 and g5052 are transcriptional regulators from the BTB/POZ family that are putative orthologs of ARIA (At5g19330) involved in cellular macromolecule catabolic process .
In fact, the OBP binding site was found to be significantly present in the promoters of the genes directly linked to the TFs and TRs in this group (Table 3). This binding site has been shown to be present in promoters of genes induced by oxidative stress in Arabidopsis . This is in agreement with the reported autophagy induction by oxidative stress in Chlamydomonas . Genes related to autophagy such as ATG8 (Cre16.g689650) and ATG9 (Cre09.g391500) present the OBP binding site in their promoters. Genes involved in vesicle traficking such as Component of the Exocyst Complex SEC8 (Cre01.g003050) and Subunit f the ESCRT-I complex VPS28 (Cre16.g678100) also present the OBP binding site in their promoters (Table 3).
Cell cycle regulation, group c
The TFs and TRs of this group are included in the cluster 2 (brown) identified in the previous section as involved in DNA replication, chromosome organization and regulation of cell cycle. These TFs and TRs are highlighted using yellow triangles in Fig. 6. A GO term enrichment analysis over the genes directly linked to these TFs and TRs confirmed their potential regulation over these processes. In this group we can find a MYB3R TF Cre12.g522400 whose putative orthologs, based on their sequence similarity, are At5g11510 and At4g32730 in Arabidopsis and NtmybA1 and NtmybA2 in Nicotiana tabacum. These genes are involved in the G2/M transition during the cell cycle [69–71]. The single member of the YABBY family in Chlamydomonas that presents two high mobility group boxes, Cre16.g672300, belongs to this group. Its putative ortholog gene At4g11080 in Arabidopsis interacts with mitotic and meiotic chromosomes . Another gene, ORC1 (g11180) belonging to the PHD TF family is also a member of this group. Its putative Arabidopsis ortholog At4g14700 (Origin recognition complex) has been shown to be in the core cell cycle machinery involved in the G1/S transition [73, 74]. Several TRs potentially involved in chromatin remodeling are present in this group such as Cre03.g197700 that code for SET domain containing protein that exhibits a high sequence similarity with At1g05830 a trithorax protein in Arabidopsis . The rest of TFs in this group, Cre09.g402350 and Cre12.g516050, are putative orthologs of Arabidopsis genes that have been shown to be co-transcribed with other core cell cycle regulators and TFs in Arabidopsis .
The E2F motif  was found to be the only known motif significantly enriched in the promoters of the genes directly linked to the TFs and TRs in this group (Table 3). The potential orthologs of the genes that contain in their promoters the E2F motif sequence are involved in the G1/S transition such as subunits of the origin of replication complex ORC1 (g11180) and ORC4 (Cre17.g726500), pre-initiation complex subunit CDC6 (Cre06.g292850), DNA replication initiation factor CDT1 (Cre03.g163300), minichromosome maintenance protein MCM2 (Cre07.g338000) and DNA polymerase alpha POLA1 (Cre04.g214350) (Table 3). The presence of the E2F motif in genes regulating the S phase has been shown previously in Arabidopsis  and Nicotiana . The gene Cre07.g323000, putative ortholog of the Arabidopsis E2F transcription factor, is not included in this group of TFs and TRs. Nevertheless, it is located in its vecinity, suggesting that it may function as an interface between regulation of cell cycle and other processes as it is the case for its Arabidopsis ortholog . The two most significant de-novo motifs found in our study presents a high similarity with the octamer and hexamer motifs. The combination of these two motifs has been shown to confer S phase-specific transcriptional activation in plants . Genes containing these motifs include B-type cyclin CYCB1 (Cre08.g370400) and cell division cycle protein CDC45 (Cre06.g270250). This suggests a remarkable conservation of cell cycle regulation in the plant kingdom not only limited to the TFs, TRs and their targets involved in this process but also in the cis-regulatory elements, TFBS, present in their promoters.
Light-regulated transcription factors and transcriptional regulators in ChlamyNET, a tutorial for ChlamyNET usage
In order to ensure the reproducibility of the results presented in this work and to facilitate further and independent studies over the Chlamydomonas transcriptome we have developed a web-based software tool also called ChlamyNET. This tool is based on WiGis, a platform for the visualization of large-scale, highly interactive graphs in a user's web browser . The software tool ChlamyNET is available from the web page http://viridiplantae.ibvf.csic.es/ChlamyNet/. In this section we discuss a case study concerning the Chlamydomonas potentially light-regulated TFs and TRs that can be used as a tutorial for the use of ChlamyNET.
Potentially Light Regulated TFs and TRs in ChlamyNET. Their potential Arabidopsis ortholog and topological indexes are indicated as well
Putative Arabidopsis Ortholog
Number of neighbours
Normalized hub score
Cre06.g295200 CPH1 / CrCRYl
At4g08920 CRYPTOCHROME 1
8.12 x 10-5
At4g36730 G-BOX BINDING FACTOR 1
7.36 x 10-7
At5g11260 ELONGATED HYPOCOTYL 5
At5g39660 CYCLING DOF FACTOR 2
3.79 x 10-8
2.29 x 10-4
At2g46790 PSEUDO-RESPONSE REGULATOR 9
7.72 x 10-7
At1g01060 LATE ELONGATED HYPOCOTYL
5.86 x 10-7
2.23 x 10-7
At4g38130 HISTONE DEACETYLASE 1
5.12 x 10-18
At5g61380 TIMING OF CAB EXPRESSION 1
1.49 x 10-5
At5g63860 UVB-RESISTANCE 8
4.09 x 10-14
At3g61140 CONSTITUTIVE PHOTOMORPHOGENIC 11
7.81 x 10-14
At4g14110 CONSTITUTIVE PHOTOMORPHOGENIC 9
1.26 x 10-12
At1g64520 REGULATORY PARTICLE NON-ATPASE 12A
7.44 x 10-14
At3g05530 REGULATORY PARTICLE TRIPLE-A ATPASE 5A
7.49 x 10-14
At4g24820 REGULATORY PARTICLE NON-ATPASE 7
1.52 x 10-13
At1g20200 REGULATORY PARTICLE NON-ATPASE 3A
2.80 x 10-15
At4g38630 REGULATORY PARTICLE NON-ATPASE 10
3.48 x 10-14
At5g19990 REGULATORY PARTICLE TRIPLE-A ATPASE 6A
1.74 x 10-12
At4g29040 REGULATORY PARTICLE TRIPLE-A ATPASE 2A
3.44 x 10-16
According to this information several light-regulated TFs are highly authoritative hub genes in ChlamyNET such as CrGBF1 (Cre01.043150) and CrHY5 (Cre12.510200) that are co-expressed with more than 130 genes. These genes are involved in photomorphogenesis in Arabidopsis, yet their function in Chlamydomonas is unknown. Others light-regulated TFs and TRs that constitute hub genes that are co-expressed with more than 50 genes, are CrCRY1 (Cre06.g295200), CrCO (g6302), CrLHY (Cre06.g275350) and the different subunits of the 26S proteasome CrRPN12A (Cre17.g708300), CrRPT5A (Cre10.g439150) and CrRPN7 (Cre13.g581450). CrCRY1, also known as CPH1, codes for a putative ortholog of CRY1 in Arabidopsis and it is a well known photoreceptor that responds to light stimulus . On the other hand, CrCO expression is affected by photoperiod and regulates carbon metabolism and cell cycle progression . Silencing and over-expression of these genes have been shown to massively disrupt Chlamydomonas cell growth and proliferation supporting their function as hubs in the network . The potential role of CrLHY in circadian rythms and the proteolytic function of CrRPN12A, CrRPT5A and CrRPN7 are yet to be tested experimentally. The potentially light-regulated genes CrHYH (Cre06.g310500), CrDOF (Cre12.g521150), CrLUX (g1542) and CrCOP11 (Cre05.g234300) whose putative Arabidopsis orthologs are involved in photomorphogenesis, photoperiod response, circardian rythms and protein degradation respectively are co-expressed with around 30 other genes. Recently, CrDOF expression has been shown to be influenced by circadian rythms and the photoperiod whereas it directly regulates the expression of CrCO . The rest of potentially light-regulated TFs and TRs identified in ChlamyNET are co-expressed with fewer than 20 other genes and are not considered hubs in the network. Most of these genes exhibit a high clustering coeficient in ChlamyNET suggesting a high level of coordination among their co-expressed genes.
The biological processes potentially controlled by the light-regulated TFs and TRs in ChlamyNET can be deduced by applying GO terms enrichment over their co-expressed genes. This can be performed by using the Analysis section located in the Search panel once the neighbouring genes have been selected. As described in the Methods section we can combine the significative GO term identified, based on orthology, with those determined based on conserved protein domains (Additional file 6: Table S2). According to this methodology, the potentially light-regulated TFs and TRs are co-expressed with genes involved in ion transport, for example the nitrate transporter NRT2.2 (Cre09.g410800) and Mo-molybdopterin cofactor biosynthesis such as MoeA (Cre10.g451400), which produces essential cofactors for the nitrate reductase, a key enzyme in the nitrate metabolism. Additionally, carbohydrate metabolism appears as a significative GO term, including genes involved in starch and glucose degradation such as the starch phosphorylase CrPHS2 (Cre12.g552200), the alpha-amylase AMY (Cre08.g384750) and the glucose-6-phosphate dehydrogenase GLD2 (Cre08.g378150). Finally, protein phosphorylation is another relevant significative GO term with genes potentially involved in cell cycle control such as the mitogen-activated protein kinase kinase kinases CrMKKK1 (Cre07.g347000) and CrMKKK2 (Cre02.g108650). Therefore, this analysis suggests a potential regulation of carbon/nitrogen metabolism and cell cycle through protein phosphorylation by these potentially light-regulated TFs and TRs that needs to be experimentally validated.
Finally, selecting the switch Promoter Sequence Enrichment in the Search panel, a transcriptional factor binding site (TFBS) enrichment analysis over the promoters of selected genes can be performed . In this case, several significative light-regulated TFBS in Arabidopsis such as SORLIP2, SORLIP3 and SORLREP5  were identified. For example, the CrGBF1, CrHYH and CrLUX genes present the sequence SORLIP3 in their promoters. This is in agreement with their high co-expression values (Fig. 8). The presence of these TFBS in the promoters of the genes studied here suggests a high conservation of light regulated TFBS across the green lineage.
A more detailed presentation of this case study is available from the web page of ChlamyNET, http://viridiplantae.ibvf.csic.es/ChlamyNet/.
ChlamyNET aims at becoming an enabling technology for researchers on the Chlamydomonas transcriptome. For example, ChlamyNET can be used to predict changes in gene expression. When a specific gene is mutated or overexpressed, ChlamyNET predicts that the expression of genes located in the neighbourhood of the mutated or overexpressed gene will be affected, whereas genes in distant regions will not substantially change their expression profile. Specifically, another application of ChlamyNET is predicting targets of specific TFs. TFs and their targets tend to be strongly co-expressed since TFs directly regulate the expression of their target genes. Therefore, the targets of a TF should be contained in its neighbourhood, possibly directly linked to it. Additionally, when the sequences recognised by a TF are well characterized, the identification of these sequences in the promoters of genes co-expressed with it provides more convincing evidence for these genes being direct targets of the corresponding TF. In this way, the analysis of gene neighbourhoods and the significance of TFBS in gene promoters can be studied using ChlamyNET constituting a powerful tool for gene expression analysis.
We also identified in ChlamyNET those genes that showed a 4-fold decrease in expression level in CrDOFin compared to the wild type CW15 in LD conditions (Fig. 9d). These genes were not located in the neighbourhood of CrDOF, suggesting that it acts as a direct activator and, possibly, as an indirect repressor in LD conditions. Instead, cluster 2 (brown) was significantly enriched with these highly inhibited genes with a p-value of 2.2°10−16. This provides evidence about CrDOF being involved in cell-cycle regulation, which indeed was experimentally validated .
This gene co-expression network and its associated web-based tool ChlamyNET constitute one of the first integrative approaches to the study of the Chlamydomonas transcriptome. They aim at providing researchers with an enabling technology that will allow them to study gene co-expression patterns, determine significant biological processes, molecular functions and cellular components for a set of genes of interest as well as to identify significant TFBS in the promoters of a given set of genes.
In this work, we have shown that ChlamyNET exhibits non-random topological properties, namely scale-free and small-world properties. This suggests that the Chlamydomonas transcriptome posseses relevant characteristics related to error tolerance, vulnerability and information propagation . On theone hand, the scale-free property implies robustness against random gene mutations or error tolerance, which means that since most genes are only co-expressed with a few other genes, a random mutation is likely to affect a non important gene altering the expression of a reduced number of other genes.Nevertheless, the existence of key authoritative hub genes produces fragility or vulnerability to targeted attacks against this type of genes. The removal or mutation of an authoritative hub gene would affect a large number of other genes co-expressed with it, massively disrupting the functioning of the Chlamydomonas transcriptome. This can lead to lethality or defective growth. For example, this has been shown for the authoritative hub gene CrCO (g6302) whose over-expression and silencing are detrimental for cellular growth . Additionally, the small-world property facilitates a quick spreading of information throughout ChlamyNET.
The analysis of the location of hub genes and genes with high clustering coefficient shows that both of them group together in specific regions of ChlamyNET. This indicates the existence of gene clusters whose expressions are highly coordinated, possibly to perform related biological processes. Indeed, we identified nine gene clusters that present a high intra-cluster and a low inter-cluster co-expression. Among these clusters we highlight the results obtained in two of them, clusters 1 and 2.
The most central cluster (blue cluster), where most authoritative hub genes are located, is significantly enriched in genes involved in carbon/nitrogen metabolism, signalling through protein phosphorylation and light response. This cluster is also significantly enriched in TFs, revealing a high transcriptional control over carbon/nitrogen metabolism induced by light. Several bHLH TFs are contained in this cluster and only one of them, CrbHLH1 (Cre14.g620850), presents a potential Arabidopsis ortholog, PAR1 (At1g69010), which has been shown to be involved in light response, and indeed the E-box sequence, a bHLH binding site, was found to be significantly present in the promoters of genes in this cluster. Two TFs from the bZIP family are also located in this cluster, CrHY5 (Cre12.g510200) and CrHYH (Cre06.g310500), potential orthologs of the Arabidopsis HY5 (At5g11260) and HYH (At3g17609) genes, respectively. Again, the G-box sequence, a TFBS recognized by HY5 and HYH in Arabidopsis, was found to be significantly present in the promoters of their co-expressed genes. Additionally, this cluster contains two ARR-B TFs (Cre13.g572450 and g16739) and the B-box TF CrBbox1 (Cre03.g182700) whose potential Arabidopsis orthologs RR14 (At2g01760), TOC1 (At5g61380) and COL1 (At5g15850) are involved in circadian rythms. This suggests a key role played by light in the regulation of central metabolism inChlamydomonas mediated by TFs from the bHLH and bZIP families. It also provides evidence for an input from circadian rythms exerted by genes from the ARR-B and B-box families.
One of the most peripheral clusters in ChlamyNET, cluster 2 (brown), was shown to be involved in DNA replication and transitions between the cell cycle phases. This cluster contains potential orthologs of Arabidopsis genes involved in the G1/S transition such as origin of replication complex ORC1 (g11180) and ORC4 (Cre17.g726500), pre-initiation complex subunit CDC6 (Cre06.g292850), DNA replication initiation factor CDT1 (Cre03.g163300), minichromosome maintenance protein MCM2 (Cre07.g338000) and DNA polymerase alpha POLA1 (Cre04.g214350). The E2F sequence was found to be significantly present in the promoters of these genes. Additionally, the presence of a combination of the octamer and hexamer motifs was significantly enriched in the promoters of the genes from this cluster such as the B-type cyclin CYCB1 (Cre08.g370400). The E2F sequence and the combination of octamer and hexamer motifs have been shown to confer S phase-specific transcriptional activation in higher plants.
These results suggest that key elements in the regulation of cell cycle, light response and carbon/nitrogen metabolism are already established in Chlamydomonas and conserved in higher plants such as Arabidospsis. The conserved elements are not only limited to TFs, TRs and their targets but also include the cis-regulatory elements, TFBS, present in their promoters.
The web-based software tool ChlamyNET (http://viridiplantae.ibvf.csic.es/ChlamyNet/) was developed to ensure the reproducibility of the results presented in this work and to facilitate further and independent studies over the Chlamydomonas transcriptome. We used potentially light-regulated TFs and TRs to illustrate its functionalities. Our case study suggests that these genes regulate carbon/nitrogen metabolism and cell cycle. Additionally, light regulated TFBS in Arabidopsis such as SORLIP2, SORLIP3 and SORLREP5 were identified in their gene promoters. Several other cases studies are available from the web page of ChlamyNET, http://viridiplantae.ibvf.csic.es/ChlamyNet/.
Finally, RNA-seq data from algae overexpressing the transcription factor CrDOF involved in photoperiod response were used as an independent data set to experimentally cross-validate the predictive power of ChlamyNET.
Data acquisition and processing
In this study we used RNA-seq data from the Chlamydomonas transcriptome publicly available at the Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra/) , a database resource at the National Center for Biotechnology Information (NCBI) that stores more than 500 TeraBases of next-generation sequencing data. We collected more than 287 GigaBytes of information produced by seven different studies [10–16] consisting of 50 samples representing eight different genotypes under diverse physiological conditions, see Additional file 1: Table S1. These data provide a general overview of the Chlamydomonas transcriptome in a physiologically relevant context. All these data were obtained using the same next-generation sequencing platform, Illumina Genome Analyzer , in order to facilitate the comparison between samples from different experiments.
The available Chlamydomonas sequenced genome (version 5.3)  was downloaded from Phytozome (http://www.phytozome.net/) , a web-based platform for green plant genomics, in order to be used as the reference genome in our study. Additionally, we also obtained from the same web resource the corresponding Augustus u11.6 gene annotation that was used as a reference transcriptome.
The processing of raw sequencing data when a reference genome is available can be divided into three different stages: (i) filtering out low quality reads and alignment of reads to the reference genome; (ii) assembly of transcripts; and (iii) estimation of gene expression . In our study, we followed the methodology described in  that makes use of the free software packages Tophat  and Cufflinks . First, we perfomed the preprocessing of the raw data consisting of the fastq files from each sample. The read sequences of low quality were filtered out according to their Phred quality scores [91, 92] and the remaining ones were aligned to the reference genome with the software package Tophat that in turn makes use of the fast and memory efficient short read aligner Bowtie . Most of the analyzed samples were of good quality and produced a high alignment rate greater that 80 %. The alignments of read sequences to the reference genome produced in this step were stored in BAM (binary aligment maps) files.
In the second step, we used the alignments in the BAM files and the known transcripts from the Augustus u11.6 annotation for the assembly of the sample specific transcriptomes using the software package Cufflinks. The whole transcriptome identified in all the samples was integrated and stored in a GTF (gene transfer format) file using Cuffmerge, a utility program within the Cufflinks package. We performed this refinement of the currently available annoted Chlamydomonas transcriptome in order to avoid incomplete or incorrect annotation that could reduce accuracy  in our study.
Finally, the gene expression levels in the different conditions integrated in our study were estimated using Cuffdiff, a program included in the Cufflinks package. In order to avoid biases due to transcript length and the total number of reads generated in each experiment, we used as unit of measurement fragments per kilobase of transcript per million mapped fragments (FPKM) [90, 94]. Additionally, recent suggestions for normalization methods  that reduce the bias due to the non-uniform distribution of mapped reads within transcripts were taken into account by setting the corresponding parameters in Cuffdiff. These normalizations remove the biases in the data while preserving the variation in gene expression that occurs because of biologically relevant changes in transcription, allowing the comparison of gene expression across multiple experiments. Subsequent analysis and visualization of the results were performed using the R package cummeRbund .
Selection of di erentially expressed genes
The selection of differentially expressed genes was performed using the standard methodology applied to the analysis of RNA-seq data described in . The logarithm of the levels of expression measured in FPKM were computed and the delta method to estimate the variance of the log odds was used. Those genes that exhibited an adjusted p-value for the multiple testing lower than 0.05 were considered to be differentially expressed.
Gene Co-expression criterion and network construction
We used the absolute value of the Pearson correlation coefficient between gene expression profiles across the different conditions to determine the level of co-expression between the selected genes . For each possible correlation value cor, we represented the co-expression relationships between genes that exceed this value in the Chlamydomonas transcriptome using an undirected weighted network G cor = (V cor ,E cor ). The nodes or vertices in V cor correspond to the genes. An undirected edge (g 1, g 2) in E cor with associated weight w > cor indicates the existence of a significant co-expression relationship between genes g 1 and g 2 with an absolute value of the Pearson correlation coefficient between the corresponding expression profiles equal to w.
In order to decide which correlation value is high enough to consider that two genes are significantly co-expressed we used a criterion that establishes a compromise between the generation of a scale-free network and a high network density. Most biological networks characterized so far are scale-free, which makes this property the most common metric for the rational selection of a gene correlation threshold. In order to facilitate the detection of clusters or modules of genes in the constructed network we also added the restriction of generating a network with a high density .
A range of correlation thresholds were considered. For each possible correlation cut-off value we determined how close the corresponding network was to fullfil the scale-free condition by computing the R 2 of the linear regression for the logarithmic transform of the node degree distribution. Additionally, for each possible cut-off value we used the average node degree as a measurement of the network density.
Graphical representations of the network were perfomed using Cytoscape , a software package for network visualization and data integration. Specifically, the organic layout method was applied to visualize ChlamyNET. This algorithm consists of a variant of the force-directed layout. Nodes produce repulsive forces whereas edges induce attractive forces. Nodes are then placed such that the sum of these forces are minimised. The organic layout has the effect of exposing the clustering structure of a network. In particular, this layout tends to locate tighly connected nodes with many interactions or hub nodes together in central areas of the network.
Significance of topological properties
In order to determine the statistical significance of the scale-free property of ChlamyNET we generated 104 random networks with the same number of nodes and edges as ChlamyNET following the Erdös and Renyi random graph model . None of these random networks exhibited a scale-free topology similar to ChlamyNET. This indicates that the scale-free topology of ChlamyNET is not random but rather it is the product of a self-organizing process. It has been suggested that scale-free networks emerge from a growth process by which newly added nodes preferentially attach to already existing nodes with a high number of neighbours . In the case of ChlamyNET the scale-free feature can be a consequence of two mechanisms in the evolution of gene co-expression networks: (i) gene co-expression networks are not static, instead new genes may appear; and (ii) new genes are preferentially co-expressed with genes that already exhibit a large number of co-expressed genes.
We also studied the clustering coefficient in ChlamyNET, a measurement of the density of edges or co-expression relationships around genes. The clustering coefficient of a gene is calculated as the ratio of the actual number of co-expression relationships among all its neighbours and the maximal possible number of such co-expression relationships.
In order to determine that the global clustering coefficient of ChlamyNET is significantly high we generated 104 random scale-free networks with the same number of nodes and edges as ChlamyNET following the Barabasi random scale-free graph model . None of these random networks exhibited a clustering coefficient higher than ChlamyNET.
In a general way, clustering techniques aim at identifying groups or clusters whose individuals exhibit high similarities, whereas individuals from different groups or clusters present low similarities. When clustering techniques are applied to co-expression networks, the similarity among genes is meassured using the correlation among the corresponding gene profiles or co-expression. Therefore, the goal of clustering techniques, when applied to gene co-expression data, consists on identifying disjoint groups or clusters of genes so that the co-expression between genes in the same cluster is high (intra-cluster similarity) whereas the co-expression between genes from different clusters is low (inter-cluster similarity) . In this respect, the silhouette  a criterion that combines the minimization of inter-cluster similarity with the maximization of the intra-cluster similarity, is one of the most popular measurements for the assesment of a clustering analysis. In our study we used this criterion to determine which clustering algorithm and number of clusters best describes the underlying structure in ChlamyNET. We compared the performance of the two most widely used clustering algorithms, hierarchical clustering and partition around medoids (PAM) for different number of clusters ranging from 4 to 20 clusters using the R package clValid .
Gene ontology term enrichment analysis
The Chlamydomonas transcriptome is very sparsely annotated since experimental validation of the different computationally predicted functions is still missing for most genes. In order to overcome this lack of GO term annotation, we followed two diffeerent complementary approaches. In our first approach, we assigned to each Chlamydomonas gene the GO terms associated with its potential Arabidopsis ortholog based on sequence similarity. In our second approach, we used the annotation about protein domains and tools available from the Pfam database  to determine the protein family to which each Chlamydomonas gene belongs to. The GO terms associated to the identified protein family were then assigned to the corresponding gene. Our methodology for the identification of the GO terms over-represented in each cluster is a combination of both approaches. We identified as overrepresented GO terms those found by both approaches or by only one of them with a very high statistical signi cance (a p-value lower than 10 −6). The R package topGO was used to perform GO term enrichment using Fisher's exact test. As gene background we selected the entire Chlamydomonas gene set as indentified in the Phytozome 9.1 database.
Transcription factor binding site enrichment analysis
Transcription Factor Binding Sites (TFBS) enrichment analysis was performed using HOMER  and the known TFBS sequences in plants from the databases AGRIS , JASPAR  and AthaMap . The findMotifs.pl script, applying the default parameters, was used to perform a known and de-novo motif over-representation analysis.
The background used for the over-representation analysis consists of all the gene promoters annotated in the current version of the Chlamydomonas genome. These data were downloaded using the BioMart functionality associated with Phytozome.
Alga material, growth conditions and RNA sequencing
Two independent biological replicates of Chlamydomonas reinhardtii wild type CW15 , and the transgenic line CrDOFin , were grown in flasks with the induction media Sueoka NO3-  under LD (16 h light/8 h dark) or SD (8 h light/16 h dark) conditions at 50 E light intensity with 22 C (during light period) and 18 C (during night period) in a model SG-1400 phytotron (Radiber SA, Spain). Algal cells were grown during 4 days and then, were harvested 4 hours after the light went on, which was considered at Zeitgebertime zero (ZT0). The RNA isolation was performed by TRIZOL (Invitrogen) method following the manufacturer instruction. RNA quality was tested employing a ND-1000 Spectrophotometer (Nanodrop). Library preparation was carried out following the manufacturer's recommendations. Sequencing of RNA libraries was performed with the Illumina HiSeq 2000 sequencer, yielding approximately 40 million 50 bp long reads for each sample.
Availability of supporting data
The RNA-seq data set used to cross-validate the preditive power of ChlamyNET is available at European Nucleotide Archive identified with the accession number PRJEB6682.
The processed RNA-seq data, R scripts used in the construction and analysis of our network as well as the network itself in gml format are available from the web page http://viridiplantae.ibvf.csic.es/ChlamyNet/.
This work was supported with funding from projects CSD2007-00057, BIO2011-28847-C02-00 and BIO2014-52425-P (Spanish Ministry of Economy and Competitiveness, MINECO) and Excellence project P08-AGR-03582 (Junta de Andaluc a) partially supported by FEDER funding to Federico Valverde and Jose M. Romero. Eva Lucas-Reina was funded by a CSIC-JAE fellowship which is partly supported by structure funding from the EU (SEF). Funding from the JdC program (MINECO) and the Excellence project P08-TIC-04200 (Junta de Andaluc a) to Francisco J. Romero-Campero is also acknowledged. The authors acknowledge the support of the high-performance computational resources from the Centro Informatico Cientifico de Andalucia (CICA).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Merchant S, Prochnik S, Vallon O, Harris E, Karpowicz S, Witman G, Terry A, Salamov A, Fritz-Laylin L. Marechal-Drouard Lea: The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science. 2007;318(5848):245–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Serrano G, Herrera-Palau R, Romero JM, Serrano A, Coupland G, Valverde F. Chlamydomonas CONSTANS and the evolution of plant photoperiodic signalling. Curr Biol. 2009;19:359–68.View ArticlePubMedGoogle Scholar
- Romero JM, Valverde F. Evolutionarily conserved photoperiod mechanisms in plants: when did plant photoperiodic signaling appear? Plant Signal Behav. 2009;4(7):642–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Valverde F. CONSTANS and the evolutionary origin of photoperiodic timing of owering. J Exp Bot. 2011;62(8):2453–63.View ArticlePubMedGoogle Scholar
- Rittmann B. Opportunities for renewable bioenergy using microorganisms. Biotechnol Bioeng. 2008;100(2):203–12.View ArticlePubMedGoogle Scholar
- Sivakumar G, Vail D, Xu J, Burner D, Lay J, Ge X, Weathers P. Bioethanol and biodiesel: Alternative liquid fuels for future generations. Eng Life Sci. 2010;10:8–18.View ArticleGoogle Scholar
- Kruse O, Hankamer B. Microalgal hydrogen production. Curr Opin Biotechnol. 2010;21(3):238–43.View ArticlePubMedGoogle Scholar
- Pienkos P, Darzins A. The promise and challenges of microalgal-derived biofuels. Biofuels Bioproducts Biore ning-Biofpr. 2009;3(4):331–40.Google Scholar
- Blaby I, Blaby-Haas C, Tourasse N, Hom E, Lopez D, Aksoy M, Grossman A, Umen J, Dutcher S, Porter M, King S, Witman G, Stanke M, Harris E, Goodstein D, Grimwood J, Schmutz J, Vallon O, Merchant S, Prochnik S. The Chlamydomonas genome project: a decade on. Trends Plant Sci. 2014;19(10):672–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Boyle NR, Page MD, Liu B, Blaby IK, Casero D, Kropat J, Cokus SJ, Hong-Hermesdorf A, Shaw J, Karpowicz SJ, Gallaher SD, Johnson S, Benning C, Pellegrini M, Grossman A, Merchant S. Three acyltransferases and nitrogen-responsive regulator are implicated in nitrogen starvation-induced triacylglycerol accumulation in Chlamydomonas. J Biol Chem. 2012;287(19):15811–158825.View ArticlePubMedPubMed CentralGoogle Scholar
- Urzica EI, Adler LN, Page MD, Linster CL, Arbing MA, Casero D, Pellegrini M, Merchant SS, Clarke SG. Impact of oxidative stress on ascorbate biosynthesis in Chlamydomonas via regulation of the VTC2 gene encoding a GDP-L-galactose phosphorylase. J Biol Chem. 2012;287(17):14234–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Fischer B, Ledford H, Wakao S, Huang S, Casero D, Pellegrini M, Merchant S, Koller A, RI E, Niyogi K. SINGLET OXYGEN RESISTANT 1 links reactive electrophile signaling to singlet oxygen acclimation in Chlamydomonas reinhardtii. Proc Natl Acad Sci U S A. 2012;109(20):E1301–11.View ArticleGoogle Scholar
- Kropat J, Hong-Hermesdorf A, Casero D, Ent P, Castruita M, Pellegrini M, Merchant SS, DM. A revised mineral nutrient supplement increases biomass and growth rate in Chlamydomonas reinhardtii. Plant J. 2011;66:770–80.Google Scholar
- Castruita M, Casero D, Karpowicz S, Kropat J, Vieler A, Hsieh S, Yan W, Cokus S, Loo J, Benning C, Pellegrini M, Merchant S. Systems biology approach in Chlamydomonas reveals connections between copper nutrition and multiple metabolic steps. Plant Cell. 2011;23(4):1273–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Miller R, Wu G, Deshpande R, Vieler A, Gartner K, Li X, Moellering ER, Zauner S, Cornish AJ, Liu B, Bullard B, Sears BB, Kuo M, Hegg EL, Shachar-Hill Y, Shiu S, Benning C. Changes in Transcript Abundance in Chlamydomonas reinhardtii following Nitrogen Deprivation Predict Diversion of Metabolism. Plant Physiol. 2010;154:1737–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Gonzalez-Ballester D, Casero C, Cokus S, Pellegrini M, Merchant SS, Grossman A. RNA-Seq Analysis of Sulfur-Deprived Chlamydomonas Cells Reveals Aspects of Acclimation Critical for Cell Survival. Plant Cell. 2010;22:2058–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Dal'Molin C, Quek L, Palfreyman R, Nielsen L. AlgaGEM - a genome-scale metabolic reconstruction of algae based on the Chlamydomonas reinhardtii genome. BMC Genomics. 2011;12:S5.PubMedGoogle Scholar
- Lopez D, Casero D, Cokus S, Merchant S, Pellegrini M. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data. BMC Bioinf. 2011;12:282.View ArticleGoogle Scholar
- Zheng H, Chiang-Hsieh Y, Chien C, Hsu B, Liu T, Chen C, Chang W. AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. BMC Genomics. 2014;15:196.View ArticlePubMedPubMed CentralGoogle Scholar
- Bassel G, Lan H, Glaab E, Gibbs D, Gerjets T, Krasnogor N, Bonner A, Holdsworth M, Provart N. Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions. Proc Natl Acad Sci. 2011;108(23):9709–14.View ArticlePubMedPubMed CentralGoogle Scholar
- Cai B, Li C, Huang J. Systematic identi cation of cell-wall related genes in Populus based on analysis of functional modules in co-expression network. PLoS One. 2014;9(4):e95176.View ArticlePubMedPubMed CentralGoogle Scholar
- Liang Y, Cai B, Chen F, Wang G, Wang M, Zhong Y, Z C. Construction and validation of a gene co-expression network in grapevine (Vitis vinifera. L.). Horticulture Res. 2014;1:14040.Google Scholar
- Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005;4:Article 17.Google Scholar
- Dong J, Horvath S. Understanding network concepts in modules. BMC Syst Biol. 2007;1:24.View ArticlePubMedPubMed CentralGoogle Scholar
- Aoki K, Ogata Y, Shibata D. Approaches for Extracting Practical Information from Gene Coexpression Networks in Plant Biology. Plant Cell Physiol. 2007;48(3):381–90.View ArticlePubMedGoogle Scholar
- Gao Z, Zhao R, Ruan J. A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks. BMC Genomics. 2013;14:S4.PubMedPubMed CentralGoogle Scholar
- Barabasi A, Albert R. Emergence of Scaling in Random Networks. Science. 1999;286:509–12.View ArticlePubMedGoogle Scholar
- Wang X, Chen G. Complex networks: small-world, scale-free, and beyond. IEEE Circuits Syst Mag. 2003;3:6–20.View ArticleGoogle Scholar
- Jeong H, Mason S, Barabasi A, Oltvai Z. Lethality and centrality in protein networks. Nature. 2001;411(6833):41–2.View ArticlePubMedGoogle Scholar
- Song L, Langfelder P. H S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinf. 2012;13:328.Google Scholar
- Batada N, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ. Stratus Not Altocumulus: A New View of the Yeast Protein Interaction Network. PLoS Biol. 2006;10(4):e317.View ArticleGoogle Scholar
- Mao L, Van Hemert J, Sudhansu DD, Dickerson J. Arabidopsis gene co-expression network and its functional modules. BMC Bioinf. 2009;10:346.View ArticleGoogle Scholar
- Watts D, Strogatz S. Collective dynamics of 'small-world' networks. Nature. 1998;393:440–2.View ArticlePubMedGoogle Scholar
- Kelinberg J. Authoritative Sources in a Hyperlinked Environment. J ACM. 1999;46(5):604–32.View ArticleGoogle Scholar
- Punta M, Coggill P, Eberhardt R, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer E, Eddy S, Bateman A, Finn R. The Pfam protein families database. Nucleid Acids Res. 2012;40(D1):D290–301.View ArticleGoogle Scholar
- Brock G, Pihur V, Datta S. Datta S: clValid: An R Package for Cluster Validation. J Stat Softw. 2008;25:4.Google Scholar
- Rhee S, Wood V, Dolinski K, Draghici S. Use and misuse of the gene ontology annotations. Nat Rev Genet. 2008;9:509–15.View ArticlePubMedGoogle Scholar
- Marcotte E, Date S. Exploiting big biology: integrating large-scale biological data for function inference. Brief Bioinform. 2001;2:363–74.View ArticlePubMedGoogle Scholar
- Chae L, Lee I, Shin J, Rhee S. Towards understanding how molecular networks evolve in plants. Curr Opin Plant Biol. 2012;15(2):177–84.View ArticlePubMedGoogle Scholar
- May P, Christian J, Kempa S, Walther D. ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii. BMC Genomics. 2009;10:209.View ArticlePubMedPubMed CentralGoogle Scholar
- Gennidakis S, Rao S, Greenham K, Uhrig R, O'Leary B, Snedden W, Lu C, Plaxton W. Bacterial- and plant-type phosphoenolpyruvate carboxylase polypeptides interact in the hetero-oligomeric Class-2 PEPC complex of developing castor oil seeds. Plant J. 2007;52(5):839–49.View ArticlePubMedGoogle Scholar
- Kwon S, Cho H, Kim S, Park O. The Rab GTPase RabG3b Positively Regulates Autophagy and Immunity-Associated Hypersensitive Cell Death in Arabidopsis. Plant Physiol. 2013;161(4):1722–36.View ArticlePubMedPubMed CentralGoogle Scholar
- Yoon K, Han D, Li Y, Sommerfeld M, Hu Q. Phospholipid:Diacylglycerol Acyltransferase Is a Multifunctional Enzyme Involved in Membrane Lipid Turnover and Degradation While Synthesizing Triacylglycerol in the Unicellular Green Microalga Chlamydomonas reinhardtii. Plant Cell. 2012;24(9):3708–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Demidov D, VanDamme D, Geelen D, Blattner F, Houbena A. Identi cation and Dynamics of Two Classes of Aurora-Like Kinases in Arabidopsis and Other Plants. Plant Cell. 2005;17:836–48.View ArticlePubMedPubMed CentralGoogle Scholar
- Umeda M, Shimotohno A, Y M. Control of Cell Division and Transcription by Cyclin-dependent Kinase-activating Kinases in Plants. Plant Cell Physiol. 2005;46(9):1437–42.Google Scholar
- Lucas-Reina E, Romero-Campero F, Romero J, Valverde F. An evolutionarily conserved DOF-CONSTANS module controls plant photoperiodic signalling. Plant Physiol. 2015;168(2):561–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Kaufmann K, Pajoro A, Angenent G. Regulation of transcription in plants: mechanisms controlling developmental switches. Nat Rev Genet. 2010;11:830–42.View ArticlePubMedGoogle Scholar
- Riaño-Pachon D, Guedes-Correa L, Trejos-Espinosa R, MR B. Green Transcription Factors: A Chlamydomonas Overview. Genetics. 2008;179:31–9.Google Scholar
- Jin J, Zhang H, Kong L, Gao G, Luo J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res. 2013;42(D1):D1182–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Pérez-Rodríguez P, Riaño-Pachón DM, Corrêa LG, Rensing SA, Kersten B, Mueller-Roeber B. PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res. 2010;38 suppl 1:D822–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Cifuentes-Esquivel N, Bou-Torrent J, Galstyan A, Gallemí M, Sessa G, Salla-Martret M, Roig-Villanova I, Ruberti I, Mart nez-García J. The bHLH proteins BEE and BIM positively modulate the shade avoidance syndrome in Arabidopsis seedlings. Plant J. 2013;75(6):989–1002.View ArticlePubMedGoogle Scholar
- Chattopadhyay S, Ang L, Puente P, Deng X, W N. Arabidopsis bZIP protein HY5 directly interacts with light-responsive promoters in mediating light control of gene expression. Plant Cell. 1998;10(5):673–83.Google Scholar
- Jonassen E, Sandsmark B, Lillo C. Unique status of NIA2 in nitrate assimilation: NIA2 expression is promoted by HY5/HYH and inhibited by PIF4. Plant Signal Behav. 2009;4(11):1084–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Wenkel S, Turck F, Singer K, Gissot L, Le Gourrierec J, Samach A, Coupland G. CONSTANS and the CCAAT box binding complex share a functionally important domain and interact to regulate owering of Arabidopsis. Plant Cell. 2006;18(11):2971–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Gendron J, Pruneda-Paz J, Doherty C, Gross A, Kang S, Kay S. Arabidopsis circadian clock protein, TOC1, is a DNA-binding transcription factor. Proc Natl Acad Sci U S A. 2012;109(8):3167–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Ledger S, Strayer C, Ashton F, Kay S, Putterill J. Analysis of the function of two circadian-regulated CONSTANS-LIKE genes. Plant J. 2012;26:15–22.View ArticleGoogle Scholar
- Yanhui C, Xiaoyuan Y, Kun H, Meihua L, Jigang L, Zhaofeng G, Zhiqiang L, Yunfei Z, Xiaoxiao W, Xiaoming Q, Yunping S, Li Z, Xiaohui D, Jingchu L, Xing-Wang D, Zhangliang C, Hongya G, Li-Jia Q. The MYB transcription factor superfamily of Arabidopsis: expression analysis and phylogenetic comparison with the rice MYB family. Plant Mol Biol. 2006;60:107–24.View ArticlePubMedGoogle Scholar
- Camargo A, Llamas A, Schnell R, Higuera J, Gonzalez-Ballester D, Lefebvre P, Fernandez E, Galvan A. Nitrate Signaling by the Regulatory Gene NIT2 in Chlamydomonas. Plant Cell. 2007;19(11):3491–503.View ArticlePubMedPubMed CentralGoogle Scholar
- Remacle C, Eppe G, Coosemans N, Fernandez E, Vigeolas H. Combined intracellular nitrate and NIT2 e ects on storage carbohydrate metabolism in Chlamydomonas. J Exp Bot. 2014;65:23–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Scheible W, Morcuende R, Czechowski T, Fritz C, Osuna D, Palacios-Rojas N, Schindelasch D, Thimm O, Udvardi M, Stitt M. Genome-Wide Reprogramming of Primary and Secondary Metabolism, Protein Synthesis, Cellular Growth Processes, and the Regulatory Infrastructure of Arabidopsis in Response to Nitrogen. Plant Physiol. 2004;136:2483–99.View ArticlePubMedPubMed CentralGoogle Scholar
- Jiao Y, Lau O, Deng X. Light-regulated transcriptional networks in higher plants. Nat Rev. 2007;8:217–30.View ArticleGoogle Scholar
- Liu Y, Burgos J, Deng Y, Srivastava R, Howell S, Bassham D. Degradation of the Endoplasmic Reticulum by Autophagy during Endoplasmic Reticulum Stress in Arabidopsis. Plant Cell. 2012;24(11):4635–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Sottosanto J, Saranga Y, Blumwald E. Impact of AtNHX1, a vacuolar Na+/H+ antiporter, upon gene expression during short- and long-term salt stress in Arabidopsis thaliana. BMC Plant Biol. 2007;7:18.View ArticlePubMedPubMed CentralGoogle Scholar
- Ascencio-Ibanez J, Sozzani R, Lee T, Chu T, Wolnger R, Cella R, Hanley-Bowdoin L. Global analysis of Arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection. Plant Physiol. 2008;148:436–54.Google Scholar
- Shaked H, Avivi-Ragolsky N, Levy A. Involvement of the Arabidopsis SWI2/SNF2 chromatin remodeling gene family in DNA damage response and recombination. Genetics. 2006;173(2):985–94.View ArticlePubMedPubMed CentralGoogle Scholar
- Perez-Perez M, Florencio F, Crespo J. Inhibition of TOR signaling and stress activate autophagy in Chlamydomonas reinhardtii. Plant Physiol. 2010;152(4):1874–88.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim S, Choi H, Ryu H, Park J, Kim M, Kim S. ARIA, an Arabidopsis Arm Repeat Protein Interacting with a Transcriptional Regulator of Abscisic Acid-Responsive Gene Expression, Is a Novel Abscisic Acid Signaling Component. Plant Physiol. 2004;136(3):3639–48.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen W, Chao G, Singh K. The promoter of a H2O2-inducible, Arabidopsis glutathione S-transferase gene contains closely linked OBF- and OBP1-binding sites. Plant J. 1996;10(6):955–66.View ArticlePubMedGoogle Scholar
- Ito M, Araki S, Matsunaga S, Itoh T, Nishihama R, Machida Y, Doonan J, Watanabe A. G2/M-Phase-Specific Transcription during the Plant Cell Cycle Is Mediated by c-Myb-Like Transcription Factors. Plant Cell. 2001;13:1891–905.View ArticlePubMedPubMed CentralGoogle Scholar
- Haga N, Kato K, Murase M, Araki S, Kubo M, Demura T, Suzuki K, Müller I, Vo U, Jürgens G, Ito M. R1R2R3-Myb proteins positively regulate cytokinesis through activation of KNOLLE transcription in Arabidopsis thaliana. Development. 2007;134:1101–10.View ArticlePubMedGoogle Scholar
- Araki S, Kato K, Suzuki T, Okumura T, Machida Y, Ito M. Cosuppression of NtmybA1 and NtmybA2 causes downregulation of G2/M phase-expressed genes and negatively a ects both cell division and expansion in tobacco. Plant Signal Behav. 2013;9:e26780.View ArticleGoogle Scholar
- Pedersen D, Coppens F, Ma L, Antosch M, Marktl B, Merkle T, Beemster G, Houben A, Grasser K. The plant-specific family of DNA-binding proteins containing three HMG-box domains interacts with mitotic and meiotic chromosomes. New Phytol. 2011;192(3):577–89.View ArticlePubMedGoogle Scholar
- Van Leene J, Hollunder J, Eeckhout D, Persiau G, Van De Slijke E, Stals H, Van Isterdael G, Verkest A, Neirynck S, Buel Y, De Bodt S, Maere S, Laukens K, Pharazyn A, Ferreira P, Eloy N, Renne C, Meyer C, Faure J, Steinbrenner J, Beynon J, Larkin J, Van de Peer Y, Hilson P, Kuiper M, De Veylder L, Van Onckelen H, Inze D, Witters E, De Jaeger G. Targeted interactomics reveals a complex core cell cycle machinery in Arabidopsis thaliana. Mol Syst Biol. 2010;6:397.PubMedPubMed CentralGoogle Scholar
- Gutierrez C. The Arabidopsis Cell Division Cycle. Arabidopsis Book. 2009;7:e0120.View ArticlePubMedPubMed CentralGoogle Scholar
- Avramova Z. Evolution and pleiotropy of TRITHORAX function in Arabidopsis. Int J Dev Biol. 2009;53:371–81.View ArticlePubMedGoogle Scholar
- Wang Y, Yang M. In Silico Identi cation of Co-transcribed Core Cell Cycle Regulators and Transcription Factors in Arabidopsis. J Integr Plant Biol. 2007;49(8):1253–60.View ArticleGoogle Scholar
- Chaboute M, Clement B, Sekine M, Philipps G, Chaubet-Gigot N. Cell cycle regulation of the tobacco ribonucleotide reductase small subunit gene is mediated by E2F-like elements. Plant Cell. 2000;12(10):1987–2000.View ArticlePubMedPubMed CentralGoogle Scholar
- de Jager S, Menges M, Bauer U, Murray J. Arabidopsis E2F1 binds a sequence present in the promoter of S-phase-regulated gene AtCDC6 and is a member of a multigene family with differential activities. Plant Mol Biol. 2001;47(4):555–68.View ArticlePubMedGoogle Scholar
- Taoka K, Kaya H, Nakayama T, Araki T, Meshi T, Iwabuchi M. Identification of three kinds of mutually related composite elements conferring S phase-speci c transcriptional activation. Plant J. 1999;18(6):611–23.View ArticlePubMedGoogle Scholar
- Gretarsson B, Bostandjiev S, Donovan J, Hllerer T. WiGis: A framework for Web-based interactive graph visualizations. In: International Symposium on Graph Drawing. 2009. p. 119–34.Google Scholar
- Jiao Y, Yang H, Ma L, Sun N, Yu H, Liu T, Gao Y, Gu H, Chen Z, Wada M, Gerstein M, Zhao M, Qu L, Deng X. A Genome-Wide Analysis of Blue-Light Regulation of Arabidopsis Transcription Factor Gene Expression during Seedling Development. Plant Physiol. 2003;133(4):1480–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Reisdorph N, Small G. The CPH1 Gene of Chlamydomonas reinhardtii Encodes Two Forms of Cryptochrome Whose Levels Are Controlled by Light-Induced Proteolysis. Plant Physiol. 2004;134:1546–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Hudson M, Quail P. Identification of Promoter Motifs Involved in the Network of Phytochrome A-Regulated Gene Expression by Combined Analysis of Genomic Sequence and Microarray Data. Plant Physiol. 2003;133(4):1605–16.View ArticlePubMedPubMed CentralGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, L W, E Y. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005;33:D39–45.Google Scholar
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HRea. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–9.Google Scholar
- Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, S RD. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178–86.Google Scholar
- Garber M, Grabherr M, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011;8(6):469–77.View ArticlePubMedGoogle Scholar
- Trapnell C, Roberts A, Go L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg S, Rinn J, L P. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78.Google Scholar
- Trapnell C, Pachter L, Salzberg S. TopHat: discovering splice junctios with RNA-seq. Bioinformatics. 2009;25:1105–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Cea T. Transcript assembly and quanti cation by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511–5.View ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175–85.View ArticlePubMedGoogle Scholar
- Ewing B, Green P. Base-calling of automated sequencer traces using phred, II. Error probabilities. Genome Res. 1998;8(3):186–94.View ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-e cient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.View ArticlePubMedPubMed CentralGoogle Scholar
- Mortazavi A, Williams B. McCue k, Schae er L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods. 2008;5:621–8.Google Scholar
- Roberts A, Trapnell C, Donaghey J, Rinn J, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011;12:R22.View ArticlePubMedPubMed CentralGoogle Scholar
- Go L, Trapnell C: cummeRbund: Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data. 2011. [R package version 1.2.0].Google Scholar
- Bullard J, Purdom E, Hansen K, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinf. 2010;11(94):1–13.Google Scholar
- Smoot M, Ono K, Ruscheinski J, Peng-Liang W, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27(3):431–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Erdos P, Renyi A. On the evolution of random graphs. Publ Math Inst Hung Acad Sci. 1960;5:17–61.Google Scholar
- Heinz S, Benner C, Spann N, Bertolino E, Lin Y, Laslo P, Cheng J, Murre C, Singh H, Glass C. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89.View ArticlePubMedPubMed CentralGoogle Scholar
- Davuluri R, Sun H, Palaniswamy S, Matthews N, Molina C, Kurtz M, Grotewold E. AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinf. 2003;4:25.View ArticleGoogle Scholar
- Sandelin A, Alkema W, Engstrom P, Wasserman W, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding pro les. Nucleic Acids Res. 2004;32:D91–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Steffens N, Galuschka C, Schindler M, Bulow L, Hehl R. AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome. Nucleic Acids Res. 2004;32:D368–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Davies D, Plaskitt A. Genetical and structural analyses of cell-wall formation in Chlamydomonas reinhardtii. Gen Res. 1971;17:33–43.View ArticleGoogle Scholar
- Sueoka N, Chiang K, Kates R. Deoxyribonucleic acid replication in meiosis of Chlamydomonas reinhardtii. J Mol Biol. 1967;25:47–66.View ArticlePubMedGoogle Scholar