Transcriptomic and phylogenetic analysis of a bacterial cell cycle reveals strong associations between gene co-expression and evolution
© Fang et al.; licensee BioMed Central Ltd. 2013
Received: 26 December 2012
Accepted: 13 May 2013
Published: 5 July 2013
The genetic network involved in the bacterial cell cycle is poorly understood even though it underpins the remarkable ability of bacteria to proliferate. How such network evolves is even less clear. The major aims of this work were to identify and examine the genes and pathways that are differentially expressed during the Caulobacter crescentus cell cycle, and to analyze the evolutionary features of the cell cycle network.
We used deep RNA sequencing to obtain high coverage RNA-Seq data of five C. crescentus cell cycle stages, each with three biological replicates. We found that 1,586 genes (over a third of the genome) display significant differential expression between stages. This gene list, which contains many genes previously unknown for their cell cycle regulation, includes almost half of the genes involved in primary metabolism, suggesting that these “house-keeping” genes are not constitutively transcribed during the cell cycle, as often assumed. Gene and module co-expression clustering reveal co-regulated pathways and suggest functionally coupled genes. In addition, an evolutionary analysis of the cell cycle network shows a high correlation between co-expression and co-evolution. Most co-expression modules have strong phylogenetic signals, with broadly conserved genes and clade-specific genes predominating different substructures of the cell cycle co-expression network. We also found that conserved genes tend to determine the expression profile of their module.
We describe the first phylogenetic and single-nucleotide-resolution transcriptomic analysis of a bacterial cell cycle network. In addition, the study suggests how evolution has shaped this network and provides direct biological network support that selective pressure is not on individual genes but rather on the relationship between genes, which highlights the importance of integrating phylogenetic analysis into biological network studies.
KeywordsCell cycle phylogenomics Caulobacter crescentus Co-expression network Functional modules Selective pressure
Advances in next-generation sequencing methodologies have significantly reduced the time and cost constraints of determining genome-wide expression levels of various organisms, including bacteria. These technologies present major advantages over hybridization-based microarrays [1, 2]. Along with high throughput, they allow single-nucleotide resolution as well as quantification of absolute RNA abundance. These benefits combined with strand-specificity and greater dynamic range in gene expression measurement have provided great insight into the transcriptional landscape of various bacteria under different growth conditions . However, no deep RNA sequencing (RNA-Seq) studies have so far reported a transcriptome analysis of a bacterial cell cycle, which would provide an important step toward understanding the genetic pathways involved in bacterial multiplication.
Decades of single-gene studies in C. crescentus have uncovered regulatory components and molecular mechanisms that govern the cell cycle and the spatial and temporal biogenesis of different organelles and molecular machineries. Following the resolution of the C. crescentus genome [8, 9], a variety of “omics” and modeling studies have been undertaken to understand the C. crescentus cell cycle at a system level [10–17]. Important studies have led the way to understanding the transcriptional cascades generated by the oscillatory expression of cell cycle master regulators [10, 12, 18–20].
In this work, we took advantage of the benefits of RNA-Seq to provide absolute measures of gene expression during the C. crescentus cell cycle, using biological replicates for each cell cycle stage. We uncovered novel properties of gene expression and regulation, identified over 1,500 cell cycle-regulated genes, and organized them into a co-expression network. Furthermore, we expanded phylogenomics  to co-expression network study by comparing network and gene evolutionary properties, and discovered strong correlations between co-expression and evolution.
Results and discussion
Single-nucleotide resolution whole-genome mapping of RNA-Seq
Genes cluster into three groups according to their expression level
A recent genome-wide transposon insertion study in C. crescentus has identified 480 essential or high-fitness-cost genes . As shown in Figure 3B, we found that gene essentiality is correlated with gene expression as the essential genes generally had higher expression values (median = 52x) than non-essential genes (median = 15x), with the high-fitness-cost genes having intermediate values (median = 31x, ANOVA test, p<1e-10). Only 4 essential and 6 high-fitness-cost genes fell to the group of poorly expressed genes (Additional file 5: Table S2). Their essentiality was determined based on colony formation on solid rich growth medium . The low expression levels of these genes under our experimental conditions (exponential-phase liquid cultures in minimal medium) suggests that their essentiality may be specific to growth conditions.
Gene essentiality as determined by laboratory mutagenesis are dependent on experimental contexts, and only identifies genes whose inactivation results in rapid lethality or high-fitness cost under the tested conditions. On the other hand, gene persistence, which measures how widely conserved a gene is among extant species , informs about the importance of a gene in natural environments, with competitions, under harsh conditions, and over 3 billion years of natural evolution . Therefore, we also compared the gene expression levels with evolutionary gene persistence. To obtain a persistence index (PI)  of each C. crescentus gene, we first determined the distribution of orthologs among 236 bacterial species selected to represent an unbiased phylogenetic tree (see methods). The expression level of each gene was then plotted as a function of its PI (Figure 3C), with PI>150 and PI<50 used as borders to distinguish “persistent” genes that have been retained in most species during evolution (with over 150 orthologs among the 236 selected genomes) from the “less conserved” genes (with less than 50 orthologs). We found that poorly expressed genes, as a group, have been poorly conserved during evolution as among the 738 genes with low expression (<5x), 675 of them (92%) had PI<50, and only 6 poorly expressed genes had a PI>150 (Additional file 5: Table S2). When considering all genes, chi-square test clearly showed that as expected , the persistent genes overall display a higher expression than less conserved genes (p<1e-10). The positive correlation between expression and persistence in very broadly conserved genes (PI>200, Figure 3C) is in good agreement with the toolbox model of bacterial evolution . Interestingly, however, we observed a few highly expressed (>1000x) genes that were present almost equally among both persistent and poorly conserved genes (Figure 3C). In fact, when we only examined highly expressed (>1000x) genes, there was no longer a correlation between PI values and expression levels (i.e., t-test of gene expressions from the two groups PI<50 and PI>150 shows no difference). This indicates once again that highly expressed genes tend to behave distinctly from the rest of the genome; they are under different regulatory and evolutionary constraints than most genes.
Identification of 1,586 differentially expressed genes
To identify cell cycle-regulated (CCR) genes, we used the baySeq package. This program took the gene expression values from the biological replicates across the 5 cell cycle time points, and estimated posterior likelihoods of differential expression via an empirical Bayesian method . Through this analysis (see methods), we identified 1,586 genes (Additional file 6: Table S3) that we will hereafter refer to as CCR genes. We note that a small fraction of our CCR genes are likely to be false positives due to the potential stresses (e.g., cold shock) associated with the cell cycle synchronization technique (see Additional file 7: SI and Additional file 8: Table S9). Most genes whose transcription is induced with the method are expected to display a peak expression in the first time point (i.e., the SW/G1 cell stage) with a lower expression profile in subsequent time point samples. The presence of these method-induced genes does not, however, affect our conclusions because we obtain similar results when the whole set of SW/G1-specific genes is excluded from all the analyses (including those described below, see Additional file 7: SI).
In general, the level of peak gene expression does not appear to influence the baySeq likelihood values (Figure 4C). Among the 1,586 CCR genes that we identified, 84% (1,331) of them had expression changes > 2 fold (Figure 4D). The maximal fold of change in cell-cycle expression was over 229, and the mean was 8.2. In terms of peak expression, 96% (1,521) CCR genes had coverage > 5x (Figure 4D) and therefore, are likely above background transcription.
GO term enrichment analysis that compares CCR with non-CCR genes revealed over-representations and under-representations of genes with particular GO terms (Additional file 11: Figure S3, Additional file 12: Table S5). For example, genes associated with flagellar motility, chemotaxis, division and DNA synthesis were enriched among CCR genes, consistent with their known cell cycle regulation. Genes encoding two-component signal transduction proteins (response regulators and histidine kinases) were also significantly enriched among CCR protein-encoding genes, while genes encoding sequence-specific DNA-binding proteins (e.g., transcriptional regulators) were overall under-represented. In addition, this analysis showed that some metabolic pathways (e.g., nitrogen and sulfur compound metabolic processes) were over-represented in terms of cell cycle regulation while others (e.g., respiration) were under-represented.
Five-hundred forty-one CCR genes did not have a GO term and these genes were in general less conserved across the phylogenetic tree than the 1,024 CCR genes with GO terms based on PI distributions (Figure 5B). However, a subset of them (22) were subject to strong selective pressure with PI > 50 (Figure 5C, KS test, p<1e-3). These conserved genes are interesting candidates for future cell cycle studies.
Cell cycle co-expression network and modules
Since genes with correlated expression profiles can suggest correlations in biological function or regulatory mechanism, we used Weighted Gene Correlation Network Analysis (WGCNA) [31–33] to determine co-expression profiles among the 1,586 identified CCR genes. From this analysis, we were able to cluster the CCR genes into 76 modules. Each module contains genes with similar cell cycle expression profiles, and the overall expression profile of each module can be represented by the first eigenvector of the module. On average, the first eigenvector was able to explain over 85% of the total variance, with even the worst case (the maroon module) still explaining 78% of the total variance (Additional file 13: Figure S4, see Eigen_varExplained.txt file).
When we examined co-expression modules that cluster with the magenta module because of their similarity in cell cycle expression pattern, we identified genes from pathways tangential to cysteine synthesis. Using this strategy, we were able to identify the metabolic network involved in cysteine, methionine, serine, glycine, glutathione, and SAM synthesis (Figure 7). The entire network is created from 38 genes, expressed from at least 25 transcriptional units (the genes had to be separated by at least 2 kb or had to be transcribed in opposite directions to be considered as distinct transcription units). Of these 38 genes, 31 display differential cell cycle expression, and most are up-regulated at the ST and/or EPD cell time point (Figure 7). Thus, gene and module clustering can be used to infer functional coupling between genes and pathways.
Cell cycle transcriptome analysis from an evolutionary perspective
In terms of gene persistence, CCR genes and non-CCR genes showed no differences (Additional file 17: Figure S7). However, the contribution of each CCR gene in forming a co-expression module was not equal, with the persistent genes (PI≥150) being more prone to be major contributors compared to the rest of CCR genes (KS test, p<1e-5). In other words, CCR genes that are widely conserved across bacterial phyla tend to determine the expression profile of their module, suggesting that evolution plays a role in shaping gene co-expression networks.
Collectively, these phylogenetic analyses suggest that gene conservation and co-expression are highly correlated: broadly conserved genes presumably organized into functional modules in ancestral species and have co-evolved as groups into many extant organisms, whereas narrowly conserved genes tend to be co-expressed together as clade- or species-specific modules. Understanding the mechanism that drives co-expressed genes to co-evolve, or co-evolved genes to be co-expressed, will be of great interest, as it is beyond the influence of operon organization [26, 44].
In this work, we leveraged the CV analysis of biological replicates to refine our expression measurements and to correct systematic biases associated with GC-rich genomes. Using this strategy, we identified three global patterns of gene expression that appear to be under distinct regulatory constraints. By integrating two popular tools, WGCNA and baySeq, we generated a list of CCR genes and identified previously unknown relationships between these CCR genes. Through phylogenetic analysis of expression network modules, we found a correlation between stronger co-expression and broader conservation of genes. By investigating the evolutionary profiles of the modules and their MPD/MNTD coordinates, we found that most (64%) modules with strong tree-level clustering were dominated by widely conserved genes, and that 11 modules with strong tip-level clustering were dominated by clade-specific genes. In total, this accounted for 79% of the 76 modules, which argues that evolutionary profiles are highly related to gene co-expressions and that evolution has shaped the cell cycle expression network. This further implies that selective pressure is not on single genes but rather on the relationships between genes (i.e., the biological network), emphasizing the value of including phylogenetic analysis to the study of gene co-expression networks.
Bacterial growth and RNA collection
C. crescentus NA1000 (also known as CB15N) was grown at 30°C in M2G until the exponentially growing culture reached an OD660 of about 0.3. Cell synchronization, which includes a centrifugation in a density gradient of silica (Ludox) at 4°C, was performed as previously described , using 1 L of culture. After synchronization, the purified swarmer cell population was resuspended in pre-warmed M2G medium. A total of 5 synchronies were done to obtain 3 time points such that, in total, we obtained three replicates (50 ml aliquots of cells) at 0, 30, 60, 90 and 120 min following synchronization. Total bacterial RNA was isolated using phenol-chloroform extraction, as described previously . The quality of the extracted RNA was assessed by agarose electrophoresis; rRNA bands appeared intact and no RNA smear was apparent. RNA samples were immediately frozen and stored at -80°C. RNA samples were later enriched for mRNA using the Invitrogen Ribominus Transcriptome Isolation Kit (Yeast and Bacteria) to remove ribosomal RNA per the manufacturer’s protocols except for the use of custom-made nucleic acid probes (Invitrogen) designed against C. crescentus ribosomal sequences. All RNA samples were tested for integrity on a BioRad Experion capillary electrophoresis system. Possible residual DNA was removed by addition of Ambion Turbo DNase.
Library preparation, sequencing and mapping
Fifteen sequencing libraries for Applied Biosystems SOLiD system sequencing were created using the Applied Biosystems Whole Transcriptome Library Preparation for SOLiD Sequencing, and individual samples were barcoded using Applied Biosystems Small RNA Expression Kit (SREK) barcodes (per the manufacturer’s protocols). Transcriptome library preparation was performed for labeling in a strand-specific manner. Samples were run on the Applied Biosystems SOLiD 3 Platform using Shotgun Sequencing (50 base pair reads) using standard sequencing protocols. Each experimental time point was run on an individual flow cell containing the 3 biological replicates with different barcodes. Raw color space data from SOLiD sequencing was mapped to the C. crescentus NA1000 chromosome (NC_011916) (plus and minus strands, separately) using SOCS software with a mismatch cutoff of 5 nucleotides , which discards about half of the reads. We assigned weights of 1, 0.95, 0.9, 0.85, 0.8 and 0.75 to reads with 0 to 5 mismatches, respectively, when summing them together.
From the bird’s eye view of raw RNA-Seq mapping (Figure 2A), we observed some spikes, indicating large concentrations of RNA-Seq reads at those locations. From Figure 2D, we also detected tails of highly expressed genes, which did not follow the major power-law distribution of the genome. Furthermore, a Chi-square test confirmed that the amount of reads mapped to highly expressed genes (>1000x) did not follow the same distribution than those mapped to the bulk of genome (p<1e-8). Hence, samples with ≤1000x and >1000x were normalized by the sum of each replicate separately.
Quantification of gene expression
We employed a dynamic programming segmentation algorithm from the tillingArray package  to divide the CV curve into segments, as shown in Figure 2C. We removed segments with CV >1 before quantifying gene expression. We then calculated the weighted mean coverage in the remaining segments that fell within annotated CDS or RNA coordinates as gene expression value.
Gene ontology analysis
GO (gene ontology) annotation was downloaded from EBI UniProt-GOA , which included 2,564 C. crescentus NA1000 genes. We mapped our CCR genes to this dataset and obtained the GO for 1,024 protein-encoding CCR genes, and their biological process (level 3) GO terms distribution (Figure 5A) was summarized and drawn by Blast2GO . GO terms enrichment analysis was also carried out using Blast2GO, and significant GO terms were reported in Additional file 18: Table 5S with their Fisher’s exact test p-value < 0.01. We also provided FDR corrected p-values for reader’s reference.
Identification of cell cycle-regulated genes and construction of the WGCNA co-expression network construction
The baySeq package  was used to identify CCR genes. Based on baySeq minimum requirement, we assumed two conditions for each gene, up or down regulated. We enumerated all possible combinations of the up and down regulation across 5 time points (each with three identical replicates), and included no expression as well as constant expression without changes, as the models to be evaluated by baySeq for each gene. baySeq considered the variance in the three biological replicates when estimating the likelihood, and assigned genes into the model that best described their cell cycle expression profile. Genes that were assigned to models with differential expressions were considered as CCR genes. Similar to our normalization procedure, we ran the baySeq workflow for the highly expressed genes and for the bulk genome separately. To construct the gene co-expression modules, we first followed WGCNA’s data filter suggestion and removed one replicate from each of the SW, ST and EPD time points. We then constructed signed network with β=36 and minimum module size of 5 using the WGCNA default Topological Overlap Matrix (TOM) . The eigenvector of each module’s expression matrix was used to represent the expression profile of the module, and scaled gene expression profiles were projected onto this eigenvector to calculate contributions from the member genes. Cytoscape was used to draw the network topology of the module .
Phylogenetic signal and evolutionary profiles of co-expression modules
We used 1 or 0 to represent whether or not a CCR gene is conserved in a species. For each module, we summed the conservation values of all member genes in each of the 236 species to obtain a distribution profile across the selected species. This distribution profile was then treated as the trait data, and the K-statistic and the associated p-value were calculated according to Blomberg et al’s algorithm . MPD and MNTD values were calculated based on the same species-distribution profiles for each module, and null model used in the calculation was generated by randomizing the species-distribution of each module 9,999 times, while maintaining the phylogenetic relationships .
Orthology and gene persistence
The large 16 s rRNA phylogenetic tree from Greengenes , which covers over 800,000 bacterial species, was first cut into about 300 evenly speciated clades. We selected all fully sequenced bacterial genomes with > 1.5 M from EMBL, and mapped them into 236 Greengenes clades. From each clade, we randomly selected one species as representative (Additional file 20: Table S8). The persistence index (PI) of a C. crescentus gene was defined by the number of orthologs found in the 236 selected species. Orthology was acquired by bi-directional best hits with protein sequences similarity over 40% and protein length difference under 20% [26, 51]. In addition to obtaining the PI value for each C. crescentus gene, we used a set of less stringent criteria to identify all proteins (referred to as homologs) with over 40% of similarity and less than 50% of length difference. The results are documented in Additional file 15: S1 and Additional file 21: Table S10.
Availability of supporting data
The data set supporting the results of this article is available in the NCBI Gene Expression Omnibus (GEO) repository, with access number GSE46915 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46915).
Coefficient of variation
Mean pairwise distance
Mean nearest taxon distance.
We thank Brian Ondov and Anjana Varadarajan for advice on the RNA mapping method. We are also grateful to Life Technologies for assistance with SOLiD sequencing. We thank Professor Antoine Danchin and the Jacobs-Wagner lab members for valuable input and for critical reading of the manuscript. This work was in part funded the National Institutes of Health (GM065835 to C. J.-W.). C. J.-W. is an investigator of the Howard Hughes Medical Institute.
- Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011, 12 (2): 87-98. 10.1038/nrg2934.PubMed CentralView ArticlePubMedGoogle Scholar
- Croucher NJ, Thomson NR: Studying bacterial transcriptomes using RNA-seq. Curr Opin Microbiol. 2010, 13 (5): 619-624. 10.1016/j.mib.2010.09.009.PubMed CentralView ArticlePubMedGoogle Scholar
- Evinger M, Agabian N: Envelope-associated nucleoid from Caulobacter crescentus stalked and swarmer cells. J Bacteriol. 1977, 132 (1): 294-301.PubMed CentralPubMedGoogle Scholar
- McAdams HH, Shapiro L: System-level design of bacterial cell cycle control. FEBS Lett. 2009, 583 (24): 3984-3991. 10.1016/j.febslet.2009.09.030.PubMed CentralView ArticlePubMedGoogle Scholar
- Kirkpatrick CL, Viollier PH: Decoding Caulobacter development. FEMS Microbiol Rev. 2012, 36 (1): 193-205. 10.1111/j.1574-6976.2011.00309.x.View ArticlePubMedGoogle Scholar
- Curtis PD, Brun YV: Getting in the loop: regulation of development in Caulobacter crescentus. Microbiology and molecular biology reviews : MMBR. 2010, 74 (1): 13-41. 10.1128/MMBR.00040-09.PubMed CentralView ArticlePubMedGoogle Scholar
- Schofield WB, Lim HC, Jacobs-Wagner C: Cell cycle coordination and regulation of bacterial chromosome segregation dynamics by polarly localized proteins. EMBO J. 2010, 29 (18): 3068-3081. 10.1038/emboj.2010.207.PubMed CentralView ArticlePubMedGoogle Scholar
- Marks ME, Castro-Rojas CM, Teiling C, Du L, Kapatral V, Walunas TL, Crosson S: The genetic basis of laboratory adaptation in Caulobacter crescentus. J Bacteriol. 2010, 192 (14): 3678-3688. 10.1128/JB.00255-10.PubMed CentralView ArticlePubMedGoogle Scholar
- Nierman WC, Feldblyum TV, Laub MT, Paulsen IT, Nelson KE, Eisen JA, Heidelberg JF, Alley MR, Ohta N, Maddock JR, et al: Complete genome sequence of Caulobacter crescentus. Proc Natl Acad Sci USA. 2001, 98 (7): 4136-4141. 10.1073/pnas.061029298.PubMed CentralView ArticlePubMedGoogle Scholar
- Laub MT, McAdams HH, Feldblyum T, Fraser CM, Shapiro L: Global analysis of the genetic network controlling a bacterial cell cycle. Science. 2000, 290 (5499): 2144-2148. 10.1126/science.290.5499.2144.View ArticlePubMedGoogle Scholar
- McGrath PT, Lee H, Zhang L, Iniesta AA, Hottes AK, Tan MH, Hillson NJ, Hu P, Shapiro L, McAdams HH: High-throughput identification of transcription start sites, conserved promoter motifs and predicted regulons. Nat Biotechnol. 2007, 25 (5): 584-592. 10.1038/nbt1294.View ArticlePubMedGoogle Scholar
- Hottes AK, Shapiro L, McAdams HH: DnaA coordinates replication initiation and cell cycle transcription in Caulobacter crescentus. Mol Microbiol. 2005, 58 (5): 1340-1353. 10.1111/j.1365-2958.2005.04912.x.View ArticlePubMedGoogle Scholar
- Vohradsky J, Janda I, Grunenfelder B, Berndt P, Roder D, Langen H, Weiser J, Jenal U: Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server. Proteomics. 2003, 3 (10): 1874-1882. 10.1002/pmic.200300559.View ArticlePubMedGoogle Scholar
- Christen B, Abeliuk E, Collier JM, Kalogeraki VS, Passarelli B, Coller JA, Fero MJ, McAdams HH, Shapiro L: The essential genome of a bacterium. Mol Syst Biol. 2011, 7: 528-PubMed CentralView ArticlePubMedGoogle Scholar
- Landt SG, Abeliuk E, McGrath PT, Lesley JA, McAdams HH, Shapiro L: Small non-coding RNAs in Caulobacter crescentus. Mol Microbiol. 2008, 68 (3): 600-614. 10.1111/j.1365-2958.2008.06172.x.View ArticlePubMedGoogle Scholar
- Li S, Brazhnik P, Sobral B, Tyson JJ: Temporal controls of the asymmetric cell division cycle in Caulobacter crescentus. PLoS Comput Biol. 2009, 5 (8): e1000463-10.1371/journal.pcbi.1000463.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim H, Lee JK, Park T: Inference of large-scale gene regulatory networks using regression-based network approach. J Bioinform Comput Biol. 2009, 7 (4): 717-735. 10.1142/S0219720009004278.View ArticlePubMedGoogle Scholar
- Holtzendorff J, Hung D, Brende P, Reisenauer A, Viollier PH, McAdams HH, Shapiro L: Oscillating global regulators control the genetic circuit driving a bacterial cell cycle. Science. 2004, 304 (5673): 983-987. 10.1126/science.1095191.View ArticlePubMedGoogle Scholar
- Reisenauer A, Shapiro L: DNA methylation affects the cell cycle transcription of the CtrA global regulator in Caulobacter. EMBO J. 2002, 21 (18): 4969-4977. 10.1093/emboj/cdf490.PubMed CentralView ArticlePubMedGoogle Scholar
- Collier J, McAdams HH, Shapiro L: A DNA methylation ratchet governs progression through a bacterial cell cycle. Proc Natl Acad Sci USA. 2007, 104 (43): 17111-17116. 10.1073/pnas.0708112104.PubMed CentralView ArticlePubMedGoogle Scholar
- Eisen JA, Fraser CM: Phylogenomics: intersection of evolution and genomics. Science. 2003, 300 (5626): 1706-1707. 10.1126/science.1086292.View ArticlePubMedGoogle Scholar
- Clauset A, Shalizi CR, Newman MEJ: Power-Law Distributions in Empirical Data. SIAM Rev. 2009, 51 (4): 661-703. 10.1137/070710111.View ArticleGoogle Scholar
- Ueda HR, Hayashi S, Matsuyama S, Yomo T, Hashimoto S, Kay SA, Hogenesch JB, Iino M: Universality and flexibility in gene expression from bacteria to human. Proc Natl Acad Sci USA. 2004, 101 (11): 3765-3769. 10.1073/pnas.0306244101.PubMed CentralView ArticlePubMedGoogle Scholar
- Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and complexity of a bacterial transcriptome. J Bacteriol. 2009, 191 (10): 3203-3211. 10.1128/JB.00122-09.PubMed CentralView ArticlePubMedGoogle Scholar
- Yoder-Himes DR, Chain PS, Zhu Y, Wurtzel O, Rubin EM, Tiedje JM, Sorek R: Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci USA. 2009, 106 (10): 3976-3981. 10.1073/pnas.0813403106.PubMed CentralView ArticlePubMedGoogle Scholar
- Fang G, Rocha EP, Danchin A: Persistence drives gene clustering in bacterial genomes. BMC Genomics. 2008, 9: 4-10.1186/1471-2164-9-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Fang G, Rocha E, Danchin A: How essential are nonessential genes?. Mol Biol Evol. 2005, 22 (11): 2147-2156. 10.1093/molbev/msi211.View ArticlePubMedGoogle Scholar
- Maslov S, Krishna S, Pang TY, Sneppen K: Toolbox model of evolution of prokaryotic metabolic networks and their regulation. Proc Natl Acad Sci USA. 2009, 106 (24): 9743-9748. 10.1073/pnas.0903206106.PubMed CentralView ArticlePubMedGoogle Scholar
- Hardcastle T, Kelly K: baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinforma. 2010, 11 (1): 422-10.1186/1471-2105-11-422.View ArticleGoogle Scholar
- Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O'Donovan C, Martin MJ, Bely B, Browne P, Mun Chan W, Eberhardt R, et al: The UniProt-GO Annotation database in 2011. Nucleic Acids Res. 2012, 40 (Database issue): D565-570.PubMed CentralView ArticlePubMedGoogle Scholar
- Oldham MC, Horvath S, Geschwind DH: Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA. 2006, 103 (47): 17973-17978. 10.1073/pnas.0605938103.PubMed CentralView ArticlePubMedGoogle Scholar
- Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, Mill J, Cantor RM, Blencowe BJ, Geschwind DH: Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature. 2011, 474 (7351): 380-384. 10.1038/nature10110.PubMed CentralView ArticlePubMedGoogle Scholar
- Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 2008, 9: 559-10.1186/1471-2105-9-559.View ArticleGoogle Scholar
- Sekowska A, Kung HF, Danchin A: Sulfur metabolism in Escherichia coli and related bacteria: facts and fiction. J Mol Microbiol Biotechnol. 2000, 2 (2): 145-177.PubMedGoogle Scholar
- Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM, Pellegrini-Toole A, Bonavides C, Gama-Castro S: The EcoCyc Database. Nucleic Acids Res. 2002, 30 (1): 56-58. 10.1093/nar/30.1.56.PubMed CentralView ArticlePubMedGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302 (5643): 249-255. 10.1126/science.1087447.View ArticlePubMedGoogle Scholar
- Bergmann S, Ihmels J, Barkai N: Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2004, 2 (1): E9-10.1371/journal.pbio.0020009.PubMed CentralView ArticlePubMedGoogle Scholar
- Mezey JG, Nuzhdin SV, Ye F, Jones CD: Coordinated evolution of co-expressed gene clusters in the Drosophila transcriptome. BMC Evol Biol. 2008, 8: 2-10.1186/1471-2148-8-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999, 96 (6): 2896-2901. 10.1073/pnas.96.6.2896.PubMed CentralView ArticlePubMedGoogle Scholar
- Hurst LD, Williams EJ, Pal C: Natural selection promotes the conservation of linkage of co-expressed genes. Trends Genet. 2002, 18 (12): 604-606. 10.1016/S0168-9525(02)02813-5.View ArticlePubMedGoogle Scholar
- Blomberg SP, Garland T, Ives AR: Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution; international journal of organic evolution. 2003, 57 (4): 717-745.View ArticlePubMedGoogle Scholar
- Kembel SW, Cowan PD, Helmus MR, Cornwell WK, Morlon H, Ackerly DD, Blomberg SP, Webb CO: Picante: R tools for integrating phylogenies and ecology. Bioinformatics. 2010, 26 (11): 1463-1464. 10.1093/bioinformatics/btq166.View ArticlePubMedGoogle Scholar
- Webb CO, Ackerly DD, McPeek MA, Donoghue MJ: Phylogenies and community ecology. Annu Rev Ecol Syst. 2002, 33: 475-505. 10.1146/annurev.ecolsys.33.010802.150448.View ArticleGoogle Scholar
- Pal C, Hurst LD: Evidence against the selfish operon theory. Trends Genet. 2004, 20 (6): 232-234. 10.1016/j.tig.2004.04.001.View ArticlePubMedGoogle Scholar
- Bergman NH, Anderson EC, Swenson EE, Niemeyer MM, Miyoshi AD, Hanna PC: Transcriptional profiling of the Bacillus anthracis life cycle in vitro and an implied model for regulation of spore formation. J Bacteriol. 2006, 188 (17): 6092-6100. 10.1128/JB.00723-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Ondov BD, Varadarajan A, Passalacqua KD, Bergman NH: Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics. 2008, 24 (23): 2776-2777. 10.1093/bioinformatics/btn512.PubMed CentralView ArticlePubMedGoogle Scholar
- Huber W, Toedling J, Steinmetz LM: Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics. 2006, 22 (16): 1963-1970. 10.1093/bioinformatics/btl289.View ArticlePubMedGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMedGoogle Scholar
- Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, et al: Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007, 2 (10): 2366-2382. 10.1038/nprot.2007.324.PubMed CentralView ArticlePubMedGoogle Scholar
- DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006, 72 (7): 5069-5072. 10.1128/AEM.03006-05.PubMed CentralView ArticlePubMedGoogle Scholar
- Fang G, Bhardwaj N, Robilotto R, Gerstein MB: Getting started in gene orthology and functional analysis. PLoS Comput Biol. 2010, 6 (3): e1000703-10.1371/journal.pcbi.1000703.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.