Genome-wide gene expression and RNA half-life measurements allow predictions of regulation and metabolic behavior in Methanosarcina acetivorans

Background While a few studies on the variations in mRNA expression and half-lives measured under different growth conditions have been used to predict patterns of regulation in bacterial organisms, the extent to which this information can also play a role in defining metabolic phenotypes has yet to be examined systematically. Here we present the first comprehensive study for a model methanogen. Results We use expression and half-life data for the methanogen Methanosarcina acetivorans growing on fast- and slow-growth substrates to examine the regulation of its genes. Unlike Escherichia coli where only small shifts in half-lives were observed, we found that most mRNA have significantly longer half-lives for slow growth on acetate compared to fast growth on methanol or trimethylamine. Interestingly, half-life shifts are not uniform across functional classes of enzymes, suggesting the existence of a selective stabilization mechanism for mRNAs. Using the transcriptomics data we determined whether transcription or degradation rate controls the change in transcript abundance. Degradation was found to control abundance for about half of the metabolic genes underscoring its role in regulating metabolism. Genes involved in half of the metabolic reactions were found to be differentially expressed among the substrates suggesting the existence of drastically different metabolic phenotypes that extend beyond just the methanogenesis pathways. By integrating expression data with an updated metabolic model of the organism (iST807) significant differences in pathway flux and production of metabolites were predicted for the three growth substrates. Conclusions This study provides the first global picture of differential expression and half-lives for a class II methanogen, as well as provides the first evidence in a single organism that drastic genome-wide shifts in RNA half-lives can be modulated by growth substrate. We determined which genes in each metabolic pathway control the flux and classified them as regulated by transcription (e.g. transcription factor) or degradation (e.g. post-transcriptional modification). We found that more than half of genes in metabolism were controlled by degradation. Our results suggest that M. acetivorans employs extensive post-transcriptional regulation to optimize key metabolic steps, and more generally that degradation could play a much greater role in optimizing an organism’s metabolism than previously thought. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3219-8) contains supplementary material, which is available to authorized users.

A listing of all RNA-seq datasets used in the study. The first column designates the name of the sample file, followed by the growth condition, and the GEO database accession number for the sample.
edgeR -The library preparation was defined as the first experimental factor followed by the growth condition. Normalization factors were computed and a generalized linear model (GLM) was estimated. The dispersion trend over multiple genes was then calculated followed by the per gene (tag-wise) dispersion. Finally, the GLM model was fit, and the adjusted p-value (Benjamini-Hochberg method) was computed and the data stored.
PoissonSeq -Total mapped reads for each gene were scaled by a factor 0.1 so that the PoissonSeq method did not overflow. The differential expression calling routine was run with the "pair parameter" set to false and the data type taken to be "two-class" (substrate and library preparation kit). A total of 100,000 permutations were performed per comparison. Data was sorted by adjusted p-value (using the PoissonSeq default method: permutation plug-in) and stored to file.

Uncertainty in Differentially Expressed Genes
A nonparametric bootstraping approach was used to estimate the uncertainty in the number of differentially expressed genes. Briefly, the DESeq2 workflow was applied to subsets of the all the RNAseq data sets and the counts of DEG were enumerated. All combinations of sets of RNAseq data ranging from two to the maximum number of replicates were generated for each growth condition (MeOH, TMA and Acetate). The Cartesian product of these sets were generated, and the DESeq2 workflow was used to estimate the total number of differentially expressed genes between the three conditions. For a given total count of datasets (N M eOH +N T M A +N Acetate ), the average and standard deviation in the number of genes called as differentially expressed with confidence p≤0.01 were computed. Mathematically, ∀c ∈ {MeOH, TMA, Acetate} (S1) ∀i ∈ {2, ..., N c } (S2) where c is the condition, N c is the count of datasets measured in that condition, S c is the set of all combinations containing 2 to N c datasets and P is the product of all unique datasets. This accounted for about 26,000 sets of differential expression combinations. At each N ∈ 2, N c the coefficient of variation in number of DEG was computed (Fig. S2). This analysis demonstrates that the CV decreases as additional datasets are included.
The CV is a measure of the uncertainty in the set of DEG; we estimate an uncertainty of 24-30% when using all 16 datasest.

Half-Life Data
To assess reproducibility of the experimental procedure, correlations were computed across timepoints. The profiles were averaged across all three replicates before correlations were computed. As can be seen from the correlation matrix in Fig. S3a, MeOH, TMA, and acetate correlate highly within condition and cluster together.
Limited data for half-lives of genes in Methanosarcina currently exist. Only half-lives for 5 genes have been reported from the related organism Methanosarcina mazei zm-15 [67] where they measured the half-lives of the methyltransferase genes (mtaA1, mtaCB1) from methanol grown cultures and acetoclastic genes (pta, ack) from acetate grown cultures. As can be seen in Figure ??A, the half-lives we measured agree quite well and within one order of magnitude in the worst case. These differences might be due to uncertainty in the measurements, differences inherent to the different organisms, and the fact that Cao et al. measured half-lives at 30 • C-and showed that there are different temperature stabilities for transcripts-while we measured half-lives from cells growing at 37 • C. In general, they agree quite well and provide confidence in our measurements. The comparison also highlights an interesting fact, that the different transcripts have different stabilities after being grown in different conditions. This supports our hypothesis that half-lives are differentially stabilized/destabilized in different conditions and wy the control coefficients are important to compute, likely by post-transcriptional modification as Cao et al. concluded [67] or via small RNA regulation.
A scaled value for each of the half-lives was computed for each condition. The scaled half-life value is computed as HLs = HL/(DT × 60) where the HL is the unscaled half-life and DT is the doubling time in that condition in units of hr. Growth rates used for scaling of data for were taken as the average of experimentally determined values reported in literature. For MeOH, TMA and acetate, growth rates used were 7.5hr [95,96,43], 8.9hr [43] and 24.6hr [97,95,43], respectively. This scaled half-life represents the fraction of the cell cycle that a transcript will remain stable. As Fig. S4 demonstrates, regardless of the growth substrate, on average the RNA molecules persist for about the same fraction of the cell cycle. The scaled half-lives are statistically the same across all conditions (p>0.33, t-test) with an average value of 12.7%±3.5% of the cell cycle. Because growth rate is linearly proportional to ATP production rate [48], and it is generally assumed that growth rate and growth yield are co-optimized in prokaryotic organisms [98], we can hypothesize two scenarios that cause this constant fraction of the cell cycle. First, the cell is optimized to use as little ATP as possible while maintaining a level capable of allowing the translation of new proteins at the correct rate and thus is linearly proportional to the ATP production rate. Since RNA turnover is a trade-off between degradation rate and production rate, the latter of which should be directly proportional to the energy required to ligate nucleotides into new RNA molecules, their steady-state values should be proportional to ATP production. An alternative but related second scenario is also possible; namely, that RNA maintains a constant fraction of the total cellular weight regardless of the growth rate. In this scenario, it is not the ATP consumption requirement in RNA production, but instead the production and degradation kinetics that set the steady-state amount of RNA in a cell. Since cell mass is proportional to ATP production rate and growth rate, the steady-state RNA is indirectly related to ATP production rate through the maintenance of constant mass fraction. One might argue that because the ATP cost of RNA production is such a low fraction of total energy expenditure, the latter of these two explanations is more likely.

Differentially Expressed Genes
The coefficient of variation (CV) computed using our uncertainty estimation method (Supplemental Section Uncertainty in Differentially Expressed Genes) decreases as the number of RNAseq dataset replicates increase (Fig. S2). The falling CV is consistent with prior studies that showed for DESeq (the precursor to DESeq2) as well as other similar methods such as edgeR and PoissonSeq, the sensitivity rate (fraction of true positives) increases with number of replicates [99,100]. At 15 datasets, the CV is about 30% of the mean.
Linearly extrapolating the trends to 16 datasets results in a CV of 24% of the mean. Therefore, we estimate that the uncertainty in the number of differentially expressed genes is between 24 and 30% of the total number.
Among the differentially expressed genes, four putative regulatory proteins stood out: MA0866, MA1395, MA2212 and MA4346. We compared the expression profiles across the three substrates to DEG that had nearly the same pattern of conservation (±2 genes) and these genes were found to be highly correlated/anticorrelated to the regulators (Fig. S14). Analysis of these similarly conserved genes leads to interesting predictions that the regulators could be either directly regulating the group of genes, or is coregulated with them by another transcription factor.
The first highly conserved regulator MA0866 encodes a PhoU type protein that likely plays a role in phosphate uptake. As expected, its expression is highly correlated with a phosphate related genes including a phosphate transporter subunit (pstS, MA0889), nicotinate phosphoribosyltransferase (pncB, MA2533) as well as TCA cycle enzymes citrate synthase (MA0249) and malate dehydrogenase (mdh, MA0819). Additionally, it is anticorrelated to the gene responsible for the final step of lysine synthesis (lysA, MA0762). These results suggest the gene could play a role in maintaining phosphate and energy balance in the cell, if it were to regulate these enzymes.
MA2212 is a TrmB-like regulator. It is notable because it is correlated highly with acetotrophic genes ack (MA3606), pta (MA3607), and cam (MA2536) as well as subunits of the ATP synthase (MA2433/MA2435/MA2440) The final regulator with high correlation to genes with similar conservation MA4346 was specific to the family Methanosarcina but most genes that were similarly had nonspecific or no annotated function.

Estimating mRNA Levels
An estimate of the average copy number of each mRNA in an average cell, N i for each of the three growth substrates were computed using the following equation, subject to the constraint where a i is the fraction of total mRNA mass m RNA that the transcripts from a single gene accounts for, which is taken to be linearly proportional to the RPKM values from the RNAseq data; ρ cell is the density of an E.
coli cell taken from the CyberCell Database [101]; V cell is the volume of the cell computed from our previous characterization of cell dimensions [85]; x RNA is the mass fraction of total cell mass that is RNA, which is taken from the metabolic model (24% of total cell dry mass) [48]; N is the total number of genes considered in the analysis; and m i is the molar mass of the transcript of interest. Since the volume of cells grown in TMA were not measured, but the growth rates are similar to those for cells grown in MeOH, the volume were assumed to be the same. Total RNA for a single "average" cell was estimated to be 23.8 and 10.6 fg from MeOH/TMA and acetate grown cells, respectively. The counts of each RNA estimated using this analysis for the three substrates can be found in Supplementary File "EstimatedRNACounts.xlsx".
As a quality check, the numbers computed through this analysis were compared to those that were reported in Cao et al [67] measured for M. mazei growing in MeOH and acetate using RT-qPCR as shown in Fig. S13.
In Cao et al [67], the values were originally reported per 100,000 16s-rRNA transcripts. We rescaled these 6 numbers to be proportional to 14,000 rRNA transcripts, the average number of ribosomal protein Rpl18p count we measured previously [85] using a single-molecule pulldown [102]. As can be seen in the figure, all transcript counts except mtaA2, mtaB2, and cdhB are statistically indistinguishable, suggesting that estimates for mRNA numbers are good. The minor disagreement for the three transcripts could be due to the difference in the two species.

Control Coefficients
The confounding effects of changes in transcription and degradation rates on average mRNA level that occur with changes in growth rate can be deconvoluted. We attempt to estimate the effect of each by using a recently reported method that computes the extent that transcription and degradation have in setting the steady-state level of mRNA in a cell [7,8,9]. The analysis is based on the assumption that the cell is at steady-state, implying the transcription rate and the degradation rate are balanced, or where k trn in the transcription rate, γ is the mRNA's degradation rate constant as computed from RNAseq data, µ is cell growth rate and M is the average copy number of a transcript. The transcription rate-which is a proxy for change in growth rate (ribosomal count, etc.) and changes in promotion or repression of a gene-and degradation rate-changes due to active or passive degradation by RNAses or post-transcriptional control by sRNA-are computed per mRNA. Writing down the total differential and manipulating, the contribution of each process can be estimated as Rearranging and noticing that at steady state, M = k trn /γ yields which can then be written as a relation between relative changes in transcript count due to changes in each rate, where here we use | · | to denote that this value is held constant in this term. Two cases for this equation can be considered: 1) the degradation of mRNA due to dilution is negligible, γ >> µ, and 2) degradation due to dilution cannot be neglected. In the former, the contributions are due to transcription ρ T and degradation ρ D , In the latter, the contributions are due to transcription ρ T , degradation ρ D and growth (dilution) ρ G , or The majority of mRNA satisfy γ >> µ. Therefore, we proceed neglecting ρ G . Dressaire et al. applied Eqn. S11 to L. lactis growing in chemostats at different rates and found that only a few percent of genes are degradationally controlled [8]. Esquerré et al. applied the same analysis to E. coli growing in chemostats at several different rates [7]. Both studies ignored the dilution effects, citing the small average half-lives relative to the doubling times studied (1-8%) similar to our average 12.7%. In contrast to these studies, we found a significantly higher number of genes that appear to be degradationally controlled (16-28%). This percentage was even higher when considering only genes associated with metabolic reactions (48-60%). Control coefficients calculated between each of the three growth substrates can be found mapped onto the metabolic network in Figs. S19 and S20.

Modifications to metabolic model
COBRAPy [103] was used to handle the flux balance computations and all changes to the metabolic model.
The M. acetivorans model (iMB745) [48] required additional improvements in order to accurately predict the metabolic behavior when grown in the standard high-salt medium [63] used for the RNA seq experiments. All components of this medium that could be taken up by the metabolic model are listed below and were turned on with a default lower bound of -1000 mmol gDW ·h . In addition to refining the methanofuran biosynthesis pathway, the alternate aminoacylation pathway for cysteine and the pyrrolysine biosynthesis pathways-two evolutionarily significant pathways-were added to the  (Fig. S9). It is important to note that, to our knowledge, the actual sulfur source in the SepCysS reaction remains unknown. However, it has been shown that sodium sulfide provides the highest activity in vitro [104]. We decided to use hydrogen sulfide as the sulfur source because it is a sulfide produced within the organism. The parsimonius FBA solution of the model with this modification interestingly showed that it only uses the canonical charging pathway even if both pathways are turned on and will only use the alternate pathway if the canonical pathway is knocked out. The RPKM values from the RNAseq data, however, suggests that SepRS is expressed at least twice as much as CysRS on average 9 across MeOH, Acetate, and TMA (Table S10). In order to constrain the pathways to reflect this, flux variability analysis was first used to determine the allowable flux ranges for SepRS and CysRS. The maximum allowable flux through CysRS was then set to equal half that of SepRS. In iMB745, allowing any cysteine uptake led to an overproduction of ATP and resulted in unrealistically high growth rates. To reflect media component consumption accurately in addition to the SepRS/CysRS constraints, iST807 constrains the cysteine uptake at a maximum rate of 0.35 mmol gDW h which is the uptake rate that minimized the differences between simulated and experimental growth rate on methanol, acetate, and carbon monoxide. Dynamic flux balance analysis was also performed to verify that this cysteine is not growth-limiting over a period of about 38 hrs as consistent with experiment [39]. These constraints successfully forced flux through both cysteine aminoacylation pathways.
Pyrrolysine Biosynthesis Pathway MMA methyltransferase is responsible for activating MMA for a methyl transfer to a cognate corrinoid protetin. In 2002, the crystal structure [105] of this enzyme from M. barkeri revealed the presence of pyrrolysine (pyl) within the catalytic site. Later studies by Mahapatra, et al. [106] on M. acetivorans showed that this methanogen could not grow on any methylamine substrates (MMA, DMA, and TMA) without the gene for pyl-tRNA, demonstrating that pyl is required for growth on methylamine substrates. The pathways involving pyl synthesis in iMB745 were present but flawed. First, the pyl-tRNA charging pathway mistakenly used the alanyl-tRNA instead of the known pyl-tRNA. Second, the hypothesized pyl biosynthesis pathways were outdated and no fluxes ran through them even when successfully simulating the model on methylamine substrates. The model essentially allowed for growth on methylamine substrates without the synthesis of pyl anywhere. These two errors were fixed by replacing the alanyl-tRNA with pyl-tRNA in the pyl aminoacylation pathway and replacing the pyl biosynthesis pathway with the most recent and accepted pathway from Gaston, et al. [107] shown in Fig. S8.
In order to force the model to recognize that pyl is required for growth on methylamine substrates, the biomass reaction was modified to reflect amino acid use more accurately. The first modification altered the biomass reaction to draw in aminoacylated tRNAs instead of free amino acids, keeping the coefficients the same.
This change was based on the realization that it is amino acids charged onto their tRNAs that eventually become part of the cell biomass through protein synthesis rather than any free amino acid produced in the cell. The second modification adds a pyl-tRNA term to the biomass when simulating growth on methylamine substrates.
For non-methylamine growth, the model sets the coefficient for pyl-tRNA to zero. For methylamine growth, the model turns on the coefficient. This coefficient was estimated from the approximate number of CTA codons in the M. acetivorans genome. CTA is canonically a stop codon but regulatory mechanisms exist within M.
acetivorans to express this as the pyl amino acid. Since the fraction of CTA codons actually coding for pyl is unknown in this methanogen, it was taken to be 50% as an upper limit estimate. This gives a pyl-tRNA biomass coefficient of 0.081 mmol pyl gDW protein . on methanol in exponential growth and stationary phase [62]. The glycogen content was significantly higher than assumed in the iMB745 model. As such, we have added/updated the biomass coefficients for glycogen and these intermediates based on these new quantitative measurements. The high glycogen content (∼ 0.93651mmol/gDCW), consumes significant energy of the cell during growth; therefore, the ATP maintenance cost had to be lowered to match growth experiments. A final value of 44.1 mmolATP/gDCW. Comparing this value to the previous value of 65.0 mmolATP/gDCW indicates that nearly 33% of energy derived by the methanogen is used in storing glycogen. This could confer evolutionary advantage when nutrients are scarce.

Modelling Alternate Biomasses for Different Growth Substrates
Biomass for growth on acetate and TMA were fit taking MeOH to be associated with the published biomass coefficients. Additionally, acetate was fit using TMA as the starting biomass coefficients. Fit biomass coefficients can be seen in Figure S15. Flux comparisons for MeOH vs Acetate and MeOH vs TMA can be seen in where it is demonstrated that significant changes to fluxes in amino acid and cofactor biosynthetic pathways are predicted. Many coefficients can vary significantly, as indicated by the large standard deviations and it is unclear as to whether physiologically requirements differ. However, a handful of biomass components were statistically different (p<0.01, t-test, n=96) in the second condition compared with the first, possibly suggesting a different physiological requirement (a greater or smaller fraction of total biomass when growing in one media compared to another). When comparing methylotrophic to acetotrophic growth, our fitting procedure suggests that nickel, cobalt and AMP requirements decreases, while cellular zinc and potassium increase. The cobalamin cofactors in methyltransferases contain cobalt and the downregulation of these enzymes under acetotrophic growth is consistent with a decrease in requirement for cobalt [39]. The decreased requirement for nickel is counterintuitive as methyl-coenzyme M reductase and carbon monoxide dehydrogenase are both nickel containing and upregulated on acetate growth. Decrease in AMP requirement could be due to slower growth relating to the phosphate balance.
The average coenzyme M (CoM) increases by a factor of 10 going from MeOH to acetate, similar to results from a recent paper that showed CoM, and sulfide content in general, increases roughly by a factor of 2.8-3x for acetate grown cells [82]. Interestingly, glycogen galactan and polyglucuronate are predicted to be produced at higher levels, along with phosphorylated myo-inositol phospholipids, which are possibly used in producing extracellular matrices that are common in cell aggregates for acetate grown M. acetivorans [97].
The fitting procedure suggests a decreased biomass requirement for adenosylcobalamin and coenzyme B (CoB) when grown in acetate as compared to TMA. The former can be explained by the fact that methyltransferase which are composed of at least one corrinoid cofactor are severely downregulated when grown on acetate (and generally not of use to acetotrophic methanogenesis). Most cofactors were required in higher levels for growth on TMA than on MeOH, including coenzyme A, both the adenosinyl-and guanosinyl-coenzyme F390 analogs, succinyl-CoA and tetrahydrofolate. The reason for these increases is unclear.
The modeling indicated that several variants of coenzyme F420 was generally lower while grown on MeOH than on acetate or TMA, however the reason for this is unclear. Reports in literature on various M. barkeri and M. mazei strains are conflicting; one study indicating that F420 concentrations are significantly higher comparing MeOH to acetate grown cells [108], while other studies found a higher level for acetate grown cells [109,110,111]. Interestingly, one report found that different variants of the coenzyme F420 predominated in different Methanosarcina species. Further experiment and modeling is required to uncover the role of the various analogs and their regulation.   and TMA (red) growth. Percentiles were computed using the weighting method of Edgeworth [112]. The overall range (whiskers) of the distributions are generally the same across classes, however the quartiles and median can be significantly different, supporting the conclusions in the main text that mRNAs are selectively stabilized/destabilized depending on function.      [97,41,95,113,43,57,114,115,96,27,74]. B) Growth yields (gDW/mol substrate) [97,116,117]. Note that experimental TMA growth yield was computed from TMA growth rate and a fitted TMA uptake rate as there were no experimental uptake values available. C) CH 4 production rate (mmol/hr/gDW) [43,118,117,119]  This map is available in format in formats compatible with Cytoscape [64] and Escher [65] in the Supporting File "ModelAndMaps.zip". Figure. S14 Heatmaps of relative expression for 4 putative regulators that are differentially expressed between at least one pair of conditions. Each regulator is highly conserved among the Methanosarinales. MA1395 is highly conserved among most methanogens and encodes for a nickel response regulator. The regulator is indicated in the title of each heatmap along with the correlation of the regulator's expression to the other genes that have the same conservation pattern.  Figure. S17 Conservation of the genes that are differentially expressed between MeOH and TMA across the tree of methanogens. Each vertical bar indicates that a homolog for the differentially expressed gene exists in the indicated species (computed as the bidirectional best hits functionality in the ITEP software [92] with an E-value cut-off of 10 −5 for a database of ∼125000 proteins). Most differentially expressed genes are highly conserved among the Methanosarcinales; however a core set of genes are conserved across all methanogens.  Figure. S18 Conservation of the genes that are differentially expressed between TMA and Acetate across the tree of methanogens. Each vertical bar indicates that a homolog for the differentially expressed gene exists in the indicated species (computed as the bidirectional best hits functionality in the ITEP software [92] with an E-value cut-off of 10 −5 for a database of ∼125000 proteins). Most differentially expressed genes are highly conserved among the Methanosarcinales; however a core set of genes are conserved across all methanogens.