Organism and growth conditions
Lactococcus lactis subsp. lactis IL1403 was grown under anaerobic conditions in batch cultures at 30°C, pH 6.6 and 250 rpm, in a chemically defined medium as previously described[22].
Polysomal RNA preparation
Polysome profile determination was adapted from the protocol developed in yeast[6]. Cells were cultivated in exponential phase at a maximum growth rate of 0.88 h-1 to an optical density of 1 at 580 nm. Translation elongation was then arrested by adding 100 mg/ml chloramphenicol and cells were collected on ice. A total amount of 96 mg of dried cells was harvested at 4°C (3645 g, 8 minutes), resuspended at 4°C in lysis buffer (20 mM Tris HCl pH 8, 140 mM KCl, 40 mM MgCl2, 0.5 mM DTT, 100 μg/ml chloramphenicol, 1 mg/ml heparin, 20 mM EGTA, 1% Triton X-100) and washed twice. Then, the cells were disrupted at 4°C in tubes containing 0.1 g of glass beads using Fast Prep ® (4 cycles of 30 seconds at 6.5 m/sec with 1 minute intervals). After centrifugation, the supernatant was loaded onto a 24-ml linear sucrose gradient (10 to 50% (w/v)) in polysomal gradient buffer (same composition as lysis buffer except for heparin at a final concentration of 0.5 mg/ml). Polysomal complexes were resolved by centrifugation at 13,500 rpm for 16 h 30, at 4°C in an SW 28 rotor. mRNA-ribosome complexes were separated in 11 elution fractions eluted in cold buffer (55% sucrose (w/v), 500 mM Tris HCl pH 8, 10 mg/ml Bromophenol blue) at a speed of 2.5 ml/minute. Absorbance at 254 nm was measured continuously with a UV detector (UPC900 Amersham Pharmacia Biotech). In each fraction, a protein elimination step was performed by adding one volume of 8 M guanidium-HCl and two volumes of absolute ethanol[6]. RNAs were extracted with the Qiagen Rneasy Midi kit®. Peak assignment for ribosomal subunits, monosomal and polysomal complexes, was achieved through 16S and 23S rRNA measurements in each elution fraction by RNA analysis using Bioanalyzer (Agilent technologies®). Peak assignment identified 30S and 50S ribosomal subunits in elution fractions 2 and 3, respectively (Figure2). Consequently, elution fraction 1 corresponded to free RNAs, while eluted fractions above number 3 contained mRNAs associated with at least one complete ribosome. To reach the 5 μg of total RNA required for transcriptome analysis, the two fractions eluted first and the last four fractions were pooled, respectively (Figure2).
Three independent experiments from culture to polysome profile determination were carried out. An aliquot of cell-free extract was used in parallel to provide unfractionated total RNA (sample named A). Protein elimination and RNA extraction were performed as described above. This unfractionated RNA was used as an internal reference for normalization (see below) and to check the reproducibility of polysome separation. To avoid mRNA degradation, the fractionation experiment was performed at 4°C and inhibitors of ribonucleasic activity were added in buffers. A loss of mRNA was however observed but this loss was constant between genes and repetitions. The percentage of mRNA molecules recovered after the polysome separation compared to unfractionated RNA was per gene in average at 60 ± 16%.
Ribosome number extrapolation
For each polysomal profile, the number of ribosomes per transcript in the fractions lacking one single ribosome resolution (fractions F to H) was estimated by a logarithmic extrapolation from the monosome peak (fraction D) with the following Equation[46]:
(1)
with the a slope coefficient corresponding to the averaged value of the three repetitions and the b constant related to the elution time of the monosome fraction specific of each repetition.
Ribosome engaged in translation
The percentage of ribosomes engaged in translation was estimated by area integration of the 254 nm absorbance of the polysomal profile. The ratio of the area under the absorbance curve corresponding to translating ribosomes (elution fractions 4 to 11) over the area corresponding to total (translating and non-translating) ribosomes (elution fractions 2 to 11) was calculated for the three polysomal profiles.
Transcriptomic analysis and normalization
For signal normalization and modeling, the free statistical software R (http://www.r-project.org/) was used.
Gene expression was measured using nylon arrays containing PCR fragments (Eurogentec®) for 1948 genes of L. lactis IL1403. Nylon array spotting and analytical support were provided by the Biochips Platform (Genopole Toulouse, France). From each fraction from B to H and from the unfractionated sample A, a constant amount of 5 μg of total RNA was used to perform retrotranscription. RNA was quantified at 260 nm with Nanodrop (Thermo Scientific®). RNA quality was checked on Agilent 2100 Bioanalyzer (Agilent technologies®). Synthesis of radiolabelled cDNA, nylon array hybridization and washings were carried out as described previously[22]. Microarrays (from A to H) were exposed to a phospho-imager screen for eight days and scanned with a phospho-fluoroimager (Storm 860, Molecular Dynamics®). Validation of this transcriptomic protocol in L. lactis by qRT-PCR was previously provided[47]. Three series of eight microarrays from A to H were obtained from the three independent polysomal profile determinations. For each gene, spotted twice on the microarrays, the mean of the two intensities after background removal was considered. For each microarray (from A to H), a cutoff value was defined as the mean intensities of the “empty” spots plus one standard deviation as previously described[22]. 1619 genes presenting at least a mean intensity above the cutoff value for one of the eight microarrays were selected for further analysis.
For each microarray series, intra-series normalization to correct experimental variations after fraction collection was performed using a common reference. Each gene signal intensity of microarrays B to H was standardized by the mean intensity of reference microarray A (unfractionated RNA). We noted the normalized intensity I of the gene i ∈ {1, …, 1619} that belonged to microarray j ∈ {B, C, D, E, F, G, H} by microarray A, for series replicates k ∈ {1, 2, 3}.
(2)
with
In order to take into account the variability in total RNA amounts between fractions B to H (from 29.7 μg ± 24.3 to 436.3 μg ± 170.3) of the same polysomal profile determination, and thus to work with RNA concentrations in the fraction instead of abundances, we corrected intensity values by total RNA quantity and named these intensities N (Equation 3).
(3)
For each microarray from B to H, an inter-series normalization step was introduced to adjust the signals of the three triplicates. For each microarray, an average intensity set was calculated from the three repetitions. The intensities of each repetition were plotted versus the average intensity set. From each plot, we estimated a linear regression coefficient r and an intercept coefficient b (Equation 4)
(4)
Where denoted the mean of the three replications.
Those estimations were denoted respectively and
The intensities of each repetition were then centered with their own b coefficient and reduced by their own r coefficient. We noted the resulting normalized intensity as (Equation 5):
(5)
Translatome variable calculations
For the 1619 genes with signal intensities above the cutoff values, two translatome variables were calculated, their ribosome occupancy and ribosome density.
For each gene, we calculated the proportions of mRNA molecules in each fraction from B to H:
(6)
For each gene, ribosome occupancy is the fraction of its mRNA population engaged in translation. It is calculated by summing the proportions of its mRNAs in fractions D to H (Equation 7). In fractions B and C, mRNA molecules were free or associated with an incomplete ribosome, so these mRNAs were not considered as engaged in translation.
(7)
For each gene, the three ribosome occupancy values obtained for each series were averaged (Equation 8).
(8)
For each gene, the peak fraction corresponds to the highest mRNA proportion within fractions D to H containing mRNA engaged in translation. The peak fraction was determined by a bootstrap method on residuals. This procedure has already been used in transcriptome analysis and allowed increased robustness of the results[48]. This method does not require any assumptions on data distribution and corresponds to a resampling procedure with replacement. The residuals ε
i,k
from each average value, from all seven fractions, were calculated (Equation 9):
(9)
Then residuals were pooled together and reassigned back to these fractions at random to create a bootstrap data set. More precisely, for each gene i and for each value of k, a value of residual was sampled in the pool of residuals and was denoted by. This residual was added to the mean of the mRNA proportions of the gene i in order to create a bootstrap value of mRNA proportion (Equation 10).
(10)
Ten thousand bootstrap data sets were made. The peak fraction was determined in each bootstrap data set analogously to the initial data set. From the ten thousand bootstrap data sets, the relative frequency of the highest mRNA proportion was calculated with a confidence interval fixed at 95%. When the 95% bootstrap confidence interval was not confined to a single fraction, the definition of the peak fraction was widened from only one to two (or more) adjacent fractions and a search for the maximum was initiated again.
From a set of 1619 genes, a peak fraction was assigned to 1177 genes: all genes had a peak fraction contained in a single fraction. For each of the 1177 genes, we calculated the ribosome density that was the number of bound ribosomes in the peak fraction normalized with respect to the transcript length. The experimental length of all transcripts was not available in L. lactis and predictions were not considered to be confident[49]. Thus, we used the coding sequence length instead of transcript length in our calculation:
(11)
Ribosome density and ribosome occupancy values are available in Additional file4: Table S1.
Enrichment analysis
In a given gene subset, statistical testing of the enrichment of genes having a characteristic of interest was performed. In a general way, if it is assumed that n1 genes were sampled without replacement in a total group of n2 genes, m of which have the characteristic of interest, the number N of genes having the characteristic of interest in the subgroup of n1 genes follows the hypergeometric distribution: N~H(m, (n2-m), n1). The p-value of the enrichment test of genes having the characteristic of interest in the n1 gene subgroup is defined as follows:
(12)
where Nobs is the observed value of N.
The p-value was calculated using R software.
Covariance model
A linear analysis of covariance model was used to identify the major determinants of three variables of interest, namely ribosome occupancy, ribosome density and protein level. To do so, each model was established from various quantitative and qualitative parameters as described previously[3]. In a previous study[3], gene parameters such as chromosomal position, open reading frame length, CAI, gene functional category, protein hydrophobicity (Gravy score) and aromaticity have already been described, and mRNA and protein concentrations were provided. mRNA half-life measurements are from[22]. The upstream mRNA sequences (from −100 to +24 bp relative to the start codon) and downstream sequences (from −24 to +100 bp relative to the stop codon) were obtained from RSAtools and then processed with RNAfold software (http://mobyle.pasteur.fr/cgi-bin/portal.py) specifying a temperature of 30°C. For each sequence, we used the free energy of the predicted minimum free energy structure (the most negative ΔG, ΔGup in the upstream sequence and ΔGdown in the downstream sequence) as a measure of secondary structure formation. Quantitative parameters were log-transformed to obtain a normal distribution except for those parameters which can take negative values (position, Gravy score and folding energy) and all were centered and reduced. This normalization was applied in order to adjust their level and allow comparison of model coefficients. For example, Equation 13 described the models established to explain the two translatome variables.
(13)
where ribosome occupancy
(i)
and ribosome density
(i)
are the measured levels of the i th value for the variable of interest, α is the intercept, parameter j(i) the value of quantitative parameter j for the i th value, βj(i) and λk(i) the coefficients associated to the j th quantitative parameter and the k th qualitative parameter of the i th value, respectively, and ζ(i) the error term for the i th value. Parameter abbreviations used are: Chrom.location for chromosome location, [mRNA] for mRNA concentration, mRNAt
1/2
for mRNA half-life, arom for aromaticity, hydro for hydrophobicity, CAI for codon adaptation index, ΔG
up
for minimum free energy structure in the upstream sequence, ΔG
down
for minimum free energy structure in the downstream sequence, cat for functional category. The model to explain the variable protein concentration was similar to that described above except that both ribosome density and ribosome occupancy were added as explanatory parameters. For each model, we obtained an estimate of the variable of interest. Least squares procedure was used to estimate coefficients of selected parameters and quality of modeling adjustment was obtained by calculation of the determination coefficient. The Akaike Information Criterion was used to select the best models without any a priori subjective parameter selection[3]. Adjusted r-squared values of our models were lower than 0.60 indicating that additional major explanatory parameters need still to be identified.
Simple linear correlation
Simple correlations were estimated by calculating Pearson correlation coefficient and associated p-value using R free statistical software.
Motif research
The presence of sequence motifs was explored using MEME suite software, section MEME (http://meme.nbcr.net) and confirmed by RSAtools software (http://rsat.ulb.ac.be/rsat/), section oligoanalysis with a default parameter selection for 5, 6 or 8 nucleotide lengths. The nucleotide sequences in the vicinity of the start codon (from −100 to +24 and −30 to +24 relatives to ATG) were also obtained from RSAtools (retrieve sequence section, default parameters).