Terminology
There are many definitions for "gene expression". Some consider it a synonym of "transcription", others as the process from "gene to protein", including transcription, translation and, if applicable, any modifications of transcript and translational product.
For clarity's sake, we will talk in this article about "transcription data" and "transcription correlation", as the microarrays measure the relative abundance of mRNA transcripts. We will avoid, where possible, the term "(gene) expression".
We talk about large and small DNA "loops". Big loops are stretches of DNA with the diameter of the nucleoid which are available for transcription. The small loops have a smaller diameter and lie inside the nucleoid. How these are organized in detail physically (in terms of e.g. supercoiling), is a question we do not ask, as it is beyond the scope of the present work.
Stochastic transcription and noise
As Samoilov et al. [1] point out, noise tends to be seen as something negative, which should be kept to a minimum and if possible eliminated. This is true for most of the fields where man is concerned. In biology, when taking readings of signals, it is indeed important to minimize the sources of noise coming from e.g. inaccurate reading settings.
However, sometimes noise deserves to be paid some attention. Gene transcription and translation and the biochemical reactions that take place between gene products are subject to stochastic fluctuations [2]. In transcriptomic analyses, signals below a certain threshold level tend to be classified as noise and are often discarded. It is presumed – correctly – that the signal does not originate from an "active" or "deterministic" transcription process and that it is therefore non-informative.
This conclusion, though, is wrong. The advent of single cell transcription analysis has shown that the random activation of genes, the random creation and destruction of messenger RNA can lead to the production of proteins that can be crucial in the cell's survival. An example is the stochastic activation of the competence gene in B. subtilis, part of the organism's stress response. In recent years researchers have started to examine this phenomenon and its repercussions on the cell more closely; we refer the interested reader to the works by Raser and O'Shea [2] and by Samoilov et al [1] for two comprehensive reviews on the subject of noise, stochasticity and phenotype.
Studying transcription patterns to decode the nucleoid's organization
Despite varied and numerous approaches, little is known about the organization of the bacterial chromosome [3, 4], partly because the system is a dynamic one, making direct observations difficult.
The advent of a new technology offers the opportunity to look at an old problem from a new and different point of view. It might confirm, confute or add new hypotheses.
Indeed, since their arrival at the end of the 1980s [5, 6], microarrays have been used to explore the chromosomal organization at a small scale (DNA stretches tens to hundreds of bps long) or large scale (thousands of bps long) [7–9].
The basic idea is that genes that share transcription patterns, must share some sort of spatial relationship, even if they are not close to each other on the chromosome. One particular approach consists in gathering as many datasets from the literature as possible, pool them together and treat them as just one large data set, an approach that has given positive and encouraging results [8, 10, 11]
In a previous work we applied this technique to two phylogenetically widely different bacteria, E. coli and B. subtilis [12].
For both bacteria we analyzed the transcription patterns and found for all genes that "the co-expression of genes varies as a function of the distance between the genes along the chromosome" [12].
We found short-range correlations, thought to correspond to DNA turns on the nucleoid surface (14–16 genes), but also long-range correlations at well-defined distances. Surprisingly, these long-range correlations were found for all the genes, regardless of their localisation on the chromosome. In other words, picking any gene at random, its expression will be correlated with genes at well-defined distances.
This suggests an organization of the chromosome beyond that of operons.
Taking the solenoid model of the chromosome as the starting point, we suggested that the chromosome is organized into two different types of loops: large loops (with the nucleoid's diameter), corresponding to expressed stretches of DNA and accounting for the short-range correlations observed, and small loops (with a smaller diameter than the nucleoid), corresponding to non-expressed DNA.
NB: at the time we made no distinction between genes that are only transcribed and those that are also translated, using the term "expression" in its wider sense.
We had, however no explanation for the regular, long-range correlations observed.
The fact that the observations were made with such different organisms, suggested that they might show us a general property of double stranded, circular bacterial DNA.
The aim of this paper
The aim of this paper is to examine the transcription correlations when the only transcription taking place is stochastic. In other words: when no active but only stochastic transcription occurs, can we observe any patterns in the transcription correlations? Do we find short- and, more interestingly, long-range correlations? And if so, how do these compare to the "active transcription" situation? Could the results be used to refine the model of the nucleoid organization? What can be said about the relationship between shared transcription patterns and physiological relationship?
To this end, we examined two particular sets of transcription data of Sinorhizobium meliloti.
The data sets
In set A all three replicons – the chromosome, pSymA and pSymB – are actively transcribed.
In set B, only the chromosome and pSymB are transcribed actively. pSymA only shows the stochastic transcriptional activity, a situation made possible by the fact that the plasmid does not contain any genes essential to the cell's viability under usual laboratory conditions (see below).
The analysis of the transcription data of pSymA in the two data sets should therefore allow us to answer the questions posed above.
A note on S. meliloti
S. meliloti is a nitrogen-fixing alpha-proteobacterium. It is distributed world-wide in many soil types, both in association with legumes or in a free-living form [13] and is used as a model species for the study of plant-bacteria symbiosis. Its genome contains 6206 ORFs distributed in three replicons: a chromosome of 3.65 Mb and two well-studied megaplasmids pSymA and pSymB, of 1.35 Mb and 1.68 Mb, respectively.
The smallest replicon, pSymA is specialized for nodulation and nitrogen fixation. It has been successfully cured without noticeable effects of bacterial viability in usual laboratory conditions [14], demonstrating that this replicon is not essential for cell viability (in the laboratory). Under certain culturing conditions, none of the proteins encoded for by the plasmid are transcribed, as revealed by enzyme assays.
pSymB contains several genes, which make it essential for cell viability, and several features suggest that it should be considered a chromosome rather than a plasmid [15].