In general, we find high genomic dissimilarity scores between plasmid sequences and representative host chromosome sequences. In addition, the GC contents of the plasmids show a bias towards low (and to a lesser extent, high) GC percentage scores. This lower GC content in plasmids has previously been noted, and has been explained in terms of a higher energy cost and limited availability of G and C over A and T/U [14]. Although available genome sequences are biased as they originate predominantly from medically and industrially relevant strains, it is unlikely that these plasmids form a particular class. In addition, our results are in accordance with those obtained by Wong and co-workers [10]. They showed, for a limited number of plasmids, that chromosomes within a species share a more similar dinucleotide composition, or genome signature, than plasmids do with the host chromosome(s).
Previously, Campell and co-workers compared plasmids to a collection of large chromosomal fragments of the host and showed that the genome signatures between each plasmid and its natural host rank amongst the closest [15]. Their suggestion that similar genome signatures of plasmids and host chromosome is required for plasmid establishment is not supported by the present data [15]. We find that intragenomic compositional comparisons of plasmids with their host often show higher genomic dissimilarity values than the genomic dissimilarity between genomic fragments and their host chromosome. This difference in interpretation of plasmid δ* values may be results of the, to our opinion more robust, method to compare these values with that of their host chromosome. First a distribution of δ* values by comparing disjoint genomic fragments to the full genomic sequence is made, providing information about the average and variance of the δ* values that a single species can display in different regions of its genome. Fragments with extreme δ* values (thus in the right tail of the distribution, Fig. 1) may result from events such as horizontal transfer or are caused by other genomic aberrations (e.g. rRNA gene clusters) [8, 11]. Thus, these extreme fragments deviate substantially from the average genome composition and are considered compositionally dissimilar from the average chromosome content. Consequently, although the δ* values of most plasmids may fall within the very close category defined by Campbell and co-workers, we consider them as dissimilar, since they behave like the extreme fragments in the distribution plot. In addition, by comparing each plasmid with its host genome fragmented into pieces with the same size of the plasmid, the effect of the sensitivity of δ* of small DNA fragments to small changes in word is circumvented.
The genome signature of DNA is thought to have evolved due to selection exerted by its host's replication, recombination and repair machineries, resulting in comparable genome signatures between members of the same species, but different genome signatures between members of different species [6]. Plasmids seem to be less subjected to these selective pressures, although they are allegedly confined to a limited number of hosts due to the presence of partitioning genes and their dependence on the host replication machinery.
The observed genomic dissimilarity between the three different B. aphidicola genome sequences supports a role for replication, recombination and repair proteins in determining the genome signature. As the genome signature represents evolutionary relatedness between species similarly as other more classical parameters, such as 16S RNA similarity [16], intraspecific high genomic dissimilarity scores indicates rapid genome evolution or long-term host co-speciation (as has been described earlier [17]). The loss of genes involved in replication, recombination and the repair machinery in Buchnera genomes [18] might be responsible for the divergence of their genome signatures. These intracellular endosymbionts might then form an excellent example to investigate the origin of the genome signature. Interestingly, we find a Buchnera plasmid (plasmid pBBp1, NC_004555) which shows a high genomic dissimilarity with the genome sequence from the same strain from which the plasmid was isolated (i.e. B. aphidicola (Baizongia pistaciae)), and a lower genomic dissimilarity with both other Buchnera genome sequences. This supports a history of mobility for this plasmid, in which it was recently acquired from a different Buchnera strain, similar to previous observations by Van Ham and co-workers [19]. Interestingly, high genomic dissimilarity between members of the same genus (the Mollicutes) has been observed previously [20, 21], which also concerns bacteria with an intracellular life-style.
We suggest three possible explanations for the reduced sensitivity of plasmids to the selective pressures generating their host's genome signature. First, the observed high genome signature dissimilarity may actually prevent the integration of plasmids into the host chromosome. Thus, what is observed for non-integrating plasmids in nature may be a biased pool of compositionally dissimilar DNA, as similar plasmids could potentially integrate into their host's chromosome more readily. Secondly, horizontally mobile plasmids may occasionally be exposed to the extracellular environment, where the atypical dinucleotide composition may favour resistance to degradation of the plasmid. Such a mechanism might drive the genome signature of plasmids towards comparable values, but the large variety in GC content among plasmids suggests otherwise. However, we cannot exclude that different environments select for different genome signatures. Thirdly, horizontal transmission of plasmids may be far more important than currently thought. This latter point is supported by the conclusion in a recent review by Sorensen and co-workers, that the overall extent of the HGT of plasmids in the environment examined might have been underestimated [22]. In addition, plasmid transfer between genera, phyla and even different domains has been described [22]. Plasmid transfer between unrelated species may be rare, but followed by a more rapid distribution among related species, would result in compositional discordance between many plasmids and their host. Our data, showing that a large proportion of the plasmids with high nucleotide discordance with their host's genome harbour genes encoding proteins involved in mobility or plasmid transfer, fits with this notion.
In addition, the plasmids showing relatively low nucleotide discordance with their host's genome are smaller than those showing high nucleotide discordance with their host's genome (Fig. 3). This could be indicative for a larger sensitivity of δ* of small DNA fragments to small changes in word than larger plasmids. However, 50% of the plasmids with a relatively low compositional discordance with their host's genome are larger than 5 kbp. Moreover, as aforementioned, the δ* value of each plasmid is compared with a distribution of δ* values of disjoint genomic fragments compared to the full genomic sequence, which provides information about the average and variance of the δ* values that in different regions of the host's genome. On the other hand, the copy number of small plasmids is in general higher than that of large plasmids. This would implicate faster replication of these smaller plasmids, hence faster amelioration rates.
We suggest that plasmids with high genomic dissimilarity scores are relatively recently acquired by the host, while the minority of plasmids with a genome signature similar to that of the host genome share a longer history with that host (i.e. a vertical association). The latter, strictly vertically transmitted, plasmids may therefore show a less atypical dinucleotide composition as a result from co-evolution with the host, but also selection due to extracellular conditions would be absent.