Gene order and gene content of legume plastomes
In contrast to the genome organization in A. thaliana, most taxa of the subfamily Papilionoideae, including the four species of which plastomes are sequenced, present a 51-kb inversion within the LSC region . Another inversion at the junction points of trnH-GUG/rpl14 and rps19/rps8 was only reported to occur in two genera, Phaseolus and Vigna[1, 19, 29], indicating that this chloroplast genome arrangement is characteristic of the Phaseolus-Vigna species complex. The chloroplast genome of M. truncatula lacks one IR, a feature shared with other legume tribes such as Carmichaelieae, Cicereae, Galegeae, Hedysareae, Trifolieae, and Vicieae and some genera of other groups . Now, all these tribes form a new clade, IRLC (inverted-repeat-loss clade) . Thus, the four-sequenced plastomes represent three types of plastome structure, suggesting that the cpDNA organization is very diverse in legume plants.
Legume cpDNAs do not contain rpl22 [31, 32] and infA  genes, indicating that they were phylogenetically lost from this lineage. A specific character of P. vulgaris cpDNA is the presence of the two pseudogenes rps16 and rpl33. The first is functional in L. japonicus and G. max but is lost in M. truncatula [23, 32]. The cpDNAs of other land plants, Selaginella uncinata, Psilotum nudum, Physcomitrella patens, E. virginiana, and Eucalyptus globules, lost this gene independently [4, 34, 35]. rpl33 is a functional gene basically present in all land plant chloroplasts, except in S. uncinata. These data suggested that P. vulgaris cpDNA is still undergoing genome reduction.
The accD gene encodes an acetyl-coenzyme A carboxylase subunit similar to prokaryotic accD in structure, and is the most variable gene present in legume chloroplasts. Its size is widely different: 1299 bp in G. max; 1422 bp in P. vulgaris; 1506 bp in L. japonicus, and 2142 bp in M. truncatula. Medicago has the largest accD of prokaryotic form, containing seven kinds of tandem repeats and one 43-bp-sized separate direct repeat situated between two conserved regions. We did a BLAST-search with the accD gene against the EST bank of M. truncatula. One tentative consensus segment of 9334 bp (TC106672) was found to contain the identical sequences of chloroplast genes trnS-GCU, trnQ-UUG, psbI, psbK, accD, psaI, cemA, and petA, indicating that these genes are transcribed. Nevertheless, the large amount of tandem repeats present in the M. truncatula accD gene calls into question its functionality.
Another landmark of the legume plastomes is the duplication of a portion of ycf2. The duplicated segment, named ψycf2, was first identified as a pseudogene in Vigna angularis . It is present in the same relative position in the legume plastomes analyzed here. In G. max, P. vulgaris and L. japonicus, ψycf2 is identical to its copy within ycf2, but in M. truncatula they are very divergent (60 % of identity). This result indicates that the last common ancestor of these plants already had this duplication and gene conversion occurred in the plastomes containing IR.
Nature of tandem repeats
The sequence and distribution of repetitive elements are characteristic of each chloroplast genome, and they can be classified in two broad categories: large repeats and short dispersed repeats (SDRs). Both categories can be found in different proportions in chloroplast genomes. Oenothera and Triticum chloroplasts contain some dispersed repeats, but 20% of the Chlamydomonas reinhardtii plastome consists of repeated sequences, many of them are tandem repeats (TR) [37–39]. In legume plastomes, clear differences reside in the number, location, and sequence of TR. M. truncatula possess a plastome with greater number and larger TRs, and P. vulgaris has a plastome with fewer TRs.
Usually, TRs are classified as a subcategory of SDRs, but our analysis of the legume chloroplast genomes shows that TRs have a different origin from the rest of the SDRs. The repetitive unit of an SDR family is dispersed throughout the genome and different members of an SDR family share high identity. In contrast, the repetitive unit of a TR is not dispersed, and the consensus sequence of each TR has low identity with the consensus sequences of other TRs, with the exception of some repeats with low complexity (i. e. ATATAT). In other words, each TR is specific to a site.
Multi-alignments among plastomes frequently show that a repetitive consensus unit of a TR can be found in other chloroplast genomes at similar positions without duplication, or the region containing corresponding sequences are completely deleted from a specific plastome. Moreover, some small insertions from 7 bp to 21 bp are the duplication events of one of the flanking sequences in a specific plastome to form a small TR (only two tandem units). On the other hand, more complicated TRs by consecutive duplication, as shown in Figure 4, also exist in other sites of the plastome. Taking together our observations, we conclude that TRs came from in situ sequences and do not share the same origin of dispersed repeats.
We propose that homology-facilitated illegitimate recombination is the mechanism that creates TRs. The reasons are: 1). TRs arise from in situ sequences, actually from 7 bp to 143 bp long in the present study; 2) About 4–17 bp initial bases of some larger insertions are the iteration of their flanking sequence; 3) There are many copies of the plastome in a cell, both in circle and in linear forms, which provide the opportunity of such recombination; 4) Homology-facilitated illegitimate recombination is corroborated by the gene transformation in the chloroplast of Acinetobacter sp. . Recombination mediated by short direct repeats was reported in wheat chloroplast .
Intracellular sequence exchange
Recently, Kami reported the sequence from a nuclear BAC clone, 71F18, containing a chloroplast-derived DNA of P. vulgaris . The sequence comparison between the P. vulgaris plastome and the BAC clone showed that two separate regions (trnG-rps14 in 914 bp, trnI-ndhB in 7901 bp) in the plastome were linked together in the nuclear genome, with the same similarity (99.01%) to their nuclear homologues. We noted that the nuclear homologues did not contain the insertion in comparison with its plastome sequence, but had 8 deletion segments ranging in size from 8 bp to 583 bp. We therefore postulate that the original fragment transferred from the plastome, likely spanned the whole fragment from trnI-GAU to rps14 (73 kb), and then some deletions occurred, including the deletion of 64 kb fragment from trnL to psbZ.
A BLAST-search of the M. truncatula plastome sequences with available nuclear genome sequences of this species found that 51% of the plastome is present in the nuclear genome with more than 99% identity. These identified chloroplast-derived segments of the M. truncatula nuclear genome can be as large as 25 Kb. One must take into account that we only had the opportunity to explore a partial nuclear genome that is available up to date in Genbank, suggesting that the whole plastome could be found in the nuclear genome if the complete nuclear genome becomes available. If so, it is similar to the case of the rice genome , but different from A. thaliana, in which the chloroplast-derived fragments found in the nuclear genome have a lesser degree of identity (commonly 92–98%) and the transferred fragments are smaller in size, generally less than 4 kb, indicating that cpDNA transfer occurs earlier in the A. thaliana genome. In the rice genome, cpDNAs are continuously transferred to the nuclear genome, which incessantly eliminates them, until an equilibrium is reached . On the other hand, we did not find significant similarity between the plastome of L. japonicus and its nuclear genome. There are several hypotheses to explain the gene transfer from chloroplast to nuclear genomes . The most common mechanism of transfer depends on chloroplast lysis, but it is still difficult to elucidate why the nuclear genome of A. thaliana did not integrate cpDNA with the same patterns as M. truncatula or O. sativa.
Rate of evolutionary change in legume plastomes
There are only a few reports that describe the evolutionary rate of the chloroplast genome [44–46]. In the present study, we demonstrate that one plastome (P. vulgaris) globally evolved faster than another plastome (G. max), which has not been observed before.
In regard to the evolutionary rate of legume plants, Lavin reported that Phaseolus and closely related genera have the fastest substitution rates at the matK locus, within Leguminosae . Delgado-Salinas recently suggested this accelerated substitution rate in matK (within the intron of trnK) is related to the formation of the modern Trans-Mexico volcanic belt . We present further evidence here that the Phaseolus plastome genomically diversified rapidly. Considering that all the genes in this genome were affected, we deduced that some factor likely impacted this plastome globally, leading to a higher rate of evolutionary change.
Evolutionary rate can be mainly affected by the following factors: generation time, population size, specific mutation rate, and natural selection . The first three factors should influence all the genes of a genome as a whole, whereas the third is able to impinge on specific genes. Generation time is usually considered as an important cause for acting on the evolutionary rate, and has been applied in the elucidation of the discrepancy of evolutionary rates between rodents and other mammals , between the plastomes of Phalaenopsis aphrodite and grass crops , and between rice and maize . However, it cannot be applied to explain the phenomenon in the present study because both G. max and P. vulgaris are annual crop plants, sharing the same generation time. Population sizes of G. max and P. vulgaris cultivars seem to be similar because they are important domesticated plants with a highly limited genetic diversity . The divergent mutation rate could be one of the causes of the variance in the substitution rate between Phaseolus and Glycine. The reasons are: 1) overall Ks in Phaseolus is much higher than Glycine (see Additional File 1); 2) the sites of synonymous substitution are far from saturation in this plastome (< < 1); 3) and these two crop plants have the same generation time and similar reproductive mode (self-fertilization), which prevents genetic recombination from other plants; and 4) the chloroplast is rarely imported from other compartments of a cell as genetic elements. On the other hand, natural selection should be a factor for the relative rate of specific genes. The present research shows that almost all genes are under a purifying selection (ω < 1). Therefore, we conclude that the different evolutionary rate between Phaseolus and Glycine is a consequence of the pressures of both mutation and natural selection.
The M. truncatula and L. japonicus plastomes evolved at a similar rate (K). However, the genes with significant differences showed a remarkably distinct rate: 10 M. truncatula genes evolved significantly faster than did their L. japonicus counterparts, but two genes, rpoC2 and ndhF, changed faster in L. japonicus. In this case, it seems that the particular reason that leads to faster evolution of some genes in one plastome must be natural selection.