Comparative Chloroplast genomes of Gynura species:Sequence Variation, Genome Rearrangement and Divergence studies

Some Gynura species were reported to be natural anti-diabetic plants. The chloroplast genomes of four Gynura species were sequenced for hybridizations to improve agronomic traits. There are only 4 genera of tribe Senecioneae have published chloroplast genome in Genbank up to now. The internal relationships of the genus Gynura and the relationship of the genus Gynura with other genera in tribe Senecioneae need further researches. Results The chloroplast genome of 4 Gynura species were sequenced, assembled and annotated. Comparing with other 12 Senecioneae species, the chloroplast genome features were detailedly analyzed. Subsequently, the differences of the microsatellites and repeats type in the tribe were found. By comparison, the IR expansion and contraction is conserved in the genera Gynura, Dendrosenecio and Ligularia. The region from 25,000 to 50,000 bp is relatively not conservative but the 7 ndh genes in this region are under purifying selection with small change in amino acids. The phylogenetic tree shows two major clades, same as the sequence divergence in region 25,000 to 50,000 bp. Based on the oldest Artemisia pollen fossil, the divergence time were estimated.


Background
Gynura is a genus of flowering plants in the tribe Senecioneae of family Asteraceae endemic to Asia, which contains 44 species in total [1]. Many species of the genus Gynura have been reported to have medicinal value to diabetes mellitus, such as G. procumbens, G. divaricata and G. medica. The aqueous extract from G. procumbens possessed a significant hypoglycemic effect in streptozotocin-induced diabetic rats [2] and it improved insulin sensitivity and suppressed hepatic gluconeogenesis in C57BL/KsJ-db/db mice [3].
Polysaccharide from G. divaricata couldalleviate the hyperglycemiabymodulating the activities of intestinal disaccharidases in streptozotocin-induced diabetic rats [4] and G. divaricata-lyophilized powder was effectively hypoglycemic by activating insulin signaling and improving antioxidant capacity in mice with type 2 diabetes [5]. Phenolic compounds isolated from G. medica inhibitedyeast α-glucosidase in vitro [6].
Some plants in genus Gynura have also been used as vegetable and tea in people's daily life of East and South Asia, thus the genus Gynura is a natural treasure trove to treat the increasingly diabetes problem. Although Gynura plants are seemingly useful and harmless, but some shortcomings need improvement such as the medicinal effect to diabetes, potential toxicity and oral tastes [7][8]. Large improvement relies on interspecific hybridizations to increase genetic diversity and introgression of valuable traits.
Phylogenetic relationship is useful information for the interspecific hybridizations, but the phylogenetic relationship of the species in genus Gynura is, as yet, unclear.
A whole chloroplast DNA ranges between 120 and 160 kb in size on the circular chromosome in most plants, composing of Large Single Copy (LSC), Small Single Copy (SSC), and two copies of an Inverted Repeat (IRa and IRb) [9][10]. Contrast to mitochondrial and nuclear genomes, chloroplast genomes are more conserved in terms of gene content, organization and structure [11]. The chloroplast genomes of angiosperms generally show slow substitution rates under adaptive evolution [12]. Considering its small size, conserved gene content and simple structure, the chloroplast genome are valid and cost-effective to research phylogenetic relationships and evolution of plants in different taxa. Recently, Forage species of Urochloa [13], marine crop Gracilaria firma [14], Epilithic sister genera Oresitrophe and Mukdenia [15], Family Adoxaceae and Caprifoliaceae of Dipsacales [16] were sequenced the related complete chloroplast genomes to elucidate the diversity, phylogeny and evolution.
In the present study, we sequenced, assembled and annotated the chloroplast genomes of four Gynura species. Combined with other published chloroplast genomes of tribe Senecioneae, the structure features, repeat motifs, adaptive selection, phylogenetic relationships and divergence time were analyzed.

Results And Discussion
Chloroplast genome features of 16 Senecioneae species In this study, we focused on the chloroplast genome features of tribe Senecioneae, which is the largest tribe of family Asteraceae. Although the tribe comprises about 500 genera and 3000 species [44], we only found that 4 genera of tribe Senecioneae had published chloroplast genome in Genebank. Five species of genus Dendrosenecio, one species of genus Jacobaea, five species of genus Ligularia, one species of genus Pericallis and four species of genus Gynura were used to find their similarities and differences. The whole sequence length ranges from 150,551 bp (Dendrosenecio brassiciformis) to 151,267 bp (Pericallis hybrida). With the typical quadripartite parts like most land plants, the chloroplast genome has one Large single copy (LSC), one Short single copy (SSC), two Inverted regions (IRa and IRb) (Fig 1). The LSC length ranges from 82,816 bp (Jacobaea vulgaris) to 83,458 bp (Dendrosenecio cheranganiensis), the SSC length ranges from 17,749 bp (D. brassiciformis) to 18,331 bp (P. hybrida) and the IRs length both range 24,688 bp (D. brassiciformis) to 24,845 bp (P. hybrida) ( Table 1). Changes in each region are not consistent with changes in whole chloroplast genome. J. vulgaris has the shortest chloroplast genome in length but its SSR region is longer than 4 Gynura species. In addition, there are 95 coding genes in the chloroplast genome of P. hybrida and 87 coding genes in the J. vulgaris. GCcontent varies between 37.2% and 37.5%. Only the rRNA number is conserved in chloroplast genome of tribe Senecioneae, which is same as family Adoxaceae and Caprifoliaceae [16], but different from genus Oresitrophe and Mukdenia [15].

Microsatellites and Repeats type
Number of micosatellites with mono-,di-and tri-nucleotide repeat motifs varies in the tribe. D. brassiciformis, J. vulgaris and L. hodgsonii do not have tri-nucleotide repeat motifs while four Gynura species have 4 to 5 tri-nucleotide repeat motifs. The number of mono-nucleotide repeat motif is 28 to 38, accounting for the largest proportion (Fig 2a).
The unit size of microsatellites is significantly different in four Urochloa species [13], which has tetra-nucleotide repeat motif and the tri-nucleotide is the largest in the proportion. The total number of repeats types is consistent with those in the four Gynura species, but the number of each repeat type is different. Palindromic repeats are the most abundant and complement repeats are secondary in 16 Senecioneae species (Fig 2b).
Compared with the Oresitrophe species [15], the Senecioneae species have 5 to 12 reverse repeats, but the Oresitrophe species do not have reverse repeats. In addition to this, forward and palindromic repeats number is similar in the Oresitrophe species.

Contraction and Expansion of Inverted Repeats
The chloroplast genome is highly conserved in land plants, but the IR expansion and 6 contraction leads to the different genome sizes of different plants [41]. 16  hodgsonii. In that region, D.cheranganiensis is close to P. hybrida , but different from the four Gynura species and L. hodgsonii (Fig 4B). That region has total 12 genes and 7 genes are coding the NAD(P)H dehydrogenase (NDH) complex subunit. The function of NAD(P)H dehydrogenase (NDH) complex is well-known in the photosystem I (PSI) cyclic electron flow (CEF) and chlororespiration [42][43], so the substitution of ndh genes was further studied. The ratio of non-synonymous(dN)/synonymous substitution (dS) rate were 7 calculated by PAML program. The ratio > 1 indicates positive selection, the ration < 1 indicates purifying selection and the ratio = 1 indicates neutral evolution. All the dN/dS ratio of 7 genes below 1 indicates that they are under purifying selection and little amino acid change happened (Table 2). That means the functions of 7 ndh genes are conservative during evolution, although they are not located in a conservative area.

Phylogenetic Relationships
The sequence alignment of 16 Senecioneae species was used to construct the ML (Fig 5) and BI (Fig S2) tree. In ML tree, two major clades were constructed with 100% bootstrap value. One clade includes the genera Gynura and Ligularia, and the other clade includes the genera Dendrosenecio, Pericallis and Jacobaea. In genus Gynura, G. bicolor was the first to differentiate, followed by G. divaricata, at last were the G. formosana and G.
pseudochina. The two clades divergence result is consistent with the sequence divergence in 25,000 to 50,000 bp region. The former systemic phylogenies of tribe Senecioneae based on ITS region (nuclear) and plastid fragment sequences show a significant different with the phylogenetic tree [44]. In the former phylogenetic tree, genus Ligularia belongs to Tussilagininae grade, which was in the earlier diverging lineage than other genus. The sequence between four Gynura species and five Ligularia species is relatively conserved and the Pi value of most sequence locations are below 0.1 (Fig S3), significantly lower than the 16 species alignment. From the perspective of whole chloroplast genomes, genus Ligularia is close to genus Gynura.

Divergence Time Estimation
For the divergence time estimation of the 16 Senecioneae species, Artemisia gmelinii and Chrysanthemum boreale (Tribe Anthemideae) were selected as outgroup due to oldest Artemisia fossil pollen [38][39]. The divergence time of 16 Senecioneae species were estimated by BEAST2.0 program (Fig 6). The divergence clades of these genera are same with ML tree. The two major clades were expected to differentiate at 37.4 mya (late Eocene). Both Gynura and Ligularia differentiated at 5.8 mya (late Miocene).
Dendrosenecio and Pericallis also differentiated at5.8 mya. The divergence time of tribe Senecioneae and Anthemideae was at 51.39 mya (early Eocene) and the result was consistent with previous study on the evolution and phylogenetic of family Asteroideae basing on the plastid fragment sequences [39]. The traditional view on divergence time of the genus Gynura is in the Old World after the Atlantic opening. In that time, the Senecioid species were transferred to South America and the divergence began [45]. The divergence time of Gynura species was about 0.3 mya and the result showed that the divergence time of genus Gynura was much earlier than the traditional views. The divergence time of genus Gynura could not start at hundreds or thousands years ago [45] and the divergence time estimated by BEAST program was in the same time period with other genera of land plants [13][14][15][16]. The Gynura bicolor, G. divaricata, G. formosana and G. pseudochina plants grow in greenhouse with normal sunlight and temperature. The DNA was extracted from their fresh leaves by CTAB method [17] and DNA degradation and contamination was monitored on 1% agarose gels. About 1.5μg DNA sample was fragmented by sonication to a size of 350bp, then DNA fragments were end polished, A-tailed, and ligated with the full-length adapter for Illumina sequencing with further PCR amplification. After PCR products purification (AMPure XP system), libraries were analyzed for size distribution by Agilent 2100 Bioanalyzer and quantified by using real-time PCR.

Conclusion
The libraries constructed above were sequenced by Illumina HiSeq X Ten platform and 150bp paired-end reads (PE150) were generated with insert size around 350bp. Quality control (QC) is removing reads with ≥10% unidentified nucleotides (N), > 50% bases having phred quality < 5 and > 10 nt aligned to the adapter, allowing ≤10% mismatches.
The perl script NOVOPlasty 2.7.2 [18] was used to assemble the chloroplast genome sequence with 50 K-mer. The chloroplast genome sequence of Dendrosenecio cheranganiensis (Tribe Senecioneae) was selected as the reference genomes. The family Asteraceae plants used in the study were downloaded from Genebank, as follows:

Repeat structure analysis
The microsatellite regions is a tract of repetitive DNA in which certain DNA motifs (ranging in length from 1-6 or more base pairs) are repeated, typically 5-50 times [24][25]. The perl script Microsatellite identification tool (MISA, http://pgrc.ipkgatersleben.de/misa/misa.html) were used to find the microsatellite regions of chloroplast genome. Considering the features of plant chloroplasts, the numbers of each unit of continuous DNA motifs is set to 1-6, and the minus DNA motifs of each unit is 1-10, 2-6, 3-5, 4-5, 5-5, 6-5. Forward, reverse, complement and palindromic repeats type were detected by online tool REPuter [26]. The Hamming distance was setting as 1 and the minimum repeat size was 30bp.
Chloroplast genome analysis All the chloroplast genome sequences was aligned by MAFFT7.427 [27] on FFT-NS-2 module. Alignments of 7 selected genome sequences were visualized by mVISTA [28]. The DNA polymorphism (nucleotide diversity) was calculated by DnaSPv5 [29] based on alignment results.
The molecular evolutionary rates (ω) between orthologous genes were estimated by calculating the ratio of non-synonymous (dN)/synonymous substitution (dS) rate. The coding gene sequences of selected region were extracted by using the Artemis [30]. The gene sequences of each species were aligned by Clustal X [31] with default parameters, and the alignment results (dnd format) were converted to PML format by DAMBE [32] for next analysis. The dN/dS value was calculated by the codeml module (seqtype=1, model=0, Nsites=1,7,8) in PAML4.9i [33]. Significant differences were calculated by Likelihood-ratio test.

Phylogenetic analysis
The 16 chloroplast genome sequences of tribe Senecioneae (family Asteraceae ) were aligned by MAFFT and the results was used to analyze the phylogenetic relationships.

Divergence time estimation
The divergence time of 16 species was estimated by BEAST2 [36]. The oldest Artemisia fossil pollen has been recorded from the Eocene-Oligocene boundary [37][38]. The Asteraceae family plant Artemisia gmelinii and Chrysanthemum boreale were selected as outgroup and the node Artemisia-Chrysanthemum was constrained by using a lognormal distribution with an offset of 31 Ma and a mean and standard deviation of 0.5 [39]. The HKY nucleotide substitution model and priors tree Yule model was selected with strict clock. Each MCMC run had a chain length of 100,000,000 with sampling every 10,000 steps. Tracer [40] was used to read the ESS and Trace value of logged statistics to access the results. Then, the divergence time was accessed by Treeannotator program of BEAST2.

Availability of data and materials
All the data and materials are available from the corresponding authors upon request.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.