Skip to main content

The chloroplast genome of Chrozophora sabulosa Kar. & Kir. and its exploration in the evolutionary position uncertainty of genus Chrozophora


Chrozophora sabulosa Kar. & Kir. is a biennial herbaceous plant that belongs to the Euphorbiaceae family and has medicinal properties. This research aimed to identify the genetic characteristics and phylogenetic position of the Chrozophora genus within the Euphorbiaceae family. The evolutionary position of the Chrozophora genus was previously unknown due to insufficient research. Therefore, to determine the evolutionary link between C. sabulosa and other related species, we conducted a study using the NGS Illumina platform to sequence the C. sabulosa chloroplast (cp.) genome. The study results showed that the genome was 156,488 bp in length. It had a quadripartite structure consisting of two inverted repeats (IRb and IRa) of 24,649-bp, separated by an 87,696-bp LSC region and a 19,494-bp SSC region. The CP genome contained 113 unique genes, including four rRNA genes, 30 tRNA genes, and 79 CDS genes. In the second copy of the inverted repeat, there were 18 duplicated genes. The C. sabulosa lacks the petD, petB, rpl2, and rps16 intron. The analysis of simple sequence repeats (SSRs) revealed 93 SSR loci of 22 types and 78 oligonucleotide repeats of four kinds. The phylogenetic investigation showed that the Chrozophora genus evolved paraphyletically from other members of the Euphorbiaceae family. To support the phylogenetic findings, we selected species from the Euphorbiaceae and Phyllanthaceae families to compare with C. sabulosa for Ks and Ka substitution rates, InDels investigation, IR contraction and expansion, and SNPs analysis. The results of these comparative studies align with the phylogenetic findings. We identified six highly polymorphic regions shared by both families, which could be used as molecular identifiers for the Chrozophora genus (rpl33-rps18, rps18-rpl20, rps15-ycf1, ndhG-ndhI, psaI-ycf4, petA-psbJ). The cp. genome sequence of C. sabulosa reveals the evolution of plastid sequences in Chrozophora species. This is the first time the cp. genome of a Chrozophora genus has been sequenced, serving as a foundation for future sequencing of other species within the Chrozophoreae tribe and facilitating in-depth taxonomic research. The results of this research will also aid in identifying new Chrozophora species.

Peer Review reports


The Euphorbiaceae family has approximately 9,000 species spread across 340 genera and 52 tribes, primarily found in various tropical regions [1,2,3,4,5,6,7]. The genus Chrozophora A. Juss. belongs to the Chrozophoreae tribe of the Acalyphoideae subfamily of the Euphorbiaceae family [8]. Chrozophora is found in the Mediterranean, tropical Africa, and West Asia, as well as in Pakistan’s tropical and temperate zones [2]. According to “World Flora Online” Chozophora has 9 accepted species (C. gangetica Gand., C. brocchiana Schweinf., C. mujunkumi T. Nasimova, C. oblongifolia (Delile) A. Juss. ex Spreng., C. plicata (Vahl) A. Juss. ex Spreng., C. rottleri (Geiseler) Spreng., C. sabulosa Kar. & Kir., C. senegalensis (Lam.) Spreng., C. tinctoria (L.) A. Juss.). Chrozophora sabulosa Kar & Kir, known as Nilakari, is an important medicinal plant from the Euphorbiaceae family [1].

The family Euphorbiaceae is challenging to understand due to its wide distribution range, many species, and poorly defined genera. However, it has been confirmed that Euphorbiaceae is monophyletic based on molecular and embryological features [8,9,10]. Webster [6] has classified the Euphorbiaceae family into five subfamilies based on the number of ovules per ovary locule: Phyllanthoideae, Oldfieldioideae, Acalyphoideae, Crotonoideae, and Euphorbioideae. In the Angiosperm Phylogeny Group (APG II 2003) [11] classification, the family was divided into four groups: Euphorbiaceae s.s., Phyllanthaceae, Picrodendraceae, and Putranjivaceae, all in the clade Malpighiales. The subfamilies with uniovulate ovary locules (Euphorbioideae, Acalyphoideae, and Crotonoideae) are considered Euphorbiaceae s.s [11]. The family was further divided into several subfamilies and tribes based on molecular data [12], and some genera were moved to independent families [5]. APG III (2009) [13] divides it into four subfamilies: Acalyphoideae, Cheilosoideae, Crotonoideae, and Euphorbioideae. Despite the extensive research carried out by botanists, who have conducted studies in taxonomy, anatomy, phytochemistry, economic botany, and molecular systematic, the knowledge of this family still has significant gaps, even regarding morphology [5]. Detailed molecular, morphological, and anatomical studies involving many genera are required to propose a safer classification for this family. However, some genera within the family, such as Chrozophora, still have confusing taxonomic positions. To understand the evolutionary relationship between C. sabulosa and related plants, it is necessary to sequence the chloroplast genomes from the Chrozophora genus and the Chrozophoreae tribe.

Previous studies have used both molecular and morphological data to perform phylogenetic reconstructions. However, it has been suggested that molecular approaches are more reliable in phylogenetics [8,9,10]. Among molecular approaches, cp. genome sequences have gained significant interest in plant phylogenetics, phylogeography, and molecular evolution investigations in recent years [14]. The cp. genome has a significantly conserved gene content and genome order [15]. Moreover, it is smaller, has fewer nucleotide alterations, and has fewer genome sequence reorganizations than the nuclear and mitochondrial genomes. These characteristics make it an excellent tool for understanding genome evolution in complicated angiosperm families [14,15,16,17,18]. As a result, cp. genomes provide valuable data that can be easily combined with source molecular data to validate complicated evolutionary connections and perform comprehensive phylogenetic analyses [17]. More than 400 species of Euphorbiaceae have had their cp. genomes sequenced using a high-throughput sequencing technique [10]. However, data on the chloroplast genomes of Chrozophora was unavailable, which limits further information regarding its phylogenetic position in the family. Additionally, no members of the Chrozophoreae tribe have had their cp. genomes sequenced yet, which casts doubt on the tribe’s exact phylogenetic position. The present study aimed to investigate the cp. genome of C. sabulosa to clarify the evolutionary position of the Chrozophora genus in the Euphorbiaceae family. By contributing significant molecular and phylogenetic data on the Chrozophora genus, this study may aid in species identification and determination of its evolutionary position. The C. sabulosa cp. genome has been marked as the first member of the Chrozophora genus and the Chrozophoreae tribe to be sequenced. The results of this research may provide a foundation for phylogenetic investigations of the Chrozophoreae tribe.


C. sabulosa cp Genome assembly and its characteristics

The Illumina HiSeq2500 generated 10.1 GB of raw data for C. sabulosa through paired-end sequencing with 150 bp reads. The de novo assembled cp. genome had an average coverage depth of 271 and was 156,488 bp long, comprising two inverted repeats (IRb and IRa) of 24,649 bp, an LSC region of 87,696 bp, and an SSC region of 19,494 bp (Fig. 1). The total GC content of C. sabulosa was 36.5%, with the highest GC content (43.3%) found in the IRs, followed by 34.1% in the LSC region and 30% in the SSC region. The CP genome contained 113 distinct genes, including 79 CDS genes, four rRNA genes, and 30 tRNA genes, as shown in Table 1. The LSC segment had 84 genes, while the SSC segment had 13. Table S1 listed the 15 genes (out of 113) with introns, including three with two introns (clpP, ycf3, rps12), five tRNA, and five CDS genes with one intron. The rps12 gene was repeated twice, leading to a trans-splitting event. The psbL gene in C. sabulosa began with a TCG codon, resulting in Threonine as the first amino acid. The cp. genome lacked the petB, petD, rpl16, and rpl2 introns. Table 2 provides a detailed description of the genes in C. sabulosa based on their function.

Table 1 The cp. genome of Chrozophora sabulosa is described in detail
Fig. 1
figure 1

The cp. genome map of C. sabulosa. Genes outwards the map is demonstrated clockwise, while inside genes are transcribed anticlockwise. Color coding is used to differentiate between functional groups of genes. The intensity of the inner circle color indicates the amount of GC (Dark grey) and AT (Light grey). LSC denotes the large single copy, SSC denotes the small single copy, and IRb and IRa denote inverted repeats

Table 2 Chrozophora sabulosa gene functions are summarized in this table

RSCU, and amino acid frequencies in C. sabulosa

All genomes showcase codon usage bias, which affects translational dynamics, consistency, accuracy, and protein folding [19]. The Relative synonymous codon usage (RSCU) ratio is the average usage frequency of a codon divided by its predicted unbiased usage frequency. Recent studies have highlighted the significance of codon usage in the evolution of the cp. genome [20, 21]. In the coding sequences in C. sabulosa, there are 52,162 codons within 79,461 bp. The most abundant amino acid in the cp. genome of C. sabulosa is leucine (11%), followed by isoleucine (9%), and the least abundant is cysteine (1%) (for further details, see Table S2 and Fig. 2). We have identified 31 variant codons with RSCU values greater than one, indicating that C. sabulosa uses them specifically to encode certain amino acids. The AGA codon, which codes for arginine, has the highest usage bias (2.05), while the CGC codon, which also codes for arginine, has the lowest (0.44). There was no bias at codons AUG (Methionine), CCC (Proline), and UGG (Tryptophan) in C. sabulosa cp. genomes with 1.00 RSCU (Table S3).

Fig. 2
figure 2

Amino acids frequency (%) of C. sabulosa

Editing sites of RNA

PREP-cp detected 50 RNA editing sites in 21 genes from C. sabulosa. The ndhB gene has the highest number of RNA editing sites (9 sites), followed by the ndhD gene (7 sites) (refer to Fig. 3). Modifying a nucleotide in the first position of a codon resulted in 13 editing sites (26%), while changes in the second position resulted in 37 editing sites (74%). Most RNA editing sites (42%) were found in Serine codons. In C. sabulosa, 90% of the Serine was converted to Leucine, while the remaining 10% was converted to Phenylalanine. The codon-encoded Proline had the second-highest conversion rate (18%), while the codon-encoded Threonine had the third-highest conversion rate (14%). Proline, Serine, Threonine, and Alanine showed multiple types of nucleotide conversion, whereas Leucine, Arginine, and Histidine showed only one. Hydrophobic amino acid conversions [including Proline (9), Alanine (2), and Leucine (6)] occurred at 34% of all RNA editing sites. In contrast, soluble amino acids resulted in 33 conversions (66%), including Histidine (3), Threonine (7), Serine (21), and Arginine (2). Three non-polar to polar conversions, 14 non-polar to non-polar conversions, 30 polar to non-polar conversions, and three polar-to-polar amino acid conversions have also been found. For further information, please refer to Table S4, which details all the RNA editing sites.

Fig. 3
figure 3

RNA editing sites of C. sabulosa cp. genome

Detecting simple sequence repeats (SSRs) in C. sabulosa

In this study, MISA obtained a set of 93 SSRs, which had 22 types that were at least 10 bp in size (Table S5). Among these, there were two types of mononucleotides (A/T), six types of dinucleotides (AG/CT, AT/AT, and AT/AT) (Table S6 & S7), two types of trinucleotides (AAT/ATT), 12 types of tetranucleotides (ACCT/AGGT, AAGG/CCTT, AAAT/ATTT, AATT/AATT, AATG/ATTC, AAAG/CTTT), and four types of pentaucleotides (AAAAG/CTTTT, AAAAT/ATTTT). The mononucleotides were the most common type of SSR detected (68%), the pentanucleotides were the most extended form of SSR, and hexanucleotides were not found in C. sabulosa (Fig. 4a). The number of SSRs identified in intergenic spacer regions was higher than in other locations (Fig. 4b). The LSC had the most SSRs, followed by the SSC, while the inverted repeats had the fewest (Fig. 4c).

Fig. 4
figure 4

SSR analysis of C. sabulosa. (a) Types of SSRs. (b) Distribution of SSRs in active cp. genome regions. (c) Location of SSRs

Oligonucleotide repeats analysis of C. sabulosa

We used the REPuter tool to identify 79 oligonucleotide repetitions, which had the following values: P = 27, F = 27, C = 7, and R = 18, as illustrated in Fig. 5. These repeats had a size ranging from 20 to 51 bp (as shown in Fig. 5b). We found that the LSC contained 57 oligonucleotide repeats, the SSC had 11, and the IRs had four. Furthermore, we discovered that IR and LSC shared three repeat layouts, with two shared by LSC/SSC and one shared by SSC/IR (as depicted in Fig. 5c). The number of oligonucleotide repetitions in intergenic spacer areas was highest (55), followed by CDS (9), transfer RNA (3), and intronic region (2). We also detected mutual repeats in the IGS/CDS (2), IGS/Intron (4), and IGS/trn (3) regions (as shown in Fig. 5d). Palindromic repeats were more frequent than other repetitions (as seen in Fig. 5a). The locations, positions, and areas of oligonucleotide repeat sequences are provided in Table S8.

Fig. 5
figure 5

Oligonucleotide repeats analysis in C. sabulosa

Phylogenetic analyses

We created a maximum likelihood tree using CDS data from Euphorbiaceae and Phyllanthaceae cp. genomes (Fig. 6, Table S12) to investigate the evolutionary relationship of Chrozophora. We constructed the tree by using C. sabulosa, 18 other genera from Euphorbiaceae, and 10 genera from Phyllanthaceae. The alignment of 31 species using MAFFT produced an 80,283 bp consensus sequence with 22,018 (22.5%) identical positions and 84.8% pair-wise identity. The tree comprised 28 nodes with bootstrap values ranging from 48 to 100. The best-fit model for this tree was GTR + F + R6. Its log-likelihood was − 471155.9363, AIC score 942465.8726, AICc score 942465.9954, and BIC score 943196.6990. The overall length of the tree was 0.9525, with internal branch lengths of 0.2132 (22% of tree length). Our analysis indicated that the Chrozophora genus evolved from other Euphorbiaceae members in a paraphyletic manner. The Chrozophora genus was closely related to the Bischofia genus (Bischofia polycarpa) of the Phyllanthaceae family. This finding suggests that the Chrozophora genus is more closely associated with Phyllanthaceae members than other Euphorbiaceae members. The tree also confirmed the common ancestor of both families and showed that the members of the Phyllanthaceae family share a molecular basis with members of the Euphorbiaceae family. Additionally, our analysis revealed Chrozophora’s unique position within the Euphorbiaceae family.

Fig. 6
figure 6

CDS-based ML tree of Euphorbiaceae and Phyllanthaceae species

C. sabulosa’s comparison with other Euphorbiaceae and phyllanthaceae species

We have selected eight plant species from two families, Euphorbiaceae and Phyllanthaceae, to compare their chloroplast genome with C. sabulosa. The four selected species of Euphorbiaceae are Ricinus communis, Manihot esculenta, Jatropha curcas, and Euphorbia helioscopia, while four Phyllanthaceae species are Antidesma bunius, Breynia fruticosa, Glochidion chodoense, and Phyllanthus urinaria. A detailed basic comparison is shown in Table 3. We have compared the length of the chloroplast sequence and the quadripartite structure of each species. The length of the chloroplast sequence varied from 155,630 bp (B. fruticosa) to 163,856 bp (J. curcas), and each segment of the quadripartite structure was comparable across the analyzed plastomes. M. esculenta and E. helioscopia had the highest number of genes (132), while B. fruticosa and P. urinaria had the fewest (129). The overall GC content of these cp. genomes varied from 35.4 to 36.7%, and the gene component was comparable, except for a few missing or added genes. The infA gene was present in C. sabulosa and A. bunius but not in the other seven species. The rps16 gene was absent in J. curcas and E. helioscopia, while the petD intron was absent in C. sabulosa, R. communis, E. helioscopia, and G. chodoense. The intron of petB and rpl16 was absent in C. sabulosa, R. communis, and E. helioscopia. The rpl2 intron was only absent in the plastome of C. sabulosa. We used Geneious Prime 2021.1.1 and the MAFFT alignment of cp. genomes from the nine species to compare the relative placements of genes across the species. The 192,904-bp consensus sequence had 103,626 (54%) identical sites and 80% pair-wise identity. This analysis demonstrates that both family members have a close association, and C. sabulosa demonstrated a close link with both family members. These comparisons confirm the phylogenetic results and show that the chloroplast genome helps understand the evolutionary relationships within the Euphorbiaceae and Phyllanthaceae families.

Table 3 Described the comparison of C. sabulosa cp. genome with four Euphorbiaceae and four Phyllanthaceae species

The expansion and contraction of IRs

A study was conducted on the margins of four key areas (LSC, IRB, SSC, IRA) and their surrounding genes in C. sabulosa and selected cp. genomes (Fig. 7). The ycf1 gene on the JSA (SSC/IRA) junction was found to be functional in all species. However, in R. communis, M. esculenta, J. curcas, E. helioscopia, (A) bunius, and G. chodoense, a pseudo copy of ycf1 was detected at the JSB (IRB/SSC) border, while it was absent in C. sabulosa, (B) fruticosa, and P. urinaria. The size and position of the ndhF gene varied at the JSB border. The rps19 and rpl2 genes were entirely in the IRs in (C) sabulosa, E. helioscopia, A. bunius, and P. urinaria, but they were found in varied locations in the other five species. The trnH gene was detected at the JLA (IRA/LSC) boundary in all species except A. bunius, which had two copies of this gene found in IR regions. The study revealed that identical genes varied in locations and sizes at every junction of the cp. genomes, indicating a variety of gene content. A thorough analysis of IR contraction and expansion is shown in Fig. 7 (IRSCOPE analysis). These findings suggest that these nine cp. genomes were slightly different due to differences in size and gene placement in these species.

Fig. 7
figure 7

The IRSCOPE analysis of Euphorbiaceae and Phyllanthaceae species

The Ka, Ks substitutions, and Ka/Ks rate

Pair-wise alignments of C. sabulosa genes were performed with eight selected cp. genomes to determine the Ka/Ks ratio (Fig. 8). All comparisons of selected genes with all species had a Ka/Ks ratio that was typically less than one. Genes for which Ka/Ks ratios were unavailable (N/A) were set to zero (See Table S10). After excluding the genes with a Ka or Ks value of zero, the average Ka/Ks ratio was 0.20, demonstrating that the genes in the cp. genome of C. sabulosa were subjected to significant purifying selection forces. The average Ka/Ks ratio for Euphorbiaceae species was 0.212, while it was 0.184 for the Phyllanthaceae species. The psbI and petN have zero Ka/Ks ratio in all species, making them the most stable genes among both families. Most of the genes exhibited a Ka/Ks ratio of below one in all comparisons, and their proportions were consistent, except for petD, ndhK, cemA, rpl23, and rpl20 genes, which had challenging ratios. For instance, in R. communis, E. helioscopia, and G. chodoense, the Ka/Ks rate of petD was above one, but in the other five comparisons, it was less than one. Compared to R. communis, the Ka/Ks ratio of petD was 9.1, whereas it was only 0.03 when compared to J. curcas. Similarly, the Ka/Ks value of ndhK was 1.12 compared to R. communis. The Ka/Ks values of the five challenging genes are displayed in Table 4. The comprehensive Ka and Ks values and their ratios are available in Supplementary Table 9.

Table 4 Genes showing elusive Ka/Ks ratio for eight selected species
Fig. 8
figure 8

Ka/Ks values for Euphorbiaceae and Phyllanthaceae species

Investigating SNPs and InDel mutations analysis in C. sabulosa

We conducted pair-wise alignments between C. sabulosa and selected species of Euphorbiaceae and Phyllanthaceae families. As a result, we discovered single nucleotide polymorphisms (SNPs) and InDels mutation events in the IR, SSC, and LSC regions of C. sabulosa. The highest number of SNPs (18,625) was found when comparing C. sabulosa to E. helioscopia, while the lowest number was found when compared to M. esculenta (17,028 SNPs). All species in Table 5 had a transition-to-transversion ratio greater than one due to more transitions than transversions. The LSC and SSC regions showed a higher rate of substitutions than the IR regions. In terms of inDels, the inverted repeat regions had the fewest, while the SSC and LSC regions had the most. Pair-wise alignment of J. curcas resulted in the most significant number of inDels (24,369), followed by P. urinaria with 23,217 inDels, and E. helioscopia with the least (21,134). Table 6 provides an in-depth description of inDels and its relevant parameters. The similar values of InDels and SNPs observed in all the Euphorbiaceae and Phyllanthaceae species indicate their closer relationship.

Table 5 Ts, Tv substitutions, and Ts/Tv ratio (in IRs, SSC, and LSC) of selected Euphorbiaceae and Phyllanthaceae species
Table 6 The detailed analysis of InDels, Average InDel Length, InDel Diversity K(i), InDel Diversity per site Pi(i), and alignment length in LSC, IR, and SSC regions of C. sabulosa by making pairwise alignment with eight selected species of Euphorbiaceae and Phyllanthaceae families

The nucleotide diversity and highly polymorphic loci of Euphorbiaceae and Phyllanthaceae species, with particular reference to C. sabulosa

We conducted an independent analysis by comparing C. sabulosa with representatives of both Euphorbiaceae and Phyllanthaceae families. We aimed to examine the nucleotide diversity and highly polymorphic loci for C. sabulosa and other chosen species of these two families. We found that the average nucleotide diversity in Euphorbiaceae species was 0.0958, whereas in Phyllanthaceae species, it was 0.0981. The nucleotide diversity values varied across species and regions. In Euphorbiaceae species (Table S10), it ranged from 0.0076(rps7) to 0.2549(rpl33-rps18), whereas in Phyllanthaceae species (Table S11), it ranged from 0.0068(rps7) to 0.2467(rps15-ycf1). We observed that the average nucleotide diversity in coding areas was the lowest (Euphorbiaceae species 0.0624, Phyllanthaceae species 0.0652), followed by IGS regions (Euphorbiaceae species 0.1934, Phyllanthaceae species 0.1953), and Intronic regions (Euphorbiaceae species 0.6178, Phyllanthaceae species 0.6193). Additionally, we identified six highly polymorphic sites shared by both families (Table 7), which can be used as mutational markers to identify and classify Chrozophora species. Figure 9 shows the nucleotide diversity values for the 93 locations selected from both families. We found that C. sabulosa had identical nucleotide diversity values with both family members, indicating that it shares characteristics of both families.

Table 7 Six mutual highly polymorphic regions of selected Euphorbiaceae and Phyllanthaceae species
Fig. 9
figure 9

Nucleotide diversity (π) in 93 regions common in Euphorbiaceae and Phyllanthaceae family members. The π values of Euphorbiaceae members are displayed in sky blue, whereas the π values of Phyllanthaceae members are depicted in red


We are excited to present our novel findings on the chloroplast genome of C. sabulosa, a first-time report in the scientific community. To determine the phylogenetic position of this genus, we conducted a comprehensive comparative analysis with members of the Euphorbiaceae and Phyllanthaceae families. Specifically, we compared C. sabulosa to four Euphorbiaceae species (Ricinus communis, Manihot esculenta, Jatropha curcas, and Euphorbia helioscopia) and four Phyllanthaceae species (Antidesma bunius, Breynia fruticosa, Glochidion chodoense, and Phyllanthus urinaria). Our investigation covered various aspects such as cp. genome structure, gene details and their functions, GC content, intron presence or absence, amino acid frequencies, relative codon use values, RNA editing sites, SSRs, and oligonucleotide repeats. Notably, the cp. genome of C. sabulosa exhibits a typical quadripartite architecture and comparable structure and genomic data to other Euphorbiaceae and Phyllanthaceae species [9, 10, 22,23,24,25,26].

Our research on the chloroplast genome of C. sabulosa unveiled unique and intriguing findings. The TCG codon in the psbL gene of C. sabulosa leads to Threonine as the first amino acid, a behavior similar to that observed in the cp. genomes of other plant species such as Indigofera genus [27], Spinacia oleracea (NC_002202), Nicotiana tabacum (NC_001879), Ampelopsis glandulosa (KT831767), Lycium barbarum (MH032560), and Lycium chinense (MK040922). This underscores the conserved structure of the chloroplast genome, a feature observed in various other angiosperm lineages [17, 28,29,30,31]. Our findings also revealed that the DNA GC percentage is not uniform within the chloroplast genomic domains, with the GC concentration in the IR area being more significant than that of the other regions, likely due to the high GC concentration found in the four rRNAs in the inverted repeats [31].

Furthermore, our comparison of the cp. genome of C. sabulosa with other species in the Euphorbiaceae and Phyllanthaceae families yielded significant findings. The cp. genomes and gene content were similar across selected species. However, the infA gene was only present in C. sabulosa and (A) bunius and was missing in other Euphorbiaceae and Phyllanthaceae cp. genomes. The pseudo copy of ycf1 was absent in C. sabulosa, (B) fruticosa, and P. urinaria. These findings highlight the unique gene content and organization in (C) sabulosa and its evolutionary implications [32,33,34,35,36,37]. We also discovered that the petD, petB, rpl16, and rpl2 genes of C. sabulosa and some other species lacked introns, a phenomenon documented in various other angiosperms [38, 39]. The genes in which intron loss was reported earlier in other angiosperms are rpoC2, atpF, rpl2, rps12, atpF, rps16, and clpP [38,39,40,41,42]. This underscores the critical role of introns in gene expression control and their potential to boost exogenous gene expression at plant genome regions to achieve desirable agronomic features [37]. The lack of specific introns may cause changes in gene expression [37].

The codon usage bias in the cp. genome of plants is an essential evolutionary characteristic that affects the translation of mRNA, gene identification, and molecular biological investigations [32]. Some genes in plastoms have shown a bias towards specific codons, likely due to external pressure [43]. C. sabulosa’s cp. genome uses leucine most commonly and cysteine uncommonly. Similar findings have been reported in other cp. genomes, such as Eruca sativa [43], Farsetia hamiltonii [17], and Nasturtium officinale [44]. AGA codon in C. sabulosa’s cp. genome had the highest usage bias for Arginine. These findings suggest that codon usage significantly impacts the reshaping and translation of the cp. genome [17, 19,20,21, 43,44,45,46,47]. Our results also support earlier studies on the adaptational evolution of the large A/T concentration in chloroplast genomes, which have also shown a preference for specific codons [17, 19, 22, 23].

RNA editing is a modification after transcription, significantly impacting the sequencing and performance of related proteins and genetic material [45]. Analyzing the RNA editing sites in C. sabulosa could provide evolutionary insights into how RNA editing systems evolved during the evolution of plant life on Earth and which editing sites may have been maintained to carry out essential functions. Most RNA editing sites were found in the ndhB gene, which encodes for NADH dehydrogenase subunits. This demonstrated that a single gene could translate a wide range of protein products using RNA editing [41]. Changes at the second position of the nucleotide were more prevalent than changes at other positions among the RNA editing sites examined. RNA editing, particularly at the second codon position, can alter the encoding amino acid and the primary, secondary, and tertiary protein organization, which may be essential for their function [41, 45]. Most of the RNA editing sites were discovered in Serine codons, with the most significant transformation of Serine into Leucine, possibly increasing the hydrophobicity of the associated peptide. Our findings also supported that RNA editing sites can restore amino acid conservation, improve hydrophobicity, and impact protein architecture [41]. These findings were consistent with the fundamental properties of chloroplast gene RNA editing in higher plants [17, 41, 45].

Our study on the C. sabulosa cp. genome revealed that mono-nucleotide repeats were the most common and pentanucleotide was the most extended SSR type. Our results were consistent with similar studies on angiosperm species, indicating that polyadenine and polythymine repeats are more abundant in cp. genomes [17, 26, 29, 44, 51, 52]. We did not observe any hexanucleotide SSRs in the C. sabulosa cp. genome, which is a shared trait with Brassica napa, Nasturtium officinale, Raphanus sativus, and Fritillaria cp. genomes [44, 48, 49]. Single-copy regions had a higher percentage of oligonucleotide repeats than inverted repeats, confirming the reverse nature of inverted repeats [17]. The IGS had more repeats than other cp. genome regions, indicating higher susceptibility to mutations and recombination [17, 29]. Palindromic repeats were more frequent than other types of repetitions, suggesting the existence of various identical or comparable sequences, either continuous or separated by a spacer region [17, 28, 29]. Our findings were consistent with several studies conducted on angiosperms [17, 28, 29, 37, 53,54,55].

High-dimensional sequencing methods have made it easier to access CP genomes with vast amounts of genetic material [17, 43]. For phylogenetic research, cp. genome sequences are an excellent resource [17, 43, 53, 56, 57]. The Euphorbiaceae family is one of the most diverse angiosperm families [4, 5, 58]. However, there is conflict in sub-famil classification within the Euphorbiaceae family. Previously, based on pollen morphology, the Euphorbiaceae was classified into five sub-families, including Phyllanthoideae, Oldfieldioideae, Acalyphoideae, Crotonoideae, and Euphorbioideae [6]. Later, the Angiosperm Phylogeny Group [13, 59] separated the Phyllanthaceae from the Euphorbiaceae, giving it a separate family status. Recently, Euphorbiaceae has been divided into four subfamilies based on molecular data [59, 60]. However, this classification is also unclear due to a lack of available data on its species. Chloroplast genomes have been used to determine the phylogenetic relationships in the Euphorbiaceae family [18, 23,24,25,26]. The systematic position of Chrozophora was unclear until this study. The findings suggest that Chrozophora is closely related to the Phyllanthaceae family, which supports the historical record of the Euphorbiaceae family being the ancestor of the Phyllanthaceae family [6]. The Chrozophora genus is considered to be distinct from the other Euphorbiaceae family genera, indicating its paraphyletic origin. Our results also confirmed that the Euphorbiaceae family’s immense diversity, morphological divergences, variable ecological range distribution, and the scarcity of literature on numerous species make phylogenetic interpretations challenging [60]. Additionally, there is a critical need to sequence more chloroplast genomes of the Chrozophoreae tribe to clarify its position among these two families.

The cp. genome is known to be stable across different plant lineages. However, the expansion and contraction of IRs can alter the size of the cp. genome and its segments [17, 28, 53, 61, 62]. The expansion and contraction of IRs affect genes, substitutions, and genome length, ultimately determining a species’ phylogenetic position [17, 34, 63]. Previous studies have found that changes in the boundaries of the cp. genome are caused by differences in the number and location of genes at the interface of inverted repeats [17, 34]. We studied the IR regions of C. sabulosa, four Euphorbiaceae, and four Phyllanthaceae species. Our findings revealed both similarities and genetic variations in these plastomes. The conservation of the IR region was higher in all plastomes, while most substitutions occurred in the LSC and SSC regions. These results are consistent with similar studies on other plastid genomes [17, 28, 62,63,64]. Our research also showed that gene migration between single copies and inverted repeat regions causes mutation rate variations, making chloroplast genomes either conserved [17, 28, 31, 34] or widely polymorphic in gene content and structure [64,65,66,67].

The Ka/Ks ratio confirms the selective forces that have acted upon the genes during evolution. These forces can be impartial, pure, or positive depending on the Ka/Ks proportion [17, 68, 69]. We compared the cp. genome of C. sabulosa to eight selected species to validate our phylogenetic findings. Our analysis showed that most C. sabulosa genes underwent purifying selection to maintain their preserved function. Purifying selection on most chloroplast genes in the Euphorbiaceae and Phyllanthaceae family species contributed to corroborating phylogenetic results. However, a few genes showed abnormal behavior regarding Ka/Ks values, indicating petD, ndhK, cemA, rpl23, and rpl20 in both family representatives. Our research findings are in collaboration with the previously published similar results [17, 43, 46, 70, 71]. InDels and SNPs were more frequent in the LSC section than in the inverted repeats region. The IRs had the lowest number of mutations, indicating their conserved nature over the single-copy regions [17, 28, 62,63,64]. The transition-to-transversion ratio was more significant than one in all selected species, indicating a higher transition rate than the transversion rate. This suggests that the species are distant and have more SNPs [17, 22]. The high Ts/Tv ratio may be due to the GC-rich composition of the chloroplast genome [61]; similarly, the nuclear genomes have already been reported to have a Ts/Tv ratio due to their GC-rich makeup [72]. The high number of InDels and SNPs (significant mutations) indicated Chrozophora’s unique phylogenetic position and paraphyletic evolution.

We analyzed nucleotide diversity in C. sabulosa and selected species from the Euphorbiaceae and Phyllanthaceae families. Results showed that IGS regions have higher rates of genetic recombination and polymorphisms than protein-coding regions. These findings confirmed that IGS regions are more susceptible to genetic recombination and polymorphisms than protein-coding regions. These outcomes also supported the conserved status of the protein-coding genes reported earlier in other plastid genomes [17, 28, 31, 34]. Nucleotide diversity is low in Euphobiaceae and Phyllanthaceae species, ranging from 0.007 to 0.24. This suggests that the plastome architecture is conserved in both families, consistent with previous studies [17, 72,73,74,75,76]. We identified six highly polymorphic regions shared by both families that could be used as molecular identifiers for the Chrozophora genus (rpl33-rps18, rps18-rpl20, rps15-ycf1, ndhG-ndhI, psaI-ycf4, petA-psbJ) all with π > 0.5. For further validation of the results of this study, more species of the Chrozophora genus and the Chrozophoreae tribe must be sequenced.


This is the first cp. genome of C. sabulosa, which is also the first member of the genus and tribe reported. It has a typical quadripartite structure and gene content that is quite similar to other chloroplast genomes. Our comparative analysis with other Euphorbiaceae and Phyllanthaceae species has highlighted the conserved structure of the chloroplast genome, the non-uniform distribution of GC percentage, and the unique gene content and organization of C. sabulosa. Our investigation into codon usage bias and RNA editing sites has provided insights into the evolutionary characteristics of the cp. genome and the potential impact on protein organization and function. The phylogenetic analysis in this study revealed Chrozophora’s unique position in the Euphorbiaceae family, supporting the idea that this genus is paraphyletic. This chloroplast genome from C. sabulosa will be useful for the molecular characterization of related Chrozophoreae tribe species in the future. The phylogenetic data presented by this study will also aid in determining the genus’ location in Euphorbiaceae family. The highly polymorphic loci identified in this study could be used as markers for future Chrozophora species identification. Furthermore, it is essential to sequence its sister Chrozophora species to fully comprehend its phylogenetic position and evolutionary dynamics. Overall, our study provides valuable information on the chloroplast genome of C. sabulosa and its evolutionary implications for the scientific community.

Materials and methods

Collection of plant material and its sequencing

Nida Javaid and Shagufta Jabeen conducted the formal plant material identification for this study. Fresh leaves of Chrozophora sabulosa plants were collected from the Lesser Cholistan desert in Pakistan (28.7719699, 71.3346211) and the verification process was carried out at the Cholistan Institute of Desert Studies (CIDS) of the Islamia University Bahawalpur. The herbarium of plants was submitted to the CIDS for identification. A voucher number was issued for Chrozophora sabulosa Kar. & Kir. Nilkari of Euphorbiaceae which is CIDS/IUB-1601/59. For DNA extraction, the phenol-chloroform (Organic) [77] procedure was used with a few modifications. These modifications included using 1 µL 2-Mercaptoethanol and precipitating DNA with absolute ethanol following a wash with 70% ethanol. The extracted DNA was then analyzed for its quantity and quality using Nanodrop and 1% agarose gel electrophoresis. A full genome shotgun was generated by a Paired-end library of 150 bp with 350 bp insert size using Illumina Hiseq2500 at the Beijing Institute of Genomics (BIG), Beijing, China.

Chloroplast genome assembly and genes annotation

We used FastQC software [78] to verify raw reads, and NOVOPlasty [79] to assemble the cp. genome. The LSC, SSC, and IR regions were defined manually by inspecting the sequence scaffolding. We employed GeSeq [80] ( and CpGAVAS ( to annotate the assembled cp. genome [81]. The annotation was manually verified using MAFFT alignment (Multiple Alignment with Fast Fourier Transform) [82] in Geneious Prime 2023.2.1 software [83]. To confirm the tRNA genes, tRNAscan-SE 1.23 program was used [84]. We determined the average sequencing coverage depth for the assembled C. sabulosa genome by mapping sequencing short reads to their respective de novo assembled cp. genomes using Tablet [85] and BWA [86]. We created the circular map of the cp. genome using OGDraw v1.2 ( [87]. The C. sabulosa cp. genome was submitted to GenBank and assigned the accession number MW541931. We also submitted the raw data obtained in this work to Sequence Read Archive (SRA) under project number PRJNA660981.

RNA editing site, codon usage, and amino acid frequency

We used Geneious Prime 2023.2.1 to analyze the amino acid frequency, and MEGA-X to examine Relative Synonymous Codon Usage (RSCU) in protein-coding sequences of C. sabulosa [88]. Additionally, we used Predictive RNA Editors for Plants Chloroplast (PREP-cp: to find RNA editing sites in 21 protein-coding genes [89].

Detecting simple sequence repeats (SSRs) and oligonucleotide repeats

The Perl script MIcroSAtellite Identification Tools (MISA) [90] software ( was used to identify simple sequence repeats (SSRs), with minimal repeat count of ten for mono-, five for di-, four for tri-, three tetra-, three Penta-, and three for hexanucleotides. Additionally, the REPuter program ( was utilized to detect reverse (R), complementary (C), palindromic (P), and forward (F) oligonucleotide repeats with an edit distance of two, and minimum repeat size of 10 bp. The maximum calculated repeat was set to 100 [91].

Phylogenetic analysis

The National Center for Biotechnology Information (NCBI) was used to select 18 species plastomes from the Euphorbiaceae family for phylogenetic analysis (Table S12). These species represent 11 tribes and three subfamilies of the Euphorbiaceae, including Acalyphoideae, Crotonoideae, and Euphorbioideae. Additionally, 10 species of the Phyllanthaceae family and two out-groups (Mangifera indica and Lannea coromandelica) from the Anacardiaceae family were chosen for analysis. In total, 31 species were included in the phylogenetic tree. The protein-coding sequences for each species were extracted and concatenated using Geneious Prime 2021.1.1. The sequences were then aligned using MAFFT in Geneious Prime 2023.2.1. The maximum likelihood tree was constructed online in Galaxy ( using IQ-TREE [92] and 1000 bootstrap replications with Ultrafast bootstrap settings [93]. The best-fit model was selected according to the Akaike information criterion (AIC) [94]. We completed the tree display using iTOL ( [95].

Comparative analyses with C. Sabulosa

The cp. genomes of four species from the Euphorbiaceae family (Ricinus communis, Manihot esculenta, Jatropha curcas, and Euphorbia helioscopia) and four species from the Phyllanthaceae familys (Antidesma bunius, Breynia fruticosa, Glochidion chodoense, and Phyllanthus urinaria) were compared to that of C. sabulosa. This was done using phylogenetic analysis outcomes. The Geneious Prime 2023.2.1 was used to perform a basic comparison of the plastomes. IRscope ( was used to observe IR contraction and expansion in the LSC/IRB/SSC/IRA junctions among these selected species [96]. Pairwise comparisons of the 78 protein-coding genes common in C. sabulosa and the eight selected species were performed to estimate synonymous (Ks) and non-synonymous (Ka) substitution rates. C. sabulosa was used as the reference member to make pairwise alignments with every gene of the selected species. Firstly, MAFFT in Geneious Prime 2023.2.1 software was used to perform 624 pairwise alignments of the identified genes among species [82, 83], and then DnaSP [97] was employed to examine Ka and Ks substitutions. Geneious Prime 2023.2.1 was used to calculate the number, coordinate placements, and types of substitutions (transition and transversion). DnaSP [97] was used to find InDels mutations for each part of the pairwise aligned cp. genomes. The alignment length, inDel average length, k(i) inDel diversity, and Pi(i) inDel diversity per site were also calculated.

Nucleotide diversity to determine highly polymorphic loci

We analyzed to compare Nucleotide diversity (π) values among representatives of both families along with C. sabulosa. We extracted a total of 837 regions, which included 59 CDS genes, 27 IGS regions, and seven intronic regions that were common in all species. To create multiple alignments of 93 locations of C. sabulosa with members of both families separately, we used MAFFT [82]. We only selected sequences that were longer than 200 base pairs [98]. To calculate the Nucleotide diversity (π), we used DnaSP [97]. We identified six highly polymorphic loci that had greater nucleotide diversity to compare among the selected species [98].

Data availability

The datasets generated and/or analyzed during the current study are available in the NCBI repository, ACCESSION MW541931. Raw data submitted Sequence Read Archive (SRA) under the project number PRJNA660981. Further data is present in the manuscript. For more in-depth details, there is a supplementary file.


  1. Malik S, Ahmad S, Sadiq A, Alam K, Wariss HM, Ahmad I, et al. A comparative ethno-botanical study of Cholistan (an arid area) and Pothwar (a semi-arid area) of Pakistan for traditional medicines. J Ethnobiol Ethnomed. 2015;11(1):1–20.

    Article  Google Scholar 

  2. Sher AA, Iqbal A, Muhammad N, Badshah SL, Emwas AH, Jaremko M. Prokinetic and Laxative effects of Chrozophora tinctoria Whole Plant Extract. Molecules. 2022;27(7):1–15.

    Article  Google Scholar 

  3. Li J, Gao X, Sang S, Liu C. Genome-wide identification, phylogeny, and expression analysis of the SBP-box gene family in Euphorbiaceae. BMC Genomics. 2019;20:1–15.

    Article  Google Scholar 

  4. Fayed AA, Soliman M, Faried A, Hassan M. Taxonomic evaluation of Euphorbiaceae Sensu Lato with special reference to Phyllanthaceae as a new family to the flora of Egypt. Biol Forum. 2020;11(1):47–64.

    Google Scholar 

  5. Hruaia V, Rinmuana L, Lalbiaknunga J, Ralte L. A study of correlation between morphology and evolution of Euphorbiaceae S.L. using taxonomic congruence and total evidence. Sci Technol J. 2021;9(1):49–55.

    Article  Google Scholar 

  6. Webster GL. Classification of the Euphorbiaceae. ANN MISSOURI BOT GARD. 1994;81(1):3–32.

    Article  Google Scholar 

  7. Islam MS, Ara H, Ahmad KI, Uddin MM. A review on medicinal uses of different plants of Euphorbiaceae family. Univers J Pharm Res. 2019;4(1):47–51.

    CAS  Google Scholar 

  8. Külkamp J, Riina R, Ram Y, Iganci RV. Systematics of Ditaxinae and related lineages within the subfamily Acalyphoideae (Euphorbiaceae) based on molecular phylogenetics. Biology (Basel). 2023;12(2):1–16.

    Google Scholar 

  9. Asif MH, Mantri SS, Sharma A, Srivastava A, Trivedi I, Gupta P, Mohanty CS, Sawant SV, Tuli R. Complete sequence and organisation of the Jatropha curcas (Euphorbiaceae) chloroplast genome. Tree Genet Genomes. 2010;6(6):941–52.

    Article  Google Scholar 

  10. Guo LY, Zhang XF, Zhu ZX, Wang HF. Complete plastome sequence of Balakata baccata (Roxb.) Esser (Euphorbiaceae). Mitochondrial DNA Part B Resour. 2021;6(4):1387–8.

    Article  Google Scholar 

  11. APG II. (2003). An update of the Angiosperm Phylogeny Group Classification for the Orders and Families of Flowering Plants: APG II. Botanical Journal of the Linnean Society, 2003, 141: 399–436.

  12. Wurdack KJ, Hoffmann P, Chase MW. Molecular phylogenetic analysis of uniovulate Euphorbiaceae (Euphorbiaceae Sensu Stricto) using plastid rbcL and trnL-F DNA sequences. Am J Bot. 2005;92(8):1397–420.

    Article  CAS  PubMed  Google Scholar 

  13. APG III. An update of the Angiosperm Phylogeny Group Classification for the Orders and Families of Flowering Plants: APG III. Bot J Linn Soc. 2009;2009(161):105–21.

    Google Scholar 

  14. Androsiuk P, Jastrzębski JP, Paukszto L, Makowczenko K, Okorski A, Pszczółkowska A, et al. Evolutionary dynamics of the chloroplast genome sequences of six Colobanthus species. Sci Rep. 2020;10(1):1–14.

    Article  Google Scholar 

  15. Chen J, Guo Y, Hu X, Zhou K. Comparison of the chloroplast genome sequences of 13 oil-tea Camellia samples and identification of an undetermined oil-tea Camellia species from Hainan Province. Front Plant Sci. 2022;12:1–16.

    Article  Google Scholar 

  16. Ma YP, Zhao L, Zhang WJ, Zhang1 YH, Xing X, Duan XX, et al. Origins of cultivars of Chrysanthemum—evidence from the chloroplast genome and nuclear LFY gene. J Syst Evol. 2020;58(6):925–44.

    Article  Google Scholar 

  17. Javaid N, Ramzan M, Khan IA, Alahmadi TA, Datta R. The chloroplast genome of Farsetia Hamiltonii Royle, phylogenetic analysis, and comparative study with other members of Clade C of Brassicaceae. BMC Plant Biol. 2022;22(384):1–19.

    Google Scholar 

  18. Mok YG, Hong S, Bae SJ, Cho SI, Kim JS. Targeted A-to-G base editing of chloroplast DNA in plants. Nat Plants. 2022;8(12):1378–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Yu CH, Dang Y, Zhou Z, Wu C, Zhao F, Sachs MS, Liu Y. Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol Cell. 2015;59(5):744–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Mittal P, Brindle J, Stephen J, Plotkin JB, Kudla G. Codon usage influences fitness through RNA toxicity. Proc. Natl. Acad. Sci. U. S. A 2018, 115(34): 8639–8644.

  21. Shen X, Guo S, Yin Y, Zhang J, Yin X, Liang C, et al. Complete chloroplast genome sequence and phylogenetic analysis of Aster tataricus. Molecules. 2018;23(10):1–14.

    Article  Google Scholar 

  22. Rehman U, Sultana N, Jamal AA, Muzaffar M, Poczai P. Comparative chloroplast genomics in Phyllanthaceae species. Diversity. 2021;13(9):1–18.

    Article  Google Scholar 

  23. Wang Z, Xu B, Li B, Zhou Q, Wang G, Jiang X, Wang C, Xu, Z. e. Comparative analysis of codon usage patterns in chloroplast genomes of six Euphorbiaceae species. PeerJ 2020, 2020(1): 1–17.

  24. Khan A, Asaf S, Khan AL, Shehzad T, Al-rawahi A, Al-harrasi A. Comparative chloroplast genomics of endangered Euphorbia species: insights into hotspot divergence, repetitive sequence variation, and phylogeny. Plants 2020, 9(2).

  25. Li Z, Long H, Zhang L, Liu Z, Cao H, Shi M, Tan X. The complete chloroplast genome sequence of Tung tree (Vernicia fordii): Organization and phylogenetic relationships with other angiosperms. Sci Rep. 2017;7(1):1–11.

    Google Scholar 

  26. Tangphatsornruang S, Uthaipaisanwong P, Sangsrakru D, Chanprasert J, Yoocha T, Jomchai N, Tragoonrung S. Characterization of the complete chloroplast genome of Hevea brasiliensis reveals genome rearrangement, RNA editing sites and phylogenetic relationships. Gene. 2011;475(2):104–12.

    Article  CAS  PubMed  Google Scholar 

  27. Zhang N, Long J, Wu Y, Zhang Y, Wu Z. The complete chloroplast genome of Indigofera stachyodes (Fabaceae), a traditional Chinese medicinal plant. Mitochondrial DNA Part B. 2022;7(3):474–5.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Yang C, Zhang N, Wu S, Jiang C, Xie L, Yang F, Yu Z. A comparative analysis of the chloroplast genomes of three Lonicera medicinal plants. Genes. 2023;14(3):6–9.

    Article  Google Scholar 

  29. Song W, Chen Z, He L, Feng Q, Zhang H, Du G, Shi C, Wang S. Comparative chloroplast genome analysis of Wax Gourd (Benincasa hispida) with three Benincaseae species, revealing evolutionary dynamic patterns and phylogenetic implications. Genes 2022, 13(3).

  30. Zhao J, Chen J, Xiong Y, He W, Xiong Y, Xu Y et al. Organelle genomes of Indigofera amblyantha and Indigofera pseudotinctoria: comparative genome analysis, and intracellular gene transfer. Ind Crops Prod 2023, 198(1).

  31. Zhang Z, Tao M, Shan X, Pan Y, Sun C, Song L, Pei X, JingZ, Dai Z. Characterization of the complete chloroplast genome of Brassica oleracea var. Italica and phylogenetic relationships in Brassicaceae. PLoS ONE. 2022;17(2):1–18.

    Article  Google Scholar 

  32. Li Y, Zhou J, Chen X, Cui Y, Xu Z, Li Y, Song J, Duan B, Yao H. Gene losses and partial deletion of small single-copy regions of the chloroplast genomes of two hemiparasitic taxillus species. Sci Rep. 2017;7(1):1–12.

    Google Scholar 

  33. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016;17(1):1–29.

    Article  Google Scholar 

  34. Frailey DC, Chaluvadi SR, Vaughn JN, Coatney CG, Bennetzen JL. Gene loss and genome rearrangement in the plastids of five hemiparasites in the family Orobanchaceae. BMC Plant Biol. 2018;18(1):1–12.

    Article  Google Scholar 

  35. Millen RS, Olmstead RG, Adams KL, Palmer JD, Lao NT, Heggie L, et al. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell. 2001;13(3):645–58.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Scobeyeva VA, Artyushin IV, Krinitsina AA, Nikitin PA, Antipin MI, Kuptsov SV et al. Gene Loss, pseudogenization in plastomes of genus Allium (Amaryllidaceae), and putative selection for adaptation to environmental conditions. Front Genet 2021, 12(1).

  37. Liang C, Wang L, Lei J, Duan B, Mag W, Xiao S, et al. A comparative analysis of the chloroplast genomes of four Salvia medicinal plants. Engineering. 2019;5(5):907–15.

    Article  CAS  Google Scholar 

  38. Sloan DB, Triant DA, Forrester NJ, Bergner LM, Wu M, Taylor DR. A recurring syndrome of accelerated plastid genome evolution in the angiosperm tribe Sileneae (Caryophyllaceae). Mol Phylogenet Evol. 2014;72(1):82–9.

    Article  CAS  PubMed  Google Scholar 

  39. Jansen RK, Wojciechowski MF, Sanniyasi E, Lee SB, Daniell H. Complete plastid genome sequence of the Chickpea (Cicer arietinum) and the phylogenetic distribution of rps12 and clpP intron losses among legumes (Leguminosae). Mol Phylogenet Evol. 2008;48(3):1204–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Downie SR, Palmer JD. A chloroplast DNA phylogeny of the Caryophyllales based on structural and inverted repeat restriction site variation. Syst Bot. 1994;19(2):236–52.

    Article  Google Scholar 

  41. He P, Huang S, Xiao G, Zhang Y, Yu J. Abundant RNA editing sites of chloroplast protein-coding genes in Ginkgo biloba and an evolutionary pattern analysis. BMC Plant Biol. 2016;16(1):1–12.

    Article  Google Scholar 

  42. Jansen RK, Cai Z, Raubeson LA, Daniell H, DePamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A. 2007;104(49):19369–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Zhu B, Qian F, Hou Y, Yang W, Cai M, Wu X. Complete chloroplast genome features and phylogenetic analysis of Eruca sativa (Brassicaceae). PLoS ONE. 2021;16(3):1–19.

    Article  Google Scholar 

  44. Yan C, Du J, Gao L, Li Y, Hou X. The complete chloroplast genome sequence of Watercress (Nasturtium officinale R. Br.): genome organization, adaptive evolution and phylogenetic relationships in Cardamineae. Gene. 2019;699(1):24–36.

    Article  CAS  PubMed  Google Scholar 

  45. Tang D, Wei F, Kashif MH, Munsif F, Zhou R. Identification and analysis of RNA editing sites in chloroplast transcripts of Kenaf (Hibiscus cannabinus L). Biotech. 2019;9(10):1–8.

    Google Scholar 

  46. Du X, Zenga T, Fenga Q, Hua L, Luoa X, Wenga Q, Hea J, Zhu B. The complete chloroplast genome sequence of yellow mustard (Sinapis alba L.) and its phylogenetic relationship to other Brassicaceae species. Gene. 2019;731:144340.

    Article  Google Scholar 

  47. Redwan RM, Saidin A, Kumar SV. Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae. BMC Plant Biol. 2015;15(1):1–20.

    CAS  Google Scholar 

  48. Hu ZY, Hua W, Huang SM, Wang HZ. Complete chloroplast genome sequence of rapeseed (Brassica napus L.) and its evolutionary implications. Genet Resour Crop Evol. 2011;58(6):875–87.

    Article  CAS  Google Scholar 

  49. Bi Y, Zhang MF, Xue J, Dong R, Du YP, Zhang XH. Chloroplast genomic resources for phylogeny and DNA barcoding: a case study on Fritillaria. Sci Rep. 2018;8(1):1–12.

    Article  Google Scholar 

  50. Mustafina FU, Yi DK, Choi K, Shin CH, Tojibaev KS, Downie SR. A comparative analysis of complete plastid genomes from Prangos fedtschenkoi and Prangos Lipskyi (Apiaceae). Ecol Evol. 2019;9(1):364–77.

    Article  PubMed  Google Scholar 

  51. Loeuille B, Thode V, Siniscalchi C, Andrade S, Rossi M, Pirani JR. Extremely low nucleotide diversity among thirty-six new chloroplast genome sequences from Aldama (Heliantheae, Asteraceae) and comparative chloroplast genomics analyses with closely related genera. PeerJ. 2021;9:1–36.

    Article  Google Scholar 

  52. Alzahrani D, Albokhari E, Yaradua S, Abba A. Complete chloroplast genome sequences of Dipterygium glaucum and Cleome chrysantha and other Cleomaceae species, comparative analysis and phylogenetic relationships. Saudi J Biol Sci. 2021;28(4):2476–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Liang H, Zhang Y, Deng J, Gao G, Ding C, Zhang L, Yang R. The complete chloroplast genome sequences of 14 Curcuma species: insights into genome evolution and phylogenetic relationships within Zingiberales. Front Genet. 2020;11:1–17.

    Article  Google Scholar 

  54. Wang J, Qian J, Jiang Y, Chen X, Zheng B, Chen S, et al. Comparative analysis of chloroplast genome and new insights into phylogenetic relationships of Polygonatum and tribe polygonateae. Front Plant Sci. 2022;13:e882189.

    Article  Google Scholar 

  55. Cui G, Wang C, Wei X, Wang H, Wang X, Zhu X, et al. Complete chloroplast genome of Hordeum brevisubulatum: genome organization, synonymous codon usage, phylogenetic relationships, and comparative structure analysis. PLoS ONE. 2021;16(12):1–19.

    Article  Google Scholar 

  56. Miao S, Luo Y, Bautista MAC, Chen T. Complete plastid genome characterization and phylogenetic analysis of Pentasachme Caudatum Wallich, ex Wight (Gentianales: Apocynaceae) from Guangdong, China. Mitochondrial DNA Part B Resour. 2021;6(3):858–9.

    Article  Google Scholar 

  57. APG IV. (2016). An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society, 2016, 181: 1–20.

  58. Mwine JT, Damme MMV, Marzouk P, Hussein SR, Kassem MES, Kawashty SA, El Negoumy SIM. Why do Euphorbiaceae tick as medicinal plants? A review of Euphorbiaceae family and its medicinal features. J Med Plants Res. 2011;5(5):652–62.

    Google Scholar 

  59. Secco RDS, Cordeiro I, Senna-Vale LD, Sales MFD, Lima LRD, Medeiros D, et al. An overview of recent taxonomic studies on Euphorbiaceae s. l. in Brazil. Rodriguésia. 2012;63(1):227–42.

    Article  Google Scholar 

  60. Menezes APA, Resende-Moreira LC, Buzatti RSO, Nazareno AG, Carlsen M, Lobo FP, et al. Chloroplast genomes of Byrsonima species (Malpighiaceae): comparative analysis and screening of high divergence sequences. Sci Rep. 2018;8(1):e20189–4.

    Article  Google Scholar 

  61. Meng D, Xiaomei Z, Wenzhen K, Xu Z. Detecting useful genetic markers and reconstructing the phylogeny of an important medicinal resource plant, Artemisia selengensis, based on chloroplast genomics. PLoS ONE. 2019;14(2):1–19.

    Article  CAS  Google Scholar 

  62. Li QJ, Su N, Zhang L, Tong R, Zhang X, Wang J, et al. Chloroplast genomes elucidate diversity, phylogeny, and taxonomy of Pulsatilla (Ranunculaceae). Sci Rep. 2020;10(1):1–12.

    Google Scholar 

  63. Cao J, Jiang D, Zhao Z, Yuan S, Zhang Y, Zhang T et al. Development of chloroplast genomic resources in Chinese Yam (Dioscorea polystachya). Biomed Res Int 2018, e6293847.

  64. Liu L, Wang Y, He P, Li P, Lee J, Soltis DE, Fu C. Chloroplast genome analyses and genomic resource development for epilithic sister genera Oresitrophe and Mukdenia (Saxifragaceae), using genome skimming data. BMC Genomics. 2018;19(1):1–17.

    Article  CAS  Google Scholar 

  65. Schwarz EN, Ruhlman TA, Sabir JSM, Hajrah NH, Alharbi NS, Al-Malki AL, et al. Plastid genome sequences of Legumes reveal parallel inversions and multiple losses of rps16 in Papilionoids. J Syst Evol. 2015;53(5):458–68.

    Article  Google Scholar 

  66. Guisinger MM, Kuehl JV, Boore JL, Jansen RK. Extreme reconfiguration of plastid genomes in the angiosperm family Geraniaceae: rearrangements, repeats, and codon usage. Mol Biol Evol. 2011;28(1):583–600.

    Article  CAS  PubMed  Google Scholar 

  67. Dong WL, Wang RN, Zhang NY, Fan WB, Fang MF, Li ZH. Molecular evolution of chloroplast genomes of orchid species: insights into phylogenetic relationship and adaptive evolution. Int J Mol Sci. 2018;19(3):e19030716.

    Article  Google Scholar 

  68. Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000;17(1):32–43.

    Article  CAS  PubMed  Google Scholar 

  69. Guo X, Liu J, Hao G, Zhang L, Mao K, Wang X, Zhang D, et al. Plastome phylogeny and early diversification of Brassicaceae. BMC Genomics. 2017;18(1):1–9.

    Article  Google Scholar 

  70. Zhao B, Liu L, Tan D, Wang J. Analysis of phylogenetic relationships of Brassicaceae species based on chs sequences. Biochem Syst Ecol. 2010;38(4):731–9.

    Article  CAS  Google Scholar 

  71. Choi KS, Kwak M, Lee B, Park SJ. Complete chloroplast genome of Tetragonia tetragonioides: molecular phylogenetic relationships and evolution in Caryophyllales. PLoS ONE. 2018;13(6):1–11.

    Article  Google Scholar 

  72. Alipour H, Bihamta MR. Genotyping-by-sequencing (GBS) revealed molecular genetic diversity of Iranian wheat landraces and cultivars. Front Plant Sci. 2017;8(1):1–14.

    Google Scholar 

  73. Smith DR, Keeling PJ. Mitochondrial and plastid genome architecture: Reoccurring themes, but significant differences at the extremes. Proc. Natl. Acad. Sci. U. S. A 2015, 112(33): 10177–10184.

  74. Cai J, Ma PF, Li HT, Li DZ. Complete plastid genome sequencing of four Tilia species (Malvaceae): a comparative analysis and phylogenetic implications. PLoS ONE. 2015;10(11):1–13.

    Article  Google Scholar 

  75. Wicke S, Schneeweiss GM, dePamphilis CW, Müller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol Biol. 2011;76(3–5):273–97.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Xia Y, Chen F, Du Y, Liu C, Bu G, Xin Y, Liu B. A modified SDS-based DNA extraction method from raw soybean. Biosci Rep. 2019;39(2):1–10.

    Article  Google Scholar 

  77. Andrews S, FastQC. A Quality Control Tool for High Throughput Sequence Data. 2010. [Online]. Available:

  78. Dierckxsens N, Mardulyn P, Smits G, NOVOPlasty. De novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017;45(4):1–9.

    Google Scholar 

  79. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, Greiner S. GeSeq - Versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45(W1):W6–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Shi L, Chen H, Jiang M, Wang L, Wu X, Huang L, Liu C. CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Res. 2019;47(W1):W65–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Katoh K, Kuma KI, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33(2):511–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9.

    Article  PubMed  PubMed Central  Google Scholar 

  83. Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33(2):686–9.

    Article  Google Scholar 

  84. Milne I, Bayer M, Cardle L, Shaw P, Stephen G, Wright F, Marshall D. Tablet-next generation sequence assembly visualization. Bioinformatics. 2009;26(3):401–2.

    Article  PubMed  PubMed Central  Google Scholar 

  85. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95.

    Article  PubMed  PubMed Central  Google Scholar 

  86. Greiner S, Lehwark P, Bock R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019;47(W1):W59–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Mower JP. The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments. Nucleic Acids Res. 2009;37(2):253–9.

    Article  Google Scholar 

  89. Thiel T, Michalek W, Varshney RK, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in Barley (Hordeum vulgare L). Theor Appl Genet. 2003;106(3):411–22.

    Article  CAS  PubMed  Google Scholar 

  90. Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R, REPuter. The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29(22):4633–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Haeseler AV, Lanfear R. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Hoang DT, Chernomor O, Haeseler AV, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35(2):518–22.

    Article  CAS  PubMed  Google Scholar 

  93. Kalyaanamoorthy S, Minh BQ, Wong TKF, Haeseler AV, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Letunic I, Bork P. Interactive tree of life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:256–9.

    Article  Google Scholar 

  95. Amiryousefi A, Hyvönen J, Poczai P. The chloroplast genome sequence of Bittersweet (Solanum dulcamara): plastid genome structure evolution in Solanaceae. PLoS ONE. 2018;13(4):e0196069.

    Article  PubMed  PubMed Central  Google Scholar 

  96. Rozas J, Ferrer-Mata A, Sanchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, Sanchez-Gracia A. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302.

    Article  CAS  PubMed  Google Scholar 

  97. Javaid N, Ramzan M, Jabeen S, Shah MN, Danish S. Genomic exploration of Sesuvium sesuvioides: comparative study and phylogenetic analysis within the order Caryophyllales from Cholistan desert, Pakistan. BMC Plant Biol. 2023;23(658):1–19.

    Google Scholar 

Download references


The authors acknowledge the contribution of Beijing Institute of Genomics (BIG), China in sequencing the chloroplast genome of C. sabulosa.


This research is supported by Hainan University Research Initiation Project Fund (XJ2400005264), Nature National Science Foundation NSFC(N0. 32371959).

Author information

Authors and Affiliations



Conceptualization: Nida Javaid, Musarrat Ramzan, Muhammad Anwar Methodology = Nida Javaid, Musarrat Ramzan, Shagufta Jabeen Writing original draft preparation = Nida Javaid, Musarrat Ramzan, Shagufta Jabeen, Muhammad Anwar Data collection and analysis= Nida Javaid and Musarrat Ramzanundertook the formal identification of the plant material used in this study and analysis. Funding acquisition = Muhammad Anwar, Yanjun Du, Song Xiqiang Supervision = Musarrat Ramzan, Muhammad Anwar, SONG Xiqiang. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Musarrat Ramzan, Muhammad Anwar or Song Xiqiang.

Ethics declarations

Ethics approval and consent to participate

We all declare that manuscript reporting studies do not involve any human participants, human data, or human tissue. So, it is not applicable. Nida Javaid and Musarrat Ramzan undertook the formal identification of the plant material used in this study. We confirmed that we have taken permission to collect the plant material used in this study. This study complies with relevant institutional, national, and international guidelines. Plant voucher number for Euphorbiaceae Chrozophora sabulosa Kar. & Kir. Nilkari is CIDS/ IUB-1601/59

Consent for publication

Not Applicable (NA).

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Javaid, N., Ramzan, M., Jabeen, S. et al. The chloroplast genome of Chrozophora sabulosa Kar. & Kir. and its exploration in the evolutionary position uncertainty of genus Chrozophora. BMC Genomics 25, 597 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: