Skip to main content

Complete chloroplast genomes of Rubus species (Rosaceae) and comparative analysis within the genus



Rubus is the largest genus of the family Rosaceae and is valued as medicinal, edible, and ornamental plants. Here, we sequenced and assembled eight chloroplast (cp) genomes of Rubus from the Dabie Mountains in Central China. Fifty-one Rubus species were comparatively analyzed for the cp genomes including the eight newly discovered genomes and forty-three previously reported in GenBank database (NCBI).


The eight newly obtained cp genomes had the same quadripartite structure as the other cp genomes in Rubus. The length of the eight plastomes ranged from 155,546 bp to 156,321 bp with similar GC content (37.0 to 37.3%). The results indicated 133–134 genes were annotated for the Rubus plastomes, which contained 88 or 89 protein coding genes (PCGs), 37 transfer RNA genes (tRNAs), and eight ribosomal RNA genes (rRNAs). Among them, 16 (or 18) of the genes were duplicated in the IR region. Structural comparative analysis results showed that the gene content and order were relatively preserved. Nucleotide variability analysis identified nine hotspot regions for genomic divergence and multiple simple sequences repeats (SSRs), which may be used as markers for genetic diversity and phylogenetic analysis. Phylogenetic relationships were highly supported within the family Rosaceae, as evidenced by sub-clade taxa cp genome sequences.


Thus, the whole plastome may be used as a super-marker in phylogenetic studies of this genus.

Peer Review reports


Rubus is the largest genus in the family Rosaceae, with approximately 700 species. Plants are widely distributed in the temperate and sub-tropical regions of the Northern Hemisphere, with only a few species found in the Southern Hemisphere. There are more than 208 species found in China, 139 of which are endemic [1]. There is a growing interest in the abundance of regional taxonomic treatments, new species, and new accounts of the genus Rubus in China [2,3,4,5,6,7,8,9,10,11,12].

Rubus has been used in traditional Chinese medicine since ancient times. The fruits are well-known in Asia, Europe, and North America and have a long history of usefulness in these regions, especially in some European countries. Some natural species, such as R. corchorifolius, are widely cultivated for their medicinal, edible, and ornamental value.

Rubus is well-known for its diversity, leaf shape, inflorescence architecture, reproductive pattern, and other features. The plant is typically armed with bristles, prickles, or glandular hairs; the leaf exhibits simple, palmate, or pinnate shapes; flowers are pentamerous and mostly bisexual; achenes are drupelets or drupaceous and are aggregated on a penduncle as semispherical, conical, or cylindrical fruits [1]. Rubus is one of the most difficult taxa of flowering plants to classify due to its highly variable morphologies, and its complicated apomixis, polyploidy, and hybridization [13, 14]. Therefore, it is essential to construct a phylogeny tree using molecular evidence to better understand the relationships between species and improve the development and utilization of wild germplasm resources.

The most widely-accepted taxonomic system of the genus Rubus was built by Focke [15,16,17]. According to Focke, Rubus was separated into 12 subgenera, the largest being Rubus (132 species), Idaeobatus (117 species), and Malachobatus (115 species). Rubus is found mainly in Europe and North America, according to Flora of China (FOC), while Malachobatus and Idaeobatus are typically found throughout Asia, especially China [1].

Several molecular phylogenetic studies have tried to resolve the genetic relationships of Rubus [13, 18,19,20,21,22]. Wang et al. (2016) selected three chloroplasts (rbcL, rpl20-rps12, and trnG-trnS) and three nuclear genetic markers (nrITS, GBSSI-2, and PEPC) to resolve the phylogenetic relationships of 142 Chinese Rubus species, in which the phylogeny showed a certain degree of inadequacy between the chloroplast and nuclear markers.

There are few reports on the chloroplast genomes of Rubus to date [23,24,25,26,27,28,29,30]. Recently, a comparative analysis of the characteristics of the Rubus cp genome was reported, and together with about other ten species was used to construct a phylogenetic tree, but the mainly species was distributed in Taiwan [31]. Here, we sequenced and assembled eight cp genomes of Rubus and comparatively analyzed. And together with previously reported 38 cp genomes downloaded from the organelle genome database at National Center for Biotechnology Information ( [32], phylogenetic analysis was also performed. Our results, including gene content, size, nucleotide variable sites, identified SSRs, and phylogeny analysis, may improve our understanding of the cp genomes structure of genus Rubus and provide resources for genetic diversity and phylogenetic analyses in future studies.

Results and discussion

General features of Rubus chloroplast genomes

For the eight newly sequenced species, Illumina PE sequencing generated 3,408,285,600 (R. innominatus) to 9,832,178,700 (R. trianthus) clean reads, with mean coverage from 831 (X) in R. innominatus to 1229 (X) in R. trianthus. The newly assembled plastome of the eight Rubus samples had a quadripartite structure forming a circular molecule ranging from 155,546 bp (R. trianthus) to 156,321 bp (R. lambertianus) in length (Table 1). The eight cp genomes comprised a large single copy (LSC) region (85,028–85,883 bp) and a small single copy (SSC) region (18,710–18,874 bp), divided by two copies of inverted repeats (IRs) (25,761–25,994 bp) (Fig. 1, Table 1). The guanine-cytosine (GC) content of the eight cp genomes differed slightly, from 37.0% (R. trianthus and R. innominatus) to 37.3% (R. coreanus, R. hirsutus and R. parvifolius) (Table 1). The GC content of the coding sequence (CDS) ranged from 37.8 to 38.0%. Same as previously reported GC content of Rubus in Taiwan, the GC content of IR regions (42.8–42.9%) was higher than LSC (34.8–35.2%) and SSC regions (30.9–31.4%) [31].

Table 1 Accession numbers and features of the eight Rubus plastomes in the present study
Fig. 1
figure 1

Gene maps of the complete cp genome of eight species of Rubus. Genes on the inside of the circle are transcribed clockwise, while that outside are transcribed counter clockwise. Genes were colored according to their functional groups. The darker gray in the inner circle corresponds to GC content, whereas the lighter gray corresponds to AT content

Totally, 134 genes were annotated in plastomes of Rubus tephrodes, R. coreanus, R. trianthus and R. hirsutus, included 89 CDS, 37 transfer RNA genes (tRNAs), and eight ribosomal RNA (rRNAs). Because infA gene were not annotated, four other Rubus species (R. lambertianus, R. parvifolius, R. innominatus and R. hunanensis) contained 133 genes, including 88 protein-coding genes, eight rRNAs, and 37 tRNAs. Among all, 16 or 18 genes had a double copy in the IR region: seven or five protein-coding genes, seven tRNAs and four rRNAs (Table 1 and Table 2); 17 genes contained one intron (rps16, rpoC1, petB, petD, rpl16, rpl22, rpl2, ndhA, ndhB, and eight tRNA genes) and two genes had two introns (clpP and ycf3) (Table S1). The trnK-UUU gene had the longest intron of 2488–2518 bp in the eight Rubus plastomes.

Table 2 Gene contents in the cp genomes of Rubus species

IR contraction and expansion

Chloroplast genome structures, including the gene content and order, were compared, and analyzed among the eight Rubus species (Table 2 and Table S1). The results showed that the eight newly assembled chloroplasts was relatively conserved when concerned in four regions (LSC, SSC and two IRs) boundary (Fig. 2). The lengths of the IR region of the 46 Rubus ranged from 25,758–25,993 bp versus 26,238 bp in Fragaria (Fig. S1). The LSC-IRb border was located between the genes rps19 and rpl2 for ten chroroplast genomes, within the rps19 gene for Rubus niveus, and between the genes trnH and rpl2 for R. leucanthus. Three types of SSC-IRa borders were detected among the twelve plastomes. In R. corchorifolius and R. boninensis the ycf1 gene was situated in the IRa region, 191 bp and 2 bp apart from the SSC-IRa region, respectively. In R. tephrodes, R. niveus and R. coreanus, the ycf1 gene was found entirely in the SSC region and was 0–192 bp away from the SSC-IRa region. For the other seven plastomes, the SSC-IRa border located in the coding region of ycf1. The length of the ycf1 gene range from 4437 bp to 5750 bp and was typically found in the SSC region. The ndhF gene located in the SSC region at the SSC-IRb border for all but two species (R. takesimensis and Fragria chiloensis). The LSC-IRa border was between the rpl2 and trnH genes. The trnH gene was found in the LSC region, which has also been reported in dicots [33, 34].

Fig. 2
figure 2

The comparison of four regions (LSC, SSC and two IRs) among twelve cp genome. Numbers above or near the colored genes indicated distances between the gene and the edge of borders. The figure is not in scale for length

Comparative Plastome sequence divergence and hotspots regions

Collinearity detection was carried out to analyze and compare the chloroplast genomes. Mauve aligment of plastomes showed that no gene rearrangements within the chloroplast genomes of the eight Rubus (Fig. 3). The eight newly assembled chloroplast genomes were compared using the annotated R. tephrodes as the reference cp sequence (Fig. 4) to determine interspecific divergence using mVISTA software. The results show that the inverted repeat regions were more stable than the single copy region, and are consistent with those of other studies [25, 35, 36]. The most diverse regions were the intergenic spacers, including rps16-trnQ, trnL-trnT, and rpl32-trnL-ccsA.

Fig. 3
figure 3

Mauve alignment of eight Rubus cp genome revealing no interspecific rearrangements

Fig. 4
figure 4

Visualized alignment of the Rubus cp genome sequences with annotated R. tephorodes as reference, using mVISTA. The x-axis represents the base sequence of the alignment and the y-axis represents the pairwise percent identity within 50–100%. Grey arrows represent positions and directions of the genes

The nucleotide variability (Pi) was calculated with the resulting average value of 0.008, and a range of 0 to 0.0313. SSC and LSC were highly variable and IR was relatively conserved (Fig. 5). Nine intergenic regions (trnK-rps16, rps16-psbK, psbI-trnS-trnG, trnG-atpA, petN-psbM-trnD, trnE-psbD, rps4-trnL, petA-psbF, rpl16-rps3, ndhF, rpl32-trnL-ccsA and ycf1) were found to be higher variable with Pi values > 0.02, and the first nine fragments were located in the LSC region, while the rest two located in the SSC region. Only two highly variable trnS-trnG and ndhF region (other region used in reference: rpl16, trnL-trnF, rbcL, rpl20-rps12) has been used to reconstruct the phylogeny of genus Rubus to date [18, 19, 21, 22, 37,38,39]. The highly variable regions detected by comparing entire chloroplast genomes may be useful markers for further phylogenetic study.

Fig. 5
figure 5

Sliding window analysis of the entire cp genome of eleven Rubus species (window length: 600 bp; step size: 200 bp). X-axis: position of the midpoint of a window; Y-axis: nucleotide diversity of each window

Synonymous (Ks) and nonsynonymous (Ka) substitution rate analysis

The ratios of nonsynonymous (Ka) vs synonymous (Ks) substitutions were calculated for shared unique protein coding genes (PCGs) in the eight Rubus cp genome, with R. tephrodes as the reference (Fig. 6, Table S2). Among 79 shared genes, 31 genes could not be calculated because no variation for identical sequences or without nonsyonymous or synonymous nucleotide substitution. Most of the Ka/Ks ratios were less than one, except rpl22 in R. trianthus (1.1892), rpl16 in R. innominatus (1.27177), R. parvifolius (1.27177), R. lambertianus (1.27177) and R. hunanensis (1.27177). The results consisted with expected for common sense that the Ka/Ks ratio of most gene is less than one [40]. The above results also indicated the two except genes (rpl16 and rpl22) are undergoing positive selection and some of mutation of the two genes in these species must be advantageous.

Fig. 6
figure 6

Ka/Ks values of protein-coding genes of the seven comparative combinations. Ka, nonsynonymous; Ks, synonymous

SSR polymorphisms and long repeat structure

MISA was used to detect the total number of simple sequence repeats (SSRs) in totally 46 Rubus plastomes (Fig. 7, Table S3). Totally, 2243 SSRs were found in the 46 plastomes of Rubus, of which 3621 SSRs (83.55%) located in the LSC region, 671 SSRs (15.48%) were in the SSC region, and 42 SSRs (.097%) were in the IR region. The number of SSRs detected among the 46 species ranged from 38 (R. parvifolius) to 63 (R. trianthus) (Table S4). The mononucleotide repeat units were the most identified SSRs. A/T were the most abundant repeats, while AT/TA and AAT/TAA repeats were most found in the dinucleotide and trinucleotide types, respectively. The SSR results showed that A/T repeats were common in the cp genomes, and are consistent with the results of previous studies [35, 41,42,43]. The SSRs may be potential specific molecular markers to use in genetic diversity and phylogenetic studies for Rubus and its related species.

Fig. 7
figure 7

Number and type of simple sequence repeats in the 46 Rubus cp genome. A, Total number of SSRs detected in each species. B, Frequencies of identified SSRs in the large single-copy (LSC), small single-copy (SSC), and inverted repeat (IR) regions. C, Numbers and types of SSRs detected in each species

In total, 2300 long repeat structures were identified in the 46 plastomes (Fig. 8, Table S5), which including 895 (38.91%) forward repeats, 907 (39.43%) palindromic repeats, 457 (19.87%) reverse repeats and 41 (1.78%) complement repeats, respectively. Most of these repeats (1648, 71.65%) were distributed in the non-coding regions.

Fig. 8
figure 8

Number and type of long repeat sequences in the 46 Rubus cp genome. A, Numbers and types of longer repeats in each species. B, Frequency of each repeat type. C, Presence of longer repeats in coding regions and non-coding regions

Phylogenetic analyses

Fifty-one Rubus chloroplast genome and other seven plastomes of Rosaceae were used to examine the usability of the chloroplast genome in phylogeny analysis. The maximum likelihood (ML) tree constructed with RaxML and Bayesian inference (BI) tree contructed by MrBayes was topologically congruent and highly supported (Fig. 9). The relationship of the tribe below Rosaceae was congruent with previously reported results [25]. The tree also strongly supported the monophyly of the genus Rubus. The Rubus clade showed that some species of Idaeobatus were likely the original taxa and the polyploidy group may have originated from those primitive species. Other sections of the tree may have evolved from these taxa via different evolutionary events. The main objective of our study was to test the discriminatory power of the chloroplast genome sequences in genus Rubus. Additional studies with broader sampling strategies are needed to test the efficiency of the regions identified by our study to clarify the phylogeny of genus Rubus.

Fig. 9
figure 9

The maximum likelihood (ML) and Bayesian inference (BI) phylogenetic tree based on whole chloroplast genome data. The support values (bootstrap value [BS]/posterior probability [PP]) are indicated at the branches. BS and PP of 100% are indicated by an asterisk

Materials and methods

Material sampling, DNA isolation, and sequencing

Here, eight species, Rubus tephrodes, R. coreanus, R. trianthus, R. lambertianus, R. hirsutus, R. parvifolius, R. hunanensis and R. innominatus were chosen for sequencing based on their special taxonomic status, their absence or not well assembled in the NCBI. Young, disease-free leaves of wild seedlings were collected and quickly dried with silica gel (Table 3). Voucher specimens were collected for each species and deposited in the herbarium of Huanggang Normal University (formerly, Herbarium of Biology Department of Huanggang Teachers College, HGTC). The specimens were identified by prof. Hongjin Dong (Huanggang Normal University) before deposited. The total genomic DNA was extracted from the leaf tissues using the CTAB method [44] with minor modifications and stored at − 80 °C. The total genomic DNA were sheared into fragments of about 300 bp to construct libraries according to manufacturer’s instructions (Illumina, Hayward, CACA, USA). Sequencing was performed on the Illumina HiSeq 2500 Sequencing System at BGI-Wuhan.

Table 3 Sampled species and their voucher specimens used in this study

Chloroplast genome assembly and annotation

Raw data with adapter sequences or low-quality sequences was filtered by SOAPnuke software developed by BGI [45]. Then, the high-quality PE reads were used for subsequent analyses. The chloroplast genome was de novo assembled in the GetOrganelle pipeline (; [46]). The output graphs file “gfa” was checked in Bandage v. 0.8.1 [47] and the finally sequence paths were selected when the minimum depth of contigs above 100 × and the minimum length > 300 bp. To validate the assembled cp sequence error, raw sequencing reads were mapped to the assembled plastomes using the Bowite2 [48] plug-in in Geneious ver 8.0.2 [48]. The assembled cp genome sequence of the eight Rubus samples was annotated using Perl script of PGA [49]. The annotated results were summarized and the final annotations were manually checked using Geneious ver.8.0.2 [48]. The assembly and annotation were completed by mapping the reported plastomes of other well-annotated Rubus species. The whole cp sequence with annotated information was submitted to GenBank with accession numbers MT478113-MT478115 and OK127882- OK127886. The physical map of the annotated cp genomes was drawn using the online program OGDRAW [50].

Comparative Plastome sequence divergence analysis

Gene order comparison of newly-assembled Rubus plastomes were performed using the Mauve v.1.0.0 [51] plugin in Geneious v.8.0.2 [48]. We compared the completed plastomes of the eight Rubus using mVISTA in Shuffle-LAGAN mode [52] with R. tephrodes as the reference. In order to compare the inverted repeated region (IR) contraction or expansion, the detailed information of the boundaries between IR and single copy region (SC) regions were manually obtained in Geneious [48]. The chloroplast genome sequence data sets used for final analysis were aligned using the Windows version of MAFFT [53]. The output data matrix was visualized and manually edited using Geneious [48] or BioEdit [54]. The nucleotide diversity (Pi) of the plastome sequence was calculated using DnaSP v. 6.10 [55], with respect to the whole cp genomes. We used a step size of 200 bp and window length of 800 bp for sliding window analysis.

Gene selective pressure analysis of eight Rubus cp PCGs

To analysis variation in the evolutionary rates of chloroplast genes, the the Ka_Ks Calculator program Caculator 2.0 was used to calculate the nonsynonymous substitution rates (Ka), synonymous rates (Ks), and their ratios (Ka/Ks). Before calculating, the shared unique protein coding gene sequences (PCGs) was aligned in MEGA [56] (version 10.1.6) by mode of MUSCLE (codons). The gene data matrix was then saved as Clustal (.aln) or Phylip (.phy) format.

Simple sequence repeats and repeat structure analysis

The Perl script MISA [57] was used to identify microsatellites (mono-, di-, tri-, tetra-, penta-, hexanucleotide repeats), with the following parameters (unit size, min repeats): 10 for mononucleotide, 5 for dinucleotide, 4 for trinucleotide, and 3 for tetra-, penta-, and hexanucleotide. The online REPuter program was used to detect four types of long repeat sequences (forward, reverse, palindromic and complement) in Rubus plastomes with a hamming distance of 3 and a minimum repeat size of 30 bp [58].

Phylogenetic analyses

The newly assembled cp genome of Rubus and relative taxa were downloaded from NCBI and then aligned with MAFFT for phylogenetic analysis [53]. Finally, fifty-eight plastomes were used to construct the phylogeny tree (Table S3). RAxML (Version 8 for Windows) was used to run maximum likelihood (ML) analysis [59] with a bootstrap value of 1000. The general time-reversible (GTR) model with a gamma model was used at normal settings to determine the rate of heterogeneity. The Bayesian inference (BI) tree was generated in MrBayes version 3.2 [60] as implemented on the Cyberinfrastructure for Phylogenetic Research (CIPRES) Science Gateway (, [61]) using the default settings. The best model was determined for each sequence partition, after comparisons among 24 models of nucleotide substitution using jModeltest v.2.1.10 [62]. Figtree v1.4 [63] was used to visualize and adjust the ML trees. The graph generated from Figtree was further revised with Adobe Illustrator (Adobe Systems, Mountain View, CA, USA).


The complete chloroplast sequences of Rubus tephrodes, R. coreanus, and R. trianthus, R. lambertianus, R. hirsutus, R. parvifolius, R. hunanensis and R. innominatus of the section Lampobatus, and Idaeobatus were reported in this study. The comparison analysis of fouty-six Rubus plastomes indicated that the structure was relatively conserved. However, the SSC-IR and LSC-IR edges were variable among the chloroplast genomes and the IR region was less varied than the SC region. We identified the location of the SSR sites and highly changeable regions, which may be used as markers in future studies of the Rubus species. The ML and BI phylogenetic tree constructed from whole chloroplast sequences illustrated the phylogenetic relationship and was consistent with the results of previous studies. Our results indicate that the whole plastome may be used as a reliable marker in phylogenetic studies of this genus.

Availability of data and materials

The complete chloroplast genome sequences for the newly assembled eight Rubus species are available at GenBank: MT478113-MT478115 and OK127882- OK127886. Raw sequencing reads used in this study were deposited in the GenBank database of Sequence Read Archive (Detailed information was listed in Table S2).


  1. 1.

    Lu LD, Boufford D. Rubus Linnaeus, Sp. P1. 1:492. 1753. Flora of China. 2003;9:195–285.

    Google Scholar 

  2. 2.

    Zhang SD. Notes on Rubus subsect. Lineati from China. Plant Sci J. 2012;30(3):301–4.

    CAS  Google Scholar 

  3. 3.

    Zhang YJ, Qiao D, Ma JX, Wang HC. Two newly recorded species of Rubus (Rosaceae) from China. Acta Botan Boreali-Occiden Sin. 2017;37(4):805–8.

    Google Scholar 

  4. 4.

    Wang HC, Ma JX, He ZR. Rubus pseudoswinhoei (Rosaceae), a distinct new species from Yunnan, Southwest China. Phytotaxa. 2016;261(2):147–56.

    Google Scholar 

  5. 5.

    Wang HC, Zhang RZ, Liang ZL, He ZR. A new species and two new synonyms in Chinese Rubus (Rosaceae). Ann Bot Fenn. 2017;54(1–3):105–9.

    Google Scholar 

  6. 6.

    Xiong XH, Ju WB, Zhang C, Li JC, Gao XF. Rubus huagaoxiensis and R. lingtianus (Rosaceae), two new species from Sichuan, China. Phytotaxa. 2019;400(1):14–22.

    Google Scholar 

  7. 7.

    Huang JY, Hu JM. Revision of Rubus (Rosaceae) in Taiwan. Taiwania. 2009;54(4):285–310.

    Google Scholar 

  8. 8.

    Sun Y, Boufford DE. Rubus naruhashii (Rosaceae), a new name for R. clivicola E. Walker. J Japanese Bot. 2012;87:135–6.

    Google Scholar 

  9. 9.

    Wang HC, He ZR, Sun H. Rediscoveries of two species of Rubus (Rosaceae) in China. Guihaia. 2012;32(3):315–7.

    Google Scholar 

  10. 10.

    Wang HC, Wang YH, Sun H. Nomenclatural changes in Rubus (Rosaceae) mostly from China. Phytotaxa. 2013;114:58–60.

    Google Scholar 

  11. 11.

    Wang HC, Sun H. Rubus tetsunii, a replacement name for the Chinese R. lobatus T. T. Yu & L. T. Lu (Rosaceae). Phytotaxa. 2013;79:63–4.

    Google Scholar 

  12. 12.

    Wang HC. Two replacement names in Asian Rubus (Rosaceae). Phytotaxa. 2014;172:293–6.

    Google Scholar 

  13. 13.

    Alice L, Campbell C. Phylogeny of Rubus (Rosaceae) based on nuclear ribosomal DNA internal transcribed spacer region sequences. Am J Bot. 1999;86(1):81–97.

    CAS  PubMed  Google Scholar 

  14. 14.

    Thompson MM. Survey of chromosome numbers in Rubus (Rosaceae: Rosoideae). Ann Mo Bot Gard. 1997;84(1):128–64.

    Google Scholar 

  15. 15.

    Focke WO. Species Ruborum. In: Monographiae Generis Rubi Prodromus part I. New York: NY: Stuttgart,E. Schweizerbart; 1910. p. 1–120.

    Google Scholar 

  16. 16.

    Focke WO. Species Ruborum. In: Monographiae Generis Rubi Prodromus part II. New York: NY: Stuttgart,E. Schweizerbart; 1911. p. 121–223.

    Google Scholar 

  17. 17.

    Focke WO. Species Ruborum. In: Monographiae Generis Rubi Prodromus part III. New York: NY: Stuttgart,E. Schweizerbart; 1914. p. 224–498.

    Google Scholar 

  18. 18.

    Yang JY, Pak JH. Phylogeny of korean Rubus (rosaceae) based on ITS (nrDNA) and trnL/F intergenic region (cpDNA). J Plant Biol. 2006;49(1):44–54.

    CAS  Google Scholar 

  19. 19.

    Alice LA, Dodson TM, Sutherland BL. Diversity and relationships of Bhutanese Rubus (Rosaceae). Acta Hortic. 2008;777(777):63–70.

    Google Scholar 

  20. 20.

    Sochor M, Vašut RJ, Sharbel TF, Trávnícek B. How just a few makes a lot: speciation via reticulation and apomixis on example of European brambles (Rubus subgen. Rubus, Rosaceae). Mol Phylogenet Evol. 2015;89:13–27.

    PubMed  Google Scholar 

  21. 21.

    Li Z, Wang Y, Chen Q, Luo Y, Zhang Y, Tang HR, et al. Phylogenetic utility of Chinese Rubus (Rosaceae) based on ndhF sequence. Acta Horticulturae Sinica. 2015;42(1):19–30.

    Google Scholar 

  22. 22.

    Morden CW, Gardner DE, Weniger DA. Phylogeny and biogeography of pacific Rubus subgenus Idaeobatus (Rosaceae) species: investigating the origin of the endemic Hawaiian raspberry R. macraei. Pac Sci. 2003;57(2):181–97.

    Google Scholar 

  23. 23.

    Yang JY, Pak JH, Kim SC. The complete chloroplast genome sequence of Korean raspberry Rubus crataegifolius (Rosaceae). Mitochondrial DNA Part B. 2017;2(2):793–4.

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Chen Q, Wang Y, Sun B, Chen T, Luo Y, Zhang Y, et al. The complete chloroplast genome sequence of Rubus coreanus, an excellent diseases-resistant resource. Mitochondrial DNA Part B. 2019;4(1):216–7.

    Google Scholar 

  25. 25.

    Guo W, Chen YZ, Deng L, Wu W. The complete chloroplast genome sequence of Rubus leucanthus Hance (Rosaceae). Mitochondrial DNA Part B. 2019;4(1):376–7.

    Google Scholar 

  26. 26.

    Chen Q, Wang Y, Wang XR, Tang HR. The complete chloroplast genome sequence of Rubus lambertianus var. glaber: sequence divergence and phylogenetic application. Mitochondrial DNA Part B. 2020;5(2):1261–2.

    Google Scholar 

  27. 27.

    Chen Y, Guo W, Wang LY, Wu W, Li YQ. The complete chloroplast genome sequence of Rubus cochinchinensis. Mitochondrial DNA Part B. 2020;5:1636–7.

    Google Scholar 

  28. 28.

    Wang QR, Yu SS, Gao CS, Ge YQ, Cheng RB. The complete chloroplast genome sequence and phylogenetic analysis of the medicinal plant Rubus chingii Hu. Mitochondrial DNA Part B. 2020;5(2):1307–8.

    Google Scholar 

  29. 29.

    Yang F, Zhang ZH, Tong L. The complete chloroplast genome sequence of Rubus amabilis Focke. Mitochondrial DNA Part B. 2020;5(2):1975–6.

    Google Scholar 

  30. 30.

    Yang JY, Pak JH, Kim SC. The complete plastome sequence of Rubus takesimensis endemic to Ulleung Island, Korea: insights into molecular evolution of anagenetically derived species in Rubus (Rosaceae). Gene. 2018;668:221–8.

    CAS  PubMed  Google Scholar 

  31. 31.

    Yang JY, Chiang YC, Hsu TW, Kim SH, Pak JH, Kim SC. Characterization and comparative analysis among plastome sequences of eight endemic Rubus (Rosaceae) species in Taiwan. Sci Rep. 2021;11(1):1152.

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Genome [].

  33. 33.

    Asano T, Tsudzuki T, Takahashi S, Shimada H, Kadowaki K-I. Complete nucleotide sequence of the sugarcane ( Saccharum Officinarum ) chloroplast genome: a comparative analysis of four monocot chloroplast genomes. DNA Res. 2004;11(2):93–9.

    CAS  PubMed  Google Scholar 

  34. 34.

    Souza U, Nunes R, Targueta C, Diniz-Filho JA, Telles M. The complete chloroplast genome of Stryphnodendron adstringens (Leguminosae - Caesalpinioideae): comparative analysis with related Mimosoid species. Sci Rep. 2019;9:14206.

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Jeon JH, Kim SC. Comparative analysis of the complete chloroplast genome sequences of three closely related east-Asian wild roses (Rosa sect. Synstylae; Rosaceae). Genes. 2019;10(1):23.

    PubMed Central  Google Scholar 

  36. 36.

    Zhang SD, Jin JJ, Chen SY, Chase MW, Soltis DE, Li HT, et al. Diversification of Rosaceae since the late cretaceous based on plastid phylogenomics. New Phytol. 2017;214(3):1355–67.

    CAS  PubMed  Google Scholar 

  37. 37.

    Wang Y, Chen Q, Chen T, Tang HR, Liu L, Wang XR. Phylogenetic insights into Chinese Rubus (Rosaceae) from multiple chloroplast and nuclear DNAs. Front Plant Sci. 2016;7(6):968.

    PubMed  PubMed Central  Google Scholar 

  38. 38.

    Imanishi H, Tsuyuzaki H, Nakahara K. Genetic relationships among native and introduced Rubus species in Japan based on rbcL sequence. Acta Hortic. 2008;769(769):195–9.

    CAS  Google Scholar 

  39. 39.

    Michael K. Clarification of basal relationships in Rubus (Rosaceae) and the origin of Rubus chamaemorus. BowlingGreen: WesternKentuckyUniversity; 2020.

    Google Scholar 

  40. 40.

    Makalowski W, Boguski MS. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous. Proc Natl Acad Sci U S A. 1998;95(16):9407–12.

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Somaratne Y, Guan DL, Wang WQ, Zhao L, Xu SQ. Complete chloroplast genome sequence of Xanthium sibiricum provides useful DNA barcodes for future species identification and phylogeny. Plant Syst Evol. 2019;305(10):949–60.

  42. 42.

    Li XQ, Zuo YJ, Zhu XX, Liao S, Ma JS. Complete chloroplast genomes and comparative analysis of sequences evolution among seven Aristolochia (Aristolochiaceae) medicinal species. Int J Mol Sci. 2019;20(5):1045.

    CAS  PubMed Central  Google Scholar 

  43. 43.

    Gao BM, Yuan L, Tang TL, Hou J, Pan K, Wei N. The complete chloroplast genome sequence of Alpinia oxyphylla Miq. And comparison analysis within the Zingiberaceae family. PLoS One. 2019;14(6):e0218817.

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bull. 1987;19(1):11–5.

    Google Scholar 

  45. 45.

    Chen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 2018;7(1):1–6.

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Jin JJ, Yu WB, Yang JB, Song Y, dePamphilis C, Yi TS, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21(1):241.

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Matthew K, Richard M, Amy W, Steven SH, Matthew C, Shane S, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9.

    Google Scholar 

  49. 49.

    Qu XJ, Moore MJ, Li DZ, Yi TS. PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods. 2019;15(1):1–12.

    Google Scholar 

  50. 50.

    Marc L, Oliver D, Sabine K, Ralph B. OrganellarGenomeDRAW--a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res. 2013;41:575–81.

    Google Scholar 

  51. 51.

    Darling ACE, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394–403.

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004;32(suppl_2):W273–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Kazutaka K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.

    Google Scholar 

  54. 54.

    Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–8.

    CAS  Google Scholar 

  55. 55.

    Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003;19(18):2496–7.

    CAS  PubMed  Google Scholar 

  56. 56.

    Tamura K, Stecher G, Peterson D, Filipski A, SK. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Thiel T, Michalek W, Varshney RK, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Tag Theor Appl Genetics. 2003;106(3):411–22.

    CAS  Google Scholar 

  58. 58.

    Stefan K, Choudhuri J, Enno O, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29(22):4633–42.

    Google Scholar 

  59. 59.

    Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539–42.

    PubMed  PubMed Central  Google Scholar 

  61. 61.

    Miller MA, Pfeiffer WT, Schwartz T. Creating the CIPRES Science Gateway for Inference of Large Phylogenetic Trees. In: Gateway Computing Environments Workshop (GCE). New Orleans; 2010. p. 1–8.

  62. 62.

    Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 2012;9(8):772.

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Rambaut A. FigTree, a graphical viewer of phylogenetic trees. In: Institute of Evolutionary Biology University of Edinburgh; 2009.

    Google Scholar 

Download references


We would like to thank the editors and reviewers for their comments and suggestions.


This research was funded by the National Science and Technology Fundamental Resources Investigation Program of China, grant number 2019FY101809.

Author information




Conceptualization, J.Y. and J.D.; methodology, J.Y.; software, J.Y.; validation and formal analysis, J.Y. and J.D.; investigation and resources, J.F., Y.F., J.X. and J.D.; data curation, J.Y. and J.F.; writing—original draft preparation, J.Y. and J.D.; writing—review and editing, J.Y., Y.F. and J.D.; supervision, J.Y. and J.D.; project administration, J.Y. and J.D.; funding acquisition, J.D.. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hongjin Dong.

Ethics declarations

Ethics approval and consent to participate

Not applicable. No specific permits were required for the collection of specimens for this study. This research was carried out in compliance with the relevant laws of China.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1

. Genes with introns in the cp genomes of Rubus as well as the lengths of the exons and introns.

Additional file 2: Table S2

. Synonymous (Ks) and nonsynonymous (Ka) analysis of the eight species based on shared unique CDS genes, with Rubus tephrodes as the reference.

Additional file 3: Table S3

. Taxonomic and accession information for samples used in the study.

Additional file 4: Table S4

. Statistics of simple sequence repeats in each species of Rubus. (XLSX 116 kb)

Additional file 5: Table S5

. Statistics of longer repeats in each species of Rubus. (XLSX 94 kb)

Additional file 6: Figure S1

. The comparison of four regions (LSC, SSC and two IRs) among twelve cp genome.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Fu, J., Fang, Y. et al. Complete chloroplast genomes of Rubus species (Rosaceae) and comparative analysis within the genus. BMC Genomics 23, 32 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Rubus
  • Chloroplast genome
  • Compare analysis
  • Phylogeny