Insights into phylogenetic relationships in Pinus inferred from a comparative analysis of complete chloroplast genomes
BMC Genomics volume 24, Article number: 346 (2023)
Pinus is the largest genus of Pinaceae and the most primitive group of modern genera. Pines have become the focus of many molecular evolution studies because of their wide use and ecological significance. However, due to the lack of complete chloroplast genome data, the evolutionary relationship and classification of pines are still controversial. With the development of new generation sequencing technology, sequence data of pines are becoming abundant. Here, we systematically analyzed and summarized the chloroplast genomes of 33 published pine species.
Generally, pines chloroplast genome structure showed strong conservation and high similarity. The chloroplast genome length ranged from 114,082 to 121,530 bp with similar positions and arrangements of all genes, while the GC content ranged from 38.45 to 39.00%. Reverse repeats showed a shrinking evolutionary trend, with IRa/IRb length ranging from 267 to 495 bp. A total of 3,205 microsatellite sequences and 5,436 repeats were detected in the studied species chloroplasts. Additionally, two hypervariable regions were assessed, providing potential molecular markers for future phylogenetic studies and population genetics. Through the phylogenetic analysis of complete chloroplast genomes, we offered novel opinions on the genus traditional evolutionary theory and classification.
We compared and analyzed the chloroplast genomes of 33 pine species, verified the traditional evolutionary theory and classification, and reclassified some controversial species classification. This study is helpful in analyzing the evolution, genetic structure, and the development of chloroplast DNA markers in Pinus.
Pinus (Pinaceae) is the largest conifer genus among existing gymnosperms with more than 110 identified species. The genus natural distribution is mainly in the northern hemisphere, but it has been introduced and cultivated as a planation species all over the world . As the most primitive group in modern genera of Pinaceae, Pinus has a long evolutionary history. Its fossil records can be traced back to 100 MYA [1,2,3], with a great potential for studying conifers evolutionary classification and species differentiation [3,4,5,6]. Pines are the main component of northern temperate forest and arid forest land, and are also important source of afforestation and industrial processing raw materials as well as their important ecological and economic values .
Pinus classification has always been a hot topic in phylogeny. Little et al.  proposed a classification system that divides Pinus into 3 Subgenera, 5 Sections, 15 Subsections and 94 species, and determined their basic classification framework. With scientific and technological advancements, Pinus classification system has gone through several revisions and improvements [1, 6, 9, 10]. Notably, Gernandt et al.  divided Pinus into 2 Subgenera (Subgenus: Strobus and Pinus), 4 Sections (Sections: Trifoliae, Pinus, Parrya, and Quinquefoliae) and 11 Subsections (Subsections: Pinus, Pinaster, Contortae, Australes, Ponderosae, Balfourianae, Cembroides, Nelsoniae, Kremfianae, Gerardianae, and Strobus) based on chloroplast gene sequence, nuclear DNA, and morphological evidence of 101 species. This classification system has been widely recognized [5, 11, 12]. However, the classification of individual species at the Subsection level has been controversial. Since Pinus squamata discovery, its classification efforts have been a hot issue. Li Xiangwang  discovered P. squamata and thought that it is close to P. bungeana. Price  pointed out that P. squamata may be a component of Subsection Gerardianae, or it may represent a separate Subsection. Li Xiangping et al.  incorporated P. squamata into Subsection Balfourianae. With wood anatomical data, Wang Changming et al.  supported the view that P. squamata is close to P. bungeana. In Gernandt et al.  traditional classification, P. squamata is also classified into the Subsection Gerardianae where P. bungeana and P. gerardiana are located. Although it is more likely that P. squamata belongs to Subsection Gerardianae, previous studies only relied on morphology and limited DNA data.
The chloroplast genome structure of terrestrial plants is stable  and has a large amount of genetic information, which can be used for phylogenetic inference and species classification . In previous studies, chloroplast sequences have been extensively utilized as molecular markers in plant phylogeny research. However, due to the lack of complete chloroplast genome sequence data, many studies on chloroplast genome were limited to only few fragments, so the application of complete chloroplast genome sequence to phylogeny has not been widely applied [19,20,21,22,23,24,25,26]. The complete chloroplast genome sequence is much better than some fewer fragments in species phylogeny and classification determination [27,28,29]. With the development of new generation sequencing technology, phylogenetic analyses have ushered in a new era  and made it easier to obtain complete chloroplast genome sequences for many species. A large number of sequence data provide basic data for chloroplast genome structure study, gene composition, and also lay a foundation for plants phylogeny, classification, and species identification.
In this study, the complete chloroplast genomes of 33 published species of Pinus were characterized, and used to conduct genome comparative and phylogenetic analyses. We aimed to: (1) explore the size and structure differences of complete chloroplast genomes among the studied species; (2) identify highly variable regions in the studied chloroplast genomes; and (3) reconstruct pines phylogenetic relationship, and verify and supplement the traditional classification system.
Characteristics of Pinus chloroplast (cp.) genomes
The cp. genomes of the 33 published pine species presented typical chloroplast genome structure, which consisted of a pair of inverted repeats (IRa/b) that divided into two single-copy regions: large single-copy (LSC) and small single-copy (SSC) regions (Fig. 1). Chloroplast genomes sequence similarity among 33 species was more than 95%. There was no significant difference in the size, gene, and genome structure among the studied chloroplast genomes. The genomes’ quadripartite structure was not obvious, which was mainly manifested by the reduction of the IR regions. The chloroplast genome length ranged from 114,082 to 121,530 bp, LSC region of which ranged from 62,747 to 66,364 bp, SSC region ranged from 49,112 to 54,288 bp, and IR regions ranged from 267 to 495 bp. The species with the largest chloroplast genome length was P. taeda, and the smallest was P. pinceana. The chloroplast genome size of Pinus was lower than that of most other seed plants, which may be related to the reduction of IR regions during evolution. Total GC content was 38.45-39.00%, with no significant difference among the 33 species (Table 1). The GC content of the genome was an important indicator to judge the genetic relationship between species, which further showed that the chloroplast genomes of the 33 pine species were highly similar.
All chloroplast genomes contained a total of 108 genes, including 72 protein-coding (PCGs), 32 tRNA, and four rRNA genes. Only the trnI-GAU gene and part of psbA gene were distributed in the IR region. All genes had the same location and arrangement across the different chloroplast genomes (Table 2). Among the above annotated genes, 14 genes contained introns, including 8 PCGs (atpF, petB, petD, rpl2, rps12, rpl16, rpoC1, and ycf3) and 6 tRNA (trnV-UAC, trnL-UAA, trnK-UUU, trnI-GAU, trnG-UCC and trnA-UGC) genes. Among them, rps12 and ycf3 contained two introns, the remaining 12 contained one intron; matK was located on the intron of trnK-UUU; trnH-GUG, trnI-CAU, trnS-GCU, trnT-GGU, psbA and psaM had two gene copies in the genome. In addition, as in angiosperms, rps12 was also trans-spliced during transcription in Pinus.
The number and sequence of rRNA genes were the same as those of “typical” seed plant plastids such as Nicotiana, and they were all arranged in the order of 16 S, 23 S, 4.5 and 5 S rRNA . However, there were some differences in the content of other genes between Pinus and angiosperms. Angiosperms lost trnP-GGG and three chl genes (chlB, chlL, chlN) during evolution . The gene rpl23 deletion had been reported in the plastids of angiosperms Spinacia [33, 34] and Trachelium . The gene rps16 had experienced many independent losses in land plants [32, 35, 36]. Similarly, the chloroplasts of Pinus also lacked rps16. In addition, unlike Pinus, in many prokaryote and eukaryote lineages, the gene accD had been lost independently .
Highly variable regions in the Pinus chloroplast genomes
The comparative visualization of the complete chloroplast genomes of the 33 species clearly showed sequences differences. As a whole, all genomes were relatively conservative and the variation of most coding genes and all rRNAs was relatively small. The regions with obvious gap were mostly concentrated in non-coding regions, among which psbM-trnD, cemA-ycf4, trnV-trnH, trnT-psbM, trnT-rps4-trnS, psbD-trnT-rrn16, psaC-ccsA, rpl32-trnV and rps7-trnL were the most significant; and in the coding regions, atpE, ycf1 and ycf2 were the most significant (Fig. 2). In order to further analyze the differences in the studied 33 Pinus chloroplast genomes, we identified highly variable regions by calculating the nucleotide diversity (Pi). Two highly variable regions psbM-trnD-trnY-trnE-clpP-rps12 and chlN-ycf1 were obtained by screening the 16 regions with the highest Pi value (0.10616–0.16672) (Fig. 3; Table S1). Chloroplast genome rearrangement analysis results showed that rearrangement events of genome were not obvious (Fig. S1).
The chloroplast genomes of Pinus have a contracted IR region
The single copy and inverted repeat boundary maps of the 33 species showed that, similar to most terrestrial plants, the cpDNA genome could be divided into four parts, including LSC, SSC, and two IR regions that separated them. However, the difference was that the IR regions of Pinus was not complete as they lost a large number of reverse repeat copies during their evolution. The IR regions had shrunk significantly, with a size of only 267–495 bp. Only trnI gene and part of psbA gene were retained in IRa region, and only trnI was retained in IRb region. The size range of LSC was 62,747 − 66,364 bp, and the size range of SSC was 49,112 − 54,288 bp, yet the size difference between the two regions was not obvious. With the exception of 6 species (P. contorta, P. crassicorticea, P. morrisonicola, P. parviflora, P. squamata, P. wangii), the IRa/LSC junction in the chloroplast genomes of the other 28 species was located in psbA, and the range extending to the IRa region was 86–87 bp (Fig. 4).
SSRs and long repeats analysis
A total of 3,205 simple sequence repeats (SSRs) with a length ranging from 8 to 230 bp were detected in the studied 33 species. Among them, there were 1,708 mononucleotide repeats with the highest frequency, mainly A or T single nucleotide, with obvious base preference. The rest were dinucleotide (817), compound (548), tetranucleotide (92), pentanucleotide (22), hexanucleotide (17), and trinucleotide repeats (1). The number of trinucleotide repeats was the least, and it appeared only once in P. monophylla. Only 4 types of SSRs were detected in 10 species, all of which lacked trinucleotide, pentanucleotide, and hexanucleotide repeats. The comparison results among the 33 species showed that the largest number of SSRs (103) appeared in P. parviflora, P. sibirica, and P. squamata, and the smallest number (90) appeared in P. nelsonii (Fig. 5; Table S2).
A total of 5,436 long repeats were detected across the 33 species, including tandem (965), forward (3531), palindromic (876), complement (21), and reverse (43) repeats. Among these sequences, forward repeats were the most abundant. The species comparison results showed that P. armandii and P. koraiensis contained the largest number of repeat sequences (307), and P. gerardiana was the least (82). The difference of forward repeats among species was the most obvious, and the difference between the species (P. pumila) with the largest number and the species (P. tabuliformis) with the smallest number was 219. The number of tandem repeats and palindromic repeats was similar, the species with the largest number of tandem repeats was P. parviflora (51), and the species with the largest number of palindromic repeats were P. aristata (42) and P. nelsonii (42). The number of complement repeats was the least, and only appears in 4 species (P. monophylla, P. morrisonicola, P. nelsonii, P. pinceana). There were no reverse repeats detected in 21 species (Fig. 6; Table S3).
Revisiting the phylogenetic relationships with complete chloroplast genomes
The complete chloroplast genomes of the 33 species were analyzed by maximum likelihood (ML) method. Gernandt et al.  proposed a traditional classification system through chloroplast gene sequences, based on which we annotated the phylogenetic results. The 33 studied species cover 2 Subgenera, 4 Sections, and 10 Subsections of the traditional classification system. The phylogenetic tree showed that the 33 species were divided into 2 large branches and 4 small branches, which were consistent with the traditional classification system. This result strongly supported the feasibility of Subgenus and Section in the traditional classification. However, there were still some issues in the Subsections division. Gernandt et al.  classified P. squamata as Subsection Gerardianae, but our phylogenetic analysis results were not supportive. P. squamata and species in the Subsection Strobus were clustered into one branch, and were closest to P. sibirica in the Subsection Strobus. Therefore, it could be considered to be included in Subsection Strobus. In addition, P. crassicorticea, which had never been mentioned in the traditional classification system, was classified as Subgenus Pinus, Section Pinus, Subsection Pinus according to its phylogenetic position (Fig. 7).
IR regions reduction resulted in variable cpDNA sizes in Pinus
Chloroplast genomes of most terrestrial plants were composed of double stranded closed circular DNA molecules with conservative structure and typical quadripartite structure, including a LSC, a SSC, and two IR regions separated by LSC and SSC regions . Although the chloroplast genomes of most gymnosperms, such as cycads, Ginkgo and Gnetophytes, had the typical quadripartite structure of seed plants [38,39,40,41], they had changed in the chloroplast genomes of Pinaceae and Cupressophytes. In previous studies, it was proposed that the IR was highly simplified in Pinaceae, but completely lost in Cupressophytes, and Pinaceae and Cupressophytes lost different IR copies, Pinaceae lost IRb, and Cupressophytes lost IRa [42, 43]. P. thunbergii in Pinus also proved that each IR region was shortened to 495 bp . Our results were similar to the previous conclusions, the quadripartite structure of the studied 33 pine species was not obvious, and the size of each IR region is only 267–495 bp, showing a decreasing trend. However, IRa and IRb did not differ in size, and also did not reflect the IRb loss (Table 1). In addition, the results showed that there was no significant difference in the size of LSC and SSC regions, and there was a possibility that part of IR region could be translocated into SSC region. The chloroplast genome of seed plants usually contains 101–118 different genes , and the genome size ranges from 120 to 160 kb . The studied 33 pine species contained 108 different genes, and the size of chloroplast genome ranged from 114,082 to 121,530 bp (Tables 1 and 2). It can be seen that the reduction of IR region resulted in the size of chloroplast genome, and the types of genes in Pinus are lower than those in other seed plants. Although the chloroplast genomes of Pinaceae and Cupressophytes do not contain typical IR, they still evolve specific IR related to chloroplast genome rearrangement. The chloroplast genomes of some conifers have shown very low collinearity [43, 47]. Strauss et al.  also speculated that in Pinaceae cpDNA, rearrangement may occur after IR reduction. However, genome synteny (Fig. S1) of Pinus chloroplast genomes revealed no obvious gene rearrangement events. This may be related to the strong conservation and high similarity of pines chloroplast genome structure.
Significance of chloroplast markers in population genetics
The existence and nature of repeat sequences had been proven to be of great significance for evolution and population genetics studies [49, 50]. A total of 7 types of SSRs were detected in the 33 pine species, of which 1,078 were mononucleotide repeats, mainly A or T single nucleotide, with base preference (Fig. 5; Table S2). The A/T base preference of pines chloroplast genomes was the same as that of many seed plants, SSRs were usually composed of polyA or polyT repeat sequences [51,52,53,54]. Recently, genomic SSRs markers have been widely used in Pinus [55,56,57]. However, compared with genomic SSRs, chloroplast SSRs markers were abundant in number, high in polymorphism and rich in species variability . The newly discovered SSRs in this study will contribute to future studies on Pinus genetic diversity and phylogeography. Pines are rich in long repeats, a total of 5,436 repeats were detected in the studied 33 species, of which forward repeats had the highest frequency (Fig. 6; Table S3). All repeats detected in this study, together with the above SSRs, had laid a foundation for the development of population genetic markers .
We screened 16 regions with the highest Pi values among the studied 33 pines, the regions they represent were psbM-trnD-trnY-trnE-clpP-rps12 and chlN-ycf1 (Fig. 3; Table S1). These two highly variable regions will provide potential molecular markers for population genetics studies. In gymnosperms, chloroplasts were generally inherited by paternity [60, 61]. Therefore, the highly variable regions detected in the present study can provide information for the development of specific DNA bar codes of Pinus, and then serve as an effective means to identify male pines parents.
Phylogenetic analysis of complete chloroplast genome reconstruction
Chloroplast genome was characterized by abundant gene capacity, conservative structure, low evolution rate, and high copy number. It had always been the main object of phylogenetic and molecular evolution research [45, 62]. Studies on the phylogeny of chloroplast genome initially relied on single gene sequences [63, 64], but single gene sequences contained less information, resulting in low support rates for many branches [30, 65, 66]. With the accumulation of data, the resolution and support rate of multi gene joint sequence reconstruction phylogenetic analysis had been significantly improved [67,68,69], and had been widely used [18, 20]. Among them, Gernandt et al.  conducted phylogenetic analysis based on chloroplast matK and rbcL sequences of 101 species of pines and constructed the classification system of Pinus. However, with the accumulation of complete genome data of Pinus chloroplasts, it was necessary to verify the traditional classification system. In this study, we reconstructed the phylogenetic relationships of the complete cp. genomes of the 33 pine species. Except for P. squamata, the classification of other species was consistent with the traditional results. Different from previous research results [9, 13,14,15,16], this study supported P. squamata to join Subsection Strobus (Fig. 7). Similarly, P. nelsonii, P. krempfii, and P. contorta also had the problem of unclear classification in previous studies , and the present study also gave reference which supported P. nelsonii joining Section Parrya, P. krempfii joining Section Quinquefoliae and P. contorta joining Section Trifoliae. This work is helpful to further understanding the evolution of chloroplasts in Pinus and will promote the research progress of pines phylogeny and taxonomy.
We conducted comparative and phylogenetic analyses of the complete chloroplast genomes of 33 pine species. Pinus chloroplast genomes structure was conservative, sequence similarity was high, and the IR region showed a decreasing trend. The discovery of two highly variable regions provided reference information for the development of Pinus chloroplast DNA bar code for future use. We reconstructed the phylogenetic relationship among the 33 pine species using the complete chloroplast genomes, which provided better resolution than that from traditional chloroplast DNA sequences. According to the phylogenetic results, we verified the traditional classification system and revised the position of P. squamata. With the increasing abundance of chloroplast genome information in Pinus, the systematic analysis and summary will enhance our understanding of Pinus evolutionary history, phylogeny, and taxonomy.
Materials and methods
Data collection and processing
The chloroplast genome sequences of 33 published pine species were downloaded from NCBI, including P. taristata, P. armandii, P. bungeana, P. contorta, P. crassicorticea, P. densiflora, P. elliottii, P. gerardiana, P. greggii, P. jaliscana, P. koraiensis, P. krempfii, P. lambertiana, P. massoniana, P. monophylla, and P. morrisonicola, P. nelsonii, P. oocarpa, P. parviflora, P. pinceana, P. pinea, P. pumila, P. sibirica, P. squamata, P. strobus, P. sylvestris, P. tabuliformis, P. taeda, P. taiwanensis, P. teocote, P. thunbergii, P. wangii, and P. yunnanensis. The sequences of 33 complete chloroplast genomes were aligned using MAFFT v7.0  and then manually checked and modified for subsequent analysis.
Comparative genomic analysis
mVISTA v.7 program  was used for multiple sequence alignment analysis, and the sequences were processed by CPGAVAS2 (http://www.herbalgenomics.org/cpgavas). Considering the chloroplast genome of P. armandii as a reference, the differences of the whole chloroplast genome of the 33 pine species were compared under the Shuffle-LAGAN model. Nucleotide diversity was used as a parameter to identify the cp. genome highly variable region. Here, we used DnaSP v.6.1  software to estimate nucleotide diversity, the step length and window length were set to 200 and 800 bp, respectively, then used GraphPad-prism v.9.0 (https://www.graphpad.com/scientific-software/prism) to visualize the data. Chloroplast genome rearrangement analysis was performed using the default settings of the Mauve v.2.3  plug-in in Geneious v.11.0 .
Detection of long repeat sequences and simple sequence repeats
The online REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer)  was used to identify long repeats (tandem, forward, reverse, palindromic, and complement repeats). The minimum repetition size was limited to no less than 30 basis points, the Hamming distance value was 3, and other settings remained at the default value. The SSRs of the chloroplast genomes of the 33 pine species were identified by microsatellite marker identification tool (MISA) (https://webblast.ipk-gatersleben.de/misa), the minimum number of repeats was used to identify mononucleotides, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides were 8, 4, 4, 3, 3 and 3, respectively; the sequence length between two SSRs was no more than 100 bp, and it was registered as a compound .
In order to determine the phylogenetic location of the 33 pine species, we used the complete chloroplast genome sequences for phylogenetic analysis with Taxus as an outgroup. The complete chloroplast genome sequences were downloaded from NCBI. MAFFT v7.0  was used for sequence alignment, and ModelFinder  was used to find the most suitable alternative models TVM + F + R2 for the complete chloroplast genome sequences. Phylogeny was constructed by ML analysis, and ML analysis was performed by IQ-tree v1.6  with 1000 bootstrap repeats. Using Figtree v1.4 (https://github.com/rambaut/Figtree) edit the two phylogenetic trees.
All data supporting the findings of this study are available within the paper and within its supplementary materials published online. All data used in the study were collected in the public database (https://www.ncbi.nlm.nih.gov/). Accession numbers of 33 species are as follow: P. aristate, NC_039809.1; P. armandii, NC_029847.1;
P. bungeana, NC_028421.1; P. contorta, MH612863.1; P. crassicorticea, NC_041150.1; P. densiflora, NC_042394.1; P. elliottii, NC_042788.1; P. gerardiana, NC_011154.4; P. greggii, NC_035947.1; P. jaliscana, NC_035948.1; P. koraiensis, NC_004677.2; P. krempfii, NC_011155.4; P. lambertiana, NC_011156.4;
P. massoniana, NC_021439.1; P. monophyla, NC_011158.4; P. morrisonicola, NC_039616.1; P. nelsonii, NC_011159.4; P. oocarpa, NC_035949.1; P. parviflora, NC_039615.1; P. pinceana, NC_039587.1; P. pinea, NC_039585.1; P. pumila, NC_041108.1; P. sibirica, NC_028552.2; P. squamata, NC_039614.1; P. strobus, NC_026302.1; P. sylvestris, NC_035069.1; P. tabuliformis, NC_028531.1; P. taeda, NC_021440.1; P. taiwanensis, NC_027415.1; P. teocote, NC_039586.1; P. thunbergii, NC_001631.1; P. wangii, NC_039613.1; P. yunnanensis, NC_043856.1.
Richardson DM. Ecology and biogeography of Pinus. Cambridge: Cambridge Univ; 1998.
Klymiuk AA, Stockey RA, Rothwell GW. The first organismal concept for an extinct species of Pinaceae. Int J Plant Sci. 2011;172:294–313. https://doi.org/10.1086/657649.
Willyard A, Syring J, Gernandt DS, Liston A, Cronn R. Fossil calibration of molecular divergence infers a moderate mutation rate and recent radiations for Pinus. Mol Biol Evol. 2007;24(1):90–101. https://doi.org/10.1093/molbev/msl131.
Liu L, Hao ZZ, Liu YY, Wei XX, Cun YZ, Wang XQ. Phylogeography of Pinus armandii and its relatives: heterogeneous contributions of geography and climate changes to the genetic differentiation and diversification of chinese white pines. PLoS ONE. 2014;9(1):e85920. https://doi.org/10.1371/journal.pone.0085920.
Zeb U, Dong WL, Zhang TT, Wang RN, Shahzad K, Ma XF, et al. Comparative plastid genomics of Pinus species: insights into sequence variations and phylogenetic relationships. J Syst Evol. 2020;58(2):118–32. https://doi.org/10.1111/jse.12492.
Jin WT, Gernandt DS, Wehenkel C, Xia XM, Wei XX, Wang XQ. Phylogenomic and ecological analyses reveal the spatiotemporal evolution of global pines. Proc Natl Acad Sci U S A. 2021;118(20):e2022302118. https://doi.org/10.1073/pnas.2022302118.
Zheng WJ, Fu LG. Flora of China. Volume 7. Beijing: Science Press; 1978.
Little EL, Critchfield WB. Subdivisions of the genus Pinus (pines). Washington: Miscellaneous Publication; 1969.
Gernandt DS, López GG, García SO, Liston A. Phylogeny and classification of Pinus. Taxon. 2005;54(1):29–42. https://doi.org/10.2307/25065300.
Gernandt DS, Liston A, Piñero D. Phylogenetics of Pinus subsections Cembroides and Nelsoniae inferred from cpDNA sequences. Syst Bot. 2003;28(4):657–73. https://doi.org/10.1043/02-63.1.
Saladin B, Leslie AB, Wüest RO, Litsios G, Conti E, Salamin N, et al. Fossils matter: improved estimates of divergence times in Pinus reveal older diversification. BMC Evol Biol. 2017;17(1):95. https://doi.org/10.1186/s12862-017-0941-z.
Singh SP, Gumber S, Singh RD, Pandey R. Differentiation of diploxylon and haploxylon pines in spatial distribution, and adaptational traits. Acta Ecol Sin. 2023;43(1):1–10. https://doi.org/10.1016/j.chnaes.2021.07.007.
Li XW. Pinus yunnanensis-new series-new species. Acta Bot Yunnanica. 1992;14(3):258–60.
Price RA, Liston A, Strauss SH. Phylogeny and systematics of Pinus. Cambridge Univ; 2000.
Li XP, Zhu ZD. Analysis of fatty acids in seed oil of Pinus bungeana and its taxonomic problems. J Nanjing Forest Univ (Nat Sci). 1993;36(01):27–34. https://doi.org/10.3969/j.jssn.1000-2006.1993.01.005.
Wang CM, Li XW, Mu QY, Xiao SQ. Study on wood structure and classification of Pinus bungeana. J Sichuan Agric Univ. 1998;16(1):165–9. https://doi.org/10.16036/j.issn.1000-2650.1998.01.026.
Mower JP, Vickrey TL. Structural diversity among plastid genomes of land plants. Adv Bot Res. 2018;85:263–92. https://doi.org/10.1016/bs.abr.2017.11.013.
Moore MJ, Soltis PS, Bell CD, Burleigh JG, Soltis DE. Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proc Natl Acad Sci U S A. 2010;107(10):4623–8. https://doi.org/10.1073/pnas.0907801107.
Cai Z, Penaflor C, Kuehl JV, Leebens-Mack J, Carlson JE, dePamphilis CW, et al. Complete plastid genome sequences of Drimys, Liriodendron, and Piper: implications for the phylogenetic relationships of magnoliids. BMC Evol Biol. 2006;6:77. https://doi.org/10.1186/1471-2148-6-77.
Jansen RK, Cai Z, Raubeson LA, Daniell H, dePamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A. 2007;104(49):19369–74. https://doi.org/10.1073/pnas.0709121104.
Jansen RK, Kaittanis C, Saski C, Lee SB, Tomkins J, Alverson AJ, et al. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol Biol. 2006;6:32. https://doi.org/10.1186/1471-2148-6-32.
Moore MJ, Bell CD, Soltis PS, Soltis DE. Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA. 2007;104(49):19363–8. https://doi.org/10.1073/pnas.0708072104.
Parks M, Cronn R, Liston A. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biol. 2009;7:84. https://doi.org/10.1186/1741-7007-7-84.
Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. Mol Biol Evol. 2005;22(9):1813–22. https://doi.org/10.1093/molbev/msi173. Epub 2005 Jun 1.
Leebens-Mack J, Raubeson LA, Cui L, Kuehl JV, Fourcade MH, Chumley TW, et al. Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the Felsenstein zone. Mol Biol Evol. 2005;22(10):1948–63. https://doi.org/10.1093/molbev/msi191. Epub 2005 Jun 8.
Lin CP, Huang JP, Wu CS, Hsu CY, Chaw SM. Comparative chloroplast genomics reveals the evolution of Pinaceae genera and subfamilies. Gen Biol and Evol. 2010;2:504–17. https://doi.org/10.1093/gbe/evq036.
Zhang SD, Jin JJ, Chen SY, Chase MW, Soltis DE, Li HT, et al. Diversification of Rosaceae since the late cretaceous based on plastid phylogenomics. New Phytol. 2017;214(3):1355–67. https://doi.org/10.1111/nph.14461.
Li HT, Yi TS, Gao LM, Ma PF, Zhang T, Yang JB, et al. Origin of angiosperms and the puzzle of the jurassic gap. Nat Plants. 2019;5(5):461–70. https://doi.org/10.1038/s41477-019-0421-0.
Meng KK, Chen SF, Xu KW, Zhou RC, Li MW, Dhamala MK, et al. Phylogenomic analyses based on genome-skimming data reveal cyto-nuclear discordance in the evolutionary history of Cotoneaster (Rosaceae). Molec Phylogen Evol. 2021;158:107083. https://doi.org/10.1016/j.ympev.2021.107083.
Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005;6(5):361–75. https://doi.org/10.1038/nrg1603.
Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, et al. The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 1986;5(9):2043–9. https://doi.org/10.1002/j.1460-2075.1986.tb04464.x.
Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A. 2007;104(49):19369–74. https://doi.org/10.1073/pnas.0709121104.
Thomas F, Massenet O, Dorne AM, Briat JF, Mache R. Expression of the rpl23, rpl2 and rps19 genes in spinach chloroplasts. Nucleic Acids Res. 1988;16(6):2461–72. https://doi.org/10.1093/nar/16.6.2461.
Schmitz-Linneweber C, Maier RM, Alcaraz JP, Cottet A, Herrmann RG, Mache R. The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Mol Biol. 2001;45(3):307–15. https://doi.org/10.1023/a:1006478403810.
Doyle JJ, Doyle JL, Palmer JD. Multiple independent losses of two genes and one intron from legume chloroplast genome. Syst Bot. 1995;20(3):272–94. https://doi.org/10.2307/2419496.
Ohyama K. Chloroplast and mitochondrial genomes from a liverwort, Marchantia polymorpha–gene organization and molecular evolution. Biosci Biotechnol Biochem. 1996;60(1):16–24. https://doi.org/10.1271/bbb.60.16.
Lee HL, Jansen RK, Chumley TW, Kim KJ. Gene relocations within chloroplast genomes of Jasminum and Menodora (Oleaceae) are due to multiple, overlapping inversions. Mol Biol Evol. 2007;24(5):1161–80. https://doi.org/10.1093/molbev/msm036.
McCoy SR, Kuehl JV, Boore JL, Raubeson LA. The complete plastid genome sequence of Welwitschia mirabilis: an unusually compact plastome with accelerated divergence rates. BMC Evol Biol. 2008;8:130. https://doi.org/10.1186/1471-2148-8-130.
Wu CS, Lai YT, Lin CP, Wang YN, Chaw SM. Evolution of reduced and compact chloroplast genomes (cpDNAs) in gnetophytes: selection toward a lower-cost strategy. Mol Phylogen Evol. 2009;52(1):115–24. https://doi.org/10.1016/j.ympev.2008.12.026. Epub 2009 Jan 7.
Lin CP, Wu CS, Huang YY, Chaw SM. The complete chloroplast genome of Ginkgo biloba reveals the mechanism of inverted repeat contraction. Genome Biol Evol. 2012;4(3):374–81. https://doi.org/10.1093/gbe/evs021. Epub 2012 Mar 8.
Wu CS, Wang YN, Liu SM, Chaw SW. Chloroplast genome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium: insights into cpDNA evolution and phylogeny of extant seed plants. Mol Biol Evol. 2007;24(6):1366–79. https://doi.org/10.1093/molbev/msm059.
Wu CS, Wang YN, Hsu CY, Lin CP, Chaw SM. Loss of different inverted repeat copies from the chloroplast genomes of Pinaceae and cupressophytes and influence of heterotachy on the evaluation of gymnosperm phylogeny. Genome Biol Evol. 2011;3:1284–95. https://doi.org/10.1093/gbe/evr095. Epub 2011 Sep 19.
Wu CS, Chaw SM. Highly rearranged and size-variable chloroplast genomes in conifers II clade (cupressophytes): evolution towards shorter intergenic spacers. Plant Biotechnol J. 2014;12(3):344–53. https://doi.org/10.1111/pbi.12141. Epub 2013 Nov 28.
Wakasugi T, Tsudzuki J, Ito S, Nakashima K, Tsudzuki T, Sugiura M. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc Natl Acad Sci U S A. 1994;91(21):9794–8. https://doi.org/10.1073/pnas.91.21.9794.
Jansen RK, Ruhlman TA. Plastid genomes of seed plants. Springer Netherlands; 2012.
Wicke S, Schneeweiss GM, dePamphilis CW, Müller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol Biol. 2011;76(3–5):273–97. https://doi.org/10.1007/s11103-011-9762-4.
Wu CS, Lin CP, Hsu CY, Wang RJ, Chaw SM. Comparative chloroplast genomes of Pinaceae: insights into the mechanism of diversified genomic organizations. Genome Biol Evol. 2011;3:309–19. https://doi.org/10.1093/gbe/evr026.
Strauss SH, Palmer JD, Howe GT, Doerksen AH. Chloroplast genomes of two conifers lack a large inverted repeat and are extensively rearranged. Proc Natl Acad Sci U S A. 1988;85(11):3898–902. https://doi.org/10.1073/pnas.85.11.3898.
Cavalier-Smith T. Chloroplast evolution: secondary symbiogenesis and multiple losses. Curr Biol. 2002;12(2):R62–4. https://doi.org/10.1016/s0960-9822(01)00675-3.
Nie X, Lv S, Zhang Y, Du X, Wang L, Biradar SS, et al. Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora). PLoS ONE. 2012;7(5):e36869. https://doi.org/10.1371/journal.pone.0036869.
Cheon KS, Kim KA, Kwak M, Lee B, Yoo KO. The complete chloroplast genome sequences of four Viola species (Violaceae) and comparative analyses with its congeneric species. PLoS ONE. 2019;14(3):e0214162. https://doi.org/10.1371/journal.pone.0214162.
Yang J, Hu G, Hu G. Comparative genomics and phylogenetic relationships of two endemic and endangered species (Handeliodendron bodinieri and Eurycorymbus cavaleriei) of two monotypic genera within Sapindales. BMC Genomics. 2022;23(1):27. https://doi.org/10.1186/s12864-021-08259-w.
Xu K, Lin C, Lee SY, Mao L, Meng K. Comparative analysis of complete Ilex (Aquifoliaceae) chloroplast genomes: insights into evolutionary dynamics and phylogenetic relationships. BMC Genomics. 2022;23(1):203. https://doi.org/10.1186/s12864-022-08397-9.
Wu L, Nie L, Xu Z, Li P, Wang Y, He C, et al. Comparative and phylogenetic analysis of the complete chloroplast genomes of three Paeonia Section Moutan Species (Paeoniaceae). Front Genet. 2020;11:980. https://doi.org/10.3389/fgene.2020.00980.
Yang B, Sun H, Qi J, Niu S, Ei-Kassaby YA, Li W. Improved genetic distance-based spatial deployment can effectively minimize inbreeding in seed orchard. For Ecosyst. 2020;7(1):117–27. https://doi.org/10.1186/s40663-020-0220-0.
Miao YB, Fang P, Yang ZH, Zhu XM, Gao Q, Liu Y, et al. Genetic structure analysis of Pinus sylvestris var. Mongolica under different geographical environments. J Beijing For Univ. 2018;40(10):43–50. https://doi.org/10.13332/j.1000-1522.20170438.
Liu L, Zhang S, Lian C. De Novo Transcriptome sequencing analysis of cDNA Library and large-scale Unigene Assembly in Japanese Red Pine (Pinus densiflora). Int J Mol Sci. 2015;16(12):29047–59. https://doi.org/10.3390/ijms161226139.
Ebert D, Peakall R. Chloroplast simple sequence repeats (cpSSRs): technical resources and recommendations for expanding cpSSR discovery and applications to a wide array of plant species. Mol Ecol Resour. 2009;9:673–90. https://doi.org/10.1111/j.1755-0998.2008.02319.x.
Nie X, Lv S, Zhang Y, Du X, Wang L, Biradar SS, et al. Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora). PLoS ONE. 2012;7(5):e36869. https://doi.org/10.1371/journal.pone.0036869.
Sutton BC, Flanagan DJ, El-Kassaby YA. A simple and rapid method for estimating representation of species in spruce seedlots using chloroplast DNA restriction fragment length polymorphism. Silvae Genet. 1991;40:119–23.
Sutton BC, Flanagan DJ, Gawley JR, Newton CH, Lester DT, El-Kassaby YA. Inheritance of chloroplast and mitochondrial DNA in Picea and composition of hybrids from introgression zones. Theor Appl Genet. 1991;82(2):242–8. https://doi.org/10.1007/BF00226220.
Korpelainen H. The evolutionary processes of mitochondrial and chloroplast genomes differ from thoseof nuclear genomes. Sci Nat. 2004;91(11):505–18. https://doi.org/10.1007/s00114-004-0571-3.
Savolainen V, Chase MW, Hoot SB, Morton CM, Soltis DE, Bayer C, et al. Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene sequences. Syst Biol. 2000;49(2):306–62. https://doi.org/10.1093/sysbio/49.2.306.
Chase MW, Soltis DE, Olmstead RG, Morgan DL, Donald H, Mishler BD, et al. Phylogenetics of seed plants:an analysis of nucleotide sequences from the plastid gene rbcL. Ann Mo Bot Gard. 1993;80(3):528–50. https://doi.org/10.2307/2399846.
Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425(6960):798–804. https://doi.org/10.1038/nature02053.
Jeffroy O, Brinkmann H, Delsuc F, Philippe H. Phylogenomics: the beginning of incongruence? Trends Genet. 2006;22(4):225–31. https://doi.org/10.1016/j.tig.2006.02.003.
Qiu YL, Lee J, Bernasconi-Quadroni F, Soltis PS, Zanis M, Zimmer EA, et al. The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature. 1999;402(6760):404–7. https://doi.org/10.1038/46536.
Soltis PS, Soltis DE, Chase MW. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature. 1999;402(6760):402–4. https://doi.org/10.1038/46528.
Fishbein M, Hibsch-Jetter C, Soltis DE, Hufford L. Phylogeny of Saxifragales (angiosperms, eudicots): analysis of a rapid, ancient radiation. Syst Biol. 2001;50(6):817–47. https://doi.org/10.1080/106351501753462821.
Syring J, Willyard A, Cronn R, Liston A. Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. Am J Bot. 2005;92(12):2086–100. https://doi.org/10.3732/ajb.92.12.2086.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. https://doi.org/10.1093/molbev/mst010.
Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, et al. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000;16(11):1046–7. https://doi.org/10.1093/bioinformatics/16.11.1046.
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25(11):1451–2. https://doi.org/10.1093/bioinformatics/btp187.
Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–403. https://doi.org/10.1101/gr.2289704.
Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9. https://doi.org/10.1093/bioinformatics/bts199.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29:4633–42. https://doi.org/10.1093/nar/29.22.4633.
Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33(16):2583–5. https://doi.org/10.1093/bioinformatics/btx198.
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9. https://doi.org/10.1038/nmeth.4285.
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74. https://doi.org/10.1093/molbev/msu300
This work was supported by grant from National Key R&D Plan for the Fourteenth Five Year Plan (2022YFD2200304).
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethics approval and consent to participate
Not applicable. No specific permits were required for the collection of specimens for this study.
Consent for publication
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Xia, Q., Zhang, H., Lv, D. et al. Insights into phylogenetic relationships in Pinus inferred from a comparative analysis of complete chloroplast genomes. BMC Genomics 24, 346 (2023). https://doi.org/10.1186/s12864-023-09439-6