Comparative genomics and transcriptomics analysis reveals evolution patterns of selection in the Salix phylogeny

Background Willows are widely distributed in the northern hemisphere and have good adaptability to different living environment. The increasing of genome and transcriptome data provides a chance for comparative analysis to study the evolution patterns with the different origin and geographical distributions in the Salix phylogeny. Results Transcript sequences of 10 Salicaceae species were downloaded from public databases. All pairwise of orthologues were identified by comparative analysis in these species, from which we constructed a phylogenetic tree and estimated the rate of diverse. Divergence times were estimated in the 10 Salicaceae using comparative transcriptomic analysis. All of the fast-evolving positive selection sequences were identified, and some cold-, drought-, light-, universal-, and heat- resistance genes were discovered. Conclusions The divergence time of subgenus Vetrix and Salix was about 17.6–16.0 Mya during the period of Middle Miocene Climate Transition (21–14 Mya). Subgenus Vetrix diverged to migratory and resident groups when the climate changed to the cool and dry trend by 14 Mya. Cold- and light- stress genes were involved in positive selection among the resident Vetrix, and which would help them to adapt the cooling stage. Universal- stress genes exhibited positive selection among the migratory group and subgenus Salix. These data are useful for comprehending the adaptive evolution and speciation in the Salix lineage. Electronic supplementary material The online version of this article (10.1186/s12864-019-5627-z) contains supplementary material, which is available to authorized users.

A widely used classification system was proposed by Skvortsov, which divided the genus Salix into three subgenera Salix, Vetrix and Chamaetia [1]. The evidence of morphological taxonomy suggests that the subgenus Vetrix has passed two stages in its development [1]. When the climate became colder [23], the thermophilic groups either became extinct or moved south (Southern China and Southeast Asia), like Section Eriostachyae, Daltonianae and Denticulatae et al. Thus the hardy ones stayed and drastically expanded their ranges. At the same time, another younger and hardier formation of the subgenus Salix expanded across the northern hemisphere being represented by a number of boreal sections. However, no study explains how willows went through the long-distance migrations and how the resident and migratory groups adapted to the varied environments from high to low latitudes during the long evolutionary history.
Transcriptome sequencing technology can rapidly and economically obtain all RNA information of organisms at one time, which playing an important role in finding molecular markers and function genes for biology research [24,25]. As more and more species had been completed transcriptome sequencing, comparative transcriptomics has received more attention from researchers [26][27][28][29][30]. Comparative transcriptomics can explain the phylogenetic relationships based on multiple species, and answer the functional differences between orthologous genes after species divergence in different living environment.
In this study, transcript sequences of 9 Salix and one Populus [31] were downloaded from public databases ( Table 1). Among them, S. matsudana and S. babylonica belong to subgenus Salix, other 7 willow species belong to subgenus Vetrix. Section Psilostigmatae ( Fig. 1) named by Flora of China shares some species (like S. salwinensis) with migratory section Daltonianae, so S. fargesii of section Psilostigmatae is as the possible migratory species of Vetrix. Most species of Vetrix are mainly distributed in North China or further north areas except S. fargesii (Additional file 1: Table S1). Comparative genomics and transcriptomics were subsequently analyzed in 10 Salicaceae species. A number of positive selection genes were found to be related to environmental factors in the Salix phylogeny.
Orthologue identification and functional characterization between 10 Salicaceae species All of the pairwise orthologues were identified by comparative analysis between the 10 Salicaceae species ( Table 4). The results showed that S. purpurea had the maximum average number (6597) of orthologous genes, whereas S. babylonica had the minimum average number (707). The highest number of orthologous genes (9713) was found between S. purpurea and S. suchowensis, while the lowest number (681) was found between S. babylonica and S. fargesii. 238 single copy orthologues were found in all 10 Salicaceae species (Fig. 2). The orthologues were annotated with GO terms (Additional file 1: Table S3). Taking P. trichocarpa as an out-group species, the phylogenetic tree of Salix was constructed based on combined 238 orthologous transcripts using Maximum Likelihood (ML) method (Fig. 4).

Phylogenetic analysis and divergence time
The genetic distance of species was related to synonymous mutation rate calculated by orthologous genes, so the synonymous mutation rates of all pairs of orthologues were estimated in 10 Salicaceae species (Table 4). Between different branches ( Fig. 3), S. purpurea has the Ks peak (0.02) with S. suchowensis, S. sachalinensis and S. dasyclados, 0.03 Ks peak with S. viminalis, S. eriocephala and S. fargesii, 0.04 Ks peak with S. matsudana, 0.05 Ks peak with S. babylonica, and the maximum Ks peak 0.11 with out-group P. trichocarpa. Between different genera, most of Salix species has the Ks peak 0.11, whereas S. fargesii was found the minimum Ks peak 0.10 with P. trichocarpa (Table 4). It is suggested S. fargesii was a relatively ancient species compared to others. Using P. trichocarpa as an out-group species, the phylogenetic tree of Salix was derived with the pairwise Ks values of the orthologous transcripts as a distance metric based on the neighbour-joining (NJ) method ( Fig. 4). In the phylogenetic tree, the average Ks value is 0.11 between Genus Salix and Genus Salix (Calculated by Table 4), and which is nearly consistent with the value of 0.12 in previous studies [18]. Based on existing fossil evidence, the divergence time of genera Salix and Populus was about 48 million years old in middle Eocene sediments [16,17]. With this time as the separation of the two lineages and K = 0.11, the rate of substitution (r) was calculated to about 1.14 * 10 − 9 per site and year (T = K/2r), and which is very close to previous value of 1.28 * 10 − 9 [18].    Using the fossil calibrations (48 Mya) of genera Salix and Populus [16,17], the divergence times were estimated based on the 238 single copy genes and pairwise Ks distance metric of the orthologous transcripts ( Table 4). The divergence of subgenus Vetrix and Salix occurred at about 17.6-16.0 Mya in the Salix phylogeny, and S. fargesii diverged at about 10.9-10.6 Mya with other species of subgenus Vetrix (Fig. 4). There were still some inconsistencies on the divergence time of subgenus Vetrix and Salix based on nuclear and plastome genes in previous studies [13,15]. The time of 17.6-16.0 Mya between subgenus Vetrix and Salix supports the value of 16.9 Mya estimated by complete plastome genomes.

Evolutionary pattern of Salix spp. genes
Ka/Ks rate of orthologous genes could reflect the evolution pattern of species. Ka/Ks > 1 indicates that the gene has involved in positive selection during evolution.
In subgenus Salix and S. fargesii (Table 5), 257 and 36 positive selection genes were identified between S. matsudana and S. fargesii, S. matsudana and S. babylonica. Universal-stress genes (NP_001132550.1 and NP_001132238.1) were wildly found between them by producing universal stress protein. Universal stress protein could induce by many environmental stressors such as nutrient starvation, drought, extreme temperatures, high salinity, and the presence of uncouplers, antibiotics and metals [40][41][42].

Salix phylogeny derived by comparative genomics and transcriptomics
In previous studies, Salix phylogenies were usually derived by several nuclear and plastid markers [7][8][9][10][11][12][13][14][15]. Different markers always obtained different phylogenetic trees. Comparative genomics and transcriptomics could make use of more and more nuclear sequences. Phylograms were derived using two methods in this work. 238 single copy genes were strictly selected to construct the phylogenetic tree by maximum-likelihood method, which used most of the sites. Another method is based on the neighbor-joining method using the pairwise Ks values of the orthologous transcripts as a distance metric, which used most of the orthologous. The divergence times of subgenus Vetrix and Salix estimated by two methods were consistent with the value by complete plastome genomes [15]. It is improved that enough single copy nuclear sequences should obtain similar results with enough plastid sequences in the Salix phylogeny.

Paleoclimate change in the divergence of Salix phylogeny
The divergence time of genera Salix and Populus was about 48 Mya at the period of Paleogene (66-23Mya) [43][44][45]. During the Paleogene, the global climate went Table 4 Number and Ks peaks of orthologous genes in 10 Salicaceae species against the hot and humid conditions of the late Mesozoic era and began a cooling and drying trend [23]. As the Earth cooled, tropical plants were restricted to equatorial regions and became less numerous. Deciduous plants became more common which could survive through the seasonal climates, during which Salix and Populus diverged.
Miocene (23 -5Mya) is the main period in the divergence of Salix phylogeny (Fig. 4). During the period, there is evidence of a warm period from 21 Mya to 14 Mya named as the Middle Miocene Climate Transition (MMCT) [23], and the rare pleasant environment might cause the species diversity. The divergence time of subgenus Vetrix and Salix was about 17.6-16.0 Mya corresponding to the period of MMCT. Then global temperatures took a drop and some species were extinct by 14Mya [46][47][48], so the north subgenus Vetrix needed to migrate or adapt in order to survive. One  (Named orth1-orth238). The orthologues were annotated to different function with Gene Ontology (GO) (Additional file 1: Table S3). The sequences of 238 orthologues are shown in Additional file 3 group with S. fargesii diverged from subgenus Vetrix and migrated to south. The resident group of Vetrix had to adapt the cold and drought climate. By 8 Mya, the climate sharply cooled and formed the Quaternary Ice Age (2.6-0.1Mya) [49]. The climate change from MMCT to Quaternary Ice Age should play an important role in the divergence of Salix phylogeny.

Universal-stress genes and migration of S. fargesii
The divergence time of S. fargesii (section Psilostigmatae) was about 10.9-10.6 Mya after the MMCT , and in which period the climate changed to be cooling. Wind and animal pollination had been proved to play an important role in the spread of willows [50][51][52]. Section Psilostigmatae are mainly distributed along the Changjiang (Yangtze) [53], Laantsang and Nujiang river of China (Fig. 1). The river provided the feasibility for the migration of the willow catkins by animal or other pollination, which is consistent with the distribution of Section Psilostigmatae. In previous studies, it was shown that the evolutionary history of the salix has involved multiple reticulation events that may mainly be due to hybridization [13]. Migration of S. fargesii maybe provided the possibility for the hybridization of Salix.
Our result shows that universal-stress genes were identified to be involved in positive selection between S. fargesii and subgenus Salix. It is suggested that selective evolution of universal-stress gene should play an import role in migrating to south for S. fargesii. When S. fargesii moved to south, universal-stress gene can help to produce the abiotic resistances to adapt the complex environment of south areas.

Cold-, light-stress genes and the north resident group of sub genus Vetrix
After the MMCT (21-14 Mya), the global climate came back the cooling and drying trend [23]. In previous studies, it was shown that the evolutionary history of the Salix maybe affected by the profound climatic cooling during the Tertiary [13]. The resident group of section Vetrix had to adapt to the cooling stage especially the high-latitude. Cold-, light-stress genes were widely identified to be involved to positive evolution among S. purpurea, S. suchowensis, S. sachalinensis, S. eriocephala and S. dasyclados. It is suggested that the cool and dry climate had played an important role in the speciation of north group of Section Vetrix.

Conclusions
In this study, we completed the comparative analysis based on genomic and transcriptomic sequences of 9 Salix and one populus species. All pairwise of orthologues were identified in these species, from which we constructed a phylogenetic tree and estimated the rate of diverse. The divergence times were estimated by the comparative analysis, and which suggested the speciation of Salix was involved in the period from MMCT  to Quaternary Ice Age (2.6-0.1Mya). The warm climate of MMCT might cause the divergence of subgenus Vetrix and Salix. Then global temperatures came back to the cool and dry trend by 14 Mya, so willows needed to migrate or adapt in order to survive. The phylogenetic relationship and geography distribution suggest that section Psilostigmatae might migrate from north to south by the Changjiang, Laantsang and Nujiang river of China. Universal-stress genes were involved in positive evolution and could help them to adapt   Fig. 4 Phylogenetic tree of 9 Salix and one Populus species. a NJ tree. Phylogram derived using pairwise synonymous substitution rates of orthologous transcripts as a distance metric (not from multiple sequence alignments) and the neighbour-joining method. b ML tree. Phylogram derived using 238 single copy genes and the maximum-likelihood method. Original trees were shown in Additional file 4: Figure S2 to the south complex environment. Cold-and light-stress genes were identified to be involved in positive evolution among the resident Vetrix. It is suggested the resident Vetrix had to adapt to the cool and dry environment in order to survive. The study shows that the paleoclimate change and selective evolution had played an important role in the divergence of Salix phylogeny.

Data sources
In order to discover the evolutionary pattern of orthologues, the cDNAs and transcripts of 9 Salix and one Populus (out-group) were downloaded from the public databases (  Table S2).

Data filtering and de novo assembly
SRA datasets with FASTQ format were filtered to remove raw reads of low quality. Transcriptome assembly was achieved using the short-read assembly program Trinity [54]. The assembled sequences (> = 300 bp) were combined and clustered with CD-HIT (version 4.0) [55,56]. Sequences with similarity > 95% were divided into one class, and the longest sequence of each class was treated as a unigene during later processing.

Identification of SSRs in 10 Salicaceae species
Putative SSRs in Unigenes and cDNAs were identified by MISA software. The options of Di-to hexa-nucleotide SSRs were set to 6 (for di-), 5 (for tri-and tetra-), and 4 (for penta-and hexa-), and all SSRs were characterized in 10 Salicaceae species.

Identification of orthologues among 10 Salicaceae species
OrthoMCL software [57] was used to cluster the transcribed sequences. Based on the proteins of Salix purpurea as reference, one-to-one sequences of each group were then filtered to use in subsequent analyses. The annotations obtained from Nr were processed through the BLAST2GO program [58] to get the relevant GO terms.
Heatmap of orthologues was draw by R language.

Estimation of synonymous substitution and nonsynonymous substitution rates
In order to remove the unigenes without open reading frames, pair-wise orthologues were searched against plant protein sequences of GenBank with BLASTX tool. The method has been used in previous studies [59]. Clustalw software [60] was used to align the filtered pair-wise orthologues, and the output files were formatted to NUC format for subsequent analysis. The rates of synonymous substitutions (Ka) and non-synonymous substitutions (Ks) were estimated with PAML software [61].

Phylogenetic analysis
There were still some inconsistencies on phylogenetic relationship in previous studies. Phylograms were derived using two methods in this work. Single copy genes by orthoMCL were aligned by Muscle [62] and formated by Gblock [63], maximum-likelihood method was used to build the phylogenetic tree by MEGA6 [64] (bootstrap is 1000 and Kimura 2-parameter model). Another method is based on the neighbor-joining method of MEGA6 [64] using the pairwise Ks values of the orthologous transcripts as a distance metric (Table 4). Populus trichocarpa was used as an out-group to root trees.