Comparative transcriptome analysis of the newly discovered insect vector of the pine wood nematode in China, revealing putative genes related to host plant adaptation

Background In many insect species, the larvae/nymphs are unable to disperse far from the oviposition site selected by adults. The Sakhalin pine sawyer Monochamus saltuarius (Gebler) is the newly discovered insect vector of the pine wood nematode (Bursaphelenchus xylophilus) in China. Adult M. saltuarius prefers to oviposit on the host plant Pinus koraiensis, rather than P. tabuliformis. However, the genetic basis of adaptation of the larvae of M. saltuarius with weaken dispersal ability to host environments selected by the adult is not well understood. Results In this study, the free amino and fatty acid composition and content of the host plants of M. saltuarius larvae, i.e., P. koraiensis and P. tabuliformis were investigated. Compared with P. koraiensis, P. tabuliformis had a substantially higher content of various free amino acids, while the opposite trend was detected for fatty acid content. The transcriptional profiles of larval populations feeding on P. koraiensis and P. tabuliformis were compared using PacBio Sequel II sequencing combined with Illumina sequencing. The results showed that genes relating to digestion, fatty acid synthesis, detoxification, oxidation-reduction, and stress response, as well as nutrients and energy sensing ability, were differentially expressed, possibly reflecting adaptive changes of M. saltuarius in response to different host diets. Additionally, genes coding for cuticle structure were differentially expressed, indicating that cuticle may be a potential target for plant defense. Differential regulation of genes related to the antibacterial and immune response were also observed, suggesting that larvae of M. saltuarius may have evolved adaptations to cope with bacterial challenges in their host environments. Conclusions The present study provides comprehensive transcriptome resource of M. saltuarius relating to host plant adaptation. Results from this study help to illustrate the fundamental relationship between transcriptional plasticity and adaptation mechanisms of insect herbivores to host plants. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07498-1.


Background
For insect herbivores, adaptation to host plants is crucial to their ability to colonize a variety of environments [1]. Host plants produce a variety of allelochemicals including various defense compounds that protect them against herbivores; meanwhile, insect herbivores have developed different means to struggle with the chemical barriers that deter them from feeding [2]. Owing to the variety of plant defense compounds, a generalist herbivorous insect has to overcome a range of chemical challenges [3]. The capacity of herbivores to metabolize and detoxify plant chemicals is considered as one of their main evolutionary adaptations [4]. Although the importance of insect adaptation to plant chemicals is widely recognized, the underlying genetic mechanisms in response to their host plant defenses are still insufficient [3,4].
The pine wood nematode (PWN; Bursaphelenchus xylophilus) is a plant parasitic nematode and major cause of pine wilt disease in Asia and Europe [5]. The transfer of PWN between host trees is mediated by insect vectors, e.g., various species of Monochamus beetles [6,7]. In Asia, PWN infection mainly occurs during feeding and oviposition of the Japanese pine sawyer Monochamus alternatus Hope [5]. Besides M. alternatus, the Sakhalin pine sawyer M. saltuarius (Gebler) (Coleoptera: Cerambycidae) is another important insect vector of PWN in Japan [8] and Korea [9]. Recently, M. saltuarius has also been confirmed as an effective vector of PWN in Liaoning Province, China [10][11][12]. The Korean white pine Pinus koraiensis Siebold & Zucc, a tree species of economic importance [13], was found to be a natural host for the PWN in the Republic of Korea in 2006, and M. saltuarius transmitted PWN to P. koraiensis [14]. Similarly, M. saltuarius was found to transmit PWN to P. koraiensis in China [10,12]. In addition, Han et al. [15] investigated the feeding and oviposition preference of M. saltuarius among eight tree species, including P. koraiensis, and they found that the highest feeding amount and oviposition preference were related to P. koraiensis. Similarly, Pan et al. [16] reported that adults of M. saltuarius preferred P. koraiensis than P. tabuliformis Carr. and Larix kaempferi (Lamb.) Carr. based on feeding behavior. Volatiles produced by host plants, e.g., α-pinene, are known to attract Monochamus spp. [17,18]. Adults of M. saltuarius can be attracted by terpenes emitted from the host plant P. koraiensis for feeding and oviposition [19,20]. In addition, host volatiles also play an important role in the mating location of longhorned beetles [21]. Therefore, the distribution pattern in the adults of M. saltuarius can be affected by host volatiles.
In many organisms, including insect species, larvae/ nymphs are unable to disperse far from the oviposition site selected by the mother [22]. Consequently, oviposition host selection can strongly impact both the survival and the spatial distribution of a species [23], and the structure and composition of animal communities [24]. Female adults of M. saltuarius lay their eggs on the bark of pine trees. After hatching, the larvae feed on the inner cambium bark and outer sapwood. Because adults of M. saltuarius prefer P. koraiensis over P. tabuliformis for feeding and oviposition, coupled with the weakened dispersal ability at the instar stage, the larvae of M. saltuarius may be confronted with different chemical challenges posed by their different hosts. However, the molecular mechanisms underlying host plant adaptation of M. saltuarius larvae are largely unknown.
Detecting transcriptional changes related to host adaptation is a vital link to understand plant-insect interactions [3,25,26]. Previous studies have proved that transcriptional plasticity of insects was related to diet. For instance, research on host adaptation in cactophilic flies, e.g., Drosophila mojavensis, D. buzzatii, and D. mettleri, have identified a series of genes associated with carbohydrate metabolism, cellular energy production, xenobiotic metabolism, and stress response [2,25,27]. Research on the striped stem borer Chilo suppressalis, Zhong et al. [26] identified several genes involved in host plant adaptation processes, including digestion and detoxification. Larvae of the Asian long-horned beetle Anoplophora glabripennis modulate a subset of genes associated with digestion when fed on a nutrient-poor, compared to a nutritious diet [28]. In addition, Scully et al. [29] showed that feeding on two appropriate host plants (Acer spp. and Populus nigra) modified the expression levels of multicopy genes involved in digestion and detoxification in A. glabripennis. Recently, Hou & Wei [30] examined the transcriptional changes of the cicada Subpsaltria yangi, on a varied diet of different host plants. The authors suggested that gene expression changes, relating to digestion, detoxification, oxidoreductase metabolism, and stress response, may be a vital adaptation to diet and habitat.
With the rapid development of sequencing technology, research into the insect transcriptome is increasing [31,32]. However, de novo transcriptome assembly represents a challenge for non-model insect species, because it generally relies on the use of short cDNA sequences (such as Illumina technology). Recently, single-molecule real-time sequencing (SMRT-seq) technology has been applied to generate long sequence reads, allowing the production of full-length transcripts without assembly algorithms [33]. SMRT-seq has been reported to provide inaccurate information on genes, which could be calibrated based on Illumina reads from matched samples [34]. Therefore, the combination of SMRT-seq and Illumina RNA-seq can be used to obtain comprehensive genetic information, including for the detection of gene isoforms and functional variants [35,36].
In the present study, the free amino and fatty acid composition and content of the two host plants of M. saltuarius, categorized as either the "preferred" P. koraiensis or "non-preferred" P. tabuliformis, was investigated. The genome-wide transcriptional profiles of M. saltuarius larvae feeding on P. koraiensis and P. tabuliformis was compared by combining SMRT-seq and Illumina RNA-seq analysis. Our aim was to identify differentially expressed genes (DEGs) in M. saltuarius relating to host plant adaptation based on diet. The results provide new information for further research on the mechanisms underlying transcriptional plasticity and adaptation of insect herbivores to different host plants. Furthermore, understanding the molecular differences of M. saltuarius when feeding on different hosts may provide significant enlightenment for the arrangement of host resistance in the control of PWN transmission.
For Illumina sequencing, 36.91 Gb high quality sequences were obtained from the six mRNA samples of M. saltuarius. The guanine-cytosine (GC) content of data sequenced from the six libraries was~42%, and the percentage of reads with an average quality score > 30 was above 93% (Table 2). This result indicated that the accuracy and quality of the sequenced data were sufficient for further analysis. The Illumina sequencing reads were not assembled alone because more than 85% of them mapped to the 32,304 non-redundant transcripts ( Table 2).
In total, 13,144 transcripts were assigned Gene Ontology (GO) terms, which were classified into the three major GO categories (Additional file 5: Figure  S4). For the biological process classification, genes involved in 'cellular process', 'single-organism process', and 'metabolic process' were highly represented. For the cellular component, the major categories were 'cell', 'cell part', and 'organelle'. For the molecular function classification, 'binding' was the most enriched GO term, followed by 'catalytic activity'. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis shows that the matched 27,077 transcripts are assigned into 336 pathways. The most wellrepresented metabolic pathways are involved in 'global and overview maps', 'carbohydrate metabolism', 'lipid metabolism', and 'amino acid metabolism' (Additional file 6: Figure S5).

DEG analysis
We evaluated the differences in gene expression between the population feeding on P. koraiensis and P. tabuliformis. It resulted in 2166 DEGs identified in the larvae of M. saltuarius feeding on P. tabuliformis (Pt) compared with P. koraiensis (Pk), including 970 upregulated genes and 1196 downregulated genes (Additional file 9: Table  S3; Additional file 10: Figure S7).
In this study, transcriptional changes related to host plant adaptation in M. saltuarius was the main focus. We identified 21 DEGs associated with digestion in the comparative set 'Pt vs Pk', encoding three carbohydrases and 18 proteases. Most of these were upregulated in P. tabuliformis (Fig. 3a). In addition, we identified 12 DEGs related to protease inhibitor, including eight serine proteases and four trypsin inhibitors (Fig. 3a).
Solute carriers (SLC) are a group of membrane transport proteins, which mediate the transport of various At least one database 29,798 All database 9262 Fig. 2 Venn diagram of the number of lncRNAs predicted using coding-non-coding index (CNCI), coding potential calculator (CPC), coding potential assessment tool (CPAT) and protein family (Pfam) database substrates across cells, including ions, nucleotides, sugars, and amino acids. We identified 27 DEGs encoding solute carriers in the comparative set 'Pt vs Pk' (Fig. 3b), which may mediate the influx or efflux of substance and involve in the osmoregulation in the host adaptation of M. saltuarius. The serine/threonine protein kinase (STK) target of rapamycin, a central element of an evolutionarily conserved eukaryotic signaling pathway, is known to act as a central regulator of cell metabolism and to respond to growth factors and nutritional status. In the present study, we identified 25 DEGs encoding STKs and seven DEGs encoding serine/threonine phosphatases (STPs) (Fig. 3c). In addition, AMP-activated protein kinase (AMPK) serves as an important regulator of cellular metabolism and energy balance. One gene encoding AMPK was found upregulated in the comparative set 'Pt vs Pk' (Fig. 3c).
Fatty acids are a significant energy store for insects. Four DEGs encoding fatty acid synthase (FAS) were identified in the comparative set 'Pt vs Pk' (Fig. 3d). In addition, six genes encoding elongation of very long chain fatty acids protein (ELOVL) were differentially expressed in the population feeding on P. tabuliformis when compared with P. koraiensis (Fig. 3d). Besides FAS and ELOVL, fatty acyl-CoA reductase (FAR), which can convert fatty acids to alcohols, performs a crucial role in lipid synthesis and metabolism. Ten DEGs encoding FARs were identified in the comparative set 'Pt vs Pk', including eight upregulated in the population feeding on P. tabuliformis (Fig. 3d).
Insect herbivores should be able to deal with defense compounds and adverse environment when obtaining nutrients from their host plants. In the present study, detoxification-related DEGs were identified, including 11 cytochrome P450 monooxygenases (P450s), three UDPglycosyltransferases (UGTs), seven carboxylesterases (CEs), and 14 ATP-binding cassette (ABC) transporters (Fig. 4a). Among which, ten P450s, two UGTs, six CEs, and eight ABC transporters were upregulated in the population feeding on P. tabuliformis compared with P. Fig. 3 Heatmap of normalized FPKM of DEGs related to a digestion, b putative osmoregulation, c sensing availability of nutrients and energy, d fatty acid and lipid metabolism. The Z-score represents the deviation from the mean by standard deviation units. The firebrick color indicates upregulated expression, whereas the navy color indicates downregulated expression. FPKM: fragments per kilobase of transcript per million fragments mapped; Pk: the larvae feeding on Pinus koraiensis; Pt: the larvae feeding on P. tabuliformis koraiensis (Fig. 4a). We identified three aldehyde dehydrogenases (ALDHs), four aldose reductases, two senecionine N-oxygenases (SNOs), and two glucose dehydrogenases, most of which were upregulated in the population feeding on P. tabuliformis (Fig. 4a). We also found that DEGs encoding peroxidase, i.e., five catalases (CAT), one glutathione peroxidase (GPx)-like, and one peroxiredoxin (Prx)-6-like, were mainly upregulated in the population feeding on P. tabuliformis compared with P. koraiensis (Fig. 4a). These genes might involve in defense response against oxidative stress, e.g., reactive oxygen species (ROS) intake in the feeding behavior. In addition, we found that one peptide methionine sulfoxide reductase (MSRA) gene was upregulated in the population feeding on P. tabuliformis (Fig. 4a), which may help repair proteins inactivated by oxidation.
Heat shock family 20 and 70 proteins serve as chaperones for damaged proteins in wood-consuming insects. In the present study, ten genes encoding heat shock proteins (Hsp), including seven Hsp70 and three Hsp68, were upregulated in the population feeding on P. tabuliformis compared with P. koraiensis (Fig. 4b). Additionally, other DEGs involved in the stress response were also identified, including 11 genes encoding E3 ubiquitin ligase, one gene encoding ubiquitin conjugating enzyme E2G1, and one gene encoding ubiquitin conjugation factor E4B (Fig. 4b).
Plant-derived compounds may interfere with the production of chitin and cuticular protein, which compels insect herbivores to adjust the production of these structural constituents. In the present study, three genes encoding chitinase, and one gene encoding cuticular Fig. 4 Heatmap of normalized FPKM of DEGs related to a detoxification and oxidation-reduction, b stress response, c structural and general odorant binding proteins, d antibacterial and immune response. The Z-score represents the deviation from the mean by standard deviation units. The firebrick color indicates upregulated expression, whereas the navy color indicates downregulated expression. FPKM: fragments per kilobase of transcript per million fragments mapped; Pk: the larvae feeding on Pinus koraiensis; Pt: the larvae feeding on P. tabuliformis protein were differentially expressed in the population feeding on P. tabuliformis compared with P. koraiensis (Fig. 4c). Additionally, eight genes encoding odorantbinding protein/general odorant-binding protein (OBP/ GOBP) were differentially expressed in the comparative set 'Pt vs Pk' (Fig. 4c).
Insects always interact with a wide array of pathogens, e.g., pathogenic bacteria. Insects can prevent infection by synthesizing antibacterial proteins such as cecropin, insect defensin, large glycine-rich protein, small prolinerich protein, and lysozyme [37,38]. In addition, leucinerich repeat (LRR)-containing proteins play important roles in pathogen-associated molecular pattern recognition to fight infection by pathogens. In the present study, we found two genes encoding antimicrobial peptide 1like isoform X1, one gene encoding defensin-1-like, three genes encoding glycine-rich proteins, three genes encoding proline-rich proteins, and two genes encoding gamma-interferon-inducible lysosomal thiol reductase (GILT), were differentially expressed in the comparative set 'Pt vs Pk' (Fig. 4d). Additionally, we identified 11 DEGs encoding LRR containing proteins in the population feeding on P. tabuliformis compared with P. koraiensis (Fig. 4d).

Enrichment pathway analysis of DEGs
To analyze functions of the DEGs, all were mapped to terms in the KEGG database. KEGG pathways with a Pvalue < 0.05 are provided in Additional file 11: Table S4, including insect hormone biosynthesis (ko00981), tryptophan metabolism (ko00380), cysteine and methionine metabolism (ko00270), and glycine, serine, and threonine metabolism (ko00260) (Fig. 5). As mentioned above, the content of glycine and threonine in P. tabuliformis was higher than that in P. koraiensis (Fig. 1a). This suggests that the "glycine, serine and threonine metabolism" pathway may play a key role in the host plant adaptation of M. saltuarius.
Following this, GO term enrichment analysis was performed. The strongest changes in the top 20 GO categories are shown in Additional file 12: Figure S8, including "oxidoreductase activity (GO:0016491)" of molecular function, and "alpha-amino acid metabolic process (GO:1901605)" of biological process.

Validation of RNA-seq data by qRT-PCR
To validate the RNA-seq results, the relative expression levels of 12 selected genes were analyzed by real-time quantitative PCR (qRT-PCR). The genes and primers used for qRT-PCR are shown in Additional file 13: Table S5. Among the 12 genes, the majority showed a consistent expression pattern between RNA-seq and qRT-PCR (Fig. 6a). The correlation analysis results for these detected DEGs are as follows: y = 0.3864x + 0.2839, and R 2 = 0.759 (Fig. 6b), indicating that RNA-seq data were reliable.

Discussion
Herbivorous insects are hypothesized to express distinctive digestive enzymes to feed on host plants with different nutritional values [39,40]. Indeed, some metabolismrelated genes are differentially expressed to adapt to the different biochemical compositions of host plant diets [2,  [42] revealed dietspecific protease expression patterns in the cotton bollworm Helicoverpa armigera responding to nutritionally distinct host plants. They suggested that serine proteases play a crucial role in this polyphagous insect to adapt to a diet of many different host plants [42]. The butterfly larvae of Polygonia c-album were demonstrated to adapt similarly to host plant diet [43]. In order to examine genes related to host plant utilization, Mason et al. [28] compared gene expression changes of the larvae of Anoplophora glabripennis feeding on a preferred host (the sugar maple) to those consuming a nutrient-rich artificial diet. Recently, Scully et al. [29] examined how feeding on two susceptible (Acer spp. and Populus nigra) and a resistant host (Populus tomentosa) affected the gene expression of A. glabripennis.
In our present study, the two host plants differed in their free amino and fatty acid composition and content; thus, they were expected to have different nutritional values for M. saltuarius. In particular, the glycine and threonine content in the host plant P. koraiensis was higher than that in P. tabuliformis. KEGG pathway analysis of DEGs identified several pathways associated with amino acid metabolism, including glycine, serine and threonine metabolism. This result suggests that the different nutritional quality between the two hosts may require M. saltuarius to express different digestive enzymes.
The serine/threonine kinase TOR (target of rapamycin) serves as a central regulator of cell growth and cellular energy [44,45]. Both protein kinases and their cognate phosphatases participate in sensing external stimuli [46][47][48]. In addition, AMPK is also known as the "energy sensor" of a cell [49]. Interestingly, AMPK and TOR are functionally interconnected, opposing signaling pathways associated with sensing the status of nutrients and energy [50]. In the present study, the genes encoding AMPK, STKs, and STPs, may be relevant to host adaptation of M. saltuarius by playing an important role in sensing the availability of nutrients and energy for the regulation of cell growth.
Fatty acids are important energy stores in insects [51] with FAS being a key enzyme for its synthesis. Animals can gain fatty acids from their diet, then generate saturated and monounsaturated fatty acids through FAScatalyzed synthesis [52][53][54]. Fatty acid synthesis includes cycles of four reactions, in which each cycle extends an initial acetyl-CoA by two carbons. The cycles can be repeated up to seven times to generate palmitic acid (C16: 0) [55]; further growth requires elongases of long or very long chain fatty acids [56]. In addition, the FAR gene family, which can convert fatty acids to fatty alcohols, also performs a crucial role in lipid synthesis and metabolism. In insects, fatty alcohols can act as precursors in the production of pheromones and cuticular hydrocarbons [57,58]. For instance, Li et al. [59] reported that FARs are requisite for cuticle shedding, and are involved in cuticular hydrocarbon production in the destructive rice pest Nilaparvata lugens. In the present study, a substantially higher content of most fatty acids was found in the host plant P. koraiensis compared with P. tabuliformis. Moreover, four, six, and ten genes encoding FAS, ELOVL, and FAR, respectively, were differentially expressed, suggesting that they may be involved in fatty acid metabolism when M. saltuarius is confronted with different fatty acid content between the two host plants.
Besides extracting nutrients from host plants, insects need to cope with toxic chemical deterrents produced by their hosts. For instance, P. tabuliformis can produce defensive monoterpene, in which α-pinene is the most abundant [60]. Besides α-pinene, volatile organic compounds (VOCs) from P. tabuliformis also includes limonene, β-pinene, and α-caryophyllene [61]. Xu et al. [62] found that the VOCs from P. koraiensis included αpinene, β-pinene, and sabinene. Insects can successfully survive on their host plants by producing detoxifying enzymes, e.g., P450s, UGTs, and CEs [25,30,41]. In the present study, we found that 11 P450, seven CE, and three UGT, and most of them were upregulated in the larvae feeding on P. tabuliformis compared with P. koraiensis. Also involved in detoxification are ABC transporters, which act as membrane-bound proteins for the transport of various substrates across the lipid membrane, including drugs and insecticides [63,64]. In our study, 14 genes encoding ABC transporters were differentially expressed, including the up-regulation of eight, in larvae feeding on P. tabuliformis compared with P. koraiensis, indicating that these genes may involve in detoxification metabolism during the host plant adaptation of M. saltuarius.
In our study, we found that three, two, and six genes encoding aldehyde dehydrogenases, senecionine N-oxygenases, and aldo-keto reductases (AKRs), respectively, were differentially expressed, suggesting their involvement in plant chemical detoxification by M. saltuarius. The aldehyde dehydrogenases are oxidizing enzymes that are involved in detoxification of both exogenous and endogenous aldehydes [65]. In addition, senecionine N-oxygenase, a flavindependent monooxygenase, may help insects cope with pyrrolizidine alkaloids, as observed in the larvae of the European cinnabar moth Tyria jacobaeae [66]. The AKRs are a superfamily of dehydrogenases/reductases that catalyze the synthesis and detoxification of carbonyls [67]. Previously, AKRs have been hypothesized to play an important role in the degradation of woody tissue [28]. In this study, we therefore speculate that genes encoding AKRs may not only be involved in detoxification but may also assist M. saltuarius to digest woody tissue.
Insect herbivores are frequently challenged with ROS which are a by-product from the metabolism of molecular oxygen [68]. Some herbivorous insects have evolved a protective response to ROS by producing detoxifying enzymes, including glutathione peroxidase (GPx), catalase (CAT), superoxide dismutase (SOD), and ascorbate peroxidase [69]. For instance, in a serious pest on wheat crops (the Hessian fly Mayetiola destructor), when feeding on resistant wheat seedlings, the larvae upregulates expression of phospholipid glutathione peroxidases, catalases, and superoxide dismutases, to counteract ROS [68]. In our study, we found that five genes encoding CATs, one GPx-like, and one Prx6-like gene, were upregulated in the larvae feeding on P. tabuliformis compared with P. koraiensis, suggesting that they may involve in the defense against ROS intake during feeding.
As well as battling toxic chemicals from host plants, insects also must deal with a wide array of pathogens. Insects combat infection by mounting a powerful immune response [70]. Insect immune system contains two major aspects: the cellular and humoral responses [71]. The humoral response includes synthesis of antimicrobial proteins [72], which are grouped into five main types: cecropins, insect defensins, large glycine-rich proteins, small proline-rich proteins, and lysozymes [38]. The LRR is a highly conserved motif usually consisting of 20-30 residues rich in leucine, and a LRR domain, which is an important binding component for immunerelated proteins [73]. In the insect Manduca sexta, a soluble, extracellular leucine-rich repeat protein (leureptin) could bind to bacterial lipopolysaccharide (LPS) and involve in hemocyte responses to bacterial infection [74]. GILT is involved in the bacterial immune response in various organisms [75]. Recently, empirical studies demonstrated that pinewood nematode infection increased the microbial diversity in pines [76,77]. In the present study, genes encoding antimicrobial peptide 1-like isoform X1, defensin-1-like, glycine-rich proteins, prolinerich proteins, LRR domain containing protein, and GILT, were differentially expressed by M. saltuarius larvae, coupled with the difference in host preference of adults acting as vector of PWN, suggesting that larvae of M. saltuarius might confront with different bacterial challenges in their host environments.
Gene expression patterns identified in this study, might not just be altered by differential host plant diets, but also genetic variations in the sample populations [78], and local environments (i.e., temperature) [79]. In the present study, the two sample sites were approximately 40 km apart; thus, they may share similarities in their natural environment. Molecular identification of cytochrome c oxidase subunit I (COI) sequence distances of sampled individuals from the two sites were between 0.0 and 0.00897, indicating little genetic divergence between M. saltuarius populations. Previously, similar results have been reported from comparable sampling strategies [26,30,80]. Therefore, the different gene expression patterns detected in the present study may be mainly caused by host plant diet. Further research is needed to verify that the DEGs detected in our analysis are due to host adaptation mechanisms in M. saltuarius.

Conclusions
In this study, firstly, we investigated the free amino and fatty acid composition and content of the host plants of M. saltuarius larvae, i.e., P. koraiensis and P. tabuliformis. Compared with P. koraiensis, P. tabuliformis had a substantially higher content of various free amino acids, while the opposite trend was detected for fatty acid content. Then, we compared the transcriptional profiles of larval populations feeding on P. koraiensis and P. tabuliformis using PacBio Sequel II sequencing combined with Illumina sequencing. The results showed that genes relating to digestion, fatty acid synthesis, detoxification, oxidation-reduction, and stress response, as well as nutrients and energy sensing ability, were differentially expressed, possibly reflecting adaptive changes of M. saltuarius in response to different host diets. Additionally, genes coding for cuticle structure were differentially expressed, indicating that cuticle may be a target for plant defense. Differential regulation of genes associated with the antibacterial and immune response were also observed, suggesting that larvae of M. saltuarius may have evolved adaptations to cope with bacterial challenges in their host environments. The results from this study help to elucidate the underlying relationship between transcriptional plasticity and adaption mechanisms of herbivorous insects to host plants.

Larvae and host plant sample collection
Both male and female adults of M. saltuarius can spread exceeding 5 km throughout their whole life cycle [81]. To obtain representative M. saltuarius samples exclusively feeding on P. koraiensis, fourth-instar larvae of M. saltuarius were collected from Cangshi Forest Farm, located in Qingyuan Manchu Autonomous County, Liaoning Province (41°59′24.72′′N, 124°31′41.17′′E), in September 2019. To obtain representative M. saltuarius samples exclusively feeding on P. tabuliformis, fourthinstar larvae of M. saltuarius were collected from Cangshi Village, Fushun County, Liaoning Province (41°53′ 42.67′′N, 124°20′25.06′′E), on the same day in September, 2019. The two sample sites were approximately 40 km apart. Some of the larval samples were flash-frozen upon collection using liquid nitrogen, then transferred and stored at − 80°C for subsequent RNA extraction. The remaining samples were transferred alive to the laboratory and reared on their corresponding hosts until development into instars. In addition, three sample logs of each host plant were collected for amino and fatty acid composition and content analysis.

Host plant amino and fatty acid composition and content analysis
Three host plant samples from each Pinus spp. were used to investigate the composition and content of free amino acids according to the method described by Zeng et al. [82]. A 0.5 g sample powder was mixed with 50 mL hydrochloric acid (0.005 mol/L) for ultrasonic extraction for 35 min at 40°C using ultrasonic cleaner. The homogenate was centrifuged at 15000 rpm for 15 min, then filtered with a 0.22 μm membrane. Amino acid content analysis was carried out by an A300 amino acid analyzer (membraPure Bodenheim, Germany) with a column packed with ion-exchange resin. Amino acid concentration was calculated by calibrating with external standards (Sinopharm Chemical Reagent, Beijing, China).
Similarly, three host plant samples from each Pinus spp. were used to investigate the composition and content of fatty acids. The crude fat were extracted from the samples with petroleum ether (boiling point 40-60°C) using a Soxhlet extractor. Fatty acid methyl ester (FAME) was prepared according to the PORIM Test Method [83]. The GC-MS systems (Trace1310/ISQ, Thermo Fisher Scientific, USA) equipped with a flame ionization detector and a TG-5 MS capillary column (30 m × 0.25 mm × 0.25 μm) were used for fatty acid composition analysis. Chromatographic parameters were set as follows: the initial temperature was 80°C for 1 min then raised to 200°C at a rate of 10°C/min, which was then increased to 250°C at a rate of 5°C/min, before being set to 270°C at a rate of 2°C/min and held for 3 min. The flow rate of carrier helium gas was 1.2 mL/ min. FAMEs were identified by comparing their retention times with those of authentic standards (Sigma-Aldrich Chemie GmbH, Deisenhofen, Germany). Quantification of fatty acids was carried out based on the molecular weight of their corresponding FAMEs.

Molecular identification of larvae
DNA barcode of mitochondrial COI was employed to ensure that the collected larvae belonged to the same species of M. saltuarius. A total of six COI sequences (677 bp) was amplified using the primer set LCO149/ HCO2198 [84] from six representatives were obtained, i.e., three larvae feeding on each Pinus spp. The COI sequences of all individuals had little divergence, with the distances from 0.0 to 0.00897 falling within the range of genetic distance among M. saltuarius populations [85]. This, coupled with morphological characteristics, indicates that the larvae collected from different host plants all belong to M. saltuarius.

Total RNA extraction and library construction
Total RNA was extracted with TRIzol reagent (Life Technologies, USA) according to the manufacturer's instructions. RNA quality was assessed by a 1% agarose gel and the concentration was determined using a Nano-Drop 2000 spectrophotometer (Thermo Fisher Scientific, USA). RNA integrity was examined using the Agilent Technologies 2100 Bioanalyzer system (Santa Clara, CA) with a RNA integrity number cutoff greater than 7.
To obtain the complete information of all transcripts, SMRT-seq (PacBio) was applied in this study. The best RNA sample (2 populations × 3 replicated samples) was selected and then pooled together in equal quantity for SMRT-seq. Full-length cDNA was synthesized using the SMRTer PCR cDNA Synthesis Kit (Biomarker, Beijing). PacBio Sequel II sequencing reactions from one SMRT cell (1-6 kb) were performed.
For Illumina RNA-seq, six RNA samples (2 populations × 3 replicated samples) were used. Then, the six libraries for sequencing were generated using NEBNext® Ultra™ RNA Library Prep Kit (NEB, Beverly, MA, USA) according to the manufacturer's recommendations. Subsequently, the prepared libraries were sequenced on an Illumina NovaSeq 6000 platform (2 × 150 bp).
PacBio sequencing data processing SMRTlink 6.0 software was used to process the PacBio SMRT-seq raw reads. The CCS was obtained from the subreads.bam file. Sequencing adapters were trimmed, then clean CCS were classified into either full-or nonfull-length isoforms based on cDNA primers and poly-A tail signal. To improve consensus accuracy, Iterative Clustering for Error Correction (ICE) and Arrow algorithm (https://downloads.pacbcloud.com/public/ software/installers/smrtlink_5.0.1.9585.zip) were used to obtain high quality isoforms and full-length sequences. Additional nucleotide errors in consensus reads were corrected using the Illumina RNA-seq data with the software LoRDEC (Helsinki, Finland) [86]. BUSCO [87] was used to explore completeness according to conserved ortholog content.

Mapping and differential gene expression analysis
The pair-end Illumina reads were aligned to the reference transcriptome using Bowtie 2 (version 2.2.9) [94]. FPKM (fragments per kilobase of transcript per million fragments mapped) was used to estimate transcript expression levels in all samples [95]. Differential expression analysis between the larvae feeding on P. koraiensis and P. tabuliformis was implemented using the DEGseq R package (version 1.10.1) [96]. The threshold of DEGs was set at q-value < 0.05 and absolute fold change > 2. Heatmaps in this study were generated with a free online platform, OmicShare tools (http://www.omicshare.com/ tools/) based on normalized FPKM data.
GO enrichment analysis of the DEGs was implemented using the GOseq R package (version 1.10.0) based on the Wallenius non-central hyper-geometric distribution to adjust gene length bias in DEGs [97]. The statistical enrichment of DEGs in KEGG pathways was tested using KOBAS (KEGG Orthology-Based Annotation System) (version v2.0.12) [98].

Validation of DEGs with qRT-PCR
The real-time quantitative PCR (qRT-PCR) assay was conducted to validate the results of our transcriptome sequencing analysis. Reverse transcription was performed using the PrimeScript first strand cDNA synthesis kit (Takara, China) according to the manufacturer's protocol. The gene-specific primers were designed using Primer5 (PREMIER Biosoft International, USA). The ribosomal protein S3 gene was used for internal control. Quantitative reactions were performed on the Real-Time PCR Detection System (ABI 7500, Applied Biosystems, USA) with the SYBR Premix Ex Taq™ Kit (Takara, China). The qPCR setting was as follows: 95°C for 5 min, followed by 40 cycles of 95°C for 10 s and 60°C for 30 s, melt curves stages at 95°C for 15 s, 60°C for 1 min, and 95°C for 15 s. To check reproducibility, the qPCR reaction for each sample was performed in triplicate. Log 2 (fold change) qPCR was calculated by −ΔΔCt = − [(Ct Pt -Ct nom ) -(Ct Pk -Ct nom )] [99].