A chromosome-level genome assembly of the soybean pod borer: insights into larval transcriptional response to transgenic soybean expressing the pesticidal Cry1Ac protein

Background Genetically modified (GM) crop plants with transgenic expression of Bacillus thuringiensis (Bt) pesticidal proteins are used to manage feeding damage by pest insects. The durability of this technology is threatened by the selection for resistance in pest populations. The molecular mechanism(s) involved in insect physiological response or evolution of resistance to Bt is not fully understood. Results To investigate the response of a susceptible target insect to Bt, the soybean pod borer, Leguminivora glycinivorella (Lepidoptera: Tortricidae), was exposed to soybean, Glycine max, expressing Cry1Ac pesticidal protein or the non-transgenic parental cultivar. Assessment of larval changes in gene expression was facilitated by a third-generation sequenced and scaffolded chromosome-level assembly of the L. glycinivorella genome (657.4 Mb; 27 autosomes + Z chromosome), and subsequent structural annotation of 18,197 RefSeq gene models encoding 23,735 putative mRNA transcripts. Exposure of L. glycinivorella larvae to transgenic Cry1Ac G. max resulted in prediction of significant differential gene expression for 204 gene models (64 up- and 140 down-regulated) and differential splicing among isoforms for 10 genes compared to unexposed cohorts. Differentially expressed genes (DEGs) included putative peritrophic membrane constituents, orthologs of Bt receptor-encoding genes previously linked or associated with Bt resistance, and those involved in stress responses. Putative functional Gene Ontology (GO) annotations assigned to DEGs were significantly enriched for 36 categories at GO level 2, respectively. Most significantly enriched cellular component (CC), biological process (BP), and molecular function (MF) categories corresponded to vacuolar and microbody, transport and metabolic processes, and binding and reductase activities. The DEGs in enriched GO categories were biased for those that were down-regulated (≥ 0.783), with only MF categories GTPase and iron binding activities were bias for up-regulation genes. Conclusions This study provides insights into pathways and processes involved larval response to Bt intoxication, which may inform future unbiased investigations into mechanisms of resistance that show no evidence of alteration in midgut receptors. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10216-2.


Background
The development of resistance to insecticidal agents by insect pests is a threat to sustainable agricultural production practices [1,2] and human health [3].The number of insect species with resistance to one or more insecticides has increased rapidly since the post-World War II development and subsequent widespread use of broadcast synthetic chemical agents [4,5].These resistant phenotypes result from mutations that disrupt the insecticidal mode of action (MoA), including those that alter the kinetics of target receptor binding, increase rates of detoxification and excretion, or cause behavioral avoidance [6].Genetically modified (GM) cotton and maize that express one or more pore-forming 3-domain crystalline (Cry) pesticidal proteins or vegetative insecticidal protein 3A (Vip3A) derived from Bacillus thuringiensis (Bt) [7] are currently planted on most cultivated hectares in North and South America [8,9].Field populations of target insect pests have henceforth evolved practical resistance to crops that express Bt proteins [10].Bt resistance in field populations has occurred despite implementation of insect resistance management (IRM) plans which rely on high-dose refuge (HDR) strategies [11][12][13].Resistance arguably occurs when one or more assumptions of the HDR strategy are violated [14], and include instances when resistance alleles are non-recessive [15,16], insects feed on plant tissues with a dose of Bt that does not cause mortality among heterozygous genotypes carrying one resistance allele resulting in functional resistance [17][18][19], or resistant phenotypes lacked fitness costs that were proposed to delay the onset of widespread resistance [20,21].These factors raise concerns for the long-term sustainability and efficacy of Bt technology [22].
The proposed MoA for Cry1A proteins includes the sequential binding model where pesticidal proteins bind membrane-anchored midgut receptors that potentiate the formation of pore channels that in-turn cause cell swelling and death via osmotic imbalance [23,24].Additionally, a signal transduction model proposes intracellular signaling and induction of the protein kinase A (PKA) pathway causes cell death (apoptosis) [25,26].Among species of Lepidoptera, resistance in selected laboratory colonies and field populations is associated with structural or functional changes in midgut receptors [27], including ATP binding cassette (ABC) transporters [28,29], cadherin, tetraspanin, alkaline phosphatase and aminopeptidase N [30].Genes in the mitogen-activated protein kinase (MAPK) pathway also have been linked or associated with resistance [31][32][33][34], wherein this pathway may impact the expression of Bt receptors themselves [35].Additional evidence suggests that up-regulation of genes encoding constituents of insect peritrophic membranes may decrease susceptibility when exposed to Bt proteins [36,37] or Bt bacterial infections [38].Additionally, increased proteolytic degradation of Bt in the gut or repair mechanisms have been proposed [30].
The efficacy of the HDR strategy and other IRM tactics may arguably be enhanced through the greater understanding of the molecular mechanisms involved in resistance [1].Furthermore, cellular responses to Bt intoxication even in susceptible insects may provide insights into Bt MoA and points at which disruption might lead to resistance [39].Among such comparisons between susceptible or resistant insects exposed or unexposed to Bt proteins, hundreds or thousands of transcripts are often differentially expressed [39][40][41][42][43][44][45][46][47].In some instances, differentially expressed transcripts are enriched for those putatively functioning in general stress response pathways (e.g.cytochrome P450 monooxygenases, esterases, oxidases, and peroxidases) and having transporter function [45], involved in cell survival mechanisms [39] or cell repair and immune function [47].Additionally, orthologs and paralogs of genes putatively functioning as Bt receptors in the insect midgut (cadherin, aminopeptidases N and ABC transporters) are differentially expressed between Bt intoxicated susceptible insects compared to unintoxicated cohorts [39,43,45].Role of these genes in response to intoxication remains uncertain, but may be informed following further studies [39].
The soybean pod borer, Leguminivora glycinivorella (Matsumura) (Lepidoptera: Tortricidae), is a destructive agricultural pest insect of cultivated soybean, Glycine max (Fabales: Fabaceae) (L.) [48].This insect has a univoltine life cycle with mature 4th instars that enter the soil and form cocoons in hollowed chambers during mid-to late-September wherein they diapause.After approximately 8-9 months larvae pupate and emerge as adults in early summer [49].Female oviposition and larval L. glycinivorella feeding primarily occurs on a narrow range of leguminous plants including G. max, wild soybean, G soja, and the shrubby relative of pea, Sophora flavescens.Larval feeding primarily occurs on the pods of G. max, which causes substantial reductions in grain yield and quality [50,51].High infestations can lead to 10-30% yield loss and exceed 50% during severe outbreaks [52].The endemic geographic range of this species includes Japan and Korea, and across eastern China.The most severe outbreaks tend to be in the northeastern Heilongjiang, Jilin, and Liaoning Provinces, which coincides with areas where approximately two-thirds of soybeans are grown in China [53], making L. glycinivorella a primary pest of this crop.Although foliar insecticides are applied, effective contact and reduction in feeding damage is difficult to achieve due to protection of larva under G. max leaf canopies and within pods [54].Use of systemic insecticides can lead to more consistent reductions in L. glycinivorella feeding damage, but ecological concerns limit their use especially given evidence of impacts on pollinator health [55].
Cultivars of G. max selected for host plant traits can reduce levels of L. glycinivorella feeding damage [56], but do not always prevent significant economic loss to growers.Transgenic expression of insecticidal molecules by G. max show promise as an L. glycinivorella control tactic.For instance, damage is reduced on plants that express double stranded RNA (dsRNA) that elicits an RNA interference (RNAi)-based knockdown of a serine protease in L. glycinivorella larvae [57,58].Pesticidal Bt proteins expressed by transgenic G. max reduce damage by several lepidopteran pests [59][60][61].Commercial use of Bt cotton expressing the Bt Cry1Ac protein was approved by China in 1997 [62].Since 2016 Chinese government directives aim to commercialize GE cotton, maize and soybean for domestic use [63] and has culminated in recent changes that allow for their widespread use [64].Bt IRM strategies for cotton pests have been established in China [65].Investigations into non-target effects and ecological assessments of Bt crops [66] and target insect baseline Bt susceptibilities have also been conducted precommercialization for many crops in China [67].However, investigation into potential for Bt resistances among many target pests is lacking.
In this study, we used L. glycinivorella as a model to determine the effects of Bt intoxication on larval gene expression following exposures to a transgenic G. max cultivar that expresses the Bt Cry1Ac pesticidal protein [61], informing future research into Bt MoA and potential mechanisms of resistance.This was accomplished by predicting differentially expressed genes (DEGs) between susceptible L. glycinivorella larvae feeding for 2 days on Bt Cry1Ac compared to conventional non-Bt cultivars.An annotated chromosome-level genome assembly for L. glycinivorella was generated and provided resources to investigate transcriptional effects of sublethal Bt exposures.Changes in gene expression among target insects following exposure to Bt transgenic crops, such as those documented in this study, provide insights into cellular responses.Such insight may also inform future research regarding potential adaptive mechanisms that lead to field-evolved resistance.

Genomic library construction, sequencing, and read filtering
Output from the Oxford Nanopore PromethION P48 provided 6.4 million reads spanning 66.6 Gb of raw read data for library LglyNP generated from a single male larva collected in Jilin Province, China (Fig. 1) (Table S1; S1) of which 97.29% had a q ≥ 20.All raw genomic read data are available in the NCBI Sequence Read Archive (SRA; Table S1) under BioProject PRJNA759210.

K-mer based genome size estimates
Both GCE and GenomeScope2 output showed a frequency distribution with a large number with low coverages at kmer = 19 representing sequencing error within the short Illumina PE reads, along with a homozygous (1 N) peak at a kmer depth of 60 (Fig. 2).The greater frequency of different kmers in the peak at a depth of 36 represented the heterozygous portion for this diploid organism.The two additional peaks at 92 and 126 corresponding to duplicated heterozygous and homozygous fractions, respectively.Genome size estimates differed between GenomeScope2 (587.2Mb; Fig. 2A) and GCE (652 Mb; Fig. 2B) while percentage unique sequence remained consistent at 47.8% (52.8% repetitive).An estimated heterozygosity of 2.05% was obtained from both GenomeScope2 and GCE.

Contig assembly
A final set of 225 contigs were obtained from the initial NextDenovo assembly of Illumina and Oxford Nanopore reads after duplicates were purged and error corrected using PE short read data.Contigs in this 657.4Mb L. glycinivorella haploid assembly, ilLegGlyc1.0,ranged from 0.13 to 37.63 Mb in size with an N50 of 4.2 Mb (Table 1) and contained no gaps (Ns).Alignment of filtered short PE 150 bp reads to these 225 contigs resulted in 99.74% coverage at a mean depth of 85.48-fold among the 99.3% of reads that aligned.BUSCO assessment of genome completeness as gauged by representation of the  2) indicated 92.9% were complete [91.0%single copy (S) and 1.9% duplicated (D)] and 0.8% fragmented BUSCOs (F), while 6.3% were missing (M).

Larval transcriptome response to feeding on transgenic G. max expressing Cry1Ac
Preprocessing filters removed transcript alignments with low read coverage and representation across replicates, and resulted in "smoothed" mean variance plots at gene (Figure S3A) and transcript levels (Figure S3B).Subsequent read counts normalized by log2(CPM) reduced inter-specific variance for the remaining 13,092 genes (Figure S4) and 18,561 transcripts (Figure S4B).A majority of the 13,092 gene models (n = 10,077; 77.0%) had single assigned transcripts.Principal Component Analysis (PCA) of mean log2(CPM) values, following correction for outlier effects, showed intraspecific clustering of replicates within Cry1Ac and non-Bt treatments at the gene level (Fig. 4A), where PC1 and PC2 accounted for 41.24 and 25.62% of the variance, respectively.From these estimates of read abundances among the 13,092 filtered gene set, there was a range of -12.40 ≥ Log2(fold-change) ≤ 14.47 between larval L. glycinivorella Cry1Ac and non-Bt treatment groups (Table S5a).A total of 204 genes were predicted to show significant DGE (Fig. 4B; Table S5b), and 10 genes to have significant DAS (Fig. 4C; Table S5c).Among the significant DGE predictions, 64 and 140 were up-and down-regulated (Fig. 5), respectively.The gene LOC125237008, a putative GTP cyclohydrolase, with significant DGE was also predicted to have significant DAS between encoded transcripts XM_048143924.1 and XM_048143925.1.Additionally, 12 significant DTU events were predicted (Table S5d), which by nature of their definitions, encompassed transcript isoforms assigned to the 10 genes with significant DAS (Table S5e; significant DTU compared to other transcripts derived from respective loci.There were 104 unique Pfam domains predicted in the 204 (PfamScan E-values ≤ 1.0 × 10 -11 ; remaining data not shown), from which 203 associated curated GO terms were retrieved across CC, BP and MF categories from level 1 to 4 (Table S6).These encompassed 2, 22, and 12 CC, BP and MF categories at GO level 2, respectively (Table 3).The most significantly enriched terms for CC and BP categories were vacuolar part and microbody, and secondary metabolic and organic hydroxy compound metabolic processes, respectively.Most significant enrichment was predicted for fatty acid synthase activity and oxidoreductase activities among MF terms.Simultaneous assessment of significance, directional bias of DEGs (up-or down-regulated), and number of DEGs in each enriched GO category (Fig. 6) showed that BP GO:0008610 (lipid biosynthetic process), GO:0015849 (organic transport), GO:0032787 (monocarboxylic acid metabolism), and GO:0046394 (carboxylic acid biosynthesis) are most significantly enriched and represent groups of genes biased for reduced expression.The MF terms GO:0003924 (GTPase activity) and GO:0005506 (iron binding) contained genes with a positive (increased) expression bias, whereas DEGs in GO:0004312 (fatty acid synthase activity) and GO:0016903 (oxidoreductase activity) were mostly down-regulated in Cry1Ac exposed larvae.Six genes in enriched GO categories are putatively in cytochrome P450 superfamilies, all of which were significantly down-regulated.Genes with putative function in chitin maintenance were significantly upregulated [chitinase (LOC125235932, LOC125235952, and LOC125235918 Table S7a Additionally, some genes previously identified as involved in Bt resistance among species of Lepidoptera showed significant DGE (Table 4; Table S5b; Table S7ac).These included two ABC transporters in subfamily C and G members and 4 tetraspanin orthologs (Table 4), all of which were assigned to significantly enriched GO categories.Significant DGE was also predicted for an up-regulated membrane-bound aminopeptidase N (apn) and down-regulated alkaline phosphatase (alp) orthologs.The L. glycinivorella gene, LOC125230228, encoding two RefSeq annotated alp transcript isoforms, XM_048135319.1 and XM_048135320.1,was significantly down-regulated ion in Cry1Ac exposed larvae (Log2FC = -2.10;FDR = 0.0136; Table S5a).An amino acid sequence alignment with Heliothis virescens mALPs (HvmALP) predicted L. glycinivorella mALP protein XP_047991276.1 (isoform X1) encoded by XM_048135319.1 and XP_047991277.1 (isoform X2) encoded by XM_048135340.1,had 48.12 and 46.99 percent identity with HvmALP1 and HvmALP2, respectively (remaining results not shown).
Phylogenetic reconstruction of the aminopeptidase N gene family among orthologs from B. mori, L.  S5a).

Chromosome-level genome resources
Investigation of Bt intoxication in this study was facilitated by L. glycinivorella genomic resources comprised of a structurally annotated chromosome-level genome assembly.The 27 autosomes and a Z-chromosome in the scaffolded assembly, ilLegGlyc1.1, is expected based on karyotype data suggesting the ancestral chromosome number (N) among Lepidoptera in the Family Tortricidae, N = 30.This chromosome number is reduced to 28 in Subfamily Olethreutinae [68] to which L. glycinivorella belongs (Taxomony ID 10351111).Due to the ZW/ZZ female/male sex determination system of Lepidoptera, the homogametic male assembly ilLegGlyc1.1 lacks a W-chromosome, as does assemblies from other tortricids (Table 2).Although not investigated here, the Z-chromosome of tortricid moths is of interest due to predicted ancestral autosomal fusion [69], and involvement in speciation [70] and ecotype variation among Lepidoptera [71].Nearly 90% of the 23,735 NCBI RefSeq gene annotations for ilLegGlyc1.1 are supported by transcript evidence (Table 1), and provides a resource for current and future research.

Transcriptional responses to transgenic Bt pesticidal plant exposure
The development of phenotypes in insect populations that are resistant to Bt insecticidal proteins reduce the efficacy of control strategies and pose a threat to the sustainability of current agricultural production practices.Mechanism(s) of lepidopteran resistance to Bt crystalline (Cry) insecticidal proteins are reported to involve structural or functional changes to the proteins that mediate binding and pore formation in the gut, alter proteolytic cleavage of native Bt toxins, or increased immune, cellular regeneration, and toxin sequestration capacities [30,[72][73][74][75]. Additionally, intracellular protein kinase signaling pathways are involved in Bt MoA [25] and are altered in resistance insects [34].The role of mutations in genes tetraspanin [76] and kinesin [77] in Bt Cry1Ac toxin resistant Helicoverpa sp. and MoA of Bt intoxication remain enigmatic.Even though Bt Cry1A proteins may interact with different midgut receptor proteins compared to Cry1F [78], and the Bt vegetative insecticidal protein, Vip3, interacts with a different set of midgut receptors compared to Cry1 proteins [79,80], a metaanalysis provided evidence for weak cross resistance between Cry1 and Vip3A [81].These prior studies suggest that Bt MOA and points wherein disruption(s) lead to resistance are not yet fully understood, which could be assisted by further investigation of molecular and cellular impacts of Bt intoxication [37,39].Specifically, this would involve expanding knowledge of the molecular interactions and pathways that invoke cell death or recovery and repair following Bt intoxication, which ultimately determine organismal survival or mortality, and be informed by molecular responses of susceptible insects when exposed to Bt proteins.In our study, relatively few genes (n = 204) were predicted to be significantly differentially expressed between L. glycinivorella larvae feeding on Cry1Ac (cultivar GP03-8-23) and a non-Bt cultivar compared to analogous results from prior studies involving species of Lepidoptera.Specifically, prior analyses predicted thousands of DEGs between Bt and non-Bt exposed lepidopteran larvae [41][42][43]47].Also, despite 104 of these 204 predicted DEGs being significantly down-regulated in Cry1Ac exposed L. glycinivorella (68.6%), there was a bias for down-regulation among genes assigned to significantly enriched GO categories (Fig. 6; 89.4% (17 of 19), 78.3 (54 of 69), 78.8% (78 of 99) in enriched CC, MF, and BP categories, respectively).Only exceptions were genes in MF categories GO:0003924 (GTPase activity) and GO:0016247 (iron ion binding) that were weakly biased for up-regulation.Although not investigated further, differences with prior studies in Lepidoptera may be influenced by different larval exposure times, level of exposure (dose), or other unaccounted for factor(s).Due to use of on-plant treatments (exposures) in our study, the absolute dose remains unknown compared to prior reports where exposures were to known concentrations of purified Bt protein in artificial diet bioassays [41][42][43]47].Regardless, clustering of PCA of Log 2 CPM estimates suggests relative responses that are homogeneous within and heterogeneous between L. glycinivorella larval treatment groups (Fig. 4A), indicating on-plant assays invoke a reproducible molecular response.It remains possible that transcriptional responses to undescribed differences in host plant resistance (HPR) factors between Bt and conventional cultivars may overlap with responses to Bt Cry1Ac exposures or be induced in both our treatments.This confounding factor could relegate a portion of genes involved in general response to Bt and unrelated HRP traits as non-significant and may be partially explanatory of the relatively small number of DEGs predicted in this study.

Changes in pathways associated with cell stress responses
Our results provide insights into the effects of feeding on transgenic Bt crops on gene expression by a susceptible pest insect and suggest a set of putative genes and pathways may be involved in physiological responses.Although functional GO annotations are extrapolated from model organisms, they are yet to be validated in L. glycinivorella.Enriched CC, BP and MF categories offer a baseline for interpreting potential systemic effects of feeding by susceptible insects on Bt compared to non-Bt plants.For instance, the MF category oxidoreductase activity was the second most significantly enriched GO category among L. glycinivorella DEGs (Table 3), agreeing with prior evidence that pathways which remediate oxidative stress are affected by Bt intoxication [82].Genes encoding cytochrome P450 monooxygenases were down-regulated in Cry1Ac exposed L. glycinivorella, suggesting further cellular efforts to reduce the accumulation of reactive oxygen species (ROS).A number of BP categories for metabolic processes are modulated after Cry1Ac exposure.Metabolic changes were also predicted in response to oxidative stress in H. armigera [83], and more generally following exposures to insecticidal toxins [39,84,85].Cellular repair and survival mechanisms are also accompanied by increased metabolic rates [86,87] and are mechanisms by which Cry1Ac resistant H. armigera respond to Cry1Ac exposure [47].These lines of evidence may support a hypothesis that changes in metabolic processes predicted in this study could be a consequence of or in support of stress response and repair mechanisms induced by Cry1Ac exposure in susceptible L. glycinivorella larvae.
The two enriched CC categories in our study, vacuole part and microbody, interestingly were also significantly enriched among genes down-regulated in larvae of the coleopteran D. v. virgifera following exposures to transgenic Bt Cry3Bb1-and Tpp34/Gpp35Ab1 maize [39].This commonality could point to involvement of recycling damaged cell components and adaptation to stress in Bt exposure responses.Specifically, microbodies were described as a suite of single membrane-enclosed organelles [88] that encompass lysosomes and peroxisomes [89].Peroxisomes are involved in lipid biosynthesis and house oxidative reactions that form peroxides [90], and are self-replicating and provisioned by importation of enzymes.Lysosomes are endosomal vesicles produced by the trans-Golgi network that function in autolysis of intracellular components and houses machinery involved in autophagy, a conserved eukaryotic mechanism that degrades and recycles damaged cellular components to maintain homeostasis [91].For terminology's sake, the vacuole (yeast and plants) is synonymous with the lysosome (animalia) [92].Aspects of cell stress are mediated by functions of lysosomes [93] and peroxisomes [94], which may include autophagy as a response to Bt infection or toxin exposure [95,96].
Other studies implicate an apoptotic response to Bt toxins [39,82,97,98].Similar to a prior study in D. v. virgifera [39], a putative lifeguard-1-like protein (LG1) with a B-cell-lymphoma protein 2 (Bcl-2)-associated X (BAX) inhibitor (Bax1-I) domain was significantly up-regulated in L. glycinivorella response to Bt exposure.Bax1-I domain proteins are involved in the negative regulation of apoptotic signaling pathways through the remediation of cell stress, where lifeguard does so by reduction of stress in the Golgi and endoplasmic reticulum [99].In our study, LG1 is encoded by L. glycinivorella LOC125231679 and is significantly up-regulated following Cry1Ac exposure.When considered in conjunction with modulation of pathways that remediate damage by ROS, LG1-mediated suppression of apoptosis may indicate survival mechanisms are engaged following Bt-induced cellular damage [39], and similarly occur among Cry1Ac-exposed L. glycinivorella larvae.Granted these general conclusions are drawn upon comparisons among a relatively limited set of experiments, but meta-analyses of these and future studies may shed light on conserved underlying pathways involved in Bt intoxication and potentially inform steps whereby resistance might develop.

Differential-regulation of putative Bt binding protein coding genes
Proteins in the midgut of Lepidoptera function as receptors in a sequential binding model for pore-forming Cry1A pesticidal proteins [23,24,100].Interestingly, genes encoding orthologs of some of these receptors show significant differential expression between susceptible L. glycinivorella fed on Cry1Ac transgenic G. max compared to those fed on a non-Bt cultivar (Table 4).It should be cautioned that changes in transcript levels are not necessarily reflected in corresponding protein levels, and latter have not been validated.Regardless, two transcripts encoding putative glycosylphosphatidylinositol (GPI)-anchored membrane-bound Bt receptor proteins, alkaline phosphatase (mALP) and aminopeptidase (APN), are differentially regulated.Specifically, we predicted that a L. glycinivorella malp is significantly down-regulated in Cry1Ac exposed larvae (Table S5a), and that two RefSeq protein isoforms encoded by L. glycinivorella malp are orthologous to H. virescens HvmALP1 and HvmALP2 that show a strong correlation between reduced transcript and protein levels in Cry1Ac resistant strains of H. virescens [101].Additional evidence shows that mALP is a functional receptor for Cry1A and Cry2Ab2 proteins [102], but not Vip3A [103], where N-acetylgalactosamine modification is required for Cry1Ac binding [78].Furthermore, constitutively reduced mALP or malp levels are documented in Cry1Ac resistant laboratory strains of Helicoverpa armigera and Spodoptera frugiperda [101], Cry1Ab resistant Ostrinia nubilalis [43] and Cry1C resistant Chilo suppressalis [104], but not in Cry1Ab resistant Diatraea saccharalis [96].Ours is the first known report of induced malp down-regulation following Cry protein exposure among Lepidoptera, but expression of several alp gene family members decreased in Aedes aegypti following intoxication with insecticidal Bt subsp.israelensis [105].The role or consequence of these changes following Bt intoxication in L. glycinivorella and other insects remains unknown.
Our results also predicted the significant up-regulation of gene LOC125242528 in Cry1Ac cultivar exposed larvae (Table 4), which encodes a membrane-bound alanyl aminopeptidase (APN).Phylogenetic reconstructions among members of this gene family in lepidopteran species predict 8 [106] or 16 clades [107].Our corresponding interspecific phylogenetic reconstruction predicts 16 major clades corresponding to those in B. mori, wherein L. glycinivorella gene expansion is most abundant in an APN05 clade with three paralogs as compared to one in each B. mori [108] and O. nubilalis [109] (Figure S5).Protein products from LOC125242528 cluster adjacent to members of an APN07 clade that also includes products from a L. glycinivorella APN-encoding gene, LOC125242529.Our phylogenetic analysis also indicates that the up-regulated L. glycinivorella APN7-like gene LOC125242528 is not orthologous with APN1 and APN3, where one or both apn1 and apn3 are down-regulated in Cry1A resistant Helicoverpa armigera [110], Trichoplusia ni [111], O. nubilalis [112], O. furnacalis [44], and Cry1C resistant S. exigua [113].Although some APN orthologs are functional Cry1A receptors [114], confer resistance in knock-out strains of Plutella xylostella [115], and interact with oligomerized Bt protein pre-pore structures to facilitate membrane insertion [23,24], APNs have a diversity of physiological roles or are not expressed in gut tissues [107].Specifically, apn1, 2, 4, 5, 6, 9, and 12 are expressed in midgut of larval B. mori, and all of these but apn12 are up-regulated at 6 h postinfection with a strain of Bacillus bombyseptieus [107].Initial apn transcript up-regulation similarly occurred in analogous experiments, but significantly down-regulated by 24-h post B. bombyseptieus infection [116].Our study shows a putative L. glycinivorella APN07-like transcript being significantly up-regulated 48-h post exposure to Cry1Ac-expressing G. max.Since there is no evidence of this APN07-like protein or orthologs functioning as Bt receptors, observed changes could be indicative of proteolytic changes involved in other cellular processes [117,118].Although not investigated further, differences in expression among APN gene family members following Bt exposure may reside in differences in exposures times and method of protein exposure (bacteria, transgenic plant, or artificial diet incorporated).
We predicted that the expression of an ABC transporter from subfamily C (ABCC) was reduced in Cry1Ac G. max exposed L. glycinivorella larvae (Table 4).ABC transporters are implicated in Bt resistance following discovery of an insertional knockout of ABCC2 linked to Cry1Ac resistance in H. armigera [119] and subsequent associations shown in other species [30,120].Interestingly, down-regulation of abcc2 and abcc3 was associated with Cry1Ac resistance in Plutella xylostella [121], where these paralogs show functional redundancy as Bt receptors in H. armigera [122].Despite this evidence, the ABCC subfamily member down-regulated in L. glycinivorella Cry1Ac-exposed larvae was assigned as a member 4 ortholog (ABCC4), which has no evidence as being a functional Cry1Ac receptor among species of Lepidoptera.Regardless, ABC transporters function the eflux of various cellular substrates [123] including metabolites from xenobiotic breakdown in insects [28], and are involved in mammalian stress-response pathways [124].The down-regulation of ABCC4 in L. glycinivorella Cry1Ac-exposed larvae remains perplexing, but was not investigated further in this study.
Tetraspanin paralogs are significantly up-(n = 3) or down-regulated (n = 1) in Cry1Ac-exposed L. glycinivorella compared to unexposed cohorts (Table 4).Tetraspanins are an evolutionarily conserved gene family of proteins structurally composed of four transmembrane domains, and intra-and extracellular domains [125].Binding of different ligands to a unique extracellular domain of each paralog transduce signals responsible for cellular responses such as migration, adhesion, and intracellular trafficking [88].There are 36 paralogs in the D. melanogaster genome [126], and functional redundancies among paralogs have made it difficult to resolve independent roles in cellular processes [125,127].Mutational change at an amino acid position (L31S) in the first transmembrane domain of an H. armigera tetraspanin, HaTSPAN1, is linked to Cry1Ac resistance, and although HaTSPAN1 transcript levels are also reduced 2.7-fold there is no alteration in Cry1Ac binding [76].Transcripts encoding tetraspanin orthologs were also upregulated among susceptible D. v. virgifera larvae feeding on transgenic Cry3Bb1 and Gpp34/Tpp35Ab1 maize [39].Although members of the tetraspanin gene family are involved in insect immune response [128] and recent evidence of differential regulation of immunity genes in Cry1Ac exposed strains of H. armigera [47], there is no analogous evidence from our study to suggest tetraspanin effects on immunity (Table 3; Table S5b).Thus, the role of tetraspanins in larval L. glycinivorella response to transgenic Cry1Ac G. max remains unknown.
It should be noted that there is a functional difference between Bt binding proteins and Bt receptor, where the latter is defined as proteins that facilitate membrane insertion [129].Within this framework, ABC transporters and cadherin are purported to be major and "accessory' functional receptors, respectively.Thus, changes in these two receptors my incur changes in resistance.Despite this, there instances where resistance has evolved outside of modulation of known receptor interactions [30,76,77], suggesting that gains in understanding of the Bt MoA might facilitate deciphering those mechanisms involved.

Genomic library construction, sequencing, and read filtering
A hybrid short-and long-read approach was used to assemble L. glycinivorella contigs.For this, greater than 20 live 4th instar larvae were collected from a cultivated non-Bt G. max field at the Jilin Academy of Agricultural Sciences, Northeast Agricultural Research Center (JAAS-NARC), Gongzhuling, Jilin Province, P.R.China on 20 August 2020 (BioSample SAMN21160035; Isolate SPB_JAAS2020; Table S1).Samples were brought to the lab and frozen alive in a -80 ℃ freezer.A subset of samples then shipped on dry ice to Benagen Tech Solutions Ltd. (Wuhan, China) where genomic libraries were constructed and sequenced.In brief, total genomic DNA was extracted from a single male L. glycinivorella larva (larva #1; Table S1) using an in-house modified CTAB protocol, quality checked by 0.75% agarose gel electrophoresis, and quantified on a Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA).Genomic DNA was selected for fragments > 15 kb using AMPure XP beads (Beckman-Coulter Life Sciences, Indianapolis, IN, USA), repaired using the NEBNext FFPE DNA Repair Mix (New England Biolabs, Ipswich, MA, USA), and used as input for the SQK-LSK109 Ligation Sequencing Kit according to manufacturer instructions (Oxford Nanopore, Oxford, UK).Following quantification and purification, the largeinsert library was loaded into R9.4Spot-On Flow Cells and run on a PromethION P48 Sequencer (Oxford Nanopore).GUPPY v 4.0.2(Oxford Nanopore) was used to convert image files to base calls and remove failed reads from the raw data, then trim reads of adapters and those with a Phred quality score (q) < 7.
Remaining DNA (< 15 kb) from L. glycinivorella larva #1 (Table S1) was sheared and fragments used to construct an indexed short insert library with the Nextera DNA Flex Library Prep Kit according to manufacturer instructions (Illumina, San Diego, CA, USA).Library quality was assessed on an Agilent 2100 Bioanalyzer Instrument (Agilent Technologies, Santa Clara, CA, USA), and quantitative PCR (qPCR) was used to estimate effective concentration as described previously [130].Then 150 bp paired end (PE) reads were generated from the short-insert library on a single S4 flow cell lane of a NovaSeq 6000 (Illumina).Fastq formatted short read data was trimmed of adapters and sequence with low quality (q < 5), and PCR duplicates removed using inhouse scripts at Benagen Tech Solutions Ltd.

K-mer based genome size estimates
A kmer depth-frequency distribution was generated for short Illumina reads using kmer_freq from which the mean heterozygosity, repetitive fraction, duplication level, and genome size were estimated using the Genome Character Estimator (GCE) v.1.0.0 (https:// github.com/ fanag islab/ GCE) [131] with default settings.This k-mer distribution was used as input to estimate genome size using the Lander-Waterman equation [132] and other parameters the R script GenomeScope2 [133].

Scaffolding
Chromatin Conformation Capture (Hi-C) libraries were prepared by Benagen Tech Solutions Ltd. (Wuhan, China) according to methods described previously [142].For this, chromatin was cross linked to DNA by formaldehyde treatment in vitro, then DNA digested with DpnII, and the 5' overhangs blunted by DNA pol I Kelnow fragment incorporation of biotinylated nucleotides.Following ligation of proximal blunted DNA ends and removal of histone crosslinks, genomic DNA was purified to remove protein and unincorporated biotinylated nucleotides.These purified ligation products were then sheared into 350 to 700 bp fragments, and short read libraries constructed using the NEB Next Ultra Library Prep kit (New England Bio-Labs, Ipswich, MA, USA), followed by purification using streptavidin beads and PCR enrichment.The library was quantified using Quant-iT ™ PicoGreen ™ dsDNA Assay Kit (Thermo-Fisher, Wilmington, DE, USA) on a Qubit ® 2.0 fluorometer (Thermo-Fisher) and a qPCR-based method [130].Library insert sizes were estimated following separation on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc.).PE 150 bp reads were then generated from library inserts on a single S2 flow cell lane of an Illumina NovaSeq 6000 sequencer (Illumina).
All post-processing of Hi-C read data was performed using HiCUP v.0.7.2 [143] to generate di-tags (PE reads located on different contigs) and used for downstream scaffolding.Specifically, HiCUP v.0.7.2 was used to trim raw reads of sequencing adapters, linker sequence, and low-quality nucleotides (q < 30) followed by truncation of reads to sequence upstream of the DpnII site.Trimmed reads were then aligned to the draft contig assembly using Bowtie 2 [144].Any di-tags resulting from self-circularization, representing dangling ends or internal fragments (map to the same DpnII site), too distant from a DpnII site in the primary contig assembly, or repetitive PCR duplicates were also removed.The valid unique aligned reads were then used to predict interand intra-contig interactions, and define putative order and orientation along chromosomes.Juicebox v1.11.08 (https:// github.com/ phase genom ics/ juice box_ scrip ts) was used to manually curate the contact-map wherein mis-joins were eliminated, inverted contigs re-oriented, and contigs joined into scaffolds.

Gene annotation
The final set of chromosome-assigned L. glycinivorella scaffolds were submitted to the National Center for Biotechnology Information (NCBI) for automated eukaryotic genome annotation pipeline [146].Evidence used to predict NCBI reference sequence (RefSeq) gene models were based on 135.4 million RNA-seq reads in the NCBI Sequence Read Archive (SRA) runs SRR5985984-SRR5985989 previously generated from 1st instar L. glycinivorella larvae [147,148].Additional evidence was generated in this study as life-stage specific reads (Libraries Lgly00 to Lgly06c; Table S1) from samples of eggs, larvae, and adults collected from fields of cultivated G. max at JAAS-NARC during July and early August, 2021 (n = 5; BioSamples SAMN24169476 to SAMN24169480).For the latter, samples were immediately flash frozen in liquid nitrogen in the field, then sent to Benagen Tech Solutions Ltd (Wuhan, China) for RNA extraction, construction of indexed RNA-seq libraries, and generation of 150 bp PE read data on a single S4 flow cell lane of a NovaSeq 6000 (Illumina).De novo repeat detection relied upon NCBI running of WindowMasker [149].
Further phylogenetic analyses performed for putative candidate Bt resistance genes, membrane-bound aminopeptidase N (apn) and alkaline phosphatase (alp), where orthology was uncertain based on RefSeq structural annotation.The Clustal W algorithm was used in the MEGA8.0alignment utility [150] to construct a multiple protein sequence alignment of 16 APN proteins from B. mori (n = 16; [107]), O. nubilalis (n = 9; [109,112]) and 27 L. glycinivorella RefSeq protein models with putative alanyl-aminopeptidase annotations encoded by 20 RefSeq gene models.A query of the Conserved Domain Database with the O. nubilalis APN1 (accession ACJ64827.1)predicted a 460 aa Peptidase M1 domain that was used to define the orthologous region in the multiple sequence alignment.Variation in the Peptidase M1 domain was subsequently used for subsequent phylogenetic analysis.The BIC score was maximized for this trimmed alignment in the LG model of protein sequence evolution [151] with an empirically-determined gamma distribution (LG + G), and MEGA8.0 [150] used to construct a phylogeny from among sites with ≥ 80% representation among aligned residues using Maximum Likelihood (ML) based on a consensus of 1,000 bootstrap pseudo-replicates.

Larval transcriptome response to feeding on transgenic G. max expressing Cry1Ac
The G. max cultivar, GP03-8-23, that express 3.75 μg/g of the Bt Cry1Ac pesticidal protein [61] and a conventional (non-Bt) cultivar were grown to R3 stage in a greenhouse at JAAS at 24 ± 1 °C, 60% relative humidity (RH), and 16:8 h light:dark (L:D).Leguminivora glycinivorella adults were collected from non-Bt G. max fields at JAAS-NARC in July 2021, placed into a cage, and eggs collected.Subsequent neonates were fed detached conventional (non-Bt) G. max leaves placed on moistened filter paper in 15-cm Petri plates and incubated in a growth chamber (26 ± 1 °C, 60-70% RH, and 16:8 L:D photoperiod).Second instars were transferred to Cry1Ac GP03-8-23 or non-Bt G. max plants (treatments), one larva per plant, across three replicates of 20 plants per treatment (60 plants per treatment; 120 total).Larvae were allowed to feed for 48 h.Recovered larvae were pooled from within each of the 3 replicates for Cry1Ac GP03-8-23 and non-Bt treatments (n = 7 larvae per treatment: n = 42 total larvae) for a total of 6 pools (Table S1; SAMN24169481 to SAMN24169486).Pools were flash frozen in liquid nitrogen and shipped to Benagen Tech Solutions Ltd. (Wuhan, China) where RNA was extracted and indexed RNA-seq libraries constructed for each Cry1Ac GP03-8-23 (Glyc05a_1Ac to Glyc05c_1Ac) and non-Bt treatment (Lgly06a_nBt to Lgly06a_nBt).Equimolar amounts of each library loaded on a single S4 lane of a NovaSeq 6000 (Illumina) on which 150 bp PE reads were generated.
Raw non-normalized fastq formatted read data were trimmed to remove nucleotide sequence with a q < 20 and Illumina adapters using Trimmomatic v0.39 [152].Only PE reads from each library that survived filtering were used for generating pseudoalignments to L. glycinivorella transcript models (NCBI accession GCF_023078275.1) with Kallisto v0.46.1 [153] (parameters: -fragment-length = 200; -sd 20; -bootstrapsamples = 100, -seed = 42).Estimated read counts for each transcript pseudoalignment was used to predict differential gene expression (DGE; aggregated estimate of transcript isoforms), differential alternative splicing (DAS), and differential transcript use (DTU) events between treatments of L. glycinivorella fed Cry1Ac cultivar GP03-8-23 (Lgly05a_1Ac to Lgly 05c_1Ac) vs. non-Bt cultivar (Lgly06a_nBt to Lgly06a_nBt; Table S1) using the 3D RNA-seq package v 1.0.0 (https:// github.com/ wyguo/ Three DRNAs eq) [154] run in R 4.4.2[155] via the integrated development environment, Rstudio 2022.07.2 + 576 [156].A transcript mapping file consisting of ≥ 1 RefSeq mRNA model (XP_) from each gene (LOC|GeneID) was provided as input [23,735 transcripts (XM_) and 2,055 non-coding RNAs (XR_) from 17,151 RefSeq gene models].Pre-processing involved conversion of read counts to length-scaled transcripts per million (lengthScaledTPM) [156], and filtered to remove genes and transcripts with a decreasing meanvariance trend; those in which ≥ 1 of 6 replicates with counts per million (CPM) ≥ 1. Read counts were normalized to log 2 CPM via trimmed mean of M-values (TMM) [157].Then counts were corrected for batch outlier effects using the RUVr package [158] run in R 4.4.2[155], and Principal Component Analysis (PCA) used to assess corrected mean log 2 CPM estimates among genes (aggregate of estimates for all transcripts for each gene) from each replicate along principal coordinates 1 (PC1) and 2 (PC2).Limma-voom [159] was then used to estimate the log 2 (fold-change) of transcript abundances based on CPM values between treatment groups [Cry1Ac (Lgly05a_1Ac to Lgly 05c_1Ac) vs. non-Bt (Lgly06a_nBt to Lgly06a_nBt)] and determined a false discovery rate (FDR) adjusted for multiple comparisons using the Benjamini-Hochberg (BH) method [108].Significance for DGE was defined at an FDR threshold ≤ 0.05 and log 2 (fold-change) ≥ 2.0.DAS and DTU events were predicted based on change in percentage spliced transcripts (∆PS; change in ratio of mean transcript abundances weighted by the mean abundance for all transcripts for the given gene) [160] as calculated in the 3D RNA-seq package [154].In brief, ∆PS was calculated for genes with > 1 transcript and DAS only events defined as instances where proportional changes in transcript abundances are predicted without corresponding significant changes in expression when all transcript isoform levels are considered.DTU (DGE + DAS) events are those when significant changes in overall gene expression and relative transcript abundances are predicted.Significance of DAS and DTU estimates were determined using an F-test with a BH adjusted FDR ≤ 0.05 and ∆PS ≥ 0.1.
Functional annotation was performed for significant differentially expressed RefSeq transcripts by prediction of protein domains by searching the Pfam-A database [161] with PfamScan (https:// www.ebi.ac.uk/ Tools/ pfa/ pfams can/) using each corresponding RefSeq proteins as the query, with results filtered for E-values ≤ 10 -10 .To avoid biased representation in instances when multiple transcripts were derived from same gene (LOC/GeneID), the longest RefSeq protein model was used to query Pfam-A.Pfam domains were then used to retrieve curated gene ontology (GO) terms and perform enrichment analyses at GO level 2 using the dcGOR 1.0.6 package [162] based Z-scores calculated on hypergeometric distributions of terms and a BH significance threshold (FDR) ≤ 1.0 -4 applied for cellular component (CC) and molecular function (MF), and ≤ 1.0 -6 biological process (BP) category terms.GOplot [163] was used to construct a bubble plot (GObubble) where -log(FDR) from GO enrichment was plotted against a Z-score that took into account expression bias among DEGs [Z-score = (count up-regulated -down-regulated genes)/sqrt(total number of genes)] with positive and negative values indicative of greater proportional of up-and down-regulated genes in a given GO category, respectively.Size of bubbles proportional to number of DEGs in each category.
N50 = 19.6 kb L50 = 1.1 M), of which 61.6 Gb among 5.7 million reads remained post-filtering (N50 = 10.8 kb L50 = 1.0 M).A high proportion of short reads failed to surpass the q > 7 cutoff (Fig S1A), but the number of filtered bases was less skewed (Fig S1B).The final set of filtered reads had a mean q = 12.35 (Fig S1C) with read lengths up to 246.3 kb.A total of 386.7 M short 150 bp PE reads were generated covering 58.0 Gb of nucleotide data (Table

Fig. 1 Fig. 2
Fig. 1 Approximate geographic distribution of Leguminivora glycinivorella across China, North Korea (NK), South Korea (SK), and Japan (JP; partial).Areas with greatest damage to cultivated soybean, Glycine max, indicated in darker green.Location of collected BioSample SAMN21160035 used in genome assembly is indicated by red pin

Fig. 3
Fig. 3 Leguminivora glycinivorella genome scaffolding and assignment to chromosomes.A chromosome conformation capture (Hi-C) contacts among 237 contigs defining 28 chromosomes in the L glycinivorella assembly ilLegGlyc1.1.B Whole genome alignment between ilLepGlyc1.1 and the Cydia Splendana assembly, ilCydSple1.2,defining putative orthologs and the Z chromosome (based on 5,625 alignments of 1711 ± 937 bp spanning at total of 0.8 Mb; Additional details in TableS3and FigureS2)

Fig. 4
Fig. 4 Effects of larval Leguminivora glycinivorella exposures to transgenic Glycine max cultivar GP03-8-23 expressing the Bacillus thuringiensis (Bt) Cry1Ac pesticidal protein compared to a non-Bt cultivar.A Principal component analysis (PCA) plot of mean log 2 CPM estimates among transcripts for genes along principal coordinates 1 (PC1) and 2 (PC2) showing two clusters corresponding to treatment groups.Variation between treatments precited B 204 genes with significant differential expression (DE) and C 10 genes showing significant differential alternative splicing (DAS).Similarly, at the transcript level D PCA demonstrate distinct clustering of read abundances among replicates of each treatment along PC1 and PC2.Intra-treatment comparisons predicted E DE of 39 transcripts and F 12 with predicted differential transcript usage (DTU)

Fig. 5 )Fig. 5
Fig. 5 Differential expression between larval Leguminivora glycinivorella exposed to transgenic Bacillus thuringiensis (Bt) Cry1Ac Glycine max compared to a non-Bt cultivar.Significant changes between treatments defined for differential gene expression (DGE), differential alternate splicing (DAS) and differential transcript usage (DTU) as determined by 3D RNA-seq package v 1.0.0 (Guo et al. 2021).Absolute number of genes in each category are shown along with up-( ) and down-regulated ( ) genes are given for DGE estimates, and individual genes for DAS, DTU and their intersections shown in the Venn-diagram.Transcript profile plots of length-scaled transcripts per million for each isoform between Cry1Ac and non-Bt exposure treatment conditions -c); and deacetylase activities (LOC125236753)].Terms for DEGs encoding constituents of the peritrophic membrane and cuticle [peritrophin-1-like (LOC125238640); and cuticle protein LCP-17-like (LOC125230076)], although their common MF term, chitin binding (GO:0008061)] were not among those that were significantly enriched.

Fig. 6
Fig. 6 Significantly enriched Gene Ontologies (GOs) in cellular component (CC), biological process (BP) and molecular function (MF) categories at GO level 2. Z-score indicative of genes in category being more up-(positive) or down-regulated (negative value).Top four most significantly enriched GOs or GOs with greatest expression bias in each category demarcated

Table 2
Comparison of the Leguminivora glycinivorella genome assembly to those from Tortricid moths (Lepidoptera: Tortricidae).

Table 3
Significantly enriched Gene Ontologies (GOs) assigned to genes differentially expressed between 2nd instar Leguminivora glycinivorella larvae fed on Cry1Ac compared to non-Bt Glycine max among cellular component (CC), biological process (BP) and molecular function (MF) categories at GO level 2. False discovery rate (FDR) thresholds of 1.0 -4 applied CC and MF categories, and ≤ 1.0 -6 to category terms

Table 4
Candidate proteins involved in Bacillus thuringiensis (Bt) insecticidal resistance and detoxification enzyme encoded by differentially-expressed genes (DEGs) between susceptible 2nd instar Leguminivora glycinivorella larvae fed on transgenic Cry1Ac Glycine max compared to unexposed cohorts A) significantly DEGs between treatments, and B) DEGs annotated with Gene Ontology (GO) terms that were significantly enriched.Putative functional annotations provided for GO cellular component (CC), biological process (BP) and molecular function (MF) categories at GO level 2, with Pfam domain predictions and RefSeq gene descriptions indicated.Direction and magnitude of estimated Log 2 (fold-change) [Log 2 (FC)] among replicated treatment groups is shown for each significant DEG (false discovery rate ≤ 0.05)