ddRAD sequencing: an emerging technology added to the biosecurity toolbox for tracing the origin of brown marmorated stink bug, Halyomorpha halys (Hemiptera: Pentatomidae)

Background Brown marmorated stink bug (BMSB), Halyomorpha halys (Hemiptera: Pentatomidae) is native to East Asia but has invaded many countries in the world. BMSB is a polyphagous insect pest and causes significant economic losses to agriculture worldwide. Knowledge on the genetic diversity among BMSB populations is scarce but is essential to understand the patterns of colonization and invasion history of local populations. Efforts have been made to assess the genetic diversity of BMSB using partial mitochondrial DNA sequences but genetic divergence on mitochondria is not high enough to precisely accurately identify and distinguish various BMSB populations. Therefore, in this study, we applied a ddRAD (double digest restriction-site associated DNA) sequencing approach to ascertain the genetic diversity of BMSB populations collected from 12 countries (2 native and 10 invaded) across four continents with the ultimate aim to trace the origin of BMSBs intercepted during border inspections and post-border surveillance. Result A total of 1775 high confidence single nucleotide polymorphisms (SNPs) were identified from ddRAD sequencing data collected from 389 adult BMSB individuals. Principal component analysis (PCA) of the identified SNPs indicated the existence of two main distinct genetic clusters representing individuals sampled from regions where BMSB is native to, China and Japan, respectively, and one broad cluster comprised individuals sampled from countries which have been invaded by BMSB. The population genetic structure analysis further discriminated the genetic diversity among the BMSB populations at a higher resolution and distinguished them into five potential genetic clusters. Conclusion The study revealed hidden genetic diversity among the studied BMSB populations across the continents. The BMSB populations from Japan were genetically distant from the other studied populations. Similarly, the BMSB populations from China were also genetically differentiated from the Japanese and other populations. Further genetic structure analysis revealed the presence of at least three genetic clusters of BMSB in the invaded countries, possibly originating via multiple invasions. Furthermore, this study has produced novel set of SNP markers to enhance the knowledge of genetic diversity among BMSB populations and demonstrates the potential to trace the origin of BMSB individuals for future invasion events. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07678-z.


(Continued from previous page)
Conclusion: The study revealed hidden genetic diversity among the studied BMSB populations across the continents. The BMSB populations from Japan were genetically distant from the other studied populations. Similarly, the BMSB populations from China were also genetically differentiated from the Japanese and other populations. Further genetic structure analysis revealed the presence of at least three genetic clusters of BMSB in the invaded countries, possibly originating via multiple invasions. Furthermore, this study has produced novel set of SNP markers to enhance the knowledge of genetic diversity among BMSB populations and demonstrates the potential to trace the origin of BMSB individuals for future invasion events.
In the past decades, BMSB has invaded and established in a range of countries irrespective of the environmental conditions [2,4,[10][11][12][13][14][15][16][17]19]. Adaptive evolutionary changes and/or ecological adaptation in a new region has made this pest a successful global invader and its recent invasion history can shed light on that. However, in-depth genetic information of BMSB at the population level is scarce. Such information on the genetic diversity of BMSB can enhance our understanding of their population structure and global invasion history. This could also assist in constructing a global genetic population structure of BMSB and develop a potential strategy to trace the country of origin for BMSB individuals intercepted at the border or in post-border scenarios in biosecurity settings.
BMSB is a serious pest for agriculture and horticulture and can be a social nuisance. As agricultural exports play a significant role in New Zealand's Gross Domestic Product, the establishment of the pest would be highly detrimental to the country. BMSB has increasingly been intercepted at the New Zealand border. Since is first intercepted in 2005 [20], the frequency of interceptions have been increasing due to the rise of international travelling and trade [21]. There have been 2009 recorded interceptions of BMSB since 2005 at the New Zealand border (up to November 2020) [20]. Therefore, it is important to study the genetic structure and composition of BMSB populations to assist in tracing its origin and predicting the potential invasive pathways.
To date, nearly all published studies for tracing the origin of BMSB utilized PCR based molecular methods and focused on small regions on mitochondrial DNA (mtDNA), such as the COI (Cytochrome c oxidase I) and/or COII (Cytochrome c oxidase II) genes [16,19,[22][23][24]. mtDNA is highly variable between species and can potentially provide sufficient resolution to identify genetic differences between species [25]. Since mtDNA is inherited maternally and lacks recombination, the resolution of mitochondria-derived genetic divergence is generally not sufficient to differentiate between individuals in a population [26]. Therefore, there is the need to study the genome-wide, high-resolution markers among BMSB populations from their native and invaded regions. The study will be able to discern genetically distinct populations thus allowing us to trace the geographical origin of BMSBs within an interception scenario. This calls for an innovative method to explore the genetic diversity within BMSB populations on a genome-wide scale.
The detection of different genetic markers is crucial for studying genetic diversity. Recently, a high-throughput sequencing-based method (HTS) has replaced traditional gel-based experiment to discover genetic markers [27]. RADseq (Restriction-site Associated DNA Sequencing), is often applied for genome-wide SNP (Single Nucleotide Polymorphism) identification in large genomes because of its relatively low cost and high-throughput [28]. The RADseq technique utilises one (or more) restriction enzyme(s) to digest the whole genome into short genomic fragments that are then subjected to high-throughput DNA sequencing [28]. Restriction site-associated DNA markers provide a well-established basis for population genetics, as they are sensitive to both SNPs and insertion or deletion events (indels) in genomes [29]. So far, RADseq has been widely used in population genetic studies for many taxa including plants [30], and animals [28]. Double digest Restriction-site Associated DNA (ddRAD) sequencing uses two restriction enzymes to allow greater control of the genomic regions sampled for sequencing and more reproducible recovery of sequenced regions [31]. Therefore, in this study, we applied ddRAD sequencing (ddRADseq) to explore the genetic diversity among BMSB specimens collected from 41 populations across 12 countries.

Results
EcoR I-Msp I restriction enzyme pair was suitable for ddRAD sequencing To select the most suitable restriction enzyme (RE) pairs for digesting BMSB genomes, in silico test using 15 combinations of REs against the BMSB genome scaffolds were conducted. The simulation revealed that more than 100 K fragments produced from most of the RE pairs selected except the pairs, MseI-MluCI, MspI-PstI and EcoRI-PstI (Table 1). Since the genome used for the test was consisted of scaffolds instead of a complete genome, the simulation results might not reflect the real situation. A pilot ddRADseq in vitro experiment was conducted with genomic DNA samples derived from two BMSB individuals (one male and one female). Of the 15 pairs of REs used for the in silico test, nine different pairs of REs were selected for ddRADseq. After the HiSeq run, approximately 2 Gb of raw RADseq sequencing data were generated for each individual. The EcoRI-MspI restriction enzyme pair recovered the highest number of genetic variances (i.e. high quality SNPs) after highly stringent SNP quality control (QC) filtering, thus was selected it as the most suitable pair of restriction enzymes for digesting the BMSB genome via ddRAD sequencing (Table 1, Additional file 1).

ddRAD sequencing statistics and SNPs estimation
In total, 399 ddRAD sequencing datasets were obtained from the BMSB individuals, which yielded a total of 3.6 billion raw paired end reads (2 × 150 bp) (min: 4 million, max: 40 million and median: 7.6 million paired end reads per sample). On average, 9 million raw paired end reads were generated for each individual. The 3′ end adaptors of raw reads were trimmed and low quality reads were discarded. Using quality-trimming of the sequence data, 387,629 SNPs were estimated from 399 BMSB individuals. A highly stringent QC criterion was applied for filtering the SNPs, and only those loci that were shared by all the individuals were retained. This resulted with 1775 high confidence biallelic SNPs from 389 individuals. Further analysis showed that the 1775 SNPs were distributed in 484 scaffolds and 1-20 SNPs were detected in each of those scaffolds with average 3.7 SNPs per scaffold (Additional file 2). The 1775 SNPs were used for the subsequent analysis of genomic diversity and population structure.

Genetic clusters were observed among the BMSB populations
At least three genetic clusters comprising China, Japan, and the invaded countries (Austria, Chile, Georgia, Hungary, Italy, Romania, Serbia, Slovenia, Turkey, and the USA) were revealed by Principal Component Analysis (PCA) using the SNP data generated from 389 BMSB individuals (Fig. 1). All BMSB individuals from Japan formed an isolated cluster, whereas BMSBs collected from the invaded countries were genetically closer to those of China. Analysis using 484 representative SNPs (one from each scaffold) produced similar result (Additional file 3).

Individuals from the same geographical region were genetically linked
To further emphasise the outcome of genetic clustering pattern via principal component analysis, minimum spanning networks (MSN) were constructed using the SNPs profile of each individual, and genetic variability was visualised among the population lineages (Fig. 2). The MSN showed that all the individuals from China were genetically linked together in the network, which also applies to the individuals from Japan (Fig. 2). There was a genetic divergence among the BMSB individuals from native regions of China and Japan, while those of invaded countries were more closely related in the network. One individual from Chile was found in the same clade of the Chinese samples, suggesting that this BMSB

Genetic distance between native populations of China and Japan was relatively higher
Population genetic divergence in the form of pairwise F ST revealed significant (p < 0.05) genetic differences (except for that between China and Serbia) among 12 geographical groups or countries, with F ST value ranging from 0.0006 between BMSB populations from Hungary and Serbia, to 0.2084 between BMSB populations from Japan and Romania (Table 2). We also observed that the genetic distance between native populations of China and Japan was moderately higher (F ST Table 2). A Neighbour-net tree constructed using the F ST pairwise values among the individuals from the 12 countries revealed the similar relationships among the BMSB populations from the 12 countries (Fig. 4). The tree depicted the overall relationships of the populations and showed that Chinese and Japanese populations were clustered together, but genetically different. The populations from the invaded countries were genetically linked, but the populations from Romina formed a long branch, indicating the genetic separation from those of the other countries studied (Fig. 4). It also demonstrated that the BMSB from the adjoining countries, i.e. Turkey/Georgia, Austria/Slovenia, Hungary/ Serbia, are more closely related with each other and are likely from the same origin (Fig. 4).

Five genetic clusters exist in the BMSB populations
Furthermore, insights into the BMSB genetic diversity were unravelled by population genetic structure analysis using fastSTRUCTURE. This analysis expanded the results of PCA ( Fig. 1) and provided more in-depth clustering for the BMSB populations from the invaded countries. This analysis predicted the presence of at least The colour represents the country where the individuals were collected from. X axis represents the variance explained by PC2 (10.3%), and Y axis represents the variance explained by PC1 (28.7%). The figure was created using R package ggplot [32] three genetic clusters within the BMSB-invaded countries (Fig. 5). The first cluster comprises of populations from the USA, Italy, Chile, Turkey, Georgia, and Hungary (Cluster 1), the second one is formed by Romania (Cluster 2), and the third one is formed by Austria, Serbia and Slovenia (Cluster 3). The BMSB populations from China (Cluster 4) and Japan (Cluster 5) were clearly separated from the invasive populations ( Fig. 5). Fst analysis of the five genetic clusters (Table 3) showed that the genetic distance was moderately higher (Fst > 0.05) among China, Japan, Cluster 2 and Cluster 1, and was lower (Fst < 0.05) between China and Cluster 3 ( Table 3). The AMOVA (Analysis of molecular variance) of genetic distance for the samples from the 12 countries allowed a partitioning of three levels ( Table 4). The proportion of variation attributable to within country differences was 90.35% while they were only 7.09 and 2.56% occurred among clusters and among countries within the clusters, respectively. The genetic differences among and within the cluster and countries were significant (p < 0·05). Therefore, the results indicate that the individuals from one country are more genetically different within them than that the difference they have with the other countries. geographical groups comprising Austria, Chile, China, Georgia, Hungary, Italy, Japan, Romania, Serbia, Slovenia, Turkey, and the USA. Each node represents an individual specimen and the edge indicates the genetic distance (dissimilarity: fast pairwise distances) between the individuals. The colour in each circle represents the countries where the samples were collected from. The figure was created using R package poppr [33] The heterozygosity analysis was showed that the Observed Heterozygosity (H o ) and the Expected Heterozygosity (H e ) for all the countries are not very high, around 0.2 (Additional file 7). The H o of Japan is smaller than the H e , suggesting that the populations in this country is under inbreeding (isolation). Conversely the H o is bigger than the H e in the other countries, indicating that an isolate-breaking effect is happening, and interbreeding is occurring among those populations.  Figure 3a showed the overall BMSB invasive pathways while the Fig. 3b is the enlarged map for the European countries. The figure was created using Tableau based on the results from the SNPs data

Discussion
To the best of our knowledge, this is the most comprehensive population genomic study so far to unravel the genetic diversity and population structure of BMSB. The study utilised ddRAD sequencing to enhance the knowledge of global BMSB genetic diversity and invasion history. We identified a suitable restriction enzyme pair for genomic digestion of BMSB genome for ddRAD sequencing study, which will be useful in future applications. The ddRAD data were analysed using a combination of approaches, including principal component analysis (PCA), phylogenetic analysis and population structure analysis to elucidate the population structure and genetic diversity among the BMSB populations. The present study unambiguously proved that the BMSB populations in the two native regions of China and Japan were genetically distinct. Many BMSB populations from the invaded countries were genetically closer to those of China. Conversely, the Japanese BMSB populations were isolated and showed genetically less similar to those from the invaded countries. Overall, this study has provided a remarkable resolution in unravelling the

Genetics isolation between the BMSB populations from China and Japan
The PCA and MSN analyses showed that the Japanese BMSB populations were isolated from the other populations studied here. The fastSTRUCTURE analysis showed that population structures of the BMSB samples from China and Japan were derived predominantly from different genetic components (Fig. 5). These results suggested that there is genetic isolation for the BMSB populations of the two countries. Xu et al. 2014 [22] also observed genetic divergence between the BMSB populations from China and Japan via haplotype analysis of mitochondrial COII and 12S/CR gene regions. One possible explanation for such phenomenon could be the geographical distance between the isolated populations. The width of Tsushima Strait is from 41.6 km to 222 km which is the distance between mainland China and Japan [35], and the average non-stop flight distance of BMSB is less than 1 km [36]. Therefore, active spread of BMSB individuals from China to Japan is unlikely. We cannot rule out the possibility of the BMSB hitchhiker capability, however this study suggest that there is limited migration among BMSB populations between the two countries. It has been shown that northern Japan or western Korea were potentially the origin of BMSB based on climate matching models [23], but Xu et al. 2014 disagreed with this conclusion and their results implied north China as the ancestral clades. They also demonstrated that China and Japan populations were predicted as one population during the glacier stage [22]. Therefore, future studies by sampling more BMSB samples from wider regions will be required to draw a reliable conclusion.   (Fig. 3), but Japan. The results are consistent with the findings of a previous study by Valentin et al. 2017 [37]. Moreover, there were large genetic differences (0.15 < F ST < 0.25) between the BMSB populations in Japan and the invaded countries of Chile, Georgia, Italy, Romania, Turkey, and the USA (Table 2). Therefore, we can conclude that the BMSB populations in the six invaded countries might not have originated from Japanese populations. The smallest F ST values of 0.0006 between the BMSB populations from Hungary and Serbia, the neighbouring countries, imply they were possibly from the same origin. The Fst analysis also indicated that the BMSB populations from Turkey and Georgia (Table 2) were more likely shared the same source and there were gene flow among them. It is also possible for the BMSB populations from Austria and Slovenia (Table 2). Moreover, the PCA (Fig. 1, Additional files 3 and 4) indicated that the established populations in Chile, Hungary, Georgia, Italy, Turkey and the USA belonged to one genetic cluster. The population structure analysis using fastSTRUCTURE supported this conclusion that those populations had similar genetic components (Fig.  5). The neutrality test (Fu's Fs) in our recent study [38] for these populations was negative (p < 0.05), indicating that these established populations have been undergoing population expansion stage [39].
One individual BMSB sample from the Italy cohort was genetically closely related to the Chinese population (Fig. 2), suggesting the possible recent arrival of that BMSB from China (Figs. 2 and 3). The rest of the BMSB samples from Italy were more closely related to the USA and other European populations, suggesting that multiple BMSB invasions have occurred in Italy in the past [16,40]. Such phenomenon was also observed in Chile [17]. Most of the BMSB specimens from Chile, except one individual, were genetically close to each other (Figs. 2 and 3), implying that there were at least two separate invasions of BMSB in Chile.

Predication of the BMSB genetic cluster
The population structure analysis of BMSB indicated that there were at least five genetic clusters identified among the BMSB populations studied here. Besides the genetic clusters of native BMSB populations from China and Japan, there were three genetic clusters in the BMSB-invaded countires. The analysis showed that the BMSB populations from the USA, Georgia, Turkey, Italy, Hungary, and Chile were more likely belong to one genetic cluster due to their similar genetic structure. Since first detected in the USA in 1996 [41], BMSB has invaded many neighbouring countries. In Europe, BMSB was first recorded in Switzerland in 2004 [14,21,42], then spread to many countries such as Germany, France, Italy, Georgia and Hungary [43][44][45]. Therefore, based on the timeline of reports, BMSB invasion might have started in the USA (1996) [2], then Europe (2004) and Chile (2017) [17], though a more transparent history of invasion remains unknown and needs further investigation.
Another interesting phenomenon is that the genetic structures between BMSB populations originating from Slovenia, Austria, and Serbia and the Chinese populations were more genetically related than those between the Chinese and the other European populations (Fig. 5). This suggests that BMSB found in these three European countries could have recently been invaded directly from China (Figs. 3 and 5) rather than from one of their neighbouring countries in Europe. In saying that, the population genetic structure analysis also showed that the BMSB populations from Slovenia, Austria, and Serbia contained the components of the Europeestablished BMSB populations (Fig. 5). This analysis indicated that there was inbreeding between the BMSB individuals from the adjoined established European BMSB populations and the Chinese populations (Figs. 3  and 5). This result indicated that these BMSB populations were still under early stage of invasion, which supports the findings of haplotype analysis using COI and COII of BMSB samples from the same populations in our previous work [38].
BMSB was first reported in Romania in 2015 [10] and have spread around the country [46]. The F ST values between the BMSB samples from Romania and the two  native countries, China, and Japan, were 0.1333 and 0.2084, respectively, which were larger than the F ST values from other invaded populations. The population structure analysis showed that Romanian BMSB samples had a unique genetic structure, which indicates that this group could have been invaded from a place of origin that was not sampled in this study. Therefore, this study further emphasised the importance of sampling. Overall, this study covered a wide range of areas including wide BMSB sampling from the native regions and the invaded countries, although we still missed samples from some countries, such as Korea, Greece, Canada and Switzerland. Therefore, further study of ddRADseq of BMSB samples from those countries will provide more insights into genetic diversity of the BMSB populations and facilitate in inferring a more in-depth invasion pathway.

Novel genetic analysis for the BMSB populations from the invade countries
This study also provided novel genetic information on the BMSB populations for some of the European countries, such as Austria, Serbia, Slovenia, and Georgia.
Since BMSB was detected in Austria [47] and Serbia [48] in 2015 and Slovenia [49] in 2017 and Turkey [50] in 2019, to the best of our knowledge, no genetic studies have been reported on those populations, and this study shows the first effort to characterise the BMSB populations for these countries. The results showed that the BMSB populations from Turkey were more likely the next generation of the BMSB from the USA whereas those populations from Austria, Serbia and Slovenia were the hybrids of Chinese and USA populations (Fig.  5). The results obtained here will be useful for future monitoring of the pest and assessing the potential pathways of invasion in those countries.
In comparison with the mitochondrial based (i.e. COI and COII) haplotypes analyses, here we show that ddRADseq can provide a higher resolution in identifying and characterising different BMSB genetic populations. As an alternative, analysis of genome-wide SNPs via whole genome sequencing can be applied for deeper understanding the BMSB genome and genetic diversity. However, it is still more expensive due to the large size of the BMSB genome (~1 Gbp) [51]. Therefore, cost effective approach such as the RAD-based approach with restriction enzymes (ddRAD) used in this study can provide a higher resolution assessment of genomic diversity and population structure among BMSB populations. Taken together, we recommend applying genome-wide SNP markers-based study using ddRAD to explore the genomic diversity in insects and other eukaryotic organisms in future applications. Most importantly, in biosecurity settings, our study will help to trace the geographic origin of the BMSB samples, irrespective of their life stages, and sex, that are intercepted at border. The advancement and findings resulted from our study will not only help in claiming the BMSB country freedom status to New Zealand but also will greatly assist in the decision making during an incursion and response events.

Conclusions
This study demonstrates that the restriction enzymes pair of EcoRI and MspI is suitable for generating restriction-associated DNA fragments for identifying SNPs within the BMSB genome through ddRAD sequencing. Via population genomic analyses using ddRADseq data, we detected remarkable genetic diversity among different BMSB populations studied in both the native and invaded countries. We believe that this study could also assist in studying the invasion history of BMSB populations and tracing the potential geographical origin of unknown BMSB intercepted at the border or in post-border scenarios in biosecurity settings using the population structure model. Thus, ddRAD sequencing going to be an invaluable emerging technology added into the biosecurity toolbox for tracing the geographic origin of insects at the border/post-border in the near future.

BMSB specimen collection and DNA extraction
BMSB samples were collected from 41 regions in 12 countries, namely Austria, Chile, China, Georgia, Hungary, Italy, Japan, Romania, Serbia, Slovenia, Turkey, and the USA (Table 5, Additional file 5). The fieldcollected samples were preserved in plastic vials containing 95% ethanol and stored at − 20°C until DNA extraction could be performed. Total genomic DNA was extracted from each individual using QIAGEN DNeasy® Blood & Tissue Kit with QIAGEN RNase A treatment (Qiagen, Valencia, CA, USA). Briefly, head and thorax or abdomen was ground using sterile plastic pestle in 1.5 mL tube with ATL buffer with Proteinase K and RNase A added, and then incubated at 56°C overnight. The DNA extraction followed the manufacturer's instructions and the DNA was eluted in 150 μl AE buffer. DNA quality was assessed using NanoDrop™ spectrophotometer (ThermoFisher Scientific, USA) and quantified using QuantiFluor™ system (ThermoFisher Scientific, USA). It was followed by an assessment for DNA shearing on a 1% agarose gel against 1 Kb Plus DNA ladder (Invitro-gen™, CA, USA) in TAE buffer stained with SYBR safe (Life Technologies, CA, USA) and visualised using a Gel Doc Software system (BioRad, Hercules, CA, USA).

Selection of restriction enzymes in silico for RAD sequencing
The selection of restriction enzyme (RE) is the crucial step for experimental ddRADseq analysis and discovery of the SNPs across the genome for genetic diversity study, thus initial in silico tests on the RE used for the library was conducted. To select the best RE, a simulation based on the BMSB reference genome (submitted to NCBI GenBank as Hhal_2.0 by the i5k Initiative on 15 December 2017) and 15 combinations of nine enzymes (AvaII, EcoRI, MSeI, MspI, NlaIII, PstI, SbfI, SphI, MluCI). A Python script (Additional file 6) was developed to calculate the number of the potential sheared segments that could be generated by in silico digestion of the reference genome using different combination of restriction enzymes.

Selection of restriction enzymes in vitro: RAD sequencing and bioinformatics analysis
Nine pairs of enzymes which generated higher numbers of segments in in silico analysis or had been used in other published studies [31] were selected. To further confirm the suitability of the RE pairs, two DNA samples were used for ddRADseq and 18 ddRADSeq libraries (nine pairs of enzymes for each sample) were prepared using Illumina® TruSeq DNA Nano following the manufacturer's instructions and ToBo lab ezRAD Protocol-v3.2 [52]. Before sequencing, all ddRADSeq libraries were assessed using a DNF-474 High Sensitivity NGS fragment kit (Advanced Analytical Technologies, Inc.) on a Fragment Analyzer™ Automated CE System (Agilent, California, USA) for quality control. The sequencing was conducted by Annoroad Gene Technology Co., Ltd. (Beijing, China) on a HiSeq X Ten sequencing platform (paired end, 2 × 150 bp). The 3'end adaptors of raw reads were trimmed using AdaptorRemoval v2 [53] and reads with phread quality score of less than 20 and length of less than 50 bp were discarded using an inhouse python script (Additional file 6). Then, qualitytrimmed sequence datasets, were mapped to the BMSB reference genome, Hhal_2.0, using Borows-Wheeler Aligner (BWA) version 0.7.17 [54] in default setting and files with the mapping information (i.e. in SAM format) were converted to BAM format, sorted and indexed using SAMtools version 1.9 [55] before identifying the SNPs. SNPs were called using GATK v4.1 [56].  Sequencing data pre-processing and quality control of ddRAD data To pre-process the raw sequencing data generated via ddRAD, a similar pipeline and data quality filtering approach was applied as described in the section of "Selection of restriction enzymes in vitro: RAD sequencing and bioinformatics analysis". To call SNPs from ddRADseq data, the raw fastq files were mapped to the scaffold of the reference genome, Hhal_2.0 (RefSeq assembly accession: GCF_000696795.2), to produce a mapping file (SAM format) using BWA v 0.7.17 [54]. The SAM mapping file was converted into binary format (BAM), sorted and indexed to increase calculation speed for further analysis using Samtools v1.9 [55]. SNPs were called from BAM files using GATK4 [56]. The quality control parameter used here was the same as that used for the selection of restriction enzymes above. In addition, the obtained SNPs were further filtered using SNP call rate of 1 and MAF (minor allele frequency) of 0.03 using Plink 1.9 [57]. As a result, 10 samples were discarded due to lower SNP call rate.

Population genomic analysis
Using the SNPs identified from the 389 BMSB individuals, we conducted principal component analysis (PCA), neighbour-net tree and population structure analysis using fastSTRUCTURE to elucidate the population structure and genetic diversity among the BMSB populations. PCA was conducted based on 1775 high quality SNPs obtained from the 389 individuals using Plink 1.9 [57] (parameters used: --allow-extra-chr --pca) and plotted using R package ggplot [32]. The k-mean was 2 as elbow point. To further explore the genetic relatedness among the BMSB individuals, a Minimum Spanning Network (MSN) based on the SNP results of each individual was also reconstructed using R package poppr [33].
To test for the presence of population structure, pairwise F ST values were generated from the ddRAD sequence data by combining the individual data from the same country as one geographic group. The analyses were implemented in Arlequin 3.5 [58] by converting the SNP profile (in Plink format) to Arlequin format using PGDSpider 2.1.1.5 [59]. The Fst and p values were obtained using 110 permutations in Arlequin 3.5. To visualize the genetic relationships, the obtained population average pairwise F ST values were further used to construct a neighbour-net method tree using SplitsTree 4 [34].
The AMOVA calculated the variance components and their statistical significance levels for variations among and within the 12 geographic countries and the five genetic clusters. The AMOVA was conducted by using the distance matrix under 1000 premutation in Arlequin 3.5. To further test the genetic variation, the heterozygosity analysis was preformed using GenAlEx 6.5 [60,61]. The genotype information of all sample was stored into a excel with GenAlEx 6.5 required format [60,61]. All calculations were conducted based on the default setting.
To provide additional insight into the genetic variation and population differences, population genetic structure analysis was conducted. The population genetic structure was inferred using the SNPs profile of individual samples. In this process, fastSTRUCTURE 1.0 [62] was used to conduct model-based clustering of all individuals. The fastSTRUCTURE 1.0 was run with the default convergence criterion of 10 − 6 , a simple prior, and ten replicate runs per dataset. The best K value (i.e. the number of populations or clusters that the samples are best divided into) was determined as 5 using a python script, chooseK.py within the fastSTRUCTURE software.
A roadmap of the BMSB invasive pathway was created using Tableau [63] by inputting a excel file with known BMSB background information. Then, the dotted lines and arrow were added in the map using Mac Preview 11.0.