Investigation of genetic relationships within three Miscanthus species using SNP markers identified with SLAF-seq

Background Miscanthus, which is a leading dedicated-energy grass in Europe and in parts of Asia, is expected to play a key role in the development of the future bioeconomy. However, due to its complex genetic background, it is difficult to investigate phylogenetic relationships in this genus. Here, we investigated 50 Miscanthus germplasms: 1 female parent (M. lutarioriparius), 30 candidate male parents (M. lutarioriparius, M. sinensis, and M. sacchariflorus), and 19 offspring. We used high-throughput Specific-Locus Amplified Fragment sequencing (SLAF-seq) to identify informative single nucleotide polymorphisms (SNPs) in all germplasms. Results We identified 257,889 SLAF tags, of which 87,162 were polymorphic. Each tag was 264–364 bp long. The obtained 724,773 population SNPs were used to investigate genetic relationships within three species of Miscanthus. We constructed a phylogenetic tree of the 50 germplasms using the obtained SNPs and grouped them into two clades: one clade comprised of M. sinensis alone and the other one included the offspring, M. lutarioriparius, and M. sacchariflorus. Genetic cluster analysis had revealed that M. lutarioriparius germplasm C3 was the most likely male parent of the offspring. Conclusions As a high-throughput sequencing method, SLAF-seq can be used to identify informative SNPs in Miscanthus germplasms and to rapidly characterize genetic relationships within this genus. Our results will support the development of breeding programs with the focus on utilizing Miscanthus cultivars with elite biomass- or fiber-production potential for the developing bioeconomy.

Thus, it is difficult to mine functional genes in Miscanthus which can seriously limit the utilization potential of Miscanthus [5]. Molecular markers would be useful for further investigations of Miscanthus plants because such markers have been widely used in studies of genetics, molecular population genetics, species formation, evolutionary and phylogenetic relationships [10][11][12], and molecular taxonomy [6].
First-generation molecular markers cover restriction fragment length polymorphisms (RFLPs) [7,8], random amplified polymorphic DNA (RAPD) [9,10], and amplified fragment length polymorphisms (AFLPs) [11], while second-generation molecular markers include simple sequence repeats (SSRs) [12] and inter-simple sequence repeats (ISSRs) [13]. However, these markers have several limitations: they are low throughput, inaccurate, time-consuming, labor-intensive, and costly [3]. These drawbacks have motivated the development of thirdgeneration polymorphic molecular markers, which are named as SNPs. These markers are generally widely distributed throughout the whole genome [14]. SNP markers are amenable to large-scale automated monitoring and have been instrumental in various crop breeding programs, such as the construction of genetic maps, the DNA fingerprinting of germplasm resources, the detection of molecular biodiversity, and the analysis of linkage disequilibrium [15]. This continuous development of molecular marker technology has accelerated functional gene identification and characterization in other crops, which has led to the development of varieties with improved functional traits [5]. Thus, these techniques might be useful for molecular genetic research in Miscanthus.
Although genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq) have been used extensively in Miscanthus, there are still some difficulties and challenges associated with the application of these techniques in Miscanthus [16]. For example, one obstacle in the widespread use of GBS is the difficulty in carrying out the associated bioinformatics analysis, which is typically hampered by a large number of erroneous SNP interferences that are not easy to diagnose or correct [17].
To overcome these challenges, we aimed to develop and identify SNP markers for Miscanthus using SLAFseq techniques. Moreover, reduction in the genomic complexity using specific digestion, develop markers via the high-throughput sequencing of representative libraries, and determine phylogenetic relationships using genotyping is also part of this study.
SLAF-seq uses bioinformatics methods to systematically analyze known genome sequences as well as of related species, bacterial artificial chromosome (BAC) sequences, or Fosmid sequences [18][19][20][21][22]. SLAF-seq techniques differ in several ways from GBS or RAD-seq techniques. The major differences cover: a) SLAF-seq identifies one tag about every 10 K; b) SLAF tags are uniformly distributed, which means that important chromosome segments are not missed; c) SLAF-seq avoids repetitive sequences, which makes it a cost effective technique. Moreover, such SLAF-seq utilizes deep sequencing to ensure genotyping accuracy, a pre-designed representation scheme to optimize marker efficiency and a double-barcode system for large populations [23].
Genome-wide SLAF markers and SNPs for the three Miscanthus species are generated by using SLAF-seq as a part of current study. In addition, phylogenetic relationships are estimated with these species based on the generated SNPs. The genome-wide markers for Miscanthus identified in this study will lead towards the utilization of its genetic resources to develop molecular markerassisted Miscanthus breeding programs.

Evaluation of the digestive enzymes
Enzyme digestion was performed according to set selection principles which included: a) the proportion of restriction fragment located in the repeat sequence was low; b) the fragments were evenly distributed in the control genome; c) the length of restriction fragment was appropriate for the experimental system; d) the number of SLAF tags was consistent with the expected number of tags. The pair-end digestion efficiency of EcoRV + ScaI for the control genome (Nipponbare) was 90.87%, indicating that this enzyme combination was suitable.

Analysis of the SLAF-seq and SNP data
We obtained 57.8 Mb clean sequence reads based on SLAF library construction and high-throughput sequencing. The average sequencing depth per sample was 11.76x for the female parent (sample A12), whereas 15.47x for the male parents (samples A1-A11, B1-B10, and C1-C9), and 7.85x for the progeny (samples D1-D19). The average GC content across all sequences was 41.39%. Across all sequences, the average number of bases with a quality score ≥ 30 (Q30) was 93.66%. In parallel, we obtained 7.17 Mb reads by sequencing the rice genome, which indicated that the experimental database was accurate.
Using the obtained clean sequence reads, we developed 257,889 SLAF tags, of which 87,162 were polymorphic. We also generated a map showing the distribution of SLAF tags across the Miscanthus chromosomes based on the chromosome-level of reference genome data for M. lutarioriparius [8] (Fig. 1).
In total, 724,773 highly consistent population SNPs with integrity of > 0.8 and minor allele frequency (MAF) of > 0.05 were identified across all samples in this study. A map showing the distribution of these SNPs across the Miscanthus chromosomes is presented in Fig. 2.

Genetic diversity of three Miscanthus species
Genetic diversity analyses can provide information about the origin and composition of individual lineages. In this study, we analyzed the genetic diversity of three Miscanthus species based on SNPs. First, population structure was analyzed under the assumption that the number of clusters (K) were ranged 1-10. At a minimum value of ΔK, there were four clusters, suggesting that all of our samples may have originated from four primitive ancestors. Cluster graphs showing K values of 1-10 of  , Fig. 4).
The phylogenetic analysis indicated that the investigated 50 accessions fell into three distinct groups: 1) group containing the accessions from M. sinensis (A1-A11); 2) containing the accessions from A12, C3, and all offspring; 3) this group containing the accessions from M. lutarioriparius and M. sacchariflorus except B8 and B10 (Fig. 5).
The results of phylogenetic analysis showed that the genetic variation within some populations was greater than between populations. Furthermore, population structure analysis, PCA, and phylogenetic analysis indicated that C3 was the male parent of the offspring.

Optimization of molecular markers based on SLAF-seq and SNPs in Miscanthus
Currently, supply of sufficient quantities of sustainably produced biomass with optimal quality characteristics is a major challenge in the development of biobased industry. Thus, genetic improvements to deliver high biomass yield with required quality traits can be a way forward. Biomass quality may be substantially improved by the development of genetic markers associated with quality traits [31]. SNPs, which are more abundant in the genome than any other molecular markers, are particularly useful for the analyses of genetic diversity and population structure [33]. In this study, we used SLAF-seq to efficiently identify SNP markers. Compared with other methods, such as GBS and RAD-seq, SLAF-seq is more accurate, faster, and less expensive. Moreover, SLAF-seq also reduces genome complexity [34]. Here, we obtained 257,889 SLAF tags and 724,773 SNPs, which is greater than the number of SNPs previously obtained in the Miscanthus genome using RAD-seq [35]. In addition, SLAF and SNP markers were evenly distributed on each chromosome based on the chromosome-level of the reference genome of M. lutarioriparius (Figs. 1, 2). These polymorphic molecular markers are highly discriminatory and can be used for genetic map construction and gene mapping in Miscanthus.

Phylogenic analysis based on Miscanthus SNPs
The heterozygosity and polyploidy that have accumulated in Miscanthus genomes over their long evolutionary history make it more difficult to sequence complete genomes in this genus [5]. Fortunately, the genomes of M. sinensis, M. lutarioriparius, M. sacchariflorus, and M. floridulus have been sequenced through the joint efforts of many researchers [6][7][8]. With these reference genomes, the accuracy of Miscanthus phylogenetic analyses can be further improved through SNPs. In this study, the M. lutarioriparius was selected as the reference genome. The use of the M. lutarioriparius reference genome enabled us not only to cluster the 50 samples, but also to draw a rooted phylogenetic tree (Figs. 5, 6). The phylogenetic tree in combination with the morphological characteristics indicated that the offspring had been produced by the intraspecific hybridization of M. lutarioriparius, which was also consistent with previous studies [8,11,15]. The phylogenetic analyses revealed that M. lutarioriparius is adjacent to M. sacchariflorus and had shown that the coefficient of intraspecific genetic variation in these two species was high.
It had been believed for long that the distribution of M. lutarioriparius is extremely narrow, consisting of only the middle and lower shallows of the Yangtze River in Southern China [42]. However, in recent years, latest studies had revealed that M. lutarioriparius is extremely adaptable with diverse geographic distribution and can even thrive in marginal areas such as saline alkali soils [43] and arid conditions [44]. In addition, the recorded distribution of M. lutarioriparius now includes both shallows and wetlands [45]. M. sacchariflorus is considered highly adaptable [7], thus this diverse geographic distribution of M. lutarioriparius can be explained through the genetic similarity between M. lutarioriparius and M. sacchariflorus. The latest studies had shown that M. lutarioriparius and M. sinensis diverged very recently [8]. This argument can be supported by the interspecific genetic distance between M. lutarioriparius and M. sinensis, which was even lower than that within M. lutarioriparius.

Conclusions
We obtained 724,773 SNPs using SLAF-seq technology. We successfully identified the paternal parent and obtained an intraspecific hybrid polyploid population of M. lutarioriparius. Despite the high similarity between the genomes, M. lutarioriparius is morphologically distinct from M. sacchariflorus. The polyploid M. lutarioriparius stands out as an excellent strain, which produces more biomass and is highly adaptable than any other Miscanthus species, including the diploid M. sacchariflorus. Based on the SNPs obtained for this population as a part of this study, high-density molecular marker linkage maps could be constructed. Such maps would be valuable genetic resource for the development of miscanthus based bioeconomy. Our results will also support future genetic improvements in biomass yield as well as quality traits in Miscanthus species.

DNA extraction
Fresh leaves from 50 Miscanthus individuals were collected and frozen in liquid nitrogen before being manually ground into a fine powder. Total genomic DNA was extracted following a modified cetyltrimethyl ammonium bromide (CTAB) method [46]. The leaves of Miscanthus are rich in phenols, therefore CTAB extraction solution was supplemented with 2% poly-N-vinylpyrrolidone (PVP) and 1% β-mercaptoethanol to purify the Miscanthus DNA. The concentration and quality of the extracted DNA was detected using 0.8% agarose gel electrophoresis and an ND-1000 spectrophotometer (Nano Drop), respectively.

Enzyme digestion design
To identify the most appropriate enzymes for genomic digestion, we selected the M. lutarioriparius genome as the reference genome (https:// www. nature. com/ artic les/ s41467-020-18923-6). Based on the reference genome, suitable restriction enzyme combinations were predicted for the digestion by using DNassist [47]. To assess the efficiency of the predicted enzymes, the genome of Japanese rice (Oryza sativa L. subsp. japonica) was selected as a control.

SLAF library construction and high-throughput sequencing
Using the identified restriction enzyme combination (EcoRV + ScaI), the total genomic DNA of each sample was digested. After adding an A-tail to the 3′ end of each digested fragment and ligating the fragment to the Dualindex sequencing adapter [48], each fragment was amplified through PCR. The amplicons were purified, mixed, and cut. The digested fragments were chosen as target segments. The libraries were selected and sequenced using an Illumina HiSeq TM 2500 platform (Biomarker Technologies Corporation), with a read length of 2 × 100 bp. The obtained sequenced reads were mapped based on the M. lutarioriparius reference genome for subsequent mutation analysis. To assess the accuracy of SLAF library construction, the Japanese rice genome was again used as a control.

Development of SLAF tags and SNP markers
The Dual-index tags were used to classify the raw sequencing data by sample. Sequences reads from the same locus were grouped using similarity clustering [49].
In general, only high-depth fragments were selected in each cluster group, whereas low-depth segments were removed. Here, we first calculated the SLAF tags for each sample independently, and then all single-sample SLAF tags were clustered to derive population-wide SLAF tags. The positions of clean reads on the reference genome were compared, the sequencing depth of each sample was counted, and variations were detected. Each sequence was aligned with the reference sequence using bwa [47]. We used GATK [48] and SAMtools [49] to identify SNPs. The SNPs, identified by both methods were considered reliable. Of these reliable SNPs, those with integrity > 0.8 and MAF > 0.05 were considered highly consistent and were used for subsequent analyses.

Genetic relationships among samples
We used admixture software [46] to determine the population structure of the 50 Miscanthus germplasms. We also performed principal components analysis (PCA) of the germplasms using cluster software [50]. PCAs was performed for linear transformations of variables to create orthogonal axes ordered by the proportion of variance explained [51]. A rooted phylogenetic tree based on our SNP data for these 50 germplasms was constructed using the neighbor-joining (NJ) method [52] in MEGA 5.0 software [53]. The phylogenetic tree based on our samples was constructed using the Kimura 2-parameter model with 1000 bootstrap replicates [54,55].