Selection of functional EPHB2 genotype in ENU mutated grass carp response to GCRV by the method of BSA Sequence Analysis

Background N-ethyl-N-nitrosourea (ENU) mutagenesis is a useful method for genetic development of plants, as well as for inducing functional mutants in animal models including mice and zebrash. GCRV (Grass carp Reovirus) is the hemorrhage disease of grass carp and causes noteworthy loss of ngerlings in the last few years. To overcome this problem, we have used ENU mutant grass carp might be a helpful to nd out some resistant functional gene for future hereditary rearing projects in grass carp. Results In the present study, ENU-mutated grass carp were used to identify genetic markers correlated with anti hemorrhagic disease caused by grass carp reovirus (GCRV). The BSA (Bulked segregant analysis) technology were performed on two homozygous gynogenetic ENU grass carp groups with susceptible or resistant in response of GCRV. Our results showed that 466,162 SNPs and 197644 InDel were found out in the mixed pool. A total of 170 genes were annotated in the associated region, including 49 genes with non-synonymous mutations at SNP sites and 25 genes with frame shift mutations at InDel sites. Among these 170 mutated genes, 5 randomly selected immune-related genes expression was higher in resistant group as compared to susceptible. Furthermore, we found one immune-related gene EPHB2 has conrmed two heterozygous SNP mutations which responded to GCRV disease. The SNPs within gene EPHB2 were found out in the intron region with position 5859 (5859 G>A ) and 5968 (5968 G>A ) were signicantly (p = 0.002, 0.003) associated with the resistance against GCRV. These SNP sites were conrmed to correlate with the GCRV-resistant phenotype of ENU grass carp. The SNPs associated with GCRV resistance are useful in selecting GCRV-resistant genes for genetic breeding in grass carp. Conclusion Our results proved the possibility of BSA-sequence analysis in detecting genes responsible for the exciting phenotypes and will help in carrying out the marker-assisted selection, especially for disease resistance in the response of GCRV. Our results proved the possibility of BSA-sequence analysis in detecting genes responsible for the exciting phenotypes and will help in carrying out the marker-assisted selection for GCRV disease resistance. EPHB2 expression was higher in kidney, liver and gills. Two SNPs found in the intron region of the EPHB2 gene were signicantly associated with GCRV resistance. Additionally (cid:0) EPHB2 is involved in immune response that may suppress virus replication and reduces acute inammatory responses to protect cells. All these data suggest that EPHB2 may be an important gene for GCRV resistance. The SNPs associated with GCRV resistance could be applied to marker-assisted selection for GCRV resistance in mutant grass carp.


Background
Genetic breeding of aquaculturally sh species primarily depends on the conclusion of natural mutants with high performance, and then betterquality strains can be formed by hybridization, genetic selection or marker-assisted breeding methods [1]. Thus far, more than 200 improved strains of aquaculturally sh species have been shaped in China, in which 139 classes were produced through genetic selection. Therefore, chemical mutagenesis is an effective way to make new mutants for upcoming genetic development in aquaculture species [2]. N-ethyl-Nnitrosourea (ENU) is a chemical mutagen that acts as an alkylating operator, exchanging its ethyl gather to nucleophilic nitrogen or oxygen locales on deoxyribonucleotides, directing to base inconsistency while DNA replication [3] [4]. ENU mutagenesis has been demonstrated to be effective for inciting point mutation in the genome of the grass carp [5].
Grass carp reovirus (GCRV) belongs to the family Reoviridae and it has genus Aquareovirus. In 1972, Hemorrhagic disease of grass carp was rst time revealed in a sh farm of Hubei Province [6]. GCRV was categorized as an aqua reovirus and in 1984, which contains 11 parts of double stranded RNA. The genus Aquareovirus is assigned A to G (AQRV-A to AQRV-G), and GCRV be appropriate to AQRV-C [7]. GCRV is the hemorrhage disease of grass carp and causes noteworthy loss of ngerlings in the last few years.
A genotyping-by-sequencing method, it permits the genome widespread association studies, bulked segregant analysis (BSA) and genomic selection, has previously been familiarized to breeding programs. In current times, because of reducing the prices of genome sequencing have loosened the opportunity for the whole genome sequencing and resequencing the higher pools of individuals. BSA was proposed to quickly distinguish markers connected to speci c characteristics of intrigued such as disease resistance [8]. Their approach includes a segregating F2 population produced from a starting cross between two phenotypically different parents, which is at that point scored for a phenotype of intrigued. Bulked DNA or RNA tests are built from organism that appears differentiating phenotypes. BSA has been primarily utilized in crop species either for the recognizable proof of huge impact QTL, such as disease resistance genes or for mapping subjective mutation [9][10][8] [11].
The objective of the current study was to select the functional genotype in ENU sh response to GCRV by the method of BSA sequence analysis. Thus, it will be helpful in carrying out the marker-assisted selection, especially for disease resistance in the response of GCRV.

SNPs and indels annotation results
After filteration, the Clean Reads data was generated up to 73.47 Gbp, average Q30 reached 93.92%, and the average GC% was up to 38.11% ( Table 1). The average genome coverage depth of each sample was 41.00X, and the genome coverage was about 99.11% (at least 1×) (Table  1), indicating the high quality of the sequencing data. The assembled genome size of grass carp is 900.50 Mb, GC content is 37.42%, the genome is Scaffold level [12]. Venn diagram showed that different number of SNP and inDel between sample S01(suspected) and R02 (resistant) like in Fig.1, 200,270 SNPs were speci ed in the S01 and 173,531 SNPs were speci cally found out in R02. The center overlapping portion of the Venn diagram represented the total number of identi ed SNPs 2,189,327 and inDel 759,064 results ( Fig. 1).
We detected a total number of 466,162, 197,644, 8,492, and 3,668 polymorphic sites in different regions like SNP annotation results, InDel annotation results, SNP and inDel annotation results in candidate region respectively (Table 2). Based on the annotation, the SNPs were randomly distributed throughout the genome (Table 2). InDel frameshift mutation type was found out in 50 genes, 7 genes gured out in codon_insertion and 6 genes found out in codon_deletion (Table 2). Synonymous and non-synonymous codon had a higher number of genes that were found out in the SNP annotated region (7251, 9298) ( Table 2).
Association Analysis (SNP, InDel) and ED method correlation results Before the association analysis, rst SNP was ltered, nally obtained 2,313,014 high-quality trusted SNP sites. Before using InDel for correlation analysis, same like SNP, rst InDel was ltered, and nally got 775,981 high-quality trusted InDel sites (Table 3). Through this, 1197 Scaffold/contig sequences with signi cantly enriched associated SNP sites were selected. Some results are shown in the table (Additional le 1). The same analysis method (ED method) as the SNP association analysis was used, and 85 Scaffold/contig sequences that were signi cantly enriched in association InDel sites were nally screened (Additional le 2). Take the intersection of the results obtained by these two association analysis methods (SNP & InDel along with ED), a total of 21 Scaffolds related to traits were obtained (Table 4).

Gene annotation and functional results
A total of 170 genes were annotated in the candidate region, among which 49 were non-synonymous mutant genes and 25 frameshift mutation genes were annotated in the mixed pool (Table 5, Additional le 3). Total 55 genes in the genome were annotated and classi ed into biological processes, cellular components, and molecular functions (Fig.2, Additional le 4, 5 & 6). 11 genes were found out with non-synonymous mutation and 2 genes were found out with frameshift mutation in GO enrichment analysis. The annotated KEGG databases showed that 29 SNP were found among top 20 pathways as shown in Fig.3. After multiple-testing corrections, we selected the pathways with Q values ≤ 0.05 as signi cantly enriched among the genes (Fig.4). Our data suggest that these genes may play an important in innate immune response to GCRV in ENU grass crap.
Gene expression results associated with grass carp reovirus (GCRV) resistance The relative expression levels of the ve genes signi cantly associated with grass carp reovirus (GCRV) resistance were detected through qPCR (Fig.5). We selected 5 genes with genetic variation to examine their mRNA expression levels in liver, kidney and gill tissue. All these ve genes involved in the in ammatory response, cell proliferation, anti-apoptosis, tumor suppressor and immune response to viral infection pathways. Results showed that mRNA expression level of SAMD9L, BNIP3L and EPHB2 was highly signi cant expressed in resistant group than susceptible group in response of GCRV (p<0.01), APPL2 expression level was higher in liver and kidney as compared to gill tissue in case of resistant group. Kidney-NLRP12 gene expression level was signi cantly highly expressed in resistant group as compared to infected group after GCRV infection (p<0.01) (Fig. 5).
Veri cation of SNPs associated with grass carp reovirus (GCRV) resistance  (Table 6). Both SNPs were located in the intron region of EPHB2 as shown in Fig.6. SNPs in EPHB2 were found out in 2 position which has been mentioned with chromosomal position 5859 (5859 G>A ), 5968 (5968 G>A ) determine G to A mutation in S01/R02.
EPHB2 gene with con rmed SNPs has been showed in Table 7 in case of resistant (S01) and susceptible (R02) group in response of GCRV. In the rst SNP, the allele frequencies of A and G in survived sh (resistant) were 24% and 76%, correspondingly. For dead sh (susceptible), the allele frequencies of A and G f were 66% and 34%, respectively. In the other SNP, the allele frequencies A and G in survived sh were 18% and 82%whereas in dead sh were 62% and 38%, respectively ( Table 7). The chi-squared test showed that allele frequencies were signi cantly different between dead and survived individuals (p = 0.002 and p = 0.003 for respective SNPs) ( Table 7), signifying that EPHB2 was signi cantly associated with GCRV disease resistance.

Discussion
Up to the present time, association analysis is still the key method for the mining of SNPs for disease resistance genes in marine animals. Previous work has been reported that SNP frequencies diverse among different populations in numerous species [13][14] [15]. However, according to our knowledge, there is limited information on immune related SNP in relation to the response of GCRV in grass carp. Therefore, the currently study focused on the selection of functional genes and its SNPs related to GCRV resistance of ENU grass carp, this could help to improve our understanding on GCRV resistance and might help to select disease-resistant strains of ENU grass carp culture.
Cultured grass carp are highly vulnerable to diseases which lead to yield reduction, antibiotics and drugs have been used to control the problem but the regular use of these alternatives showed side effects to animal and environment. Therefore, disease-resistant grass carp are highly desirable as they can greatly increase sh yields [5]. Due to low occurrence of natural mutation in grass carp, chemical mutagenesis can be useful for increasing genetic mutations [2]. In this study, homozygous gynogenetic ENU grass carp which has the advantages of strong disease resistance and rapid growth was used and its functional genotype was selected by using BSA sequence analysis after challenged to GCRV.
In this study, we used BSA technology to determine SNP mutation within the genic region associated with GCRV resistance in homozygous gynogenetic ENU grass carp. Our ndings veri ed the viability of BSA approach and the extensive information of SNP gene for disease resistance traits as a fast and affordable method for marker development. Zhang  . In our analysis, we selected 5 genes with genetic variation to examine their mRNA expression levels in liver, kidney and gill tissue. Results showed that mRNA expression level of SAMD9L and EPHB2 in all tissues was higher in resistant group than susceptible group in response of GCRV, while BNIP3L and APPL2 expression level was higher in liver and kidney compared to gill tissue of resistant group. Kidney-NLRP12 gene expression level was higher in resistant group as compared to susceptible group after GCRV infection. The results of expression analysis of ve genes in resistant and susceptible group pinpoint that these gene may be partially responsible for GCRV resistance.
In recent years, studies have found out that SNPs in intron and intergenic also play a great role in the changes of traits. It was reported that, studied in this study, we only found SNPs in EPHB2 gene as explained above, and this gene was highly expressed in resistance group than in susceptible group. The modi cation and higher expression of this gene indicate that this it related to the resistance of GCRV. This gene functioned as a tumor suppressor and also had a role in immune cell enhancement against GCRV disease in ENU grass carp. Our results prove that EPHB2 is involved in immune response that can repress virus replication and attenuates acute in ammatory responses to protect cells.

Conclusions
Our results proved the possibility of BSA-sequence analysis in detecting genes responsible for the exciting phenotypes and will help in carrying out the marker-assisted selection for GCRV disease resistance. EPHB2 expression was higher in kidney, liver and gills. Two SNPs found in the intron region of the EPHB2 gene were signi cantly associated with GCRV resistance. Additionally EPHB2 is involved in immune response that may suppress virus replication and reduces acute in ammatory responses to protect cells. All these data suggest that EPHB2 may be an important gene for GCRV resistance. The SNPs associated with GCRV resistance could be applied to marker-assisted selection for GCRV resistance in mutant grass carp.

Materials And Methods
Experimental Design: ENU mutant grass carp (meiotic gynogenetic offsprings induced by UV inactivated heterologous sperm from Megalobrama amblycephala) were obtained from the Bream Genetics and Breeding Center of Shanghai Ocean University, Shanghai, China. After on arrival in the laboratory, sh were maintained in the laboratory at 28 ± 0.5 °C for at least 7 days prior to experimental use and feed them well to make healthier. A total of 60 shes were used. The average weight of the sh was 4.4-6.0g. The trials were conducted in aerated glass aquariums (120 x 40 x 30 cm) each containing 100 L of water. After the acclimatization period, all the shes were intraperitoneally inoculated with 20 μl/g of GCRV-873 strain.
After 14 days of the challenge, all survived sh were collected as a safe sh from which 30 shes were chosen arbitrarily as a resistant group.
Fish sampling: In this way, mutant grass carp were divided into two group. 30 shes were selected as susceptible/ infected/morbid group (S01) and leftover 30 shes which were selected as resistant group (R02) to GCRV. About fourteen days of the injection later, liver tissue collected as sample and transferred it for BSA analysis.

Sequencing analysis:
Genomic DNA was taken out utilizing conventional phenol-chloroform extraction strategy in combination with RNase treatment and put away at −20°C until utilize. Two bulks were produced by pooling an equal quantity of DNA from a susceptible (S01) and a resistant group (R02). DNA from each bulk was utilized to build paired-end (PE) sequencing libraries, which were sequenced on an Illumina HiSeq (Illumina Casava 1.8 version). Whole experimental procedure is prepared according to the standard convention was given by Illumina, including sample testing, library construction, library-quality testing, and computer sequencing and Sequence read length was 150 bp (Biomarker technology, Beijing, China).
After evacuating connector and low-quality reads, the clean reads were advance rechecked for quality utilizing FASTQC. High quality pairedend reads were mapped to the grass carp reference genome sequence (PRJEB5920) [12] using the BWA program with default constraints [31].
Position of Clean Reads on the reference genome was found out by comparing the data such as the sequencing depth and genome coverage of each test, and then mutation location was achieved. The assessment results of the sequencing output information of each sample, the comparison results of the samples, the average coverage depth of each sample, and the genome coverage proportion comparison to each depth among S01 and R02 group can be seen in the following Table 1.

SNP and InDel detection:
SnpEff is a software for annotating mutations (SNP, Small InDel) and predicting the effects of mutations [32]. The detection of SNP is mainly implemented using GATK software toolkit [33]. According to the positioning results of Clean Reads in the reference genome, Picard (http://sourceforge.net/projects/picard/) was used to perform preprocessing such as Mark duplicates. GATK was used for Local realignment and base recalibration to ensure detection.
InDel represents single base insertion and deletion. The insertion loss of the sample was detected using GATK. Small InDel variation is generally less than SNP variation, which also re ects the difference between the sample and the reference genome, and InDel in the coding region will cause frameshift mutations, resulting in changes in gene function.
Euclidean distance calculation: The Euclidean Distance (ED) algorithm is a method that uses sequencing data to nd markers that have signi cant differences between pools and also evaluated the SNP, inDel association analysis [8]. Theoretically, two mixed pools constructed by the BSA project have differences in the target trait-related sites, and other sites tend to be consistent, so the ED value of non-target sites should tend to 0. The calculation formula of the ED method is shown below. The larger the ED value, the greater the difference between the mark and the two mixing tanks.
where each letter (A, C, G, T) corresponds to the frequency of its corresponding DNA nucleotide in the mutation and wild type pool or bulk respectively.
Functional annotation of genes containing SNPs: The genes having SNPs correlated with resistance/susceptibility to GCRV were annotated using NCBI non-redundant database by BLAST software to perform in-depth annotation of multiple databases such as (NR [non-redundant protein database, NCBI], Swiss-Prot Gene expression and SNP veri cation analysis: The qPCR was carried out on CFX96 Touch™Real-Time PCR System, using SYBR Premix Ex Taq kit (TaKaRa, Japan). All primers used in the study were designed by software Primer Premier 5.0 and are listed in Table 8. The comparative expression value of the designated gene vs. 18s rRNA (reference gene to normalize expression levels between sample) was calculated using the 2 −∆∆Ct method. Reactions of SYBR Green were performed in a 20 µL volume containing 10 µL of 2 × SYBR ® Green Realtime PCR Master Mix (Toyobo, Osaka, Japan), 1 µL of each forward and reverse primer (10 µM), 7 µL of water, and 1 µL of diluted cDNA (100 ng/µL). All experiments were performed in two groups. We identi ed ve functional genes as disease-causing mutations in response to GCRV in mutant grass carp.
For SNP veri cation, again we injected the virus (GCRV) in 100 ENU grass carp and divided the shes into 50 resistant and 50 susceptible groups to nd out their allelic frequency as we can see in the Table 8. Both SNPs were located in the intron region of EPHB2. Using these two SNPs, we investigated the genotype number in 50 dead (susceptible and 50 survived(resistant) sh after the GCRV challenge to detect whether there was a statistically association between genotype and resistance, using p-values of chi-square test. A subsequent p-value 0.05 0r less was measured statistically signi cantly.

Statistical Analysis:
The statistical results (expressed as mean ± standard deviation) were analyzed by one-way analysis of variance, followed by Dunnett's test for multiple comparisons using IBM SPSS Statistics 22 software. p < 0.01, p < 0.05 was considered to be statistically signi cant. All experiments were repeated at least three times.       Clustering map of GO in candidate region: The abscissa is the content of each category of GO, the left of the ordinate is the percentage of the number of genes, and the right is the number of genes. This gure shows the gene classi cation of each secondary function of GO in the context of all genes in the associated region.  genes expression analysis showed that mRNA expression level of SAMD9L, BNIP3L and EPHB2 was highly signi cant expressed in resistant group than infected group in response of GCRV (p <0.01), APPL2 expression level was higher in liver and kidney as compared to gill tissue in case of resistant group. Kidney-NLRP12 gene expression level was signi cantly highly expressed in resistant group as compared to infected group after GCRV infection (p <0.01). The statistical results (expressed as mean ± standard deviation) were analyzed by one-way analysis of variance, followed by Dunnett's test for multiple comparisons using IBM SPSS Statistics 22 software. p < 0.01, p < 0.05 was considered to be statistically signi cant.