Identification of genomic regions associated with multi-silique trait in Brassica napus

Background Although rapeseed (Brassica napus L.) mutant forming multiple siliques was morphologically described and considered to increase the silique number per plant, an important agronomic trait in this crop, the molecular mechanism underlying this beneficial trait remains unclear. Here, we combined bulked-segregant analysis (BSA) and whole genome re-sequencing (WGR) to map the genomic regions responsible for the multi-silique trait using two pools of DNA from the near-isogenic lines (NILs) zws-ms (multi-silique) and zws-217 (single-silique). We used the Euclidean Distance (ED) to identify genomic regions associated with this trait based on both SNPs and InDels. We also conducted transcriptome sequencing to identify differentially expressed genes (DEGs) between zws-ms and zws-217. Results Genetic analysis using the ED algorithm identified three SNP- and two InDel-associated regions for the multi-silique trait. Two highly overlapped parts of the SNP- and InDel-associated regions were identified as important intersecting regions, which are located on chromosomes A09 and C08, respectively, including 2044 genes in 10.20-MB length totally. Transcriptome sequencing revealed 129 DEGs between zws-ms and zws-217 in buds, including 39 DEGs located in the two abovementioned associated regions. We identified candidate genes involved in multi-silique formation in rapeseed based on the results of functional annotation. Conclusions This study identified the genomic regions and candidate genes related to the multi-silique trait in rapeseed. Electronic supplementary material The online version of this article (10.1186/s12864-019-5675-4) contains supplementary material, which is available to authorized users.

Brassica crops are important sources of edible oil, vegetables, industrial erucic acid, animal feed (seed meal), and ornamental plants. Like most Brassicaceae family members, Brassica species have monoclinous flowers, with one pistil and six stamens per normal flower.
Morphological variation in siliques (pods) has been reported in several Brassica species. In the 1940s, Pathak [12] reported an abnormal plant of Brassica campestris L. var. sarson prain, which had one to two extra pods telescoped into the outermost pods. The inner pods was basically normal but were contorted and smaller than those of the wild type. Multi-locular Brassica juncea [13], trilocular B. juncea [14,15] and Brassica rapa [16] were obtained from interspecific hybridization. Unlike previously reported fruit variations, zws-ms flower has three completely independent and apocarpous pistils and nine to ten stamens sharing the same receptacle in each floral organ, resulting in the formation of three independent siliques sharing the same carpopodium. Thus, zws-ms has an increased number of pod on each carpopodium rather than an increased number of loculi per pod. Similar phenotypes have also been observed previously: Guan and Li [17] obtained a line with double siliques through treatment of B. napus with 12 C heavy ion beam irradiation. Hu et al. [18] identified a multi-pistil plant in a B. campestris L. cytoplasmic male sterile line, different from our zws-ms plants which were obtained by crossing of B. napus and B. rapa [19]. Though the agronomic and physiological traits, and genetic mechanism of zws-ms plants have been analyzed, the molecular mechanism underlying multi-pistil formation, which increased the silique number per plant in rapeseed, remains elusive.
Bulked segregant analysis (BSA) [20] is a simple but effective strategy for identifying molecular markers linked to target genes by genotyping a pair of bulked DNA samples from two populations with distinct phenotypes. BSA was usually combined with traditional markers, such as simple sequence repeat (SSR) [21], sequence related amplified polymorphism (SRAP) [22] or random amplified polymorphic DNA (RAPD) [23] markers to map traits. Recent years, combination of BSA and next-generation sequencing (NGS) have been successfully used in the identification of key genes/loci related to regular agronomic traits in B. napus. For example, Wang et al. combined BSA with NGS to map quantitative trait loci (QTL) for the branch angle and revealed BnaYUCCA6 was candidate gene controlling branch angle [24]. Geng et al. combined specific length amplified fragment sequencing (SLAF-seq) with BSA and identified four candidate genes related to seed weight [25]. Yao et al. mapped the orange petal color gene (Bnpc1) in spring B. napus to a 151-kb region via BSA with whole genome re-sequencing (WGR) [26]. Hence, we believed that combining BSA and WGR would be a convenient way to map the genomic regions responsible for the multi-silique trait.
In this study, we performed association analysis based on SNP and InDel data using a Euclidean Distance (ED) algorithm to identify the gene underlying the multi-silique trait in rapeseed. Though the SNP-index method is also a classical method for association region identification, and has been employed in many studies [25,26]. However, it is suitable in the case when four DNA samples are sequenced: two parental plants and two extreme pools from segregated population. In this study, only two pools from NILs multi-silique pool and single-silique pool were sequenced, therefore, only ED algorithm was useful in this situation. We also performed transcriptome sequencing to identify differentially expressed genes (DEGs). Based on functional annotation of the genes with SNPs or InDels and DEGs in the associated regions, we identified candidate genes that might be closely related to the multi-silique trait.

Results
Morphological analysis of the multi-silique trait in zws-ms Under normal growth conditions, the NILs (Fig. 1) zws-ms (multi-silique) and zws-217 (single silique) showed no phenotypic differences (e.g., plant height, leaf shape, and number of branches) at the vegetative stage. However, zws-ms had shorter and rounder buds than zws-217 ( Fig. 2a). At the full-bloom stage, zws-ms showed distinct floral structures with zws-217. In contrast to normal Brassicaceae flowers, which typically contain a single pistil and six stamens (four tetradynamous stamens and two shorter stamens), zws-ms flowers had three completely independent apocarpous pistils sharing the same receptacle in each floral organ (Fig. 2b). The zws-ms flower had nine to ten stamens, six of which were more prominent than the others (Fig. 2b). In zws-ms flowers, all pistils and stamens shared the same receptacle, but the petals and calyxes appeared normal. Roughly 32~53% of flowers in a zws-ms plant had this mutant multi-pistil and stamen configuration, whereas the rest flowers in the same plant were normal. The abnormal zws-ms flowers developed into three independent siliques sharing the same carpopodium as the flowers (Fig. 2c). The triple siliques were normal in length, but occasionally one became degenerated and curved.

Association analysis
High-throughput whole genome re-sequencing on the Illumina HiSeq platform produced a total of 89.56 Gb of raw data. The Q30 ratio averaged 90.52%. After alignment to the reference genome, the average coverage was 91.78% and the average sequencing depth was 40×. When the clean reads were mapped to the B. napus reference genome, each of them was aligned in both forward and reverse directions; thus, the number of total reads was twice of the clean reads: zws-ms and zws-217 contained 328,507,566 and 263,588,948 reads, with 89.89 and 89.10% properly mapped to the reference genome, respectively.
After filtration, 1,135,690 SNPs and 289,832 small InDels of high quality (HQ) were used for association analysis. We used the ED algorithm due to only two pools from NILs zws-ms and zws-217 were sequenced. According to the ED algorithm [27], the association threshold was 0.26. Based on this information, three SNP-associated regions were identified, including one on chromosome A09 (Fig. 3a) and two on chromosome C08 (Fig. 3b). One smaller associated region on the chromosome C08 that close to another larger one (Fig. 3b) was also detected. These regions had a size of 10.45 MB and contained 2095 genes. Similarly, two InDel-associated regions were identified on chromosomes A09 (Fig. 3c) and C08 (Fig. 3d) totaling 10.20 MB and containing 2044 genes, with an association threshold value of 0.30. Combining the SNP-associated regions and InDel-associated regions, two merged intersecting regions (Table 1, Additional file 1: Figure S1) were obtained on chromosomes A09 and C08, with 2044 genes in this 10.20 MB region, including 126 genes with non-synonymous mutations and 23 with frame-shift mutations.

Transcriptome sequencing, data mapping, and annotation
We performed sequencing saturation and cluster analysis of the samples to ensure the validity of the data. In total, 61 Gb of raw data were generated, with an average Q30 value of 90.54%. For each sample, approximately Fig. 1 Scheme for constructing the multi-silique population (zws-ms) and the corresponding single-silique NIL population (zws-217) Fig. 2 Morphological differences between zws-217and zws-ms. a: at the budding stage, zws-ms has inflated, round buds; b: at the full-bloom stage, a normal flower has one pistil and six stamens, while zws-ms has three pistils and nine stamens per floral organ; c: a normal silique in zws-217 and the three-silique trait in zws-ms. These images were taken originally in 2017  Table S1). The average proportion of total reads mapped to the reference genome of samples T01 to T06 was 73.729% (Additional file 3:  Table S2), ranging from 77.81 to 76.03%.
KEGG pathway enrichment analysis revealed the four most significantly enriched pathways involving the DEGs: stilbenoid, diarylheptanoid and gingerol biosynthesis (ko00945), limonene and pinene degradation (ko00903), glycosyl phosphatidyl inositol (GPI)-anchor biosynthesis (ko00563), and isoquinoline alkaloid biosynthesis (ko00950) (Additional file 10: Table S9). These pathways were further classified into five major groups: cellular processes, environmental information processing, genetic information processing, metabolism and organismal   (Table 2) were located in the two associated regions mentioned above. Thirteen genes were located on chromosome A09, while 26 were located on chromosome C08. These 39 genes were clustered into two groups based on expression level (Fig. 6, Table 2): the first group contained 14 genes that were upregulated in multi-silique zws-ms plants, while the second group contained 25 genes that were downregulated in these plants.

Validation of transcriptome sequencing data by qPCR
We validated the transcriptome sequencing data by comparing the relative transcript levels of these DEGs in zws-ms and zws-217 by qPCR. qPCR analysis of plants in buds showed that all genes except BnaC08g39120D had similar trends in expression as those observed by transcriptome sequencing, especially BnaA09g45320D and BnaC08g40410D (Figs. 6 and 7). These results further confirm the reliability of the transcriptome sequencing data described above. Gene ontology (GO) terms of the genes from the intersecting regions. A total of 2041 genes were divided into three categories: biological processes, cellular components, and molecular functions Orthologs of these 12 candidate genes in Arabidopsis These 12 candidate genes mentioned above, including seven with non-synonymous mutations or frame-shift mutations, as well as the five DEGs, were then aligned with Arabidopsis and their orthologs (Table 3) in this well-studied model plant were identified based on The Arabidopsis Information Resource (TAIR, https://www. arabidopsis.org): (1) AT2G22370 (ortholog of BnaA09 g42700D), identified as MED18, encodes a subunit of the mediator complex that affects flowering time and floral organ formation through FLOWERING LOCUS C (FLC) and AGAMOUS (AG); (2) AT2G21750, identified as EGG CELL 1.3 (EC1.3), encodes a small cysteine-rich protein secreted by the egg cell during gamete interactions; (3) AT2G21540 is an SEC14-LIKE 3 (ATSFH3) protein; (4) AT1G19350 (BZR2 or BES1) encodes brassinosteroid signalling protein that accumulates in the nucleus as dephosphorylated form in response to brassinosteroid. (5) BnaA09g45320D and BnaC08g39130D share the same ortholog, AT1G14980, known as CPN10, encoding mitochondrial-localized chaperonin 10; (6) AT1G08260, known as TIL1 gene, encodes the catalytic subunit of DNA polymerase ε (abo4-1); (7) AT2G20570 (GPRI1 or GLK1) encodes one of a pair of partially redundant nuclear transcription factors that regulate chloroplast development in a cell-autonomous manner; (8) AT1G 11910 (APA1) encodes an aspartic proteinase that forms a heterodimer and is stable over a broad pH range; (9) AT1G11870 is identified as OVA7 (SRS), of which expression products Seryl-tRNA synthetase targeted to chloroplasts and mitochondria; (10) AT1G09970 (RLK7) belongs to a leucine-rich repeat class of receptor-likekinase (LRR-RLKs); (11) AT1G 16710 (HAC12) encodes an enzyme with histone acetyltransferase activity that can use both H3 and H4 histones as substrates.

Discussion
In the current study, we constructed NILs to establish the two extreme DNA pools (multi-silique pool and single-silique pool), instead of the frequently used F 2 [28][29][30], doubled haploid (DH) [25,31], or recombinant inbred line (RIL) [32,33] populations. BSA [20] combined with NGS [34], which has been successfully used in various crops [29,30,32,35,36]. Here, the WGR produced abundant, high-quality data. After filtration, 1,135,690 HQ SNPs and 289,832 InDels were identified and used to conduct association analysis. These markers were clearly superior to traditional markers in terms of number, efficiency, and quality: as we expected, these markers were much denser than SSR or other markers and the combination of BSA and WGR saved the labors and shortened the whole process. As we were unable to use the SNP-index method in this study, we used the ED algorithm instead, followed by the identification of three SNP-associated regions, including one on chromosome A09 and two on chromosome C08. The two regions on chromosome C08 were very close to each other (only 0.14 Mb apart), and the smaller region was only 0.07 Mb long, suggesting that it is likely a minor site. Moreover, we identified two InDel-associated regions. Taken together, the SNP-associated region data were highly consistent in terms of position with those of the InDel-associated regions, confirming the reliability of both data sets. Therefore, the two intersecting regions are considered to be highly associated with the multi-silique trait, representing the only difference between the zws-ms and zws-217 pools. Preliminary analysis indicated that these two regions contain 2044 genes, including 126 with non-synonymous mutations and 23 with frame-shift mutations.
Since these genes were identified by WGR, they had small differences in sequence, which consequently led to changes in the amino acid sequences, structures, and final functions of the proteins. Since zws-ms and zws-217 are NILs that only differ from each other in the multi−/single-silique trait, we reasoned that these genes are associated with this trait and were therefore selected as candidate genes. However, some genes might have differed between the two pools in terms expression levels rather than nucleotide sequence. Therefore, we performed transcriptome sequencing from buds in plants to uncover the DEGs between lines, which we then confirmed by qPCR. We identified 129 DEGs between zws-ms and zws-217, which are distributed on all chromosomes except chromosome A06. Among the 129 DEGs, 39 are located in the two important trait-associated regions identified in this study and are therefore considered highly related to the multi-silique trait. These DEGs were classified into two groups based on expression level, including one containing upregulated genes and the other containing downregulated genes.
A comprehensive understanding of the functions of the 2044 genes in the two intersecting regions is crucial for candidate gene selection. Unlike standard agronomic traits (such as plant height and flowering time), which are relatively well-studied and for which some candidate genes are available for reference, the multi-silique trait analyzed in this study has not been studied on genomic or molecular level in detail to date. Thus, GO annotation  and KEGG pathway analysis provided important clues for candidate gene identification. Among the 2044 genes within the two associated regions, 2041 were successfully annotated. KEGG pathway analysis provided a general view of the genes in these regions. Most enriched KEGG pathways were involved in the biosynthesis or degradation of some secondary metabolites. As we were unable to relate this information to the multi-silique trait using established knowledge, we conducted a GO analysis. Eight of these DEGs were annotated in the category "negative regulation of flower development (GO: 0009910)" and 221 genes were involved in "organ development (GO: 0048513)". This information is too general to reveal an association with flower/silique development. In addition, most genes within the two regions were the same between the two DNA pools. Thus, more attention should be paid to genes showing polymorphism between zws-ms and zws-217 plants, specifically genes harboring non-synonymous mutations or frame-shift mutations. Furthermore, 104 of the 126 genes with non-synonymous mutations and 19 of the 23 genes with frame-shift mutations were represented in the GO database. We examined each of their annotations and identified seven interesting genes, which were annotated to ovule, carpel or stamen development. The NILs zws-ms and zws-217 differ only in terms of flower and silique morphology and structure. Therefore, genes that differ in zws-ms versus zws-217 at the genomic level are thought to encode diverse proteins, suggesting they are excellent candidate genes for the multi-silique trait.
At the transcriptome level, we focused on DEGs, especially those located in the associated regions mentioned above. Thirty-nine out of 129 DEGs were found in the two intersecting regions, among which five genes were annotated to the ovule or stamen development. This annotation information strongly suggests that these DEGs are involved in the multi-silique trait, as zws-ms flowers Then, we aligned these selected genes to the Arabidopsis, screening for homologous genes sharing the highest sequence identity with them, but did not identify more potential candidate genes other than these 12 genes already mentioned (BnaA09g43100D, BnaC08g38270D, BnaC08g36330D, BnaA09g44210D, BnaA09g48960D, BnaA09g42920D and BnaA09g42700D with non-synonymous mutations or frame-shift mutations, as well as BnaA09g45320D, BnaC08g39130D, BnaC08g41780D, BnaC08g42450D and BnaC08g41720D with different expression levels). Based on analysis of these orthologs in Arabidopsis and their roles in plants, the annotated information of these candidate genes are confirmed: (1) BnaA09g42700D is found much shorter than its Arabidopsis ortholog AT2G22370 (MED18) in length, but in their overlapping region, they share 100% identity of nucleotide sequence. Mutation of MED18 in Arabidopsis significantly delayed its flowering time [37]. FLC encodes an MADS-box protein and is a repressor for flowering [38]; AG is a floral meristem identity gene, which specifies carpel and stamen identity in the flower [39,40]. Thus, BnaA09g42700D could probably act on flowering indirectly. (2) Disruption of AT1G11870 results in ovule abortion in Arabidopsis [41]. So the different expression of BnaC08g41780D between zws-ms and zws-217 may result some unknown process in fruit development. (3) The knockdown of HAC1 causes disturbances in flower morphology in the Arabidopsis, such as a modified petal shape or even absence of petals [42]. HAC12 has high identity with HAC1, but what morphology changes it can make are still unknown. (4) SEC14-LIKE 3 (ATSFH3) is predominantly expressed in the flowers and involved in the transfer of phosphatidylinositol or phosphatidylcholine phospholipids during flower development in Arabidopsis [43,44]. In chickpea (Cicer arietinum L.), SEC14-LIKE3 is also considered a potential candidate flowering time-regulating genes by mapping [45]. (5) EC1.3 triggers sperm cell activation during double fertilization and knockdown of it reduces seed number in siliques [46]. (6) The brassinosteroid signal is found involved in ovule initiation and development, by regulating the expression level of BZR1, of which dephosphorylation increases ovule and seed number in Arabidopsis [47]. In this study, BnaA09g44210D has non-synonymous mutation between zws-ms and zws-217, probably causes multi-silique trait by some undiscovered pathways. (7) ATGLK1 (GPRI1) can regulate the silique number in Arabidopsis: the glk1/glk2 double mutant realized only about 1/3 of the mean silique number per plant found in the wild type, due to changes in silique wall and leaf photosynthesis [48]. (8) The Arabidopsis mutant til1-4/til1-4 has altered floral phyllotaxis, reduced ovule number, abnormally developed ovules, and reduced fertility [49]. (9) APA1: In Cynara cardunculus, the aspartic proteinase cardosin B is found playing an important role in ovule function [50]. (10) RLK7: Stamen development depends on a teamwork of receptorlike kinases [51,52]. (11) CPN10 is involved in response to many physiological process, such as seed abortion in plants [53]. Although no report directly leads to multi-silique trait so far, these orthologs are all related to flower development, flower morphology change, fruit development and even the number of ovules/seeds. It is due to the scarcity of the multi-silique trait to some extent.
Moreover, this multi-silique trait is theoretically more complex, since it is controlled by multi-loci. Anyhow, these clues validated the 12 candidate genes.
In summary, the unique multi-silique rapeseed zwsms, examined in this study represents an enriched rapeseed germplasm resource, which is considered the basis of crop breeding. Thus far, 12 candidate genes in two genomic regions associated with the multi-silique trait have been identified, including seven genes with sequence changes and five genes with expression level differences between the multi-silique and single-silique NILs. This information lays the foundation for future research, such as candidate genes isolation, their functional verification and so on. Several unanswered questions remain. A previous study suggested that the multi-silique is likely controlled by three allelic genes [19], but in the current study, we only identified two associated regions, which harbor no more than two independent genes. This finding suggests that one more gene/factor remains unidentified. We also noted that the multi-silique trait is highly sensitive to the environment. For example, when zws-ms plants were planted in Xindu (altitude of 472 m, 30°47′10″N 104°12′12″E), the multi-silique trait was continuously stable for years; but when they were grown in Ma'erkang (altitude of 2540 m, 31°94′15″N 102°14′29″E), the multi-silique trait disappeared, and all plants had normal siliques (Additional file 12: Fig. S2). Interestingly, once we transferred zws-ms seeds harvested in Ma'erkang back to Xindu, the multi-silique trait was recovered. By contrast, zws-217 maintained the single-silique phenotype at both locations.
Thus, there might be some epigenetic reason for this trait, such as methylation of a base that does not alter the DNA sequence of this underlying locus but is strongly affected by environmental factors. We noticed that there were several genes in the environmental response category in the associated regions with SNP/ InDel variations between the two DNA pools: BnaA 09g45600D was annotated as "cellular response to water deprivation (GO: 0042631)" and BnaA09g44900D as "response to cold (GO: 0009409)". Moreover, several DEGs were also annotated to environment response, like BnaA09g45320D was also annotated as "response to cold" and so on. Since these genes are responsive to environmental factors, such as cold, water, and light intensity, their functions vary when their sequences or expression levels change. Thus, they may have different effects under different environmental conditions. Furthermore, Xindu is located in the Sichuan Basin, which has a humid subtropical monsoon climate (with an annual average temperature of 16.2°C), whereas Ma'erkang is located in a mountainous area in western Sichuan, where the annual average temperature is only 8.6°C. The environmental factor with the greatest difference between the two locations is temperature. Thus, we propose that temperature is the most important environmental factor affecting the multi-silique trait in zws-ms, although other factors such as light intensity are also likely involved in regulating this trait.

Conclusions
A novel trait in rapeseed, multi-siliques, was investigated at the genome level by association analysis based on WGR, followed by analysis at the transcriptome level. The two regions associated with this trait contain seven genes with non-synonymous mutations or frame-shift mutations annotated to floral organ-related GO terms, as well as five other vital DEGs. These genes are interesting candidate genes for the multi-silique trait.

Plant materials and population development
The original multi-silique rapeseed line was discovered at the Crop Research Institute, Sichuan Academy of Agricultural Sciences from a population derived from a cross between B. napus and B. rapa. Multi-silique plants were grown and self-pollinated for six successive generations to obtain the homozygous multi-silique zws-ms plants. Single-silique offspring were continuously backcrossed for six generations with zws-ms as the recurrent parent, followed by six continuous generations of self-pollination to obtain the near-isogenic line zws-217 with the single-silique phenotype (Fig. 1). The NILs zws-217 and zws-ms were grown in the field under standard conditions at the Sichuan Academy of Agricultural Sciences in the Xindu District of Sichuan Province, China.

DNA preparation and pool construction
The rate of multi-silique formation per zws-ms plant was determined by phenotyping at the full-bloom stage (BBCH 67). The multi-silique rate per plant was calculated as: multi−silique rate ¼ number of three−pistiled flowers per plant total number of flowers in this plant Thirty plants with high multi-silique rates were selected for DNA isolation, while 30 zws-217 plants were randomly selected. Fresh leaves were sampled from these plants (zws-ms and zws-217) and subjected to DNA extraction by the cetyltrimethyl ammonium bromide (CTAB) method [54]. RNA contamination was removed from each sample using RNase A. The DNA was quantified using a NanoDrop 2000 Spectrophotometer (Thermo Scientific, USA). Samples with optical density 260/280 ratios ranging from 1.8 to 2.2 were considered adequate. Equal quantities of DNA were pooled by line (multi-silique and single-silique), with 30 samples per line, to a final concentration of 40 ng/μl.

Whole genome re-sequencing
Pooled DNA samples were sheared into~350 bp fragments using a Covaris S2/E210 DNA Shearing kit. The sheared DNA was end-repaired and a single nucleotide (A) overhang was subsequently added to the repaired fragments by adding a Klenow DNA polymerase Fragment (3′ → 5′ exo-) (New England Biolabs, Ipswich, MA, USA) and dATP at 37°C. Barcodes and Illumina sequencing adapters were ligated to the A-tailed fragments using T4 DNA Ligase (Takara, Dalian, China). PCR was performed on both pooled samples using diluted prepared (sheared and ligated) DNA, deoxyribonucleotide triphosphate, Q5® High-Fidelity DNA Polymerase, and PCR primers. The PCR products were purified using Agencourt AMPure XP Beads (Beckman Coulter, High Wycombe, UK). Fragments 300 to 500 bp in length (including barcodes and adaptors) were excised and purified using a QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany). Gel-purified products were diluted for pair-end sequencing (150 bp on each end) on an Illumina HiSeq system following the standard protocol (Illumina, Inc.; San Diego, CA, USA) at the Biomarker Technologies Corporation (Beijing, China). The sequencing depth was roughly 30 × .

SNP calling
Low-quality reads with quality scores <20e were filtered out and raw reads were sorted according to their barcode sequences. After the barcodes were trimmed, clean HQ reads were mapped to the Brassica_napus.v4.1.fa genome (http://www.genoscope.cns.fr/brassicanapus/data/) using the Burrows-Wheeler Aligner software [55]. SAMtools [56] was used to mark duplicates, and GATK [57] was used for local realignment and base recalibration. An SNP set was formed by combining the results of analysis with GATK and SAMtools via SNP calling using default parameters. SNPs identified between the two lines were regarded as polymorphic for association analysis.

Association analysis
Euclidean distance (ED) association analysis is a method that calculates ED values (quadratic sum root of differences between bulks from the depth of four types of bases) and is satisfied by ED. In theory, the higher the ED value is, the closer the object site [25]. The ED values were calculated as follows [27]: In this formula, A ms , C ms , G ms , and T ms represent the depth of bases A, C, T, and G on a site in the multi-silique pool, and A 217 , C 217 , G 217 , and T 217 represent the depth of bases A, C, T, and G on a site in the single-silique pool, respectively. To remove background noise, the original ED 5 value was used, and the adjusted values were fit using the DISTANCE method. Regions over a threshold value were considered to be trait-related candidate regions. The associated threshold value was determined based on ED + 3SD (standard deviation) [27]. InDel-associated regions were obtained via a similar method.

RNA isolation for transcriptome analysis
Three individual plants of the zws-217 (T04, T05, and T06) line were randomly selected for RNA isolation. Eight to ten buds (BBCH 57) per plant were randomly selected and sampled. For the zws-ms, three plants (T01, T02, and T03) were selected, and buds were sampled from each plant. All buds were quick-frozen and stored in liquid nitrogen. Before RNA isolation, the buds from three zws-ms plants were sliced with tweezers and observed. Only buds containing three pistils were considered to be correct multi-silique buds and used for RNA isolation. Buds from each plant were mixed to isolate the RNA, yielding three samples from zws-217 (T04, T05, and T06) and three samples from zws-ms (T01, T02, and T03). Since zws-217 and zws-ms were NILs, and these plants differ from each other only in the multi-silique trait, we reasoned that DEGs between these lines might be related to this trait. Total RNA was isolated using an RNA Isolation Kit (Tiangen, Beijing, China) and the concentration measured using a Nano-Drop 2000 (Thermo). RNA integrity was assessed using an RNA Nano 6000 Assay Kit and the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA).

RNA library preparation and transcriptome sequencing
For each tissue sample, 1 μg RNA was used as input material for RNA sample preparation. Sequencing libraries were generated using a NEBNext Ultra™ RNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's instructions with index codes added to associate sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5×). First-strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase. Second-strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. The remaining overhangs were converted into blunt ends using exonuclease/polymerase. After adenylation of the 3′ ends of DNA fragments, a NEBNext Adaptor with a hairpin loop structure was ligated to prepare for hybridization. To select 240 bp cDNA fragments, the library fragments were purified using the AMPure XP system (Beckman Coulter, Danvers, MA USA). The size-selected, adaptor-ligated cDNA was incubated with 3 μl USER Enzyme (NEB, USA) at 37°C for 15 min, followed by 5 min at 95°C. PCR was then performed with Phusion High-Fidelity DNA polymerase, universal PCR primers, and an Index (X) Primer. The PCR products were purified (AMPure XP system) and library quality assessed on the Agilent Bioanalyzer 2100 system.
Clustering of index-coded samples was performed on a cBot Cluster Generation System using a TruSeq PE Cluster Kit v4-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the libraries were sequenced on the Illumina HiSeq X-ten platform and paired-end reads were generated.
Raw reads in fastq format were initially processed using in-house Perl scripts. During this step, clean reads were obtained by removing adapter sequences, reads containing poly-N, and low-quality reads. At the same time, the Q20, Q30, GC-content, and sequence duplication levels of the clean reads were calculated. All downstream analyses were based on clean HQ reads.
Clean reads were mapped to the reference genome sequence, and only reads with a perfect match or one mismatch were further analyzed and annotated based on the reference genome (http://www.genoscope.cns.fr/ brassicanapus/data/). Tophat2 tools (parameters: --readmismatches 2 --read-edit-dist 2 --max-intron-length 5,000,000 --library-type fr-unstranded --mate-inner-dist 40) were used to map clean reads to the reference genome.

Differential expression analysis
Differential expression analysis of the two lines was performed using the DESeq R package [58]. DESeq provides