GWAS hints at pleiotropic roles for FLOWERING LOCUS T in flowering time and yield-related traits in canola

Background Transition to flowering at the right time is critical for local adaptation and to maximize grain yield in crops. Canola is an important oilseed crop with extensive variation in flowering time among varieties. However, our understanding of underlying genes and their role in canola productivity is limited. Results We report our analyses of a diverse GWAS panel (300–368 accessions) of canola and identify SNPs that are significantly associated with variation in flowering time and response to photoperiod across multiple locations. We show that several of these associations map in the vicinity of FLOWERING LOCUS T (FT) paralogs and its known transcriptional regulators. Complementary QTL and eQTL mapping studies, conducted in an Australian doubled haploid population, also detected consistent genomic regions close to the FT paralogs associated with flowering time and yield-related traits. FT sequences vary between accessions. Expression levels of FT in plants grown in field (or under controlled environment cabinets) correlated with flowering time. We show that markers linked to the FT paralogs display association with variation in multiple traits including flowering time, plant emergence, shoot biomass and grain yield. Conclusions Our findings suggest that FT paralogs not only control flowering time but also modulate yield-related productivity traits in canola. Electronic supplementary material The online version of this article (10.1186/s12864-019-5964-y) contains supplementary material, which is available to authorized users.


Highlight
The genetic association, eQTL and expression analyses suggest that FT paralogs have multifaceted roles in canola flowering time, plant development and productivity traits.

One sentence summary
Paralogs of FT which is known to be critical for flowering time have pleiotropic roles in yield related traits in canola.

Background
Natural variation provides a valuable resource for discovering the genetic and molecular basis of phenotypic diversity in plant development, adaptation and productivity [1,2]. Canola (rapeseed, Brassica napus L., A n A n C n C n genomes, 2n = 4× =38) is an important oil crop, varieties of which display extensive variation in life history traits such as flowering time. Precise knowledge of flowering time is fundamental for identifying locally adapted varieties. It is also essential in the development of new varieties that maximize yield and oil quality in diverse and rapidly changing environments. For example, early flowering varieties are preferred for cultivation when periods of drought and high heat are frequent, whereas winter/semi-winter crops achieve maximum yields in the longer growing seasons that occur in temperate regions [3].
In Arabidopsis thaliana, the four major pathways that regulate flowering time are photoperiod, vernalisation, autonomous and gibberellic acid pathways [4,5].
The gene FLOWERING LOCUS T (FT) is generally considered to integrate inputs from several pathways that finally result in floral transition. In A. thaliana, loss-offunction mutations in FT result in late flowering under long-day conditions [22,23]. In B. napus, six paralogs of FT have been identified [24,25] that contribute to functional divergence in flowering time between winter and spring cultivars. For example, mutations in BnC6.FTa and BnC6.FTb paralogs have been shown to alter flowering time in B. napus accessions [26]. Owing to the multiple copies of FT in canola, it has been difficult to establish the functionality and precise relationship between various paralogs in plant development and productivity traits, as shown in Arabidopsis, onion and potato [27][28][29][30][31][32]. In addition, under field conditions, it is difficult to determine the extent of genetic variation in photoperiod response, as plants undergo a series of cold temperature-episodes during vernalisation.
Here we determine the extent of flowering time variation utilizing a diverse panel of 368 canola genotypes representing different geographic locations around the world. Using GWAS we identify several underlying QTLs controlling phenotypic variation in photoperiod response and flowering time. We show that the response to photoperiod maps to FT paralogues, and their potential transcriptional regulators CIB, CO, CRY2, FVE, MSI, EMF2 and PIF4. Using a doubled haploid population of plants grown under LD and/or field conditions, we show that expression levels of FT paralogs are significantly associated with flowering time variation across diverse canola accessions. The eQTL analysis for FT expression levels map not only to FT itself (e.g., BnA7.FT) but also other loci that are known regulators of FT such as BnFLC.C3b (FLC5), FPA, SPA1 and ELF4. We also demonstrate that plant productivity traits such as plant emergence, shoot biomass accumulation, plant height, and grain yield map in the vicinity of FT. Taken together our findings suggest that FT has multifaceted role in plants and could be exploited for selection of canola varieties for improved productivity.  Table S2). A buffer (non-experimental line) row of an Australian canola variety SturtTT was seeded after every two ranges to ensure that plots are harvested at the right maturity. For WWAI single row trial, 300 accessions were arranged in a randomized block design with 60 rows (each row 10 M long) by 10 columns in two replicates (Additional file 2: Table S2), each replicate of 30 rows was separated with a buffer row of SturtTT. Each accession per replication had 100 plants. This trial was sown under Lateral Move irrigation system to match water demand for optimal plant growth. The Condobolin trial was sown as rainfed and arranged in a randomised complete block design with 100 rows by 6 columns, accommodating all 300 accessions in two replicates (Additional file 2: Table  S2). For field plot experiments, accessions were sown in plots (2 m wide × 10 m long at Wagga Wagga and 2 m wide × 12 m long at Condobolin) at density of 1400 seeds/20 m 2 plot. Seeds were counted with Kimseed machine and directly sown in plots in the field; each plot consisted of 6 rows spaced 25 cm apart. Plots were sown with a six-row cone-seeder to 10 m length. All plots were sown with a granular fertilizer (N: P: K: S, 22: 1: 0: 15) applied at 150 kg ha -P . The fertilizer was treated with the fungicide Jubilee (a.i. flutriafol at 250 g/L, Farmoz Pty Ltd., St Leonards, NSW, Australia) to protect all genotypes against the blackleg fungus, Leptosphaeria maculans. After crop establishment, plots were trimmed back to 8 m after emergence by applying Roundup (a. i. glyphosate) herbicide with a shielded spray boom. For controlled environmental cabinets (CE cabinets, Thermoline Scientific, Wetherill Park NSW, Australia), eight plants of each of the 368 accessions were grown in plastic trays as described previously [12] under long (LD) and short day (SD) conditions. For LD treatment, seeds were planted in a CE maintained at 20 ± 1°C under white fluorescent lamps (4000 K, Osram) with light intensity of approximately 150 μM/m 2 /s, with a 16-h photoperiod. In SD treatment, plants from 368 accessions were grown at the same conditions described above but for 8 h photoperiod.

Flowering time and other phenotypic measurements
Days to flower from sowing was calculated when 50% of plants had opened their first flower. In SD conditions, flowering time was recorded for up to 200 days. Plants without any flowers at the end of the experiments were classified as 'assigned (A)' (LD-A, SD-A, see Fig. 1). The response to photoperiod was calculated as the difference between 50% flowering in plants grown under SD and LD conditions. For field trials, flowering time was recorded three times in a week.
Normalised Difference Vegetative Index (NDVI) was measured as a proxy of fractional ground cover for early vigour [33,34] using a GreenSeeker® (model 505, NTech Industries Inc., Ukiah, CA, USA). The NDVI readings were taken at 7-10 days interval after 5 weeks of sowing before the onset of flowering. Multiple readings were taken in each plot and then averaged across each plot for genetic analysis. Plots were harvested by direct heading with a Kingroy plot harvester (Kingaroy Engineering Works, Queensland, Australia) in the 4th wk. of November (Condobolin, NSW) and 2-3rd wk. of December (Wagga, Australia). Grain samples were cleaned with Kimseed (Kimseed Australia, Western Australia) and plot yield was expressed into t/ha.

Field evaluation of SAgS DH population
SAgS population of 144 DH progeny from a BC 1 F 1 plant derived from the cross Skipton (less responsive to vernalisation) and Ag-Spectrum (more responsive to vernalisation) have been previously described [12,13,35]). The population was grown in 2015 (35°01′32.3″S 147°19′25.4″E) and 2016 (35°01′42.8″S, 147°20′23.3″E) in the field at the WWAI, NSW, Australia. Both trials were randomised in a complete block design with three replicates in a single block. A total of 1,400 seeds per genotype were directly sown in plots in the field as described above. The traits measured included plant emergence, first flowering, plant biomass, plant height, and grain yield. Plant (shoot) biomass was calculated from cuttings obtained from 10 randomly selected plants growing in the central row of each plot. Each sample was weighed on a digital scale and fresh weights were expressed in g/plant. Plant height (cm) was measured at the physiological maturity stage using 5 plants selected randomly in the middle row of each plot. Plots were harvested with a Kingaroy plot harvester in the 2-3rd wk. of December (Wagga, Australia).

Genome-wide genotyping
Leaf material was collected individually from 368 diverse DH canola accessions, grown under LD conditions, and then immediately snap-frozen in liquid nitrogen. Genomic DNA was isolated as described previously [13] and sent to Trait Genetics, Germany (http://www.traitgenetics.com/) for genotyping with Illumina infinium 15 k Brassica chip representing 60 K Infinium SNP array [36].

Population structure and GWA analyses
For GWA analysis, we only used SNP markers with allele frequencies > 0.05 and overall call rates (proportion of genotypes per marker) of > 90% [37]. To prevent the potential loss of genome wide associations (GWA) missing data was imputed [38]. A total of 11,804 SNP markers could be anchored to the A n and C n subgenomes of reference sequenced genome of B. napus cv. The variety 'Darmor-bzh' (Darmor) was used as reference for cluster and GWA analyses in a diversity panel of 368 accessions (Additional file 1: Table S1).  Table S1 Cluster analysis was performed with the Neighbor-Joining method [39] using MEGA version 6. In order to reduce spurious associations between markers and variation in flowering time, population structure and the relative kinship coefficients of individual genotypes were estimated as described previously [12]. Flowering time-SNP marker association analysis was performed using the EMMAx/P3D method [40,41] implemented in the R package GAPIT [42] (https:// cran.r-project.org/). Significance of GWA between markers and flowering time was tested at LOD score of 3. The P (−log 10 P) values for each SNP were exported to generate a Manhattan plot in R [43]. The proximity of candidate genes to identified associations based on the physical positions of SNPs/candidate genes was inferred based on functional annotation of the A. thaliana genome and implemented in the reference sequenced genome of Darmor [44]. After Bonferroni correction, associations with LOD score = 5.41 were also considered as significant on a p < 0.05 level. The associations detected through GWAS were compared with the QTL intervals associated with flowering time identified in the field conditions in the SAgS DH mapping population evaluated in 2013, 2014 [13], 2015 and 2016 (this study).

Statistical and QTL analysis
Flowering and other phenotypic data collected from different experiments were analysed using linear mixed models in R as described previously [45]. Essentially, we defined the individual experimental Plot as a factor with 432 levels for each of the 2015 and 2016 trials. The factors: Row and Range corresponded to the rows and ranges of the trials, with levels equal to the number of rows and ranges in each trial. The combination of levels of Row and Range completely index the levels of Plot such that Plot = Row:Range. The factor Rep has 3 levels corresponding to the replicate blocks in each trial. The plot structure for the field experiment consists of plots nested within blocks and is given by, Rep/Plot which can be expanded to give, Rep + Rep:Plot. The term Rep:Plot indexes the observational units for all traits and thus is equivalent to the residual term for these traits. The treatments for the field phase of the experiment are the lines allocated to plots so we define the treatment factor, Genotype, with 144 levels corresponding to lines grown in each trial. Due to marker data being included in the model, we need to define an additional two factors; Gkeep (corresponding to lines with both phenotypic and marker data) and Gdrop. The factor Gdrop has 16 levels corresponding to lines with phenotypic data but not marker data. Therefore treatment structure is given by Gkeep + Gdrop. Finally, marker data is incorporated into the analysis and individual markers are scanned following the approach of Nelson et al. (2014) [9] to establish a final multi-QTL model. We also used phenotypic data from 2013 and 2014 experiments that was published previously [13], in order to test multifaceted role of FT in flowering time and other productivity traits across environments. A genetic map based on 7,716 DArTseq markers representing 499 unique loci [13] was used to determine trait-marker associations. The predicted means for first flowering, and response to photoperiod for each genotype were used to detect genome wide trait-marker associations.

FT expression and eQTL analyses
FT expression analysis was carried out in two different sets of populations. First, we analysed FT expression in field-grown plants from 144 DH lines of the SAgS DH population. Second, we analysed FT expression in 24 accessions that represented extreme flowering phenotypes (i.e., early and late flowering accessions) from the 368 accessions in the GWAS panel. For both sets of experiments, five independent leaf samples collected from field/CE grown plants (at floral budding stage) per genotype were pooled and flash-frozen in liquid nitrogen (in field/ CE). For field-grown plants there were internal replications that effectively represented at least two biological replicates. For CE grown plants three biological replicates were used. RNA was isolated using TRIZol (Invitrogen) and cDNA was synthesized using a First Strand Synthesis Kit (Roche). Samples were controlled for their quality using two approaches as outlined previously [12]. Gene specific primers for each of six FT paralogs [26] used for the expression analysis are described in Additional file 3: Table S3. Since the expression levels of all FT paralogs were correlated, we used data from BnC6.FT for eQTL analysis using SVS package (Golden Helix, Bozeman, USA).

Structural variation in canola FT paralogs
We obtained sequence information for FT paralogs from a whole-genome resequencing data for the 21 canola accessions, which will be described elsewhere (Raman et al., in preparation). These 21 accessions also included the parental lines (Skipton and Ag-Spectrum) of the SAgS mapping population used in this study (Additional file 2: Table S2). Variation across the FT paralogs was extracted using the gene model information or by manually identifying gene regions based on BLAT homology (Additional file 4: Table  S4). The physical positions of different FT paralogs (NCBI GenBank accessions; genomic sequences: FJ848913 to FJ848918; promoter sequences: JX193765, JX193766, JX193767, JX193768) were confirmed with those of the sequenced FT genes on the 'Darmor' assembly as well as with published literature [24,25,46]. For each accession, the FT nucleotide sequences were aligned using MUSCLE as implemented [47] in the software package Geneious (https://www.geneious.com) Structural variation, number of polymorphic sites within the gene and the promoter region was identified using ANNOVAR [48]. The diversity indices were calculated using the MEGA version 6 [49]. The Tajima [50] and Fay and Wu [51] tests were conducted to examine whether the frequency spectrum of polymorphic nucleotide mutations conformed to the expectations of the standard neutral model. The effect of InDel mutations on functional domains was investigated using information from the NCBI conserved domain database.

Natural variation in flowering time across diverse environments
We determined the natural variation in flowering time of diverse accessions across five different environments. Across all environmental conditions, we found extensive variation in flowering time, which ranged from as little as 29.2 days up to more than 137 days (Fig. 1 Table S6). Most of this variation was genetically controlled as the broad sense heritability (h 2 , also called as reliability) ranged from 45 to 97% across different environments (Additional file 7: Table S7). We observed positive genetic correlations (r = 0.88 to 0.96) for flowering time between the different field trials, suggesting that majority of the genetic variation and underlying mechanisms are shared across environments (Fig. 2).

Relationship between flowering time and other traits
To determine whether there is any relationship between flowering time and yield-related traits in canola, we analysed their genetic correlations (Fig. 3). There were low genetic correlations between the flowering time and other agronomic traits, which suggests that the growth environment play an important role in trait expression. Flowering time showed a negative correlation with grain yield across sites (WW-Wagga Wagga and Con: Condobolin) under LD photoperiodic conditions (field and controlled environments). Early vigour (NDVI.WW) showed positive correlations with flowering time (r = 0.2 to 0.7) under LD and field conditions (WW-Wagga and Con), and with grain yield (r = 0.1 to 0.4) depending upon growing environment.
Genetic relatedness among accessions in the GWAS panel SNP marker distribution across genome is shown in Additional file 20: Figure S2. SNP markers were distributed un-evenly: most were located on chromosomes A03, A07, C03, and C04 (> 780 markers/chromosome). The lowest marker density was observed in chromosome C09. A total of 11,804 SNP markers anchored to the reference B. napus genome, with the mean marker density of 621.3 per chromosome provided coverage of~84.7 kb/marker. Cluster analysis revealed at least three main clades among accessions, representing European winter, Australian semi-spring/Canadian spring, and semi-winter of Indian/Chinese origin (Fig. 4, Additional file 21: Figure S3). The first three principal components (PC1 = 38.1%, PC2 = 11.9%, and PC3 = 5.67%) accounted for 55.7% of the genetic variation and the grouping of accessions reflected the cluster analysis (Additional file 22: Figure. S4). To estimate the extent of genome-wide linkage disequilibrium (LD) we calculated the squared allele frequency correlations (average r 2 ) for all pairs of the anchored SNPs using an LD sliding window of 500 as 0.02 (Additional file 23: Figure S5). The kinship coefficient among accessions ranged from 0.03 to 0.99 suggesting a wide-range of familial relatedness between pairs of accessions (Additional file 8: Table S8), as observed in our previous study [12].

Genetic architecture of flowering time and photoperiod response
Accounting for both population structure and kinship information, we detected a total of 142 significant associations (at the genome-wide significance thresholds of LOD score of ≥3) for flowering time in diverse environments [(under field, three experiments), LD and SD conditions)]. The markers with significant associations were distributed across all chromosomes except A01 (Additional file 9: Table S9). Majority of the associated SNPs (70%) were identified on A n subgenome (Additional file 10: Table S10), suggestive of an uneven distribution on the physical locations of Darmor assembly. Most of the associated SNPs (33.1%) were on chromosome A02 (47 SNPs), followed by 9.15% on C03 (13 SNPs), and these could explain the majority of allelic variation for flowering time in canola. We identified 22 unique SNP markers that accounted for associations that were detected at least in 2 different environments (Additional file 9: Table S9). Of the 142 significant associations, six SNPs crossed the Bonferroni threshold for flowering time in LD conditions, all of which are located on chromosome A02 ( Table 1). Two of these SNPs (Bn-A02-p9371948 and Bn-A02-p9371633) associated with flowering time under LD conditions were located near the FT locus (~0.64 Mb, BnA02.FT, BnaA02g12130D) (Fig. 5a-c). Under different environmental conditions, we detected different associations; several of these SNP associations were mapped near the vicinity of genes known to play a regulatory role in FT expression in A. thaliana such as FLC4, UPSTREAM OF FLC, CO, MSI1, LD, MAF4 on A02; BnFLC3a, CO and EMF2 on A03; NY-YB8 on A04; GI on A08; EMF2 and CRY2 on A10, and CIB1 on C08 (Additional file 11: Table S11). We also identified 28 SNPs that showed significant association above a LOD of 3 with response to photoperiod identified under controlled environment cabinet conditions on chromosomes A01, A02, A07, A09, A10, C01, C03, C06, C08 and C09 (Additional file 11: Table S11, Fig. 5c).
To identify potential candidates involved in the photoperiod response, we compared the physical positions of 28 significant SNP associations for photoperiod with the physical positions of flowering time genes (Additional file 11: Table S11). Seven significantly associated SNP markers map in the vicinity (0.  Table S11). Consistent with our previous study (Raman et al. 2016a), our results reinforces that while the major players of flowering time appear to be conserved between Arabidopsis and canola, the specific functional roles of the paralogs might differ depending on the environmental conditions.

QTL analysis in biparental population identifies loci for flowering time and productivity traits near FT paralogs
To ensure capturing the relevance of entire genetic architecture of flowering time variation, we considered the SAgS DH mapping population derived from a BC 1 F 1 cross between Australian spring type cultivars; Skipton (less responsive to vernalisation) and Ag-Spectrum (more responsive to vernalisation). We had previously utilised this cross for genetic analyses for range of traits of interest [8,13,35,[52][53][54]. The frequency distributions of the DH lines for different traits evaluated are shown (Additional file 25: Figure S7). The DH lines exhibited high broad sense heritability values (56.7 to 99%) for all traits, except for NDVI and plant emergence (29.2 to 44.3%) across environments (Additional file 12: Table  S12a). There was moderate to high genetic correlations for flowering time, early vigour, plant biomass and grain yield across environments (phenotyping years) in the SAgS DH population (Fig. 6). Flowering time showed generally negative correlations with grain yield and plant biomass, whereas it showed positive correlation with early vigour and plant height. We identified several QTLs associated with flowering time, plant emergence, shoot biomass, plant height, and grain yield across phenotypic environments in the SAgS population (Additional file 12: Table S12b).
Since we detected moderate to high genetic correlations in this population between multiple traits including flowering time (Additional file 13: Table S13), we considered whether the QTLs underlying these multiple phenotypes co-localise onto the physical map of B. napus. Genetic and physical localisation of markers on   Darmor reference genome [44] revealed that three significant, co-located, QTLs associated with multiple traits (Fig. 7). A multi-trait QTL flanked by markers 3110489 and 3075574 for plant emergence, shoot biomass, flowering time, and grain yield mapped on chromosomes A07 was located within 0.65 Mb of the FLOW-ERING LOCUS T (FT, NCBI accession FJ848914.1); BnA2.FT paralog in B. napus [24]. Consistent with GWAS analysis, we detected QTLs near the FT in the biparental population (Fig. 7). Mapping of pleiotropic trait QTL in the vicinity of FT (A07) suggest that FT may have multifaceted role in plant development and productivity traits.

Expression levels of FT paralogs explain significant variation in flowering time
To assess whether changes in the expression of different FT paralogs could explain the phenotypic variation in flowering time, we examined expression of FT paralogs among field-grown plants of all 144 DH lines. Expression levels of all 6 FT paralogs displayed significant association with flowering time (p < 0.001), with different copies accounting for genetic variation in flowering time variably; ranging from 23% (BnC2.FT) to 40% (BnC6.FTb) (Fig. 8a). The FT homologues BnA7.FTb and BnA7.FTa localised near a multiple trait QTL (Additional file 12: Table S12) could explain 30 and 31% of genetic variation in flowering time, respectively. Sequence analyses of the PCR products also confirmed that BnC6.FTb and BnA7.FTb are accurately detected in our assays.
To further assess whether a similar pattern is also observed among natural variants, we assessed the expression of BnC6.FTb, BnA2.FT2 and BnFLC.A02. We choose BnC6.FTb because it showed the highest correlation in the DH population. BnA2.FT2 was detected as a QTL in the diversity set of 24 accessions, whilst BnFLC.A02 was identified in accessions that differed significantly in their flowering time. Consistent with the QTL analysis and the expression studies in DH populations, we observed Table 1 Genome-wide highly significant SNPs associated with variation in flowering time and photoperiodic response in diverse accessions of B. napus. Photoperiod response was evaluated under long (LD) and short day (SD) conditions in the controlled environment cabinet (CE). QTL marked with * were detected in the SAgS (Skipton/Ag-Spectrum/Skipton) DH population (Raman et al. 2013 [8], 2016 [12,13,35] (Fig. 8b). Consistent with the timing of sample collection (i.e., just prior to flowering), we detected expression variation in FT rather than FLC accounting for most of the flowering time variation in these diverse set of 24 accessions. Taken together this study revealed that irrespective of the causal variation, the phenotypic variation is associated with changes in the expression levels of the floral integrator FT.
To unravel the cis and trans acting candidates associated with differential FT transcripts expression, we first sought SNPs that affect expression levels of all FT homologues in diverse canola accessions. Then, we layered this information on the physical map positions of SNPs associated with genetic variation in flowering time and photoperiod response (Additional file 14: Table S14). We identified a total of 13 SNPs mapped on chromosome A07 and C03, in the vicinity of multiple trait QTLs that we identified in the SAgS population. The candidate genes located near significant SNP associations are FT, ELF4-L2, PRR9, VIN3, BnFLC.C3b (FLC5, AY036892.1), FPA, SPA1 and TOE1 (Additional file 11: Table S11).

FT paralogs exhibit structural sequence variation in B. napus accessions
In total, nine FT copies were identified in B. napus accessions (Additional file 15: Table S15), including three putative FT copies on chromosomes A01, C02, and C04, (Additional file 15: Table S15). Sequence analyses showed considerable variation in level of synonymous and nonsynonymous SNP variations, Insertion-deletions (InDel) in promoters, as well as exonic and intronic regions. A total of 310 segregating sites were detected across FT paralogs. Our results showed that frequency spectrum of structural variants for BnA02.FT, BnC02.FT and BnC06.FT conformed to neutral expectations, while BnC04.FT and BnA07.FT showed non-conformance to neutrality, suggesting evidence of selection (Additional file 16: Table S16). We detected high level of diversity in FT paralogs mapped on A07, C04 and C06 chromosomes (Additional file 17: Table  S17, Additional file 18: Table S18). For example, BnC04.FT (21 SNPs) located in intron II (Fig. 9). Interestingly, an 8-bp deletion of the sequence 'TTCCGGAA' (coordinates BnC04: 12,437,458-12,437,465 bp) was observed in exon-IV of BnC04.FT in seven accessions; Av-Garnet, BC92157, Skipton, Charlton, BLN3614, ATR-Cobbler, ATR-Gem and in Darmor-bzh (reference genotype). This mutation creates a frameshift that removes the highly conserved C-terminal domain containing the PEBP-domain and several substratebinding sites. Cluster analysis showed that all variants formed a distinct cluster (Fig. 10). In the BnA07.FTb (BnaA07g33120D) we identified two indel mutations in the coding region (Fig. 9). The first is a single nucleotide deletion in exon 4 that is heterozygous with the wild type allele in Australian varieties; Av-Garnet, Skipton, Charlton, BC92156, Marnoo, BLN3614, Ag-Castle, Monty, Maluka, BLN3343-C00402, CB-Telfer, ATR-Gem, Surpass402, ThunderTT, ATR-Mako, Wesroona and Ag-Spectrum (the remaining lines are homozygous wild-type). The deletion results in a frameshift that affects the final 20 amino acids of the encoded peptide, including the 9 amino acids of the PEBP domain. The second InDel is a 3 base-pair mutation in exon 1 (His60-deletion) that is found in all our sequenced lines. These polymorphisms are consisted with the observed QTLs in the vicinity of FT.

Structural variation in FT promoter region
We further searched CArG box and other motifs for FLC, SOC1, SMZ and CO which can potentially bind to repress FT expressions [55] in introns (especially  [24]) was found in both BnA02.FT and BnC02.FT genes. However, this Type-II block ' A' sequence was absent in all FT paralogs located on the A07 and C06 chromosomes. 'CArG' box (CC(A/T)6GG) was absent in introns 1 of BnA02.FT and BnC02.FT genes. We also found several 'CACTA' elements in B. napus FT paralogs. For example, in BnaC04g14850, a total of four motifs were identified; three were present in introns (2 in Intron 2, antisense direction, and one in sense strand), and one CACTA motif was identified in Exon-IV. In BnA02.FT, a total of 834 CACTA motifs were identified in promoter, intron 1 and exon II. We also identified homologous sequences to FT promoter blocks C and E of A. thaliana [56] in three B. napus FT genes (BnaC06g27090D, BnaA07g25310D, and BnaA02g12130D). Block E was also identified in BnaC06g27090D with blastn (Additional file 26: Figure S8). In comparison to the Block C alignments, the binding regions were not well conserved in Block E. The structural variants for the four FT genes were plotted. Finally, in order to determine whether polymorphism in FT directly relates to flowering time variation, we performed phylogenetic analysis of 21 accessions representing GWAS panel and parents of mapping populations being used in the Australian Brassica Germplasm Improvement Program. Our results showed that grouping for both spring and winter types based on FT paralogs was not that distinct (Fig. 10) suggesting that other key flowering genes such as FLC and FRI may have contributed to diversification of these morphotypes [14,57].

Discussion
In this study we explored the genetic architecture underlying phenotypic diversity in flowering time, an important trait involved in plant development, adaptation and productivity. Our results demonstrate that there is extensive Variation in the response to photoperiod (as revealed from LD and SD conditions) appears to be another key determinant of flowering time differences among canola accessions (Fig. 1). Despite extended photoperiod at 20°C, several accessions did not flower under CE conditions. These accessions flowered when exposed to extended periods of cold temperatures suggesting that these accessions require vernalisation [12,13,52]. Thus, a combination of variation in photoperiod and vernalisation response causes phenotypic diversification of flowering time in canola (Fig. 1).
In order to have a minimum effect of vernalisation on flowering time, all field trials were conducted in the middle of June (instead of April the main canola growing season in Australia). We identified a highly significant QTL close to FT locus on chromosome A02 for flowering time variation in field-grown or CE cabinet-grown plants, suggesting that FT is a major candidate for flowering time across different growing environments (Fig. 4). This QTL was also mapped within 80 kb of a QTL for vernalisation response in our previous study [12], suggesting that FT integrates signals from both photoperiod and vernalisation pathways and regulates the transition from vegetative to reproductive phase in canola.
The functional role of FT was determined using quantitative RT-PCR using six FT paralog specific primers. Our results demonstrated that all paralogs underlie genetic variation in flowering time in canola. For the first time, we show FT expression in a canola population grown under field conditions is significantly associated with variation in flowering time. It was interesting to observe that most of variation in flowering time was explained by A02 locus in a GWAS panel, and A02 and A07 loci near FT paralogs in the SAgS DH mapping population (Fig. 6, Additional file 12: Table S12). However, the maximum correlation (R 2 = 0.4) was observed for BnC6.  [24,25] discounted the possibility of TSF controlling variation in flowering time which is shown to have much lower expression levels than FT [58][59][60].
We detect considerable structural variation in promoter, as well as in exonic and intronic regions in FT genes located on chromosomes other than A01 and C02. These high levels of polymorphism suggest that the FT gene is a major target for selection during domestication and systematic breeding of canola. FT is a member of the PEBP family and multiple paralogs have evolved from its common ancestral species, however these paralogs may have retained, lost or gained gene function in  Table S1. FT variant used for revealing diversity in BnaC04g14850D among 21 accessions are given in Additional file 18: Table S18 the polyploid genome of canola [24,61]. Our sequencing analyses reveals that different copies of FT harbour mutations including in the CArG, CACTA, Block C and Block E -the binding sites for the transcriptional factors such as FLC, SVP, GI, CO, CIB, CRY2 and SMZ proteins (Additional file 24: Figure S6), which regulate of the expression of FT [1,25,56,62,63]. Mutations in FT and TFL1 paralogs in canola have been reported to affect flowering and yield components [26]. Mutants or isogenic lines carrying different FT paralogs and/or their combination are required to establish the precise role of each paralog in both vegetative and reproductive phase of plant development. While our expression analyses of FT genes hints at functionality of these paralogs, further research is required to establish whether there is any role of transcriptional enhancers: Block C and Block E on the FT expression [56] as well as its association with other traits of agronomic interest.
We show that FT has multifaceted role in diverse traits that influence plant development. QTLs for several traits such as plant emergence, early vigour, plant biomass, plant height, grain yield, were localized with flowering QTL in a cluster and the expression level of FT showed a good association with different traits. However, this relationship was dependent upon G × E interaction (Additional file 19: Figure S1). These findings hint that flowering time, driven by FT paralogs have variable influence on different traits under different environments. However, it was difficult to establish in this study due to presence of multiple copies of FT in canola genome.
This study demonstrates multigenic inheritance of flowering in the SAgS population. However, a relatively small size population (n = 144) may have compromised the estimates of QTL identified herein. In addition, QTL only accounted for small genetic effects (2.7 to 10.3%) in this study (Additional file 12: Table S12). This is in contrast with other studies, which reported major QTLs for flowering time [64]. Recently, Tyagi et al. [65] showed that Brassica FT homeologs influence flowering time, branching pattern, plant height, silique length and width, seed size, stomatal density, and fatty-acid profile in B. juncea. Our expression analyses revealed that enhanced FT gene expression is related with early flowering in the doubled haploid lines of Skipton/Ag-Spectrum//Skipton (Fig. 8). In a previous study, Raman et al. [13] showed that early flowering DH lines having Skipton QTL alleles yield higher than those having late flowering allele (Ag-Spectrum). These results suggest that canola varieties having higher FT gene expressions can be selected for enhancing productivity.
In canola, sequence variation in BnFLC.A10 appears to underlie QTL for both flowering time as well as root biomass [21,66]. In addition, flowering time has been implicated in plasticity of water-use efficiency, carbohydrate availability, plant vigour, resistance to diseases and yield [67][68][69][70]. We propose that alleles that showed significant association with flowering time and grain yield in the water-limited years experienced in 2013 and 2014, are of high relevance even though they did not reveal genetic associations in water-unlimited years (non-stress environment, 2015 and 2016). Environmental stress tends to drive changes in flowering time in Brassica as a result of change in allele frequencies of the flowering time genes [71,72]. Our data also suggest that different FT paralogs regulate flowering time depending upon environment. For example, QTLs for flowering time were identified close to BnaA07g25310D in 2013 and 2014, however a QTL for flowering time was mapped on chromosome C04, close to a different FT paralog, Fig. 10 Neighbour-joining tree based on nucleotide variation across all FT paralogs among 21 accessions of B. napus representing GWAS and parental lines (shown in red color) of a doubled haploid population derived from Skipton/Ag-Spectrum//Skipton. Tree was generated in MEGA 6. Nucleotide variation in FT genes was also compared with the corresponding FT genes in the reference Darmor-bzh, in colour. Number refers to percent bootstrap support for branches with greater than 50% support BnaC04g14850D in 2015 and 2016. Since, flowering time showed a good correlation with plant emergence, early vigour, shoot biomass, and grain yield; and enhanced FT expression is also correlated with early flowering, it is possible that FT may be one of the drivers promoting early growth in canola, therefore contributing to higher grain yield in canola especially under terminal drought and heat stress environments prevalent in Mediterranean countries.
The findings presented here reveal that the genetic architecture of natural variation in flowering time involves multiple alleles having major effects located near FT, UPSTREAM of FLC and RAV2 paralogs on chromosome A02 (Table 1, Additional file 24: Figure S6). This is in contrast to genetic variation in flowering time regulated by vernalisation which is controlled by multiple alleles distributed across genome [8,10,12]. Both positive and negative regulators of FT were located near significant SNP associations; for example, under LD treatment FLC that repress the FT transcription by direct binding to the CArG sites in intron 1 and promoter region of FT was detected [55]. The role of the candidate genes: GI, FD, SAM, AGL18/FUL in flowering time is well documented [7]. We also identified significant SNP associations for flowering time in the vicinity of H + -ATPse (Additional file 24: Figure  S6) which is implicated in stomatal opening and enhanced FT expression in the guard cells [28]. In addition, MSI, EMF2, FVE, and CURLY LEAF which regulate FT transcription via trimethylation of H3K27me3, H3K4me3 and EARLY FLOWERING 6 [1,56] were located in the vicinity of significant SNPs. These results suggested that both approaches utilized in this study: QTL as well as GWAS analyses are suitable for revealing the genetic architecture of flowering time in canola.
Based on their photoperiodic response, all genotypes could be grouped into photoperiod sensitive, photoperiod insensitive (less sensitive), and non-flowering types (vernalisation sensitive). Classification of such genotypes based on flowering habit was also supported with our molecular marker clustering, which placed the majority of the winter type varieties from Europe, China and Japan, in a single cluster (cluster II, Additional file 21: Fig. S3). These results supported that spring (semispring in Australia), semi-winter and winter canola belong to distinct genepools. A number of semi-winter accessions from China grouped into separate clade. Previous research has shown that Chinese canola germplasm is derived as a result of intensive crossing between winter canola introduced from Europe via Japan and spring type B. rapa for local adaptation [73].
In summary, we have demonstrated through a series of complementary and exploratory analyses based on association tests using genome-wide SNPs, expression QTL and quantitative RT-PCR that the natural variation in flowering time and response to photoperiod revealed in this study is controlled by FT and other loci dispersed across the genome, and modulated by the environment. GWAS approach delineated genomic regions and provided insights into the genetic architecture of flowering time and its multifaceted role in plant development and productivity traits. Although some alleles identified in this study may not be causative of phenotypic differences in flowering time, they still represent valuable selection tools to increase rate of genetic gain in canola improvement programs. Several Illumina Infinium™ SNP and FT gene specific markers located near the QTL associated with trait variation and known flowering time genes [74][75][76] would enable the identification of canola accessions with optimal FT expression and agronomic trait performance. Further research is required to understand the role of different FT copies in canola productivity across target environments.