Quantitative trait loci
A total of 22 regions of the P. mariana genome were detected as being potentially involved in budset and growth variation, which is lower than the 52 QTLs that had been found in the congeneric white spruce for each of these traits by Pelgas et al. which used two pedigrees and larger sets of progeny. A relatively small number of progeny (283) was surveyed in the present study and only one biparental family was used to map QTLs. Given that the progeny were not clonally replicated, in contrast to the study conducted by Pelgas et al., it was not expected to identify as many QTLs with as much precision. However, the main goal of the present QTL survey was to obtain an exploratory overview of the genomic architecture of adaptive traits in black spruce for comparative purposes.
Trait variations were not normally distributed, so that the guidelines of Asins et al. were followed and, consequently, both parametric and non-parametric approaches were used to identify QTLs. The three different methods yielded overlapping complementary results. The non-parametric methods (nIM and K-W tests) were notably more efficient in QTL detection than the parametric approach (IM), despite their expected lower power to detect QTLs. However, the application of IM still yielded valuable information, such as estimates of the proportions of explained variance. Despite the limitations that were imposed by the available data, four QTLs were repeatedly identified over years or sites, thereby more likely representing genomic regions encompassing polymorphisms that were more central to trait variation. Such a restricted set of QTLs that is shared across years or environments has also been observed in the congeneric P. glauca.
On average, 8% of the observed phenotypic variance could be explained by a single QTL. This average proportion of explained variance (PVE) seemed low compared to several published values in conifer QTL mapping studies of traits related to growth or adaptive traits [41–43]. Yet, many QTL studies showing high PVE for single QTLs have been performed using interspecific crosses [44–47]. Given that phenotypic variation in such crosses may represent the sum of trait variations originating from both species, QTLs having effects on these adaptive traits would harbour higher PVE estimates accordingly to the average trait differences between the two species. In contrast, the average amplitude of PVE values that were estimated in the present study is consistent with results obtained in other QTL studies on phenology and growth trait variation which had been assessed with intraspecific crosses [13, 43, 48, 49]. Even though such PVE estimates may have been biased upward due to the Beavis effect , the congruence with results from other QTL studies relying on larger progeny [13, 48], which should be less sensitive to such effect , suggests that PVE values were quite accurately estimated in the present study. These results suggested that many QTLs are truly involved in adaptive variation. We noted that three QTL regions were involved in both budset and growth variations, suggesting pleiotropic effects. A similar pattern was also observed among P. glauca QTLs for phenology and growth . Such patterns are expected, given that significant positive correlations between growth and budset are generally observed in trees as reported previously in both P. glauca and P. mariana[25, 28]. In other words, the later the budset, the longer the growing season and, consequently, the more growth is generally accumulated.
SNPs associated with adaptive trait variation
A total of 34 SNPs from as many distinct genes were significantly associated with adaptive trait variations in genetic association tests, representing 65% of the tested candidate SNPs. Such a rate of positives is high, exceeding by more than an order of magnitude the results obtained in other genetic association studies that are based on candidate genes by more than an order of magnitude (e.g.[18, 20, 51]), and where there had been no a priori screening of the tested gene polymorphisms. Given the much smaller number of tests that were performed with a limited set of candidate polymorphisms, the issue of multiple testing and potentially high rates of false positives that are associated with intensive testing becomes less of a concern. Although “next-gen” sequencing technologies will allow detection of many more polymorphisms at an affordable cost in the near future, with approaches such as genotyping-by-sequencing , statistical issues that are related to high rates of false positives when large numbers of tests are conducted will remain a recurrent theme. As shown here, a candidate SNP approach should contribute to reducing these concerns and the number of tests to be performed, especially when quantitative characters implicating large numbers of genes are considered, or for organisms characterised by very large genomes such as conifers . The outlier analysis and bulk segregant analysis of sequence signatures that were used to identify candidate SNPs were quite efficient in reducing the number of candidate SNPs to be tested by more than an order of magnitude, in both cases (see Methods). The several hundred genes that were involved in the sequencing and preliminary screening procedure for segregating SNPs between opposite pools of extreme individuals represented quite a large number of screened candidate genes, given that we relied on traditional PCR and Sanger sequencing to discover segregating SNPs. New methods relying on massive sequencing of the gene space that use gene sequence capture approaches and next-gen sequencing should make it easier to discover much larger numbers of candidate polymorphisms segregating for a variety of phenotypic characters. In various spruce species, such strategies are currently employed, which involve gene capture and sequencing for a variety of pools using oligo arrays that represent most of the transcriptome of the spruce genome .
While screening for candidate gene SNPs was conducted using bulk segregant analysis when plants were 6- and 7-years old, more SNPs were found to be significant in association testing around these ages (6-, 7- or 8-years old) than at a later age of 12 years (Table 2). This trend suggests that different genes and SNPs (and physiological processes) could be involved at different ages, as the transition from juvenile to mature adult is taking place. This scenario is additionally supported by the fact that data from 12-year-old plants were not used in the screening procedure to identify candidate gene SNPs from pool segregant analysis, and that the number of significant genetic association tests was the lowest for field data at that age. A similar trend was not observed in a QTL study of adaptive characters that had been assessed over several years in P. glauca, probably because the trees were still at the juvenile stage of development . In the present study, genetic correlations between characters at age 12 and ages 6, 7 and 8 years were generally modest for both sites (data not shown), further pointing to partially different sets of genes and polymorphisms involved. These modest correlations are consistent with previously published estimates from a clonal test in P. mariana.
In our study, each SNP that was associated with trait variation explained a low proportion of variance (PVE ~ 1.25%), which significantly differed between the two study sites. Average PVE was generally higher for site #4 than for site #3. Site #4 was characterised by harsher climatic conditions at higher altitude (Additional file 1: Table S1), with freezing temperatures occurring earlier in the season than at site #3, the latter site being located at lower elevation and lower latitude. Since budset timing is also significantly related to temperature conditions , it was likely that quantitative trait differences between adapted and maladapted trees would have been amplified on the site with harsher climatic conditions, leading to greater ability in detecting significant associations. With respect to this likely trend, the variance in quantitative traits was generally greater on the site with harsher climatic variation (data not shown).
The low PVE values that were observed with significant SNPs suggested that a large number of adaptive polymorphisms from various genes and gene families are involved in the genetic control of adaptation. These low PVE values were not surprising given that similar values have been observed in genetic association studies involving other tree species (3% in ; <1% in ; 3% in ; 0.7-5.4% in ; 3-5% in ). This trend was consistent with that found for QTLs of the same characters (although lower PVE values were observed in association testing), and in agreement with quantitative genetics theory, which predicts that genetic variation in these quantitative traits is controlled by many genes, each exerting small effects [56, 57]. Because linkage disequilibrium usually decays rapidly within gene limits in natural populations of boreal spruces [10, 11], the co-occurrence of several physically linked adaptive polymorphisms that would explain a higher proportion of the variance was unlikely.
Several lines of evidence supported the involvement of these significant SNPs in adaptation. First, half of these polymorphisms displayed allele frequency variation significantly correlated with climatic variations (Table 2, Figure 3), hence corroborating, with an extensive population sampling, the trends that had been previously observed in black spruce molecular adaptation studies . In the absence of population structure, these correlations suggested that the alleles are under divergent selection. However, the remaining SNPs should not be considered as false positives given that variation in SNP frequencies may not follow a linear distribution . Although the distribution of an adaptive trait is expected to follow the selective environmental variable, adaptive allele frequency variation along the environmental gradient may follow various and sometimes barely predictable distributions given the complexity of interacting genes and alleles that likely govern adaptive trait variation [12, 42].
This idea of complexity of gene and allelic interactions is also reinforced by the large diversity of families and functions observed among the genes that were found to harbour significant SNPs. We noted that 53% of significant SNPs belonged to gene families previously found to harbour genetic polymorphisms and expression patterns that were related to stress resistance and adaptation in other conifers, trees or plant model organisms. For instance, genes of the MYB family have already been reported in several studies as transcription factors that are involved in growth control, response to cold stress and organ differentiation, and related to budset in P. glauca[20, 58–60]. In addition, MYB members have also been discovered in genetic association studies of cold hardiness in Pseudotsuga menziensii var. menziesii and Picea sitchensis. A SNP in a BAM gene was also associated with ring growth variation in a previous association study in P. glauca. As budset timing and cold resistance are preceded by growth cessation , we expected to find genes regulating growth development. As was the case for genes from the AP2 family, which have been identified previously in the gene network activating plant freezing tolerance in Arabidopsis and Triticum aestivum. Similarly, variation in the expression of genes from the C3HC4 RING family has been observed under drought and low temperature stress in Arabidopsis and rice [63, 64]. In the latter species (Oryza sativa L.), a protein of the WRKY family was also found to be activated by a Heat Shock protein under drought stress . Dnaj Heat Shock proteins are produced under diverse abiotic stresses  and both WRKY and Heat Shock families were represented by a gene SNP that was associated with height variation in the present study. Among other significant gene SNPs that were found in the present study, the NAC, LEA, PATATIN and RAP2 gene families were also represented. NAC proteins are over-expressed in response to drought and salinity stresses in rice , as well as in drought responses of Arabidopsis. In addition, several LEA genes are known to improve resistance to dehydration and freezing temperatures in plants  and were reported in a drought adaptation study in Pinus pinaster Aiton . It should be also noted that a PATATIN gene has been found to be stimulated by drought stress in Vigna unguiculata (L.) Walp. and a RAP2 gene was found to be up-regulated by drought stress in Arabidopsis. Hence, a wide variety of gene families was found, indicative of the large number of physiological processes involved in budset and growth.
Of the 34 SNPs significantly associated with adaptive trait variation, seven were also located in a QTL for the same character. Given the known relationships between budset timing, the length of the growing season and growth (see above), it is plausible that several SNPs that were associated to one trait were located in a QTL for the other trait. Considering such cases, 17 significant SNPs (50%) were located in QTLs. Such corroboration would support the identified polymorphisms as true positives. Furthermore, the two analytical approaches should not be expected to be congruent but complementary. QTL mapping permits the identification of polymorphisms involved in trait variation in one or few pedigree populations, thereby reducing background genetic variations as well as the number and complexity of interacting alleles, while genetic association testing permits the identification of polymorphisms of more general value over a wider range of genetic backgrounds.
In total, 65% of the significant SNPs that had been identified by genetic association tests were corroborated by additional indirect evidence of various sources, as indicated above (relationships with climatic factors, adaptation QTLs and key gene families). This evidence highlights their relevance for possible in situ monitoring of genetic variation for adaptation or for inclusion in marker-aided breeding selection schemes. The remaining gene SNPs could still represent polymorphisms of high adaptive value, even though additional research would be required to identify their specific role in adaptation.