High-density linkage mapping and evolution of paralogs and orthologs in Salix and Populus

Background Salix (willow) and Populus (poplar) are members of the Salicaceae family and they share many ecological as well as genetic and genomic characteristics. The interest of using willow for biomass production is growing, which has resulted in increased pressure on breeding of high yielding and resistant clones adapted to different environments. The main purpose of this work was to develop dense genetic linkage maps for mapping of traits related to yield and resistance in willow. We used the Populus trichocarpa genome to extract evenly spaced markers and mapped the orthologous loci in the willow genome. The marker positions in the two genomes were used to study genome evolution since the divergence of the two lineages some 45 mya. Results We constructed two linkage maps covering the 19 linkage groups in willow. The most detailed consensus map, S1, contains 495 markers with a total genetic distance of 2477 cM and an average distance of 5.0 cM between the markers. The S3 consensus map contains 221 markers and has a total genetic distance of 1793 cM and an average distance of 8.1 cM between the markers. We found high degree of synteny and gene order conservation between willow and poplar. There is however evidence for two major interchromosomal rearrangements involving poplar LG I and XVI and willow LG Ib, suggesting a fission or a fusion in one of the lineages, as well as five intrachromosomal inversions. The number of silent substitutions were three times lower (median: 0.12) between orthologs than between paralogs (median: 0.37 - 0.41). Conclusions The relatively slow rates of genomic change between willow and poplar mean that the genomic resources in poplar will be most useful in genomic research in willow, such as identifying genes underlying QTLs of important traits. Our data suggest that the whole-genome duplication occurred long before the divergence of the two genera, events which have until now been regarded as contemporary. Estimated silent substitution rates were 1.28 × 10-9 and 1.68 × 10-9 per site and year, which are close to rates found in other perennials but much lower than rates in annuals.


Background
Genomic analyses of related organisms are central for addressing the evolution of genome organisation as well as for supporting the identification of the genetic background of economically and biologically important traits. With genetic linkage maps and genome sequence information available across taxa, it is possible to study evolutionary processes ranging from major structural changes like whole-genome duplications and chromosome rearrangements to fine scale differences such as single base substitutions. The rate of chromosomal rearrangements differs widely both among and within eukaryotic lineages [1][2][3][4]. Although the exact mechanisms creating rearrangements are unknown, repetitive elements seem to be an important trigger as breakpoint regions of rearrangements are enriched in various classes of repeats [1]. Changes in chromosome numbers may occur through fission and fusion of chromosomes, or more radically through polyploidization (genome duplication). The highly variable rate observed for gross chromosomal rearrangements in different lineages is also evident at the nucleotide substitution level [5]. Base substitutions at neutral sites occur at a rate equivalent to the mutation rate. This rate is affected by the number of mitotic cell divisions in animal germ lines or in plant cell lines ultimately forming gametes, during which replication errors can cause mutations. With information on divergence times obtained from the fossil record and estimates of the number of neutral substitutions in orthologous genes (K) one can through the relationship, r = K/(2T) [5] estimate neutral substitution rates (r). Although there is uncertainty in the dating of divergence times using the fossil record, this method gives rough estimates of the rate of silent substitutions in the species under study. In addition, the formula can be applied for studying relative timing of speciation and genome duplication events by comparing the number of substitutions in paralogous and orthologus genes, assuming constant substitution rates through evolutionary time. It is generally thought that orthologs share similar functions whereas paralogs often have different functions, and therefore paralogs are expected to diverge more per unit time than orthologs. This has however recently been questioned as orthologs might obtain new functions at a similar rate as paralogs [6]. Furthermore, functional change is not expected to affect silent substitution rates.
Salix and Populus species are trees, shrubs or subshrubs and members of the Salicaceae family. They share many characteristics such as dioecy, rapid growth and seed development, and ease with which they can be vegetatively propagated. Species across both genera typically have a haploid chromosome number of 19 and small and convenient genomes sizes (~500 Mbp) compared to the large genomes of most conifers . Based on the fossil record, the divergence of the two genera has been dated to approximately 45 mya [7,8]. Populus is recognized as a model genus for genetic and genomic studies in angiosperm trees, with many resources available such as the genome sequence of Populus trichocarpa http://genome.jgi-psf.org/Poptr1_1/ Poptr1_1.home.html, [9], linkage maps [10][11][12][13][14][15] and microarrays [16,17].
The genus Salix comprises more than 300 species and is widespread in both the Northern and the Southern hemisphere, excluding Australasia and New Guinea. Many Salix species display rapid growth and high biomass yields, even when nitrogen fertilizers are used sparsely [18]. These characteristics together with ease of establishment and a broad range of adaptability explain the wide-spread use of Salix spp. for short rotation biomass production. The large phenotypic variation within and among Salix species facilitates the identification of quantitative trait loci (QTLs) in experimental populations. Also, natural populations of Salix typically display high genetic diversity and are likely to contain many low-frequency alleles that underlie outstanding phenotypes useful to breeders [19]. Salix viminalis, S. dascyclados and S. schwerinii and their hybrids are some of the most commonly used Salix species in the breeding programs in Europe. These species have also been the focus of most past investigations of Salix genetics, which include the generation of linkage maps [20][21][22][23][24] and QTL analyses i.e. frost resistance and phenology [25,26], growth, water-use efficiency and drought tolerance [27,28]. As the use of willow as a source of biomass is expected to increase in the near future (S. Larsson Lantmännen Agroenergi AB, personal communication), one can foresee a rising demand for high yielding willow clones that are adapted to different environments and resistant to a wide repertoire of pathogens. This is going to put more pressure on breeding activities and more efficient ways to select useful clones will be needed. One method that can strongly support these activities is "Marker Assisted Selection" (MAS), i.e. selection on easily detectable genetic markers in genes linked to QTLs underlying phenotypic traits. MAS should be particularly useful for traits that are difficult to measure, exhibit low heritability, and/or are expressed late in development, such as productivity, disease resistance, drought and heat tolerance. To develop efficient MAS in willow, QTLs and ultimately genes responsible for selected phenotypic traits must first be identified. Quantitative traits are expected to have complicated genetic backgrounds including many genes, and complex interactions between them. Mapping such traits requires dense linkage maps, large mapping populations and thorough phenotyping of plants grown both in greenhouse and in field experiments over multiple seasons.
In this study we present two willow genetic linkage maps based on two mapping populations with the major aim of investigating the degree of genomic conservation between willow and poplar (Populus). The first more detailed map (S 1 ) is based on a S. viminalis × (S. viminalis × S. schwerinii) cross using SNP, microsatellite and AFLP markers. The second map (S 3 ) is based on a S. viminalis × S. viminalis cross and SNP and microsatellite markers. Both maps are considerably more detailed than previously available willow linkage maps, and in contrast to those they are to a large extent based on SNP markers with a genome-wide distribution for which the corresponding positions in the poplar genome are known. It was therefore possible to investigate the degree of synteny and gene-order conservation across a significant part of the two Salicaceae genomes by comparing the willow linkage maps with the physical map constructed from the poplar genome sequence. This is a useful approach for studying genome evolution between less well studied species and model plant species with sequenced genomes. In addition, we estimated the number of substitutions in introns (K i ) between orthologous and paralogous genes in willow and poplar, and used the distributions of these estimates to infer the relative age of the divergence between the lineages leading to Salix and Populus and age of the whole genome duplication shared by these lineages.

SNP markers and genotyping
In total, 426 willow gene fragments containing SNPs were evaluated for genotyping at the SNP Technology Platform (Uppsala University). Of these 426 SNPs, 350 were identified using primers developed for the purpose of this study, representing 309 different genes, hence, a number of genes contained multiple SNPs. 12 SNPs were previously identified in phenology candidate genes (N. Gyllenstrand, personal communication) (Additional file 1) and 64 SNPs (in 58 unique genes) were identified using primers from Hanley et al. (2006). See Additional file 2 for marker information and primer sequences. Homology searches against the poplar genome with the 309 willow sequences developed in this study indicated a most significant hit at the position of the expected target in all but 24 instances (Table 1). In 20 of these cases, the willow sequence was more similar to the paralogous gene copy (paralog 2) than to the orthologous gene expected to be amplified (paralog 1). Furthermore, the positions in the willow genome match the positions of paralog 2 in the poplar genome in all but one case (XI_14om), suggesting that the paralogs were erroneously amplified and sequenced in willow. XI_14om is unique in the sense that based on sequence homology, the willow sequence is most similar to paralog 2 in poplar, however the position in the willow genome does not agree with the position of paralog 2 in poplar. In the other four cases, the willow sequences showed best sequence homology to genomic regions other than the expected (Table 1).
396 SNPs were selected and genotyped in the S 1 and S 3 mapping populations. In a first panel of 384 SNPs conversion rate was 92% (354/384) and the overall sample call rate for the approved SNPs was 97.5%. The reproducibility was 100% according to duplicate analysis of 2.3% of the genotypes. In the second 12 SNP panel, the SNP conversion rate was 83% (10/12) and the overall sample call rate for the approved SNPs was 99.0%. The reproducibility was 100% according to duplicate analysis of 39.5% of the genotypes.

Linkage mapping
After excluding SNPs that were monomorphic or contained null alleles, segregation data for 307 SNPs in 463 individuals were used to construct linkage map S 1 (see Additional file 2). 75 markers were heterozygous only in the female parent, 158 only in the male parent and 74 were heterozygous in both. In addition, 45 microsatellite markers were included, of which 25 segregated in both parents, six were maternally informative and 14 were paternally informative. Finally, 287 AFLP markers were included, with genotypic data on 89 individuals. Consensus linkage maps were constructed using all 639 markers together. In the LOD Groupings step in Joinmap, markers were grouped at a LOD threshold of 5.0 and 25 linkage groups were identified of which three were duplets and three contained only four markers. The other 19 groups represent the 19 linkage groups in poplar ( Figure 1 and Figure 2). 48 AFLP markers were unlinked and excluded from further analysis. For the markers within each group, map calculations were performed in Joinmap based on recombination frequencies, which is the step when linked markers are placed on linkage groups and ordered relative to each other. For five groups (II, V, XIV, XV and XIX) no consensus LGs could be calculated due to lack of markers segregating in both parents. These groups were split into the parental groups and the markers tested separately within each parental group. A number of markers remained unlinked after the map calculations due to insufficient linkage to the other markers. These markers were tested separately within groups to examine if any of them were linked to each other. A number of small groups were defined in this way and included in the map. Ten of the groups (eight duplets and two with four markers) contained only AFLPs and since they cannot be connected to the poplar genome they were excluded from further discussion. After inspection of LG VI, marker ph27 was removed because of inconsistency in the placement of this marker compared to S 3 and poplar.
In general, the S 1 map without the AFLP markers was in very good agreement with the map including the AFLP markers. However, in LG VIII the gene order differed slightly when constructing the map with or without the AFLP markers. Since the gene order in this linkage group without the AFLP markers was in better agreement with the gene order in the poplar genome as well as with the S 3 map, we present LG VIII without AFLP markers in Figure 1 (all other LGs in Figure 1 and Figure 2 include AFLP markers). Furthermore, II-24_sa on LG II was not mapped when no AFLP markers were included. 88 markers in map S 1 showed distorted segregation ratios (P < 0.00001). XI_21_sa and R_66_sa were removed because they were distorted and had a high number of missing genotypes. The consensus S 1 linkage map with 495 linked markers aligned to the poplar physical map is shown in Figure 1 and Figure 2. The map spanned 2477 cM with an average interval between the markers of 5.0 cM. The linkage groups A to F and the female linkage groups present on LGs II, V, XIV, XV and XIX, where no consensus maps were constructed are not included in the total map length.
214 SNPs genotyped in 282 individuals were used to construct the linkage map S 3. 94 SNPs segregated in the female parent only, 79 in the male parent only, and 41 in both parents. 41 microsatellite markers were also included, 21 of which segregated in both parents, 13 were maternally informative and seven were paternally informative (Additional file 2). Consensus linkage maps were constructed using all 255 markers. For map S 3 , marker groupings at a LOD threshold of 4.0 identified 27 linkage groups and seven markers remained unlinked. Similarly to S 1 , some markers that did not map to the other markers within linkage groups due to insufficient linkage were tested separately. Groups Ia, Ib, V, VII, VIII, IX, XII, XIV, XV, XVII and XVIII are represented by more than one linkage group in S 3 , and they were placed on the map based on positional information in S 1 (Additional file 3). The most plausible reasons why we were unable to detect linkage between the small groups and other linkage groups is likely an effect of the small mapping population and limited number of markers, and as a consequence markers flanking the gaps can be too far apart, contain too little information, or are separated by a region of high recombination rate. Distorted segregation ratios (p < 0.00001) were found for 38 markers. R_24_sa and XVI_5_pIII form a separate duplet that cannot with certainty be connected to any LG. The same applies to the group containing SB945 and VI_8_sa_pIII.
There are no major rearrangements distinguishing the two willow maps, although there are a few apparent discrepancies in marker order on some LGs. Most differences involve closely spaced markers located on small groups or at the distal ends of groups, where the exact positioning of markers is uncertain. In several cases there was a swop in the positioning of two markers with tiny genetic distances, most likely an effect of the small size of the S 3 mapping population. The S 3 consensus map, excluding the small groups S 3 A and S 3 B and the groups S 3 Ia-2 and S 3 V-2 that could not be placed in groups Ia and V respectively, consists of 221 markers Table 1 Markers representing cases when the willow sequence shows best sequence homology to the poplar paralog (paralog 2) or to some other genomic region in poplar other than the expected (paralog 1).  Figure 1 The first eight linkage groups (LGI -LGVIII) of the consensus S 1 linkage map aligned to the poplar physical map. The willow consensus S 1 linkage map aligned to the poplar genomic sequence (prefixed with P) based on SNP, microsatellite and AFLP markers with approximate positions of markers given in base pairs (bp) and centimorgans (cM) respectively. Loci that may indicate syntenic disparities are underlined. Markers showing segregation distortion are indicated by asterisks on the willow map (one asterisk: P < 0.1, two asterisks: P < 0.05, three asterisks: P < 0.01: four asterisks: P < 0.005, five asterisks: P < 0.001, six asterisks: P < 0.0005, seven asterisks: P < 0.0001). Three inversions are indicated by boxes.

Figure 2
The last eleven linkage groups (LGIX -LGXIX) of the consensus S 1 linkage map aligned to the poplar physical map. The willow consensus S 1 linkage map aligned to the poplar genomic sequence (prefixed with P) based on SNP, microsatellite and AFLP markers with approximate positions of markers given in base pairs (bp) and centimorgans (cM) respectively. Loci that may indicate syntenic disparities are underlined. Willow linkage groups S 1 A to S 1 F that could not be aligned to the poplar genome are also shown. Markers showing segregation distortion are indicated by asterisks on the willow map (one asterisk: P < 0.1, two asterisks: P < 0.05, three asterisks: P < 0.01: four asterisks: P < 0.005, five asterisks: P < 0.001, six asterisks: P < 0.0005, seven asterisks: P < 0.0001). Two inversions are indicated by boxes.
aligned to the S 1 map and is shown in Additional file 3. S 3 spanned 1793 cM with an average interval between the markers of 8.1 cM. We could link nine previously unmapped scaffolds in poplar to linkage groups in the willow genome (scaffold 64 and 134 to LG Ib + XVI, scaffold 88 to LG V, scaffold 132 to LG VIII, scaffold 899 to LG XI, scaffold 142 to XIII, scaffold 40 to LG XIV and scaffold 82 and 122 to LG XV). We were also able to merge some groups that were unlinked in Hanley et al. (2006) (IIIa and IIIb, VIb and VIc, VIIb and VIIc, IXa and IXb, XIIIa and XIIIb, XVIIa and XVIIb, XIXa and XIXb).

Comparative mapping
We found two major interchromosomal rearrangements distinguishing the karyotype of willow from poplar, involving poplar LG I and XVI and willow LG Ia and Ib. Markers located on poplar LG I are in willow placed on two different LGs; willow LG Ia aligns to the first part of poplar LG I, from 0.3 Mb to 14.8 Mb (Figure 1) and willow LG Ib aligns to markers on the second part of poplar LG I, from 16.0 Mb to 35.3 Mb. In addition, markers located on poplar XVI are in the willow map linked to LG Ib. These data support either a fusion resulting in poplar LG I or a fission resulting in willow LG Ia and Ib sometime since the divergence of the two lineages. Similarly, either a fission in the poplar lineage forming poplar LG I and XVI or a fusion in the willow lineage forming willow LG Ib must have happened since the divergence of the two species (Figure 1).
Seven markers in the willow map should based on sequence homology be true orthologs, but the position of the markers in the willow linkage map differs from the corresponding position in the poplar genome. These cases could possibly represent cases of translocation of limited chromosomal segments (Table 2).
Generally, gene order within syntenic groups is very well conserved in the willow -poplar comparison.
However, the data indicate a few differences between the two genomes, five of which are inversions involving several markers each. Three of these are located on willow LG Ib ( Figure 1) and correspond in size to~1. 6 -5.4 Mb in the poplar genome sequence. In poplar, these regions correspond to parts of LG XVI and I. The other two supported inversions are located on LG XI and XVIII and involve roughly half of the total lengths of the linkage groups (6.9 Mb on LG XI and 6.3 Mb on LG XVIII) (Figure 2). For LG XI, the inversion is not evident in the S 3 map, which shows the same marker order as poplar, while the inversion on LG XVIII cannot be detected in the S 3 map because the markers involved are located on a separate linkage group so the gene order cannot be determined. Apart from these supported differences there are a number of suggested gene order differences between the willow and the poplar genomes, it is however uncertain whether or not there are true differences or errors in either the poplar genome assembly or the willow linkage maps. The assembly of the poplar genome is not complete and most likely the genome sequence as it is presented today contains numerous gaps.

Sequence analyses
For 115 gene fragments where three sequences were present (willow, poplar paralog 1 and poplar paralog 2) alignments were constructed and used to estimate pairwise K i between orthologous loci in willow and poplar, paralogous loci in willow and poplar and paralogous loci in poplar (Figure 3). The willow sequence, the expected part of poplar paralog1 to be amplified and the total genomic sequence of poplar paralog 2 and default parameters were used to construct the alignments. A total of 48 kb genomic DNA sequence were used for estimations of K i . As can be seen in the distributions of orthologous K i (Figure 3), there are sequence pairs with high K i values (> 1). This is likely a result of erroneous Table 2 Markers representing cases where the willow sequence shows best sequence homology to the expected location in the poplar genome (paralog 1), but with a different position in the willow map than the expected, possibly representing translocations of small chromosomal segments between the genomes.

Discussion
Based on our large sample of sequence based markers and large pedigrees we have studied genome evolution in species from two tree genera, Salix and Populus by constructing linkage maps in Salix and aligning them to the Populus trichocarpa genome sequence. The large number of SNP markers with known location and even distribution in the poplar genome allowed the first detailed comparison between the two genomes. We have used the physical information from the poplar genome to approximate the proportion of the willow genome covered by our maps. The size of the poplar genome has been estimated to 485 ± 10 Mb of which approximately 410 Mb has been assembled [9] and approximately 310 Mb have been assigned to linkage groups according to the poplar genome website. Our S 1 linkage map covers about 280 Mb of the poplar genome or 90% of the DNA sequence assigned to linkage groups. Most likely this fraction (the assembled and assigned genomic sequence) of the poplar genome is mostly euchromatic DNA including the majority of transcribed genes. As much as 40% of the whole poplar genome consists of repetitive elements [29] and we assume that the majority of the unassembled genome sequence is heterochromatic DNA. This means that our maps cover most of the genic regions of the genome with less coverage in the heterochromatic regions. The willow markers form 19 major linkage groups, as expected based on the haploid chromosome number of 19. Aligning the willow linkage map to the poplar genome, we find strong support for two interchromosomal rearrangements since the divergence of the two lineages. One rearrangement involve poplar LG I and willow LG Ia and Ib, and is either a result of a fusion in the poplar lineage or a fission in the willow lineage since the divergence from a common ancestor. The second rearrangement involve poplar LG I and XVI and willow LG Ib and is either a result of a fission in the poplar linage or a fusion in the willow lineage sometime since the divergence from a common ancestor. In support of our observations, Hanley et al. (2006) identified two groups in their willow map aligning to poplar LG I, but in that study, the willow LG aligning to poplar LG XVI was not linked to willow LG Ib as was the case in the present study. The correspondence between LG Ia and Ib in Hanley et al. (2006) and Ia and Ib in the present study is supported by three common markers on Ia and seven common markers on Ib. The most plausible reason why we found support for linkage of willow Ib and poplar XVI is the significantly higher number of SNP markers in the present study while the majority of the markers in Hanley et al. (2006) were AFLPs.
Three of the five inversions we found support for involved poplar LG I and XVI and willow LG Ib. It thus appears that poplar LG I has been exposed to more changes, both inter-and intrachromosomal than any other linkage group. The analysis of the poplar genome [9] revealed that poplar LG I is a result of multiple rearrangements involving three tandem fusions, rearrangements that seem to have taken place before the evolution of modern poplar species as revealed by colinear genetic maps among multiple poplar species [9]. These rearrangements possibly took place during the genome-wide reorganization and diploidization following the whole-genome duplication (the Salicoid event) that took place before the divergence of Salix and Populus but after the split from Arabidopsis [9]. So, it seems that poplar LG I is the result of several changes occurring before the split of willow and poplar, and our results show that this linkage groups has been subjected to an elevated rate of rearrangements also after the divergence of Salix and Populus. One distinct feature of poplar LG I is that it contains significantly less euchromatin than any of the other 18 chromosomes, hence it is richer in heterochromatin and possibly repetitive DNA [9]. Since it is known that repetitive elements can trigger rearrangements this is one possible explanation to the many rearrangements involving LG I. It is also by far the largest LG in the willow map (279 cM). The same applies to its physical length, poplar I and XVI is together at least 42 Mb long (according to the poplar genome), which is much longer than any other poplar LG, in fact the second largest (LG II) is only~23 Mb long (also according to the poplar genome sequence).
In addition to the above mentioned major rearrangements, seven markers had positions in willow and poplar that may reflect syntenic disparities between the genomes. They likely represent six different events of rearrangements because in two instances, markers are located next to each other and most likely reflect the same rearrangement events (II-23_sa and II-37_sa_pI are located next to each other). In summary, a surprisingly small numbers of markers show evidence for rearrangements between willow and poplar, most likely reflecting the stability of the willow and poplar genomes.
Using a divergence time of 45 my and counting the number of rearrangements between the poplar physical map and willow linkage maps (one fission, one fusion, five intrachromosomal inversions and six single markers that were located on different linkage groups in willow and poplar), we end up with a rate of 0.14 rearrangements/my/genome. Comparisons of these figures with estimates from other species should be done with caution as they depend strongly on the resolution of the map as well as the accuracy of the divergence estimates. With this in mind we can conclude that the rate of rearrangements in willow and poplar is at the lower end compared to estimates in other plant species, (0.17 -0.6 rearrangements/My) [30,31]. The relatively few chromosomal rearrangements and greatly conserved gene order makes the use of poplar genomic resources for genetic and genomic works in willow very promising. This is of particular relevance for the identification of genes in willow QTLs with the help of the poplar genome.

Sequence analysis of orthologus and paralogous loci in willow and poplar
Pairwise substitution rates in introns, K i , between orthologous and paralogous loci in willow and poplar were estimated in 115 gene fragments covering a total of 47966 nucleotides. K i between willow and poplar orthologs were approximately three times lower (median: 0.12) than between paralogs (willow -poplar paralogs: 0.42, poplar paralogs: 0.37). The two latter estimates did not differ significantly, suggesting that the two lineages have similar evolutionary rates since their divergence. The large difference in substitution rates between orthologs and paralogs was unexpected as the genome duplication and the divergence of the two genera have been suggested to have occurred within a short time span some 60 -65 mya [9,32]. Our data clearly dispute this conclusion and instead suggest that the genome duplication occurred much earlier than the divergence of the two lineages.
According to interpretations of available fossil data, both Salix and Populus occur in middle Eocene sediments that are around 45 million years old [7,8]. Using this time as a proxy for the separation of the two lineages and K i = 0.12, the rate of substitution (r) was estimated to 1.28 × 10 -9 per site and year (T = K i /2r). Assuming that this substitution rate has been constant since the Salicoid duplication yields an estimated date of the genome duplication some 150 mya. This date is obviously much older than previous estimates, and is not consistent with other data showing that the Salicoid duplication is not shared with Arabidopsis [9]. Available data suggest that the lineages leading to Arabidopsis (Eurosids II) and Salicaceae (Eurosides I) separated 100 -120 mya [33]. Assuming instead that the duplication occurred soon after the divergence of Eurosids I and Eurosids II at 110 mya yields an estimate of the divergence between Salix and Populus of around 40 mya. Considering the uncertainty in the fossil record of Salicaceae [8,34] such a date is perhaps not inconceivable. Dating of duplication and divergence events is notoriously difficult due to uncertainties both in the fossil data and substitution rates. Acknowledging these uncertainties, our data still suggests that the Salicoid genome duplication occurred considerably earlier than previously suggested. In a recent study, the poplar genome duplication was estimated to have occurred 45 mya, although it was acknowledged this was an underestimation of the timing due to the much slower substitution rates in poplar compared to other species [35].
Our data indicate substitution rates in Salicaceae between 1.28 × 10 -9 and 1.68 × 10 -9 per site and year depending on choice of calibration date, which is close to estimates for conifers (0.7 -1.3 × 10 -9 [36]), but considerably lower than for annual plants with a reported range of 5 -33 × 10 -9 [37][38][39]. These results are in line with the generally lower rates of substitutions estimated in perennials as opposed to annuals [40].

Conclusions
In this study we have developed a large number of new SNP markers in willow by using the poplar genome sequence as a reference and genotyped them in two large mapping populations. Besides providing insights into the evolution of Salicaceae genomes, our maps will play a fundamental role in our quest to identify QTLs and candidate genes for traits important for biomass production, e.g. growth related traits, resistance to pathogens, drought tolerance and phenology (bud burst and cessation of growth). This is a prerequisite for the advancement of breeding through early marker-based selection. Although the genomes of willow and poplar appear well conserved, we have identified several previously unknown genome rearrangements. In addition, we have used a large data set of non-coding DNA sequences to estimate substitution rates in Salicaceae, which were similar to estimates found in other perennials but much slower than rates found in annuals. A conclusion is that the genomes evolve slowly both at the genomic levels and at the nucleotide level. We have also shown that the Salicoid genome duplication must have taken place much earlier than the separation of the two lineages, two events that were previously thought to have occurred within a short evolutionary time. Our data also highlights the difficulties in dating ancient whole genome duplication events, and that caution is necessary when using such estimates to draw conclusion about the evolutionary consequences of polyploidy.

Plant material and DNA extraction
The mapping population S 1 with 463 F 1 offspring was created by crossing a diploid hybrid male clone ('Björn'; Salix viminalis × Salix schwerinii) to a diploid S. viminalis female clone (78183). The S. schwerinii parent to Björn originates from Siberia while the S. viminalis parent as well as 78183 originates from southern Sweden. Initially, the parental clones to S 1 were selected based on variation in phenology traits, but have also shown variation in rust resistance (Melampsora spp.) and different growth traits.
The mapping population S 3 was made by crossing the diploid S. viminalis male clone (81084) with the diploid S. viminalis female clone (78195) to produce 282 F 1 offspring. 78195 originates from Southwest of Sweden and 81084 originates from Southeast of Sweden. The parental clones to S 3 were selected based on variation in gall midge resistance.
Genomic DNA was extracted from frozen young leaves with the FastDNA Kit (MP Biomedicals) according to the protocol provided with the kit and DNA concentrations were determined with a Nanodrop spectrophotometer (Nanodrop Technologies).

SNP marker development and genotyping
SNP primers (Additional file 2) were designed to amplify fragments of genes at distances of approximately every 1 Mb on each linkage group (n = 19) of the poplar genome (The poplar genome website: http://genome.jgi-psf. org/Poptr1_1/Poptr1_1.home.html). In order to minimize amplification of multiple loci, genomic regions with little or no homology to other regions were selected. In effect, this means that single copy loci were favoured over those with multiple copies. Primers were positioned in exonic regions and designed to amplify across introns of approximately 500-1000 bp. Primers were also positioned to amplify parts of candidate genes for drought tolerance [41], rust [42][43][44] and insect resistance [45] (Additional file 1 and 2). 698 primer pairs were designed for the purpose of this study and PCR reactions were run with DNA from the parents of the mapping populations, of which 456 (65%) gave clean PCR products suitable for sequencing. 79 primer pairs were taken from Hanley et al. (2006) and tested in the parents as well, of which 70 resulted in clean PCR products. The PCR products were directly sequenced at Macrogen Inc. (Macrogen, Seoul, South Korea).
The primers were tested in each of the four parents in 15 μl volume PCR reactions containing 10 ng genomic DNA, 1 × PCR buffer II (Applied Biosystems), 2.5 mM MgCl 2 (Applied Biosystems), 0.2 mM dNTP mix (Fermentas), 0.5 μM of each primer (Invitrogen), and 0.5 U AmpliTaq Gold DNA polymerase (Applied Biosystems). The PCR reactions were run on a MyCycler thermal cycler (Bio-Rad Laboratories) with a PCR profile consisting of 10 min denaturation at 95°C followed by 35 cycles of 30 s denaturation at 95°C, 30 s annealing at 55°C and 1 min extension at 72°C with a final 10 min 72°C step. Amplification success was determined by agarose gel electrophoresis. For the first 96 primer pairs, a poplar sample (Populus trichocarpa) was also included to test if failure to amplify a product in willow was due to differences in nucleotide sequence between willow and poplar and not a result of unsuccessful primer design.
The willow PCR products were cleaned with 1 μl of a mixture of Exonuclease I (New England BioLabs) and Shrimp Alkaline Phosphatase (SAP) (Fermentas) for every 5 μl of PCR product before they were sent for sequencing of both strands with the forward and reverse PCR primers. Sequences were aligned using the DNA Baser V. 2 software. Contigs were then examined for the presence of informative SNPs for mapping. The SNPs were genotyped in the S 1 and S 3 mapping populations using the Golden Gate Assay [46] from Illumina (San Diego) at the SNP Technology Platform, Uppsala University http://www.genotyping.se. We applied a nomenclature for the SNP markers with roman letters referring to the poplar linkage group the primers were designed to amplify, followed with an arbitrary number and the letters 'sa'. In case markers were informative only in one of the mapping populations or different SNPs were used in S 1 and S 3 populations, the suffix I or III was added to the name.

Microsatellite and AFLP markers
PCR primers for microsatellite loci were taken from literature [47][48][49][50][51] and tested in the four parents (for details of the markers see Additional file 2). The PCR products were genotyped to check for levels of polymorphisms. The microsatellite loci selected for mapping were multiplexed and separated on ABI3730XL instruments at the Uppsala Genome Center. The majority of the microsatellite markers were PCR amplified by the Uppsala Genome Center, although some were amplified in-house. The in-house PCR reactions were run according to the protocol as described above for the SNP markers. The PCR reactions run at the Uppsala Genome Center were amplified in 10 μl volumes containing 10 ng genomic DNA, 1 × PCR buffer, 5% DMSO, 0.2 mM dNTP mix, 0.02 μM forward primer, 0.2 μM reverse primer, 0.2 μM M13-tail, 0.005 U Hot Star DNA polymerase and 1.25 μg BSA. The touchdown PCR profile consisted of 5 min denaturation at 95°C, six cycles of 30 s denaturation at 94°C, 30 s annealing at 58 -53°C (-1°C per cycle) and 30 s extension at 72°C, seven cycles of 30 s denaturation at 94°C, 30 s annealing at 53 -50°C (-0,5°C per cycle) and 30 s extension at 72°C, 30 cycles of denaturation at 94°C, 30 s annealing at 50°C and 30 s extension at 72°C and finally extension at 72°C for 7 min. Microsatellites were scored using the Peak Scanner software V 1.0 (Applied Biosystems). The genotypic data from an AFLP analysis was included from an earlier study on a subset of 89 individuals from the S 1 cross [21].

Map construction
Map construction was performed with the JoinMap 3.0 software [52]. Linkage maps were constructed using a LOD (logarithm of odds threshold) of 5.0 for map S 1 and a LOD of 4.0 for map S 3 to determine marker groupings. The LOD scores were selected based on the number of groups formed for each LOD score. The Kosambi mapping function was used for map construction with the following JoinMap settings: Rec = 0.4, LOD = 1.0, Jump = 5. The resulting linkage maps were drawn using the MapChart 2.1 software [53]. We did not use the ability of the software to force problematic markers on to the map and used the map resulting after the second round of map building. Mendelian segregations of the different markers were tested with chisquare analyses. We constructed S 1 maps both with and without the AFLP markers but present the map including AFLPs.
The locations of the SNP markers developed in this study (markers with prefix 'sa') on the physical map of the poplar genome were placed according to the positions of the genes where the primers were designed. Since we did not have access to the target positions in the poplar genome of the markers taken from Hanley et al. (2006), the positions of these markers on the poplar physical map were determined by BLAST searching the poplar genome with the willow gene fragments.

Database searches and Sequence analyses
BLAST searches against the Populus trichocarpa genome database on the NCBI web site were performed with the willow gene fragments that had been sequenced for the purpose of SNP identification in this study. BLAST searches were also performed with the poplar genomic sequences of the genes where the primers were positioned (paralog 1), hence the expected ortholog to the willow sequences. Usually we found two strong hits, of which we assumed the second best BLAST hit was the paralogous gene copy resulting from the Populus whole genome duplication (paralog 2). Since when designing the primers, genes or genomic regions with multiple copies were avoided as much as possible, a second BLAST hit was many times not present. When a second BLAST hit was present, the positions of the gene fragment of the willow sequence, the poplar paralog 1 and 2 in the poplar genome were compared, to determine if the expected orthologous gene fragment had been amplified in willow or, if the paralog had been amplified.
When three sequences were present for a gene-fragment; i.e. the willow sequence, the poplar paralog 1 and paralog 2, alignments were constructed using Clustal W [54] in the Alignment Explorer tool in MEGA4 using default parameters [55]. With the aim of comparing the number of substitutions in introns (K i ) between orthologs and paralogs, pairwise K i was estimated in orthologous loci between willow and poplar, paralogous loci between willow and poplar and paralogous loci in poplar using MEGA4 with the maximum composite likelihood method. When the willow gene-fragment showed higher sequence homology to poplar paralog 2 and the location in the willow map correspond to the position of poplar paralog 2, willow and paralog 2 were regarded as orthologs. Frequency distributions of K i values for each pairwise comparison are presented in Figure 3. K i values were statistically compared between the three groups by non-parametric Mann-Whitney U tests in Mini-Tab 15. Medians and means (± standard error of the means) are presented.