Next-generation sequencing provides new opportunities for studying the genomics of non-model organisms . For recently diverged taxa connected by gene flow, such as the two subspecies of the willow warbler, genomic differences are expected to be few but informative about adaptive divergence [28, 29]. On the other hand, detecting these differences could require a dense set of markers. The sequencing of a willow warbler transcriptome has vastly increased the genomic resources available for this species. Previous studies on the same populations have used a small number of coding genes or microsatellites [14, 15, 30] or anonymous AFLPs  to study genetic divergence between the subspecies. With the use of the zebra finch genome for annotation, this data set provides a large amount of sequence data that could be associated with particular genes and gene features. As such, it will be a useful resource for future research by highlighting potentially interesting genes or genomic regions and by aiding the design of primers surrounding sequence polymorphisms.
In this study, the vast majority of reads (84%) were matched to the zebra finch genome, but the applicability of cross-species genome mapping is dependent on the similarity between the genomes and is not expected to work equally well among different non-model species. The specific karyotype of the willow warbler has not been determined and it is not known how the arrangement of genes within and between chromosomes differs between the willow warbler and zebra finch genome. Within the order Passeriformes (passerines), to which both the willow warbler and zebra finch belong, a remarkable conservation of genome structure has been reported, with both gene order and gene content on chromosomes being largely unchanged even between distantly related species [31–33]. It therefore seems plausible that the zebra finch genome is a good model for the genome of the willow warbler. However, it is not necessarily identical, since chromosomal re-arrangements have been observed between different species of passerines [31, 32] and a neo-sex chromosome is suggested to have arisen in a subgroup of passerines presumably including the willow warbler . On the other hand, these types of large-scale differences in genome organization have little impact on positions within or in the immediate vicinity of genes and do not therefore cause major differences in annotation at the gene level. Another possibility is that some coding genes found in the willow warbler genome are not present in the zebra finch genome. Of particular concern are genes underlying traits associated with migration, since the zebra finch does not show behavioural adaptations (e.g. migratory restlessness and hyperphagia) that are observed among long-distance avian migrants . However, migratory behaviour has been characterized as a threshold trait in which a non-migratory phenotype is switched into a migratory phenotype when the combined effects of multiple genes reach a threshold [35, 36]. As a threshold trait, it is possible for variation responsible for migratory behaviour to be maintained in a population consisting of only resident individuals . It is therefore likely that migration genes are also found in the zebra finch genome even if they do not manifest themselves into a distinct migratory phenotype in contemporary zebra finch populations.
As expected from the enrichment of mature mRNA in the sample preparation steps, the greatest sequencing depth was observed in exons (Table 2). Unexpectedly, however, alignments in intronic features were common and, compared to alignments in exons, spread out over more positions in total in the zebra finch genome. This could be explained by that the enrichment step might not have been perfect and included unspliced mRNA . More than half of the positions in the zebra finch genome with aligned willow warbler reads were situated in intergenic regions, but the sequencing depth was lower than within annotated genes. As found in a transcriptome sequencing study of the great tit Parus major, most of the intergenic positions were located close to predicted genes in the zebra finch genome (Figure 1). A possible explanation is that a majority of the positions could be situated in uncharacterized parts of UTRs. More distantly situated alignments could potentially be situated in genes that have not been annotated in the zebra finch genome. Positions in putative UTRs are of importance because they add more sequence data to genes and thus make it possible to identify more sequence polymorphisms. Additionally, in some cases they constitute the only aligned positions associated with a certain gene.
A large number of SNPs were detected in the sequence data, which were located on all but a handful of small chromosomes. The difference in SNP density observed across features is likely to be associated with their different evolutionary constraints. For example, exonic positions had a lower density of SNPs than what is found in both intronic and intergenic positions (Table 2). Overall, the SNP density is lower than found in a previous sequencing study of the subspecies . A likely explanation is that the sequencing depth in general is too low to include many of the rarer alleles in the pools. This explanation is supported when restricting analyses to SNPs with a greater sequencing depth. For example, if only positions with a sequencing depth of at least 60 reads are included, the SNP density is nearly doubled and more similar to previous estimates.
Due to the construction of the cDNA library it is impossible to trace the sequencing efficiency of each individual and this could bias the genetic differentiation estimate (DI) for each SNP. In order to get a more unbiased estimate, we filtered SNPs by their minimum sequencing depth in each of the pools. Assuming that all or most individuals in each pool have similar expression levels of transcripts, and that these pooled transcripts are randomly sequenced, increasing the sequencing depth makes it more probable that more individuals are represented among the sequence reads. However, larger expression differences between individuals would result in a higher probability for transcripts from certain individuals to be sequenced. In this case, filtering by sequencing depth may not necessarily provide a more unbiased estimate. Nevertheless, the general distribution of DIs of SNPs between the subspecies pools (Figure 2, Table 3) agrees with the low background differentiation and few genetic differences previously reported between the subspecies [14, 15, 19]. For example, with a minimum requirement of eight reads from each subspecies pool, only 55 out of 84,847 SNPs had a DI ≥ 0.9 (Figure 2, Table 3). Of these, four are located within a chromosome region that was previously shown to be highly differentiated between northern and southern willow warblers in Scandinavia . Eight of the 14 remaining highly differentiated SNPs that were validated were also differentiated between the subspecies in an independent set of individuals originating from southern and northern Sweden (Table 4). The difference in DI between the 454 data and the validation data was generally smaller with an increased sequencing depth in both of the pools in the 454 data set (Figure 3). This suggests that filtering SNPs by the sequencing depth provides an efficient way of accounting for sequencing bias of individuals in the pools. The validation set included a particularly interesting SNP that is situated close to the ADCYAP1R1 gene. This gene encodes a membrane receptor that binds to the product of the ADCYAP1 gene , which has been shown to explain some of the migratory behaviour observed within and between European blackcap populations . Even though not all individuals in the validation set could be successfully genotyped for this particular SNP, there was no difference in allele frequency between the genotyped samples.
The identified genetic differences between southern and northern Swedish willow warblers could also reflect adaptations to different environments. The only obvious large environmental and ecological contrast within the sampling area is between the Scandinavian mountains and the rest of Scandinavia. In Sweden, only the northern subspecies occurs in the mountains and some adaptations to this drastically different environment are expected. Indeed, two alleles of an earlier identified genetic marker show a distribution that is strongly associated with the different environments . To address this question, we also genotyped eight individuals caught in Lithuania for SNPs that showed a moderate to high differentiation between southern and northern Swedish samples in the validation set. Willow warblers in Lithuania belong to the northern subspecies (acredula) and express the same migratory behavior as northern birds in Scandinavia. The environment, however, is more similar to what is found at the same latitude in Southern Sweden. If the highly differentiated SNPs identified in this study were associated with adaptations to the different environments of southern and northern Sweden, we would expect the Lithuanian birds to have genotypes much more similar to birds in Southern Sweden. In contrast, we observed comparable levels of differentiation for SNPs between southern Swedish and Lithuanian samples as between southern and northern Swedish samples. This corroborates the hypothesis that the identified genetic differences are associated with the subspecies in general and are potentially linked to adaptations involved in their different migratory strategies.
The majority of the highly differentiated SNPs, including most of the SNPs verified to be differentiated in the validation set, are not located in the protein coding part of genes, but in intergenic regions (Additional file 2: Table S2). Although some of these SNPs themselves might be directly under divergent selection, they are more likely to be differentiated because they are in linkage disequilibrium (LD) to divergently selected variation in the closest gene or even in genes further away on the same chromosome. The requirement of a minimum sequencing depth in each of the pools used for estimating differentiation reduces both the number of SNPs within genes and the number of genes containing SNPs. For example, the almost 85,000 identified SNPs with a minimum sequencing depth of eight reads in each of the pools are located in at least 2,469 predicted genes. Including positions overlapping multiple genes and positions 2000 bp upstream and downstream of the genes, the total number of genes is increased to 3,642, which still is only a fraction of the number of genes covered by the total set of sequence data. Hence it is possible that many sequence variants are missed because of insufficient sequencing depth. In addition, LD has previously been shown to extend over several Mb in the willow warbler. Using the zebra finch genome as a basis for gene order, Lundberg et al. identified a region on chromosome 3 that was highly differentiated between willow warblers in the Scandinavian mountains and the rest of Fennoscandia. The chromosome region is significantly more differentiated than an estimated genomic background level for at least a 2.5 Mb interval in the zebra finch genome that contains at least 24 coding genes. In the present study, SNPs that also were highly differentiated in the validation set clustered in regions comprising 8.5 Mb on chromosome 1 and 2.5 Mb on chromosome 5 in the zebra finch genome (Table 4). Since the distances are based on positions in zebra finch genome, they should only be regarded as rough approximations of those found in the willow warbler genome and in reality these SNPs might be located closer or further away from each other. Potentially large divergent chromosome regions could be formed if selection is reducing gene flow between chromosomes possessing alleles that are favourable in different environments, [24, 28, 41]. Reduced gene flow could also be facilitated by an inversion, which could maintain favourable allele combinations despite gene flow [23, 42, 43].
With the present data set it is possible that most of the large to moderate-sized divergent chromosome regions between the subspecies could be identified, but in order to detect more genomically localized differences, a denser set of markers would be required. This has been shown in a recent study of three-spined sticklebacks Gasterosteus aculeatus, in which full-genome re-sequencing identified a number of more genomically localized signals of divergent selection that had not been detected in a previous genome scan employing 45,000 SNPs originating from Restriction-site Associated DNA (RAD) tags . Another limitation of the present study design is that the low number of individuals in each of the pools only allow for detection of SNPs that are highly differentiated between the subspecies. Smaller allele frequency differences could be expected if a locus is more loosely linked to a gene under selection, if weaker divergent selection is acting on the trait  or if the genetic architecture of a trait is primarily composed of many loci with small effects .
Future work will aim at validating the remainder of the highly differentiated SNPs in other individuals from the same populations of willow warblers. If these SNPs also are found highly differentiated in a validation set, their allelic distribution will be investigated over a larger geographical scale to determine how well it follows the distribution of the subspecies. Putative divergent chromosome regions will be more finely mapped to get a better approximation of their size and gene content. When the differentiated regions have been properly delimited, efforts should be focused on identifying the actual targets of selection. This process will be aided by the integration with data derived from a microarray expression profiling study performed on the same set of samples (Boss et al. in prep). A particularly interesting analysis would be to compare the genomic position and the functional annotation of differentially expressed genes between migrating individuals of each subspecies with to those of genes found within the differentiated chromosome regions.