In this study, RAD-Seq has been used to discover, verify and genotype novel genetic markers in pedigreed Atlantic salmon. By targeting individuals of known disease resistance phenotype and genotype at a major QTL, we discovered and scored novel QTL-linked SNPs with flanking sequence. The use of pedigreed (parent and offspring) samples allowed us to examine segregation of RAD markers, linkage patterns, and to distinguish RAD loci containing putative SNPs from those containing putative paralogous sequence variants (PSVs). The outcomes of the study include a new SNP resource for Atlantic salmon, high-coverage sequence data at sites dispersed throughout the genome, improved knowledge of a genome region harbouring a QTL of major importance to salmon aquaculture and improved population LD-based genetic tests for resistance to IPN.
The RAD library sequence data were analysed with the RADtools pipeline . In this method, unique RAD reads are filtered based on quality score and clustered into RAD loci based on sequence similarity within and across individuals. Further analyses of the data defined putative SNPs and PSVs within RAD loci and examined the segregation patterns of alleles within these loci by looking at presence or absence of alleles in individual animals using methods similar to those used by Baxter et al.. These analyses were suitable for our main goals; the thresholds we chose for defining RAD loci and for distinguishing genuine segregation patterns from fluctuations in read counts were empirically derived and conservative. Genotypes in our dataset were defined as ‘presence’ or ‘absence’ of a RAD allele, and as such the RAD markers were effectively acting as dominant markers. Although we did not attempt it in our study, it may be possible to use the fragment count data to differentiate homozygous and heterozygous genotypes, or to identify putative multisite variants based on an excess of one particular allele. Indeed, the recently published software pipeline ‘Stacks’ also detects and genotypes SNPs in short-read sequence data, and uses a maximum likelihood algorithm to call heterozygous and homozygous genotypes based on read counts . This software has recently been used to create linkage maps in the spotted gar . As RAD-Seq continues to develop as a means of genotyping by sequencing, the analysis pipeline is likely to become increasingly robust, standardised and automated, which will broaden its utility and improve consistency.
A notable outcome from our analyses of the most frequent patterns of segregation was the degree of clustering of sire-based segregation patterns (Figure 2). In one family, 82% of sire segregating RAD markers clustered into the 58 most frequent presence/absence patterns (i.e. two pairs of 29 mirror patterns) which correspond to the number of chromosomes in European Atlantic salmon, without similar clustering in a dam-based analysis. The remaining patterns of segregation may represent male recombination, but are also likely to include patterns that are artefacts due to sequencing errors or false negative allele nulls due to read coverage fluctuation for example. It is well-established that recombination rate is low in regions of the male salmon genome [5–8], and the current data are consistent with an absence of recombination over much of the genome sampled with SbfI in these families. A similar analysis of dam-based linkage patterns in the diamondback moth assigned approximately 65% of RAD markers to 31 pairs of binary patterns, a species with 31 chromosomes and no recombination in females . The most recent salmon linkage map suggests that the differences in recombination between males and females are mainly due to the location of crossovers, which are thought to generally cluster towards the telomeres in males . Therefore, it is likely that the RAD linkage clusters in the current study correspond to non-telomeric regions where male recombination is very low. It is noteworthy that in the recent study of Miller et al., the vast majority of identified SNPs in their hybrid rainbow trout populations also clustered towards the putative centromeres. However, the physical distances encompassed by these linkage clusters are unknown, and may include the majority of the chromosome. Our segregation data are based on analyses of two families containing 14 offspring each, and further insight into recombination patterns between the RAD markers will require construction of a linkage map in larger families using the SbfI RAD markers.
In both the QTL analysis and the bi-allelic segregation pattern analysis, there were notably fewer RAD markers in family 2 compared to family 1. There were some differences in the sequencing technology used for these libraries, and we examined the quality scores and their drop-off by position in the read for both families. Family 1 offspring had marginally better average sequence quality readings than family 2, but the number of RAD alleles defined, the number of RAD alleles per locus, and the number of SNPs were all reasonably consistent between the two families (Figure 1 and Table 2). In the overall unfiltered RAD dataset, there were 6,594 and 5,985 RAD alleles in family 1 that show sire and dam segregation patterns respectively, versus 5,491 and 7,038 in family 2. Therefore, given that sire heterozygosity and dam homozygosity are required to observe a sire-segregation pattern, the differences could reflect greater homozygosity in the family 2 sire and/or greater heterozygosity in the dam. However, there was zero inbreeding for the four parents of these families, making substantial differences in homozygosity unlikely.
Some dissimilarity in RAD marker clustering was also observed between the two families. For example, the top ranked linkage cluster (ranked by number of observed markers) in family 2 was only the 18th largest cluster in family 1, and RAD alleles in linkage cluster 21 in family 1 were split over two linkage clusters in family 2 (Table 4). While these observations may be due to technical bias, it is also possible that they indicate real differences in the rate and/or position of chiasma formation between the two male parents. The extent and pattern of tetravalent pairing in male salmonids and resultant residual tetrasomic manifestations are considered to be influenced by the degree of similarity among the chromosome complement of individuals. Aberrant segregations are thought to be more common in genomes from crosses between genetically divergent individuals . Furthermore, Robertsonian polymorphisms have been observed between and within Atlantic salmon populations with 2n chromosome number thought to vary between 56 and 58 . Therefore, it is possible that genetic heterogeneity, including possible karyotypic differences within the farm strain could explain some of the differences between the families.
A subset of QTL-linked SNP markers were genotyped at a population level and assessed for linkage and association with IPN mortality. Previous studies by our group  and Moen et al. have mapped the IPN-resistance QTL to a region of linkage group 21 with a confidence interval of 10 cM and 3 cM respectively. In the current study, the genotyped QTL-linked RAD SNPs were spread across a large region of our linkage map (37.6 cM) and the QTL confidence interval was narrowed marginally to 2 cM. In the study of Moen et al., microsatellite marker haplotypes showing population-level association with IPN mortality were identified by establishing the phase-relationship between the QTL allele and the marker haplotype in QTL-heterozygous parent . However, several different marker haplotypes were associated with a particular QTL allele which hinders the practical application of population LD- based selection. Here we demonstrate that a RAD-derived SNP (RAD_HT_01) and a previously published SNP (SSA0139ECIG ) show highly significant population-level association with IPN mortality, implying strong LD between these SNPs and the QTL in the Landcatch Natural Selection broodstock population. We do not know how physically close these SNPs are to the QTL causal mutation, and the level of LD is likely to vary from population to population. The short timescale and cost-efficiency of our RAD-Seq approach highlights its utility for QTL-linked marker generation and fine-mapping. Additional QTL-linked RAD markers can be generated by using a different restriction enzyme, and the RAD approach we applied herein can be applied to map loci affecting other economically important traits.