RAD sequencing and representation of the B. napus genome
In our study, 113,221 RAD clusters were sequenced (Table 1). With a length of 102 bp per RAD cluster after trimming and the fact that only a very limited number of pairs of restriction sites of KpnI occur within 102 bp, our results indicate that we obtained sequence information from eight genotypes which represents about 1% of the B. napus genome. So far, comparative sequencing of such a large part of the B. napus genome has not been performed in more than two genotypes.
To increase the genome representation through RAD sequencing approaches in B. napus further, it might be useful to apply paired-end sequencing methods. Also, sequencing libraries constructed with restriction enzymes cutting more frequently than KpnI would result in a higher genome representation. Alternatively, more than one library prepared with different restriction enzymes could be sequenced on separate lanes of a flow cell to improve the read yield. The latter two approaches have been demonstrated to be useful in sorghum , and explain why a substantially larger percentage of the genome could be targeted by RAD sequencing in that study. However, when using more than one enzyme and sequencing a number of different libraries, the necessity for enzyme-specific adapters will increase the total costs. This problem can be circumvented by using double digest RAD sequencing , where DNA is digested with different restriction enzymes simultaneously.
Genome-wide distribution of RAD polymorphisms
The RAD clusters detected in our study for B. napus were tested for their presence in the known B. rapa genome sequence. We found only about one third of the RAD clusters in the B. rapa sequence, and slightly less in the B. rapa chromosome data (Figure 6(a)). The B. rapa genome has been reported to have a size of about 529 Mbp [15, 16], whereas the size of the B. oleracea genome has been described to be larger, namely 599-696 Mbp [1, 15]. Hence, it is expected that the number of RAD clusters mapping to the B. oleracea genome is higher compared to the number of RAD clusters mapping to the B. rapa genome, given that the G/C content, which influences the number of restriction fragments and consequently also the number of RAD clusters, is identical for the two species. Results from earlier studies show that this is the case, as they found that the G/C content was 35.4% in B. rapa, and 36.0% in B. oleracea. Despite the aforementioned genome size of B. rapa, the published B. rapa reference sequence  has a size of 284 Mbp. We therefore conclude that a considerable part of the B. rapa genome sequence is not part of the reference sequence, which, together with the smaller genome size when compared to B. oleracea, explains the fraction of RAD clusters we found in the B. rapa sequence. In addition, the BLAST searches might have been affected by matches against low complexity regions.
We observed an equal distribution of RAD clusters across the B. rapa chromosomes (Figure 7). Our finding suggests that we can expect a similar distribution of RAD clusters with regard to the chromosomes of the unknown C genome. This result in turn suggests that polymorphisms detected from RAD clusters are also uniformly distributed across the B. napus genome, which makes them an important resource not only for GWAS but also other applications like high-resolution linkage mapping.
Polymorphism detection and genotyping
In the examined eight B. napus inbreds, we observed for 113,221 RAD clusters a total of 20,835 SNPs and 125 InDels (Table 1). Considering that for SNPs, 82 bp (positions 7-88) and for InDels, 74 bp (positions 7-80) of each RAD cluster were regarded, we detected SNPs from a total of 9,284,122 bp and InDels from a total of 8,378,354 bp. Hence, we found one SNP every 446 bp and one InDel every 67,027 bp for a very diverse set of B. napus. The study by Westermeier et al. on six B. napus winter oilseed rape varieties observed with one SNP every 247 bp a slightly higher polymorphism frequency. This might be due to sampling effects because candidate sequences with a total length of 21.4 kb of the B. napus genome were investigated. However, with one InDel every 3,583 bp, the study by Westermeier et al. observed a drastically higher InDel frequency. This is because the relatively short reads of 102 bp in our study in combination with the unavailability of a reference sequence are not powerful for the detection of InDels.
Trick et al. estimated the overall sequence polymorphism rate between the transcriptomes of the two cultivars ‘Tapidor’ and ‘Ningyou 7’ to be one SNP per 2,130 bp based on a minimum read depth of eight, or 1,195 bp based on a minimum read depth of four. The reason for the lower SNP frequency in that study when compared to ours is most likely the derivation of SNPs from the less polymorphic coding region. Moreover, only two inbreds were investigated in the study by Trick et al., which leads to an underestimation of the SNP frequency in a species.
Characterization of polymorphisms
The overall representation of different transitions (58.2% (total), 49.7% (A/G), 50.3% (C/T)) and transversions (41.8% (total), 26.5% (A/C), 29.7% (A/T), 17.0% (C/G), and 26.8% (G/T) (Figure 4)) we observed was in good accordance with those detected by Barchi et al. in eggplant through RAD sequencing (transitions: 49.7% (A/G), 50.3% (C/T), transversions: 24.0% (A/C), 28.5% (A/T), 19.9% (C/G), 27.6% (G/T)). Both studies showed a preponderance of transitions over transversions, which has been observed earlier in various species [20, 21]. However, the ratio of total transitions/transversions was with 1.39 lower in our work compared to 1.65 in the study by Barchi et al.. On the other hand, in a study on DNA polymorphism in B. rapa, more than 21,000 SNPs were discovered and characterized in eight diverse genotypes, and a transition/transversion ratio of 1.03 was observed. We therefore conclude that the ratio we found fits into the range of observations from comparable studies. The aforementioned research  furthermore found a transition/transversion ratio of 1.03 in exons and introns versus a ratio of 1.63 in exons only. Hence, our observation of a lower ratio in the B. rapa sequence data (1.45) compared to that in the coding sequence data (1.60) is in line with earlier findings from the species B. rapa.
The BLAST search against the UG set allowed us to identify the fractions of unigenes with RAD clusters and polymorphisms. For this search it was useful to apply a large sample size, therefore we refrained from a differentiation between the B. napus A and C genomes for this part of the study. Consequently, also the BLASTX search for each UG against the UniProtKB/Swiss-Prot dataset was based on the B. napus data. We observed a tight correlation between the GO term representations of all UG and UG with RAD clusters and polymorphisms, except for an apparent over- and underrepresentation of GO terms for UG with InDels (Figure 8). This observation is due to the small number of that polymorphism type in this work. Hence, the results of our study revealed no signature of selection with respect to the distribution of polymorphisms within genes belonging to a specific GO category.
Verification of polymorphisms
In our study, 26 out of 31 SNPs (84%) were verified to be polymorphic according to the Sanger sequencing information. However, four out of the five non-polymorphic SNPs in our work were from one specific RAD cluster. If this RAD cluster was disregarded, the percentage of verified polymorphisms would be close to 100%. It is not obvious why the SNPs from this RAD cluster could not be validated. The RAD cluster was based on a low number of reads (data not shown), but so were other RAD clusters from which SNPs could be validated. A possible reason might be that this RAD cluster comes from a region which is similar between the B. rapa and B. oleracea genomes. The Illumina sequencing approach does not allow the assignment of the reads to the two different genomes and, thus, hemi-SNPs were considered as SNPs. In contrast, with Sanger sequencing, only the SNPs in one of the genomes have been targeted. However, this requires further research.
The high correlation (ρ=0.92) of MRD determined with SSR markers and RAD polymorphisms between pairs of the eight inbreds examined in our study (Figure 5) indicated that the RAD polymorphisms identified are likely to be true polymorphisms. Furthermore, this observation is strongly supported by the verification data of 31 SNPs by Sanger sequencing, where only 13.1% of the inbred-allele combinations observed for the 31 SNPs did not agree. Trick et al. also validated candidate SNPs in their study. Out of nine SNPs that had been PCR amplified previously, eight had been called as hemi-SNPs according to the transcriptome sequencing data. Four of the hemi-SNPs (44.4%) were confirmed, four of the hemi-SNPs (44.4%) were uninformative, and the ninth putative SNP (11.1%) was contradicted. Moreover, Trick et al. found eight out of nine SNPs (88.9%) in the aligned regions to be polymorphic in both data sets. Therefore, the number of validated polymorphic RAD clusters and inbred-allele combinations in our study provides a good rate of correctly called polymorphisms.