Whole-genome resequencing for comprehensive SNP detection
Ossowski et al.  reported that 87% of the total Arabidopsis reference genome was covered by Solexa reads. However, no similar study of rice had been reported. We found that 80.1% of the rice reference genome (Nipponbare) was covered by Solexa reads derived from a closely related cultivar, Koshihikari (Table 1). Our preliminary in silico mapping indicated that 79.9% of the 35-bp split-genome sequence of Nipponbare chromosome 1 was uniquely mapped to the corresponding region of the Nipponbare genome (data not shown), showing good agreement with the genome coverage. Despite the high level of genome coverage, 372 435 contigs (56.9%) were less than 100 bp long (Figure 1); this is unavoidable because of the short reads generated by the high-throughput sequencer. The lack of coverage of some regions was caused by large changes in chromosomal structure and therefore loss of contiguousness, but most omissions were due to unmapped reads with multiple SNPs, conserved domains, repeat sequences, or small InDels, which must be filled in after future advances in sequencing technology.
Though we mapped the Solexa reads at an average sequencing depth of 15.7×, the depth varied among genomic regions. Small chromosomal regions that were not covered well by the Koshihikari reads were found on all chromosomes (additional file 1). A region with lower depth that measured 6.8 to 7.1 Mb on chromosome 8 was consistent with a cluster of retrotransposons (data not shown). However, we could not fully explain the lower depth of the small chromosomal regions. Some cases might be explained by heterochromatic and repetitive regions  or by chromosome segment duplication . The relationship between sequencing depth and efficacy in the comprehensive detection of SNPs is a key concern from the perspective of cost-effectiveness. Although the optimal sequencing depth depends on the study objectives, our data suggest that a sequencing depth of at least 10× would be necessary to discover genome-wide SNPs for use in haplotype analysis (additional file 2). Smith et al.  reported that the redundancy resulting from increasing the sequencing depth from 10× to 15× permits accurate and cost-effective detection of polymorphisms using the Solexa analyzer. Our results clearly support their conclusion.
Distributions of the detected SNPs between closely related rice cultivars
We detected 67 051 SNPs between Koshihikari and Nipponbare (Table 1), with an average density of 5.7 kb/SNP. In previous comparisons between indica and japonica, the densities were 0.27 kb/SNP in a 2.4-Mb region on chromosome 4 between GLA4 and Nipponbare , and 0.93 kb/SNP across the whole genome between 93-11 and Nipponbare . The density of SNPs between the Japanese cultivars in the present study was 1/6th to 1/21st those in the previous reports, which may reflect sequence divergence among the varietal combinations used in the analysis. Recently, a genome-wide resequencing analysis of 20 diverse indica and japonica cultivars  demonstrated that the average SNP density between two temperate japonica cultivars, Nipponbare and M202, was 25.1 kb/SNP (about 117 Mb/4662 SNPs). Monna et al.  reported a diversity analysis among nine rice accessions, including Koshihikari and Nipponbare, by means of Sanger sequencing of 1117 intergenic regions. Of these, 78 sites showed polymorphism between the two cultivars, and were distributed unevenly throughout the genome. We detected a large number of SNPs in regions that did not show any polymorphism, indicating the potential power of the present approach for SNP discovery even among closely related rice cultivars. However, we still identified extremely low numbers of SNPs in several chromosomal regions, such as in 6.5 to 9.5 Mb on chromosome 1, in 12 to 14 Mb on chromosome 2, and in 2.5 to 3 Mb on chromosome 8 (Figure 2). It is very likely that these chromosomal regions are conserved between Koshihikari and Nipponbare, because they share a common ancestral cultivar (Norin22), and thus probably share a common chromosomal segment. This would make these regions unsuitable sources of SNPs for use as DNA markers. To overcome the uneven distribution of SNPs, we are collecting additional SNP sets to fill the gaps by whole-genome sequencing of Japanese cultivars distantly related to Koshihikari and Nipponbare. On the other hand, Wang et al.  defined numerous genomic regions with low frequency of SNPs among rice cultivars as "SNP deserts", and suggested that they might be a result of rice domestication. Some of the regions with low frequencies of SNPs, such as on the distal end of the long arm of chromosome 1, at 10-12 Mb on chromosome 2 and at 10-15 Mb on chromosome 5 (Figure 2), overlap with these deserts, and may be legacies of domestication.
Definition of the pedigree haplotype
Genome-wide SNP typing has been a powerful method for revealing the genomic constitutions of closely related lines or cultivars . Huang et al.  proposed a new method of high-throughput genotyping and estimation of recombination points based on whole-genome resequencing of recombinant inbred lines. Even though the coverage of sequence reads throughout the genome was not enough (only 0.02× coverage per line), their method provided useful results. However, this method can only be applied with cultivars whose genome sequences have been decoded, and that makes it difficult to apply their approach with a large set of independent lines or cultivars. To overcome this limitation, we performed a two-step analysis, with genome-wide SNP detection followed by array-based genome-wide SNP typing. Based on the relatively high sequencing depth, we detected a large number of SNPs between two cultivars (Nipponbare and Koshihikari) that are representative of the population being analyzed. We then developed a typing array consisting of 1917 SNP sites, and applied it to 151 representative cultivars from Japanese rice breeding history over the past 150 years (additional file 5). Even though the SNP density was insufficient in some regions of the chromosomes, we successfully defined distinct haplotypes.
Definition of the pedigree haplotypes enabled us to discriminate between almost all of the 151 cultivars, and to thereby gain insights into the changes in genome composition that have occurred during the history of modern rice breeding. Our analysis revealed that these breeding practices have clearly simplified the genome composition. Furthermore, we clearly demonstrated that the Koshihikari genome is dominant in certain regions of the chromosomes in the most recently bred cultivars. This dominance by the Koshihikari genome is not surprising, because Koshihikari has frequently been used as a parental line during cultivar development (additional file 5, Figure 3B). This information will be useful to guide future cross-breeding using recent Japanese cultivars, since it will allow breeders to avoid redundant haplotypes in crossing designs and will facilitate the selection of genotypes in the progeny of each cross combination.
We successfully identified 18 consensus haplotype blocks longer than 1 Mb in Koshihikari, Hitomebore, Akitakomachi, and Hinohikari (Figure 3C), which together account for about 65% of the total area of rice cultivation in Japan . Furthermore, these haplotype blocks could be assigned to genomic regions of the ancestral landraces. These highly conserved regions are the consequence of selection during modern rice breeding, and may be an essential identifying characteristic of recent Japanese rice cultivars. Currently, we have not yet developed any annotations for these haplotype blocks. From a biological point of view, it will be necessary to clarify the relationship between haplotype conservation and recombination frequency or physical structure of chromosomes. From a rice breeding point of view, it will be necessary to clarify the association between particular haplotypes and phenotypic differences. We have recently developed reciprocal chromosomal-segment-substitution lines between Nipponbare and Koshihikari . These plant materials will be invaluable for the genetic analysis of phenotypic traits that are of economic interest.
Changes in haplotype diversity during modern rice breeding
It has been reported that genetic diversity, based on the estimated genetic distances among several rice ecotypes, is decreasing in modern rice cultivars . We clearly demonstrated that the haplotype number has been decreased by breeding practices (Figure 4), supporting this idea. In barley, Rostoks et al.  analyzed chromosomal haplotype diversity using 1524 SNPs based on genome-wide expressed sequence tag information. They suggested that chromosome 6H is highly diverse throughout the cultivar group. In rice, chromosome 2 showed a higher level of diversity than the other chromosomes (Figure 4 and additional file 9). Interestingly, rice chromosome 2 has a syntenic relationship with barley chromosome 6H. There is limited information on the association between genes located in those genomic regions and phenotypic variation in traits of agronomic or economic interest. Thus, it is difficult to speculate about the source of this high level of diversity. This observation may nonetheless provide a new opportunity to investigate whether the same form of selective sweep during breeding has occurred in two or more related crop species such as rice and barley.
Genetic diversity was relatively low (haplotype number <0.1) in several chromosomal regions in all three groups, such as in 2.6 Mb on chromosome 1, in 4.2 Mb on chromosome 4, in 23.5 Mb on chromosome 7, in 15.2 Mb on chromosome 8, in 6.9 Mb on chromosome 11, and in 6.8 Mb on chromosome 12 (Figure 4). This suggests that there may have been genetic bottlenecks during differentiation of the Japanese landraces. It will be interesting to learn whether this low level of genetic diversity is present in those genomic regions in other japonica rice accessions. It would also be interesting to perform new introgression in these regions from the genomes of distantly related cultivars during breeding. In contrast, relatively highly variable regions (haplotype number > 0.4) were identified in 19.5 Mb on chromosome 10 and in 2.2 Mb on chromosome 11. Because of their high variability, these regions are not likely to have been adversely affected by selection pressure during breeding. We therefore hypothesize that there are no genes of economic value in these chromosomal regions. Alternatively, intense recombination might have occurred in these chromosomal regions during breeding.
The number of haplotypes may have decreased from Group 2 (older) to Group 3 (more recent) in chromosomal regions such as those measuring 0.9 to 1.9 Mb on chromosome 6 and 7.6 to 12.5 Mb on chromosome 12. This may be an example of decreasing genetic diversity in a particular chromosomal region as a result of breeding. This decrease might be due to the change in breeding strategy from a focus on high yield to a focus on grain and cooking quality. In fact, the short arm of chromosome 6 contains major QTLs related to flowering time  and the waxy gene, which may contribute to grain quality [37–39], and several QTLs that control grain quality are located in the middle of chromosome 12 (Gramene QTL Database, http://www.gramene.org/qtl/).
We detected regions in which the number of haplotypes increased from Group 1 to Group 3, such as in 29.8 to 33.8 Mb on chromosome 1, in 0.7 to 2.2 Mb on chromosome 3, and in 19.4 to 22.7 Mb on chromosome 12, suggesting that a new haplotype block might have been created by selection during breeding. One possible explanation is that this occurred by strong selection of two loci that become tightly linked during the repulsion phase. For example, major QTLs for eating quality (stickiness, hardness, and the appearance of cooked rice) have been mapped in a region of 0.7 to 2.2 Mb on chromosome 3 [40–42]. In addition, a major QTL related to germination ability at low temperature (qLTG-3-1) was identified in this region . It therefore seems reasonable to hypothesize that the new haplotype was created as a result of selection for a combination of eating quality and germination ability. Further analysis will be required to prove that this type of strong selection pressure occurred during rice breeding.
Toward next-generation breeding based on haplotype selection
The haplotypes of modern rice cultivars are the results of combinations of haplotypes inherited from ancestral landraces. Therefore, some of the haplotype blocks that remain in Group 3 might have been selected by breeders, either consciously or unconsciously. Once we reveal the relationships between the haplotype blocks and phenotypic changes in future research, it should become possible to develop ideal combinations of haplotype blocks (ideotypes). Based on the densely mapped SNP information from the present study, it will be possible to design disruption and reconstruction of existing consensus haplotype blocks  to generate novel haplotype blocks with new variations (i.e., to perform haplotype shuffling). On the other hand, excessive inbreeding has generally resulted in homogenization of the genome, thereby decreasing genetic diversity and creating a more uniform genome pattern in autogamous crop plants such as rice and wheat. To enhance the potential utility of the haplotype information revealed in this study, genome or haplotype shuffling should be carefully considered in future rice breeding.
To facilitate this process, the introduction of new approaches in breeding, such as recurrent selection to generate dynamic whole-genome recombination  and genomic selection [46, 47] will be needed. In these new approaches, genome-wide SNP discovery and whole-genome SNP typing will be indispensable tools. Our results in the present study will therefore open the door for next-generation breeding in rice.