Massive gene losses in Asian cultivated rice unveiled by comparative genome analysis
© Sakai and Itoh. 2010
Received: 18 September 2009
Accepted: 19 February 2010
Published: 19 February 2010
Skip to main content
© Sakai and Itoh. 2010
Received: 18 September 2009
Accepted: 19 February 2010
Published: 19 February 2010
Rice is one of the most important food crops in the world. With increasing world demand for food crops, there is an urgent need to develop new cultivars that have enhanced performance with regard to yield, disease resistance, and so on. Wild rice is expected to provide useful genetic resources that could improve the present cultivated species. However, the quantity and quality of these unexplored resources remain unclear. Recent accumulation of the genomic information of both cultivated and wild rice species allows for their comparison at the molecular level. Here, we compared the genome sequence of Oryza sativa ssp. japonica with sets of bacterial artificial chromosome end sequences (BESs) from two wild rice species, O. rufipogon and O. nivara, and an African rice species, O. glaberrima.
We found that about four to five percent of the BESs of the two wild rice species and about seven percent of the African rice could not be mapped to the japonica genome, suggesting that a substantial number of genes have been lost in the japonica rice lineage; however, their close relatives still possess their counterpart genes. We estimated that during evolution, O. sativa has lost at least one thousand genes that are still preserved in the genomes of the other species. In addition, our BLASTX searches against the non-redundant protein sequence database showed that disease resistance-related proteins were significantly overrepresented in the close relative-specific genomic portions. In total, 235 unmapped BESs of the three relatives matched 83 non-redundant proteins that contained a disease resistance protein domain, most of which corresponded to an NBS-LRR domain.
We found that the O. sativa lineage appears to have recently experienced massive gene losses following divergence from its wild ancestor. Our results imply that the domestication process accelerated large-scale genomic deletions in the lineage of Asian cultivated rice and that the close relatives of cultivated rice have the potential to restore the lost traits.
With the worldwide demand for food crops increasing, the genetic improvement of cereals has been considered a promising approach to overcoming problems related to food supplies for the past several decades . Between 1966 and 2000, the Green Revolution increased food production in densely populated developing countries by 125% . The drastic improvement in crop yields has saved people in those countries from large-scale famine and economic upheaval . However, the Food and Agriculture Organization (FAO) of the United Nations estimated that 854 million people worldwide remained undernourished from 2001 to 2003 http://www.fao.org/docrep/009/a0750e/a0750e00.HTM. Moreover, the United Nations Population Division has predicted that the world's population will increase from 6.5 billion in 2005 to 9.2 billion in 2050 http://www.un.org/esa/population/publications/wpp2006/wpp2006.htm. We will therefore need to produce 50% more grain supplies by 2025 .
Because crop yields have suffered serious losses due to various plant diseases, disease resistance has been one of the most important challenges in ensuring stable food supplies. It is estimated that diseases and insect pests can cause losses of up to 25% per year . A further threat may be posed by global climate change , which may affect crop yields in various ways, such as increasing the risk of plant diseases or causing direct damage at specific developmental stages (e.g., flowering time) . For these reasons, the development of new cultivars that show enhanced disease resistance, yield, and other aspects of performance is urgently needed. Wild relatives of cultivated species are expected to contain a wealth of genetic resources that are currently unexplored and may assist in the breeding of extant cultivars . Despite the possible agronomic importance of such genetic resources, the amount and characteristics of these resources are largely unknown . In this paper, we use a genome-wide comparative analysis to elucidate the quantity and agronomic potential of genes specific to wild rice species.
Asian cultivated rice, Oryza sativa L. (Os), contains two major cultivar groups, japonica (Oj) and indica (Oi), which are the most important grain crops and provide over 30% of the caloric intake in Asia http://www.irri.org/science/ricestat/index.asp. Oj and Oi split about 400,000 years ago , and were independently domesticated about 10,000 years ago . The two cultivars show significant diversity in single nucleotide polymorphisms, intergenic sequences, and individual gene duplications, suggesting that the genomes of Os have undergone dynamic genome evolution . In addition to Os, the genus Oryza includes the African cultivated rice, O. glaberrima (Og), as well as 22 wild species. The species have been classified into 10 genome types on the basis of chromosomal affinity during meiosis in experimental hybrids . The O. sativa complex consists of Os, Og, and five wild species that have the same AA genome type. These species can be intercrossed, whereas different genome types are incompatible because of reproductive barriers . Thus, two wild rice species of this complex, the annual O. nivara (On) and the perennial O. rufipogon (Or), which are considered the progenitors of Os, can be utilised in hybridisation-based breeding to develop new cultivars with favourable traits . The premise of such breeding efforts is that a considerable amount of useful genetic resources is preserved in wild species. During the domestication process, rice appears to have undergone severe bottlenecks, which can be attributed to the initial cultivation of limited numbers of individuals that possessed key desirable traits, such as reduced grain shattering and seed dormancy [1, 7]. The subsequent modern breeding process, which has been facilitated by extensive artificial selection for desirable traits, has resulted in the rapid loss of genetic diversity that was originally present in the wild ancestral population . Therefore, there may be a significant number of genes that were not essential to the initial cultivation and domestication process, but that might be related to agronomically useful traits.
The term "wild rice-specific genes" can be used in two distinct senses: loci and alleles. Whereas allelic variations among cultivars have been used in breeding to improve agronomically important traits [14–16], few novel loci have been utilised. The discovery of wild rice-specific loci of agronomic importance would be beneficial to the genetic improvement of modern cultivars. In order to examine this issue, the genome sequence of an Asian cultivar, Nipponbare, which was released by the International Rice Genome Sequencing Project and annotated by the Rice Annotation Project (RAP) [17, 18], is compared with the genomes of its wild relatives. The Oryza Map Alignment Project has made available a large collection of bacterial artificial chromosome end sequences (BESs) of wild rice species . Herein, we use Or, On, and Og BESs to identify lost genes in Oj and Oi. The dynamic evolutionary changes evident in the rice genomes are discussed.
Statistical summary of the genomic sequences of four Oryza species used in this study.
Genome size (Mbp)
No. of BESs
Total length (bp)
Total length of repetitive sequences (bp)
Fraction of repetitive sequences (%)
No. of BESs used for mapping
No. of mapped BESs
No. of ambiguous BESs
No. of unmapped BESs
Fraction of unmapped BESs (%)
Of the total BESs used for mapping, 10.9% and 9.5% of On and Or sequences, respectively, failed to be mapped to the Oj genome under the criteria of 80% identity and 70% coverage. Some BESs might have remained unmapped because of large alignment gaps that were derived from long DNA insertions into the Oj genome after the species split. However, even under the relaxed criteria of 80% identity and 30% coverage, 5.2% and 4.0% of the BESs of On and Or, respectively, could not be mapped (Table 1). Hence, our conservative estimate indicates that about 5% of the ancestral genome was lost in the Oj lineage after the divergence between Oj and its wild relatives. Likewise, 6.7% of the Og genome was missing in the Oj genome. Nonetheless, the unmapped BESs may have been conserved in the independent Oi lineage. In fact, of the 3,992, 2,056, and 3,609 unmapped BESs of On, Or, and Og, respectively, we found that 1,775, 1,047, and 880 BESs matched the Oi genome sequence under the criteria of 80% nucleotide identity and 30% sequence coverage. These results indicate that certain genome fractions were deleted only in the Oj lineage, following the divergence of the japonica and indica groups.
Estimation of the numbers of species-specific genes.
1.57 × 10-4
1.55 × 10-4
1.45 × 10-4
1.37 × 10-4
1.32 × 10-4
We regarded "ambiguous" BESs (Table 1) as mapped BESs, so that the estimated numbers of unique genes would be conservative. As a result, we estimated that 1,360, 934, and 1,260 genes lost in Oj have been preserved in On, Or, and Og, respectively (Table 2). Likewise, the number of genes lost in the Oi genome was estimated (Table 2). In addition, to count all the possible candidates, we also estimated the numbers of unique genes, regarding the ambiguously mapped BES as unmapped ones (Additional File 7). Because the ratio of h u to h m depends on the criterion of similarity against the nr database proteins, we tested several criteria and obtained essentially the same results (Additional File 8).
Furthermore, we examined the portions of the genomes lost after the split of the japonica and indica cultivars. Using simulated BESs of Oj and Oi, we mapped the BESs to the other genomes, and estimated the number of unique genes in the Oi genome that are missing in the Oj genome, and vice versa (Table 2). We used 466 Mbp as the genome size of Oi as reported by a previous study . It was revealed that Oj lost 980 genes, while Oi lost 946, which are comparable with the numbers for On, Or, and Og.
The ten most frequent domains among the unmapped BESs of O. rufipogon.
No. of genes with the domain
No. of genes without the domain
No. of genes with the domain
No. of genes without the domain
Protein kinase, core
1.10 × 10-5
Zinc finger, CCHC-type
7.30 × 10-5
2.61 × 10-12
Zinc finger, SWIM-type
9.95 × 10-4
Serine/threonine protein kinase, active site
Leucine-rich repeat, N-terminal
1.93 × 10-6
Protein kinase ATP binding, conserved site
Serine/threonine protein kinase-related
To obtain a conservative estimate of the number of wild rice-specific genes, we applied a relaxed threshold to map BESs onto the Oj genome. Therefore, there may be other candidate genes that were erroneously mapped. These might be detected by careful checking of their molecular phylogeny. In fact, a BES from On, CL619881, was mapped to the Oj genome and matched an Oj gene, BAD33147, with 84% amino acid identity. However, the large evolutionary distance of 0.145 between CL619881 and BAD33147 clearly indicates that BAD33147 is not an orthologue of CL619881 (Additional File 15). Thus, a reasonable evolutionary scenario is that BAD33147 was duplicated before divergence of the On and Oj lineages, and the Oj orthologue to CL619881 was subsequently lost. Because CL619881 is found in a group of genes related to Pib, which is known to be an effective disease resistance gene against rice blast, this On gene might encode an agronomically useful trait. If we apply more stringent criteria to map BESs onto the Oj genome, more wild rice-specific genes that have the potential to genetically improve modern cultivated rice may be identified.
Although grass genomes have been suggested to be evolutionarily stable , genome-wide studies in a variety of species, including rice, have shown dynamic evolution of their genome structures [10, 33, 34]. Our comparative analyses between cultivated and wild rice species revealed that Asian cultivated rice species have accumulated genomic deletions since their divergence from their wild ancestors. However, the deleted genes may have been preserved in other closely related species. In fact, we found that the genomes of wild rice species harbour about one thousand unique genes, which account for 3% of the total genes of the ancestor of Oj. These estimated numbers of the unique genes depended on the genome sizes based on the flow cytometry . Although flow cytometry was later found to have overestimated the genome size of Oj, the difference was quite small (1.29%) . This indicates that our estimates were not affected largely by possible overestimations of the genome sizes. Thus, it appears that the Asian cultivated rice lineage underwent these drastic changes over a relatively short period of time. Additionally, because large-scale deletions, rather than successive small-scale deletions, were found, the changes may have occurred in a relatively small number of steps.
The linear relationship between the number of lost genes and synonymous substitutions suggests selective neutrality of the deletion events (Figure 3). This hypothesis is consistent with our finding that the fraction of unmapped BESs that matched nr database proteins was less than that of mapped BESs (28% vs. 47% for On, and 26% vs. 46% for Or). Deletions of protein-coding regions were selected against, whereas non-coding regions that were functionally less important for the species were more likely to be lost from the genome. By contrast, the rate of recent gene losses seems to be accelerated in the Os lineage. This observation may be related to our finding that some functional domains are overrepresented among the deleted genes (Figure 4, Table 3, Additional File 9, and Additional File 12). In particular, NBS-LRR-type disease resistance genes seem to have been prone to elimination from the Oj genome. It is possible that these disease resistance genes were artificially overrepresented because of the biases in the collection of the BESs examined. However, the BESs are randomly distributed throughout the genome (Additional File 16). In addition, the functional classifications of the mapped BESs of the three close relatives were almost the same as the classification of the Oj genes (Figure 4 and Additional File 9), indicating that there were no obvious biases and that the current BES set represented the overall characteristics of the genes of the three close relatives.
The accelerated reductive evolution might be due in part to purifying selection against the disease resistance genes. It is intuitively expected that disease resistance genes should have been fully utilised in cultivated rice because of their agronomical importance. However, a study in Arabidopsis thaliana has shown that the fitness costs of resistance tend to decrease the presence of disease resistance genes . During the domestication process, Asian cultivated rice may have had less contact with pathogens in the human-controlled environment. Thus, disease resistance genes may have imposed a cost on the fitness in the cultivated species.
A selectively neutral process of accelerated rice genome reduction may be possible if we consider gene deletions in large gene families, such as NBS-LRR. It is known that NBS-LRR genes were subjected to birth-and-death evolution, where some genes in a gene family were maintained in the genome, while others were deleted or become nonfunctional [36, 37]. In fact, 32% of the NBS-LRR genes of Oj were shown to be nonfunctional . Thus, one possible explanation is that rapid turnover following the high rate of nonfunctionalisation has led to the elimination of NBS-LRR genes in the Oj genome. Genetic recombination is one of the mechanisms that contribute to the evolution of NBS-LRR genes in plants [37, 39]. Because NBS-LRR genes have frequently been amplified and, in many cases, are arrayed in tandem, a possible mechanism for the elimination of these genes is unequal crossing-over. In fact, Chin et al. reported that unequal crossing-over led to deletion of the lettuce disease resistance gene, Dm3 . Another possible reason for the gene deletions in rice is that the mating system of rice has rapidly changed from outbreeding to inbreeding during domestication. This hypothesis is supported by the observation that Dm3 was eliminated in an inbred line . Cytological and theoretical studies have demonstrated that self-fertilisation enhances the frequency of recombination [41, 42]. Therefore, the increased self-fertilisation rate may have led to a high frequency of unequal crossing-over in the Os genome, resulting in the rapid loss of NBS-LRR genes. When we conducted BLASTN searches between the unmapped BESs of On and Or, the result showed that 22% of the On unmapped BESs and 33% of the Or unmapped BESs hit to each other. Furthermore, NBS or LRR domains were frequent among the hit BESs, suggesting that these genes were commonly preserved in the genomes of the wild rice species. It is generally accepted that the domestication of rice began approximately 10,000 years ago . Although the loss of a great number of genes over such a short period may appear implausible, the fact that the loss of Dm3 in the lettuce genome occurred in merely four generations  suggests that drastic large-scale deletions may be ubiquitous in cultivated species.
For the breeding of modern food crops, in addition to extant cultivars, wild species are abundant genetic resources that have been largely unexplored to date. Therefore, in this study, we focus on the quantity and quality of genes that have been lost from the genomes of Asian cultivated rice but are preserved in its relatives. Among the promising candidate genes that we have found are a significant number of possible novel disease resistance genes, such as homologues of Pib and PibH8 (Additional File 15), which are well-studied disease resistance genes against rice blast . In this study, we have examined only three accessions of three species. We expect that there are many undiscovered useful genes and allelic variations in wild rice.
Because all three relatives analysed in this study have the same AA genome type as Os, the genes found to be unique to the genomes of the three relatives can be transferred to the Os genome by conventional hybridisation-based approaches. The recent development of introgression lines of elite cultivars and wild rice species has facilitated the efficient detection of quantitative trait loci associated with yield-related and other morphological or physiological traits [44, 45]. Therefore, the introgression lines can be used to further examine the functions of the candidate genes identified in our analyses. As shown by the transfer of a bacterial blight resistance gene, Xa1, from indica to japonica , the transgenic approach may also be useful for examining the functions of the candidate genes and for conferring beneficial traits to the species of interest.
Throughout this study, we have emphasized the importance of novel genetic resources in species that have not been examined in depth at the molecular level. We found that many potentially useful genes remain unexplored in the wild relatives of Asian cultivated rice. Hence, the complete genome sequencing of a wide variety of wild rice species should further expedite genomic breeding, as well as comprehensive analyses of gene function [27, 47]. The high-throughput sequencing of rice species will unveil the retained genetic resources and will promote the development of new cultivated rice varieties.
where L is the genome size, r is the fraction of non-repetitive sequences, and m is the fraction of mapped BESs. The genome sizes of Kim, H. et al. were used . Genome-wide alignments between Oj and Oi were constructed using BLASTZ with the options of "C = 0, H = 2000, Y = 3400, and T = 4" [50, 51]. Some unmapped BESs hit to rice proteins in the nr database (see " Functional classification of genes encoded in BESs"). However, the numbers of the hits were very small (24, 19, and 24 for On, Or, and Og, respectively), and they were negligible in our estimations.
We generated simulated BESs of Oj and Oi from the genome sequences by extracting the same numbers of sequence fragments as the total number (243,927) of BESs of three close relatives with the same lengths. To randomly select DNA fragments, we used the Mersenne Twister algorithm as the uniform pseudorandom number generator . We selected fragments without any ambiguous nucleotides, such as N. To confirm the efficiency of our BES mapping, we aligned the simulated BESs of Oj to the Oj genome, and Oi to the Oi genome. In addition, we randomly introduced nucleotide mutations in the simulated BESs of Oj so that the number of substitutions became 0.025 that was equal to the average number of nucleotide differences between Oj and Og. As a result, we confirmed that 99.9% of the simulated BESs as well as artificially mutated BESs were successfully mapped (Additional File 2).
BESs that were mapped to multiple positions with the same identity and coverage were discarded from the dataset. If multiple BESs were mapped to the same genomic position, the BES with the highest nucleotide identity against Oj was selected. The four pairwise alignments between Oj and the other species were compared, and the regions that were covered by all five species were used. We made multiple alignments for these segments using Clustal W (ver. 1.83) . The positions of the protein-coding regions were determined based on the genome annotation of Oj, which was released by the Rice Annotation Project Database (RAP-DB) . We used the protein-coding regions to construct a phylogenetic tree. Genes containing one or more gaps or internal stop codons were excluded. The alignments were concatenated into one large multiple sequence alignment. MP and NJ trees were constructed using MEGA4 . A ML tree was reconstructed using PAUP* 4.0b10 . We used only the third positions of the codons.
In order to investigate the relationship between evolutionary distances and the numbers of close relative-specific genes, we calculated the numbers of synonymous substitutions in the five species by the Nei-Gojobori method .
where N pc is the number of BESs that overlap with RAP protein-coing regions more than 50 bp on the Oj genome, N all is the number of all BESs used for mapping, J pc is the number of simulated BESs of Oj that overlap with RAP protein-coding regions more than 50 bp, and J all is the number of all simulated BESs of Oj. The gene densities of non- Oj species were the Oj gene density multiplied by a weight factor.
To infer the functions of genes encoded in mapped and unmapped BESs, we conducted BLASTX searches against the non-redundant protein sequence database (nr) using a threshold E -value of < 1.0 × 10-10. The best homologues of the BESs reported by BLASTX were used for indirect functional inference. Because the Oj genome is about 5% incomplete , there may be unmapped BESs that correspond to unsequenced portions of the genome. In fact, some unmapped BESs matched the rice proteins in the nr database with high identities. Therefore, we excluded those possibly mapped BESs from our analysis of gene functions. We chose threshold amino acid identities of 96.0%, 95.8%, and 94.5% for On, Or, and Og BESs, respectively, so that 5% of the homologous BESs of each species were discarded (Additional File 18). The detected proteins and the representative Oj proteins from the RAP-DB annotation were subjected to InterProScan searches . On the basis of the Gene Ontology (GO) hierarchy, the functions were categorised by the map2slim program with generic GO slims. Proteins that were classified as transposable elements (GO:0003964 and GO:0004803) were excluded. The functional classifications of the proteins were compared between the Oj proteins and the mapped or unmapped BESs.
To verify the validity of this indirect method, we compared the direct and indirect functional classifications of the rice protein sets obtained from RAP-DB. Because the rice proteins themselves were included in the nr database, all self-hits were discarded. We confirmed that there was no significant difference between the classifications (Additional File 19). Therefore, the indirect functional inference should correctly reflect the actual function.
bacterial artificial chromosome end sequence
nucleotide-binding site and/or leucine-rich repeat
Rice Annotation Project
bacterial artificial chromosome
nucleotide-binding adaptor shared by apoptotic protease activating factor-1, R proteins, and Caenorhabditis elegans cell death gene 4
DNA Data Bank of Japan
munich information center for protein sequences
We thank Yoko Nishizawa and Shuichi Fukuoka for their comments on disease resistance genes; Rod A. Wing for helpful discussions about the OMAP data; and Masatoshi Nei for his valuable suggestions about the manuscript. This work was supported by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural Innovation, GIR-1001).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.