Global characterization of the root transcriptome of a wild species of rice, Oryza longistaminata, by deep sequencing
© Yang et al. 2010
Received: 16 July 2010
Accepted: 15 December 2010
Published: 15 December 2010
Skip to main content
© Yang et al. 2010
Received: 16 July 2010
Accepted: 15 December 2010
Published: 15 December 2010
Oryza longistaminata, an AA genome type (2 n = 24), originates from Africa and is closely related to Asian cultivated rice (O. sativa L.). It contains various valuable traits with respect to tolerance to biotic and abiotic stress, QTLs with agronomically important traits and high ability to use nitrogen efficiently (NUE). However, only limited genomic or transcriptomic data of O. longistaminata are currently available.
In this study we present the first comprehensive characterization of the O. longistaminata root transcriptome using 454 pyrosequencing. One sequencing run using a normalized cDNA library from O. longistaminata roots adapted to low N conditions generated 337,830 reads, which assembled into 41,189 contigs and 30,178 singletons. By similarity search against protein databases, putative functions were assigned to over 34,510 uni-ESTs. Comparison with ESTs derived from cultivated rice collections revealed expressed genes across different plant species, however 16.7% of the O. longistaminata ESTs had not been detected as expressed in O. sativa. Additionally, 15.7% had no significant similarity to known sequences. RT-PCR and Southern blot analyses confirmed the expression of selected novel transcripts in O. longistaminata.
Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates sufficient genomic information for adequate de novo assembly of a large number of transcripts in a wild rice species, O. longistaminata. The generated sequence data are publicly available and will facilitate gene discovery in O. longistaminata and rice functional genomic studies. The large number of abundant of novel ESTs suggests different metabolic activity in O. longistaminata roots in comparison to O. sativa roots.
Rice (Oryza sativa L.) is a staple food crop for about half of the world's population. In 2008, the total rice-harvested area and rough rice yield in the world were 155.7 million hectares and 661.8 million tons, respectively [International Rice Research Institute (IRRI) 2009]. However, the productivity of rice is severely affected by soil nitrogen nutrient deficiency worldwide. Commercially available urea fertilizer is the most widely used resource to meet a rice crop's nitrogen requirement, of which one third is lost through emission of greenhouse gasses and leaching, causing adverse environmental impacts [1–3]. To meet these challenges and develop environmentally sustainable rice production systems, much attention has been given to natural methods of biological nitrogen fixation (BNF) [4, 5] or to increase nitrogen use efficiency (NUE) [6–8].
The genus Oryza comprises 24 species, including 2 cultivated (O. sativa and O. glaberrima) and 22 wild species with diverse ecological adaptation. These species are categorized into 10 recognizable genome types (AA, BB, CC, EE, FF, GG, BBCC, CCDD, HHJJ and HHKK) [9, 10]. Wild rice has diversified over 40 million years. Wild species are tremendous gene reservoirs for domesticated rice improvement, as they possess many desirable traits, such as resistance to diseases and insect pests or tolerance to different kinds of stresses [11–14]. Oryza longistaminata chev. (2 n = 24, AA), broadly distributed throughout tropical Africa, is a perennial species with characteristics of long anthers, self-incompatibility, allogamy, strong rhizomes and high biomass production on poor soils. In spite of its overall inferior appearance, O. longistaminata has furnished genes for developing perennial rice [15, 16] and for breeding blight disease resistance varieties . To make better use of this potential, more genomic information is required, but there are only few batches of mRNAs or full-length cDNAs (FLcDNAs) of O. longistaminata in public databases, and no genome sequence is available.
Sequencing and analysis of expressed sequence tags (ESTs) has become a primary strategy for functional genomic studies in plants including novel gene discovery, gene expression profiling, microarray and molecular marker development, and accurate genome annotation. After completing the full genome sequence of O. sativa ssp. japonica cv. Nipponbare and the draft genome sequence of the O. sativa ssp. indica cv. 93-11 through a map-based sequencing strategy and through a whole-genome shotgun sequencing approach, respectively [18, 19], much efforts were involved into rice ESTs projects. Approximately 1249,110 ESTs and >50,000 full-length cDNA sequences of cultivated rice are currently available in public databases. However, the genomic studies of rice wild relatives are still in their infancy with the exception of the generation of 5,211 leaf ESTs from the O. minuta (BBCC genome) and 1,888 leaf FLcDNAs from the O. rufipogon (AA genome) [20, 21]. Especially roots are organs underrepresented in EST studies.
Therefore, a comprehensive survey of ESTs in roots of O. longistaminata was undertaken to provide an overview of O. longistaminata root transcriptome and thus a molecular basis for the identification of useful genes. As newly developed massively parallel 454 pyrosequencing allows rapid generation of sequence data and deep sequencing coverage with reducing labour and cost [22–24], we here characterized the first global root transcriptome of that wild rice species O. longistaminata by 454 GS-FLX pyrosequencing technology. This led to the discovery of a huge amount of novel ESTs which will facilitate gene mining and provide a basis for comparative studies within the genus Oryza.
In order to obtain transcripts of genes that might be required for growth under nutrient stress, O. longistaminata plants were clonally propagated and were adapted to low-nitrogen conditions in unfertilized soil for several months. Mature plants with high biomass production (see Additional file 1) were subjected to RNA extraction from roots. As soil-grown roots often yield low quality RNA with inhibitory effects on enzyme activity (reverse transcription or PCR) , several RNA extraction methods were compared. A standard extraction protocol with Trizol yielded degraded RNA (not shown), while RNA extracted by a CTAB-based method was of high quality (Additional file 1).
Pooled RNA extracts from two extractions were used for normalization and sequencing of cDNAs. One GS-FLX 454 pyrosequencing run produced a total of 337,830 reads (87.3 Mb) with average sequence length of 258 bp (SD = 24, range = 60-925) from root cDNAs of O. longistaminata. After removal of adaptor sequences, polyA tail and low quality sequences, 337,471 reads remained with a total length of 66.7 Mb and an average length of 197 ± 61 bases, ranging from 20 bp to 393 bp (Additional file 2). Only sequences above 100 bp of length were further considered. Clustering and assembling of these sequences produced 43,423 contigs and 32,708 singletons. These data were trimmed again by removing those showing homology (E-value cutoff, e-5) to sequences of bacteria, fungi or metazoa, resulting in a total of 71,367 processed unique sequences. The length of contigs varied from 101 bp to 2082 bp with an average of 299 bp, and that of singlets ranged from 101 bp to 393 bp with an average of 215 bp (Additional file 2).
Size distribution of Oryza longistaminata ESTs after assembly
468 (< 1%)
> 1000 bp
146 (< 1%)
146 (< 1%)
Average length (bp)
Maximum length (bp)
The sequence data obtained were in a similar range as for other plant EST sequencing projects using this technology [26, 27], however with a slightly higher read length, demonstrating the power of this approach to deliver large EST datasets.
Distribution of the consensus sequences in rice genome
NO. of ESTs
No. of ESTs
1 (43.26 Mb)
2 (35.93 Mb)
3 (36.41 Mb)
4 (35.28 Mb)
5 (29.89 Mb)
6 (31.25 Mb)
7 (29.70 Mb)
8 (28.44 Mb)
9 (23.01 Mb)
10 (23.13 Mb)
11 (28.51 Mb)
12 (27.50 Mb)
In order to assess how many O. longistaminata ESTs had already been detected as expressed genes in O. sativa, the ESTs mapping onto the O. sativa genomes were also compared with the Knowledge-based Oryza Molecular Biological Encyclopedia (KOME, http://cdna01.dna.affrc.go.jp/cDNA/) cDNA collection, the indica cDNA database http://www.ncgr.ac.cn/ricd/, and the NCBI rice EST database. 83.3% matched to O. sativa genes found to be expressed previously.
A large amount of ESTs (9,993 or 16.7%) had previously not been detected as expressed. For most of them, we did not find homologies to predicted gene models: Inspection of the 30 longest ESTs showed that 67% shared sequence similarity with O. sativa but not to predicted genes, 23% with genes of predicted functions, and 10% with genes encoding hypothetical proteins. This was also reflected in the lack of functional assignments (see below), as after in silico translation only for a small fraction (777) of these ESTs could be assigned according to Gene Ontology (GO). This emphasizes the power of the next generation sequencing approach to detect novel transcripts or even novel genes. As the O. sativa genome may still contain regions that are not fully annotated, our ESTs might indicate as yet unpredicted genes or UTRs that might be functional in O. sativa as well. On the other hand, O. longistaminata might express a special set of genes in comparison to O. sativa, due the particular conditions - being adapted to low availability of external nitrogen sources-, or due to the interspecies differences in expression.
As another category of novel ESTs, in total, 11,212 (15.7%) of 71,367 unique EST sequences could not be mapped to the O. sativa chromosomes by homology search against genomic sequences. Among them, 250 matched the publicly available O. sativa mRNAs or ESTs. The remaining 10,962 sequence tags showed no significant sequence identity (cut- off e-5) with any rice genomic or expressed sequences in public database. Among these, only a very small number (740) had a significant hit in NCBI non-redundant (NR) nucleotide database or ESTs database. The remaining 10,222 ESTs may therefore represent novel genetic material present in O. longistaminata and other root-residing eukaryotes.
The consensus sequences were annotated for sequence similarities using the BLASTX translated sequence comparison against the NCBI non-redundant (NR) protein database. Among the 71,367 contigs and singlets, 34,510 (48.4%) had at least a significant alignment to exisiting gene models in the NR database at an E-value cut-off of e-5. A majority (51.6%) of the O. longistaminata sequences did not match any known protein sequences. Most of the 10,962 novel sequence tags (15.4%) fell into this category. This can partly be attributed to the short length of most of these uni-ESTs, or a large fraction of the ESTs might represent untranslated regions. Mapping those uni-ETSs to rice gene models supported this assumption. https://www.gabipd.org/database/cgi-bin/GreenCards.pl.cgi.
Approximately 15.4% of the unique EST sequences detected in the O. longistaminata root transcriptome currently are not similar to rice sequences in databases. These may represent novel genes of O. longistaminata not present in O. sativa, or it may be possible that there are gaps existing in cultivated rice genome sequences, or a small portion of the unmapped sequences might have resulted from contamination by non-rice sources. A total of 14 novel ESTs were randomly selected for RT-PCR to determine the portion of potential novel genes originating from O. longistaminata rather other organisms in our transcript collection. RT-PCR experiments were conducted on RNAs derived from root tissue of clonally propagated O. longistaminata plants grown in soil in the phytotron. Of the 13 primer pairs for PCR, 10 generated RT-PCR products that were of the expected size and whose sequences were confirmed by Sanger sequencing. The results demonstrated that these 10 novel transcripts detected among the 454-ESTs are indeed expressed in O. longistaminata roots grown in soil (Figure 2A). Among another set of primer pairs for 19 additional ESTs, six yielded a positive result (Additional file 5). However, as conditions for PCR amplification cannot be optimized due to lack of intron-free template, these results may be an underestimation. To test for distribution of the putatively expressed genes among different accessions of the same species, O. longistaminata grains collected at the Okavango region of Namibia were used for gnotobiotic cultivation of seedlings in the phytotron, and pooled for analysis. From root RNA extracts, 5 of the 10 primer pairs yielded RT-PCR products with correct size whose sequences were validated by Sanger sequencing again. This confirmed that these fragments indeed originated from this species and not e.g. from root endophytes, and that their expression was conserved within the species. To control occurrence in the genome, Southern blot analysis was carried out from genomic DNA extracted from leaves of O. longistaminata accession IRGC 110404, and from O. sativa. Probes generated from two out of the 10 ESTs detected hybridizing fragments for wild but not for cultivated rice (Figure 2B). The results indicated that these 2 ESTs are indeed O. longistaminata -specific sequences. Based on these results, we estimated that a large subset of novel sequences was derived from O. longistaminata. The remaining novel EST sequences might be due to the contamination from other sources or due to the 454 sequencing artefacts.
In this study, we present a large-scale EST dataset comprising 71,367 unique EST sequences derived from wild rice O. longistaminata by massively parallel pyrosequencing. Among them, 34,510 ESTs matched to known gene models, and 25,448 ESTs were annotated with GO terms. The comparative analysis between wild rice and two domesticated rice subspecies indicated that O. longistaminata had parallel similarity to japonica as to indica rice. Notably, a large amount of ESTs derived from O. longistaminata roots have not yet been detected as expressed in O. sativa, or did not show similarity to publicly available rice sequences or any other genes. Our data contribute to future annotation approaches of the O. longistaminata genome, to identification of O. longistaminata -specific genes and to the comparative study of the evolution among Oryza genus. These novel ESTs will particularly provide a basis for further identification of genes of O. longistaminata underlying adaptation to nutrient-limiting conditions. All EST obtained in this study is attached in the supplemental data (Additional file 6).
The O. longistaminata accession IRGC 110404 (short name Xa21) was grown under nitrogen-limiting conditions in soil without nitrogen fertilizer in the phytotron in Bremen. The soil (from Camargue) had a low percentage of total nitrogen (0,229%) and a high C/N ratio (25.5). The roots and leaves were harvested by snap-freezing in liquid nitrogen and prepared for RNA and DNA isolation, respectively. The seeds of O. longistaminata collected from Namibia were surface-sterilized  and cultured gnotobiotically in plant medium  supplemented with agar (4 g per L).
The RNA was extracted by the CTAB method described by Chang et al.  from soil-grown roots and then purified using plant RNeasy columns (Qiagen, Hilden). The RNA from cultured seeds was isolated using TRIzol (Invitrogen) according to manufacturer's instructions. The quality of RNA was evaluated by a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Genomic DNA was isolated by the CTAB method described by Allen et al.  from leaves. The concentration of DNA was determined spectrophotometrically and the quality of DNA was checked by agarose gel electrophoresis. cDNA was synthesized using the SMART PCR cDNA synthesis Kit (Clontech, Mountain View, CA). cDNA was purified by QIAquick spin columns (Qiagen, Hilden).
Synthesis of cDNA and normalization for pyrosequencing was carried out by MWG (Ebersberg, Germany) using RNA from roots of soil-grown plants without N-fertilizer. High quality polyA+ RNA was isolated from total RNA as template for first- and second-strand synthesis. By using a semirandom priming approach for both strands, an even shotgun-like distribution of cDNA fragments was achieved. The fragments were size-fractionated and normalised by denaturing and re-association. Approximately 10 μg of cDNAs were sheared by nebulisation and sequenced on a 454 GS-FLX pyrosequencing platform. A total of 337,830 raw reads were obtained. SeqClean software http://compbio.dfci.harvard.edu/tgi/software/ was applied to eliminate low quality sequences, poly A/T sequences, adaptor sequences. The cleaned sequences were subjected to the CAP3 program  for clustering and assembly with default parameters. All the consensus sequences were compared with NR database (GenBank). GO accessions were obtained via assignment of Arabidopsis gene identifiers with the strongest BLASTx alignments to the corresponding O. longistaminata ESTs. Comparison of the distribution of cellular component, biological processes or molecular function obtained using GO annotation was done using the GOSlim program http://www.geneontology.org.
The sequences are available at http://www.gabipd.org/ under the accession Xa21_454, and at GenBank (dbEST acc. No. HS317469 - HS388835).
To validate the presence of novel ESTs detected by pyrosequencing in O. longistaminata, randomly selected sequences were used for expression analysis by RT-PCR (root RNA) and Southern blot (leaf DNA) analyses. About 100 ng total RNA was use to synthesize the first-strand cDNA by SuperScript™ II Reverse Transcriptase (Invitrogen, Carlsbad, CA) with Oligo(dT)12-18 primers. Specific primer pairs for cDNA amplification were designed by Primer3 software  according to the EST sequences. PCR was performed in a 50 μL reaction volume containing 1 μL cDNA, 1× PCR buffer [10 mM Tris-Hcl (pH 8.0), 1.5 mM MgCl2], 0.2 mM dNTPs, 0.2 μM of each primer, and 1.5 U Taq polymerase (MolTaq). The annealing temperature was 60°C for all primer pairs. After 5 min at 94°C, 35 cycles were carried out with 45 s at 94°C, 45 s at 60°C, 1 min at 72°C for extension and final step of 10 min at 72°C. The PCR products were purified and sequenced by the Sanger method (LGC Genomics, Germany). For Southern blot analysis, 5 μg of genomic DNA was used for restriction endonuclease digestion with HindIII and subjected to Southern blot analysis with digoxygenin-labeled probes according to the protocol described by Neuhaus-Url et al. .
This work was funded by a grant awarded by the BMBF (Bundesministerium für Bildung und Forschung) in the framework of GABI-FUTURE (no. 315068) to B. R.-H. and T. H. Grains from O. longistaminata were collected under the Research/Collection permit 1358/2009 and Export Permit 74439 by the Ministry of Environment and Tourism, Namibia.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.