Massive parallel sequencing of mRNA in identification of unannotated salinity stress-inducible transcripts in rice (Oryza sativa L.)
- Hiroshi Mizuno†1,
- Yoshihiro Kawahara†1,
- Hiroaki Sakai1,
- Hiroyuki Kanamori2,
- Hironobu Wakimoto1, 3,
- Harumi Yamagata2,
- Youko Oono1,
- Jianzhong Wu1,
- Hiroshi Ikawa2,
- Takeshi Itoh1 and
- Takashi Matsumoto1Email author
© Mizuno et al; licensee BioMed Central Ltd. 2010
Received: 20 April 2010
Accepted: 2 December 2010
Published: 2 December 2010
Microarray technology is limited to monitoring the expression of previously annotated genes that have corresponding probes on the array. Computationally annotated genes have not fully been validated, because ESTs and full-length cDNAs cannot cover entire transcribed regions. Here, mRNA-Seq (an Illumina cDNA sequencing application) was used to monitor whole mRNAs of salinity stress-treated rice tissues.
Thirty-six-base-pair reads from whole mRNAs were mapped to the rice genomic sequence: 72.0% to 75.2% were mapped uniquely to the genome, and 5.0% to 5.7% bridged exons. From the piling up of short reads mapped on the genome, a series of programs (Bowtie, TopHat, and Cufflinks) comprehensively predicted 51,301 (shoot) and 54,491 (root) transcripts, including 2,795 (shoot) and 3,082 (root) currently unannotated in the Rice Annotation Project database. Of these unannotated transcripts, 995 (shoot) and 1,052 (root) had ORFs similar to those encoding the amino acid sequences of functional proteins in a BLASTX search against UniProt and RefSeq databases. Among the unannotated genes, 213 (shoot) and 436 (root) were differentially expressed in response to salinity stress. Sequence-based and array-based measurements of the expression ratios of previously annotated genes were highly correlated.
Unannotated transcripts were identified on the basis of the piling up of mapped reads derived from mRNAs in rice. Some of these unannotated transcripts encoding putative functional proteins were expressed differentially in response to salinity stress.
Gene expression profiling is accelerating our progress toward a comprehensive understanding of the genetic mechanisms that control responses to environmental stress. Microarray analysis was developed to obtain overall gene expression profiles in various plants. Microarray profiling and the recently introduced tag-based sequencing approaches are proven technologies for estimating gene expression. However, array-based technologies have critical limitations [1, 2]. As most microarray probes are designed on the basis of gene annotation, arrays are limited to the analysis of transcripts from previously annotated genes of a sequenced accession of a species. Probes are designed to cover only a very small portion of a gene and so do not represent the whole structure of the gene. Moreover, computationally annotated genes have not fully been validated, because ESTs and full-length cDNAs (FL-cDNAs) cannot cover entire transcribed regions. It is therefore important to identify whole transcripts (including unannotated transcripts) for complete gene expression profiling. There is a need for the development of technologies beyond arrays.
Sequencing-based approaches could overcome the limitations of array-based technologies. Following the rapid progress of massive parallel sequencing technology, whole mRNA sequencing has been used for gene expression profiling [3–8]. This sequencing involves mapping of the reads on known annotated gene models but cannot be used to identify novel genes. Recently, a series of programs have been developed for building gene models directly from the piling up of short reads: Bowtie efficiently maps short reads on genomic sequences ; TopHat concatenates adjacent exons and identifies reads that bridge exon junctions ; and Cufflinks  constructs gene models from the exons and bridging sequences predicted by Bowtie and TopHat and then calculates their abundances of these sequences. The use of this series of programs has the potential to discover new transcripts from mRNA-Seq (an Illumina cDNA sequencing application) but has only just begun [7, 12].
In this study, we identified unannotated transcripts in rice on the basis of the piling up of mapped reads. As a model case, we give examples of salinity stress-inducible unannotated transcripts encoding putative functional proteins. For these purposes, we performed whole mRNA sequencing by using massive parallel sequencing technology. We also took advantage of various high-quality genomic resources in rice, including the genomic sequence (International Rice Genome Sequencing Project [IRGSP] build 4.0), FL-cDNA sequences , the Rice Annotation Project database (RAP-DB: http://rapdb.dna.affrc.go.jp/) [14, 15], and a rice 44K microarray (Agilent Technologies, Palo Alto, CA, USA), in our analysis of rice transcriptomes. First, to estimate the scale of the transcriptomes in rice, we mapped 36-base-pair (bp) reads from the mRNA of salinity stress-treated rice tissues on the rice genome. The coverage of previously annotated regions or of the rice genome was then calculated. Second, we attempted to identify salinity stress-inducible genes as a model system for gene expression profiling by mRNA-Seq. The number of mapped reads was counted and marked on the rice genome. Third, using the mRNA-Seq data, we used Bowtie, TopHat, and Cufflinks to construct gene models based on the piling up of short reads on the rice genome, and compared these with previous annotations and then characterized the unannotated transcripts. We conducted a BLASTX search for the unannotated transcripts, and we discuss the function of the encoded proteins. Fourth, to validate our sequence-based technology, we compared the results of quantification by the array-based and sequence-based approaches, and we discuss the advantages of the latter. This work contributes to the discovery of whole salinity stress-inducible transcripts without the need to rely on previous annotations. It should help to establish further sequence-based gene expression profiling in any organism.
Mapping of 36-bp reads to the rice genome
Numbers of mapped reads
Total filtered reads
Rice transcriptome analysis was based on response to salinity stress. mRNAs were prepared from the tissues of normal rice shoots and roots and from those subjected to 1 h of salinity stress. Of the 27 to 35 million quality-evaluated reads (Table 1; Total filtered reads), 72.0% to 75.2% were mapped uniquely to the rice genome (Table 1; Unique-genome); 5.0% to 5.7% of the reads bridged flanking exons (Table 1; Unique-bridged); 6.0% to 11.2% of the reads were repetitive sequences (Table 1; Multiple); and 10.1% to 16.7% had no match in the genome (Table 1; Unmapped). Thus, a total of 76.9% to 80.9% of the reads were mapped uniquely to the rice genome or to exon-exon junctions (Table 1; Unique-total).
Of the unmapped reads, 26.1% had high levels of identity to sequences derived from sequencing adaptors (11.0%), contaminating organisms (8.2%), or ribosomal RNA (6.9%) (Additional file 1. Table S1). A few transcripts might have been transcribed from unsequenced genomic regions of rice . However, most of the unmapped reads (71.5%) had no similarity to each other (data not shown). Our preliminary experiment showed that the ratio of these unmapped reads was higher with mRNA-Seq (10.1%-16.7%; Table 1; Unmapped) than with genomic sequencing (2.0%-3.1%; data not shown). Thus, part of the random sequences might have come from residual random primers used in cDNA synthesis. The common random sequences might have come from sequencing errors in the use of the Illumina sequencing technology.
Identification of differentially expressed genes by mRNA-Seq
mRNA-Seq quantifies the amount of transcripts on the basis of the number of sequence reads mapped on each gene. We adopted this method for transcript quantification by RPKM  and calculated the RPKM of each gene (Additional file 2: Table S2). RPKM quantification was distributed from 0 to over 104. In shoots under normal conditions, the gene encoding ribulose bisphosphate carboxylase activase (AK104332) was expressed at extremely high levels (rpkm_0 hr_shoot = 10612.237). In roots under normal conditions, the gene for metallothionein (AK105219) was expressed at extremely high levels (rpkm_0 hr_root = 23661.149). The statistical mean and median were 19.78 and 3.399, respectively, in the shoot, and 18.705 and 4.241 in the root under normal conditions.
Constructing gene models by mRNA-seq
Numbers and ORF predictions of transcripts
No. of transcripts mapped on RAP2 annotated loci
No. of transcripts mapped on unannotated region
--ORFs detected by BLASTX
--ORFs ≥ 20 a.a., no BLASTX hits
--no ORFs or ORFs < 20 a.a.
Examples of expression ratios and putative functions of unannotated transcripts
Indole-3-glycerol phosphate lyase
Inter-alpha-trypsin inhibitor heavy chain-related
Pleiotropic drug resistance protein 4
HHP2 (heptahelical transmembrane protein2)
Putative lipoxygenase 5
Probable pleiotropic drug resistance protein 1
Phosphoinositide-specific phospholipase C
Pleiotropic drug resistance protein 4
Kinesin motor protein-related
Phosphatidate cytidylyltransferase family protein
Chromosome-associated kinesin, putative
Inter-alpha-trypsin inhibitor heavy chain-related
VAD1 (vascular associated death1)
Jacalin lectin family protein
PDR11 (pleiotropic drug resistance 11)
Comparison of sequence-based and array-based technologies for gene expression profiling
Estimation of variation and abundance of whole transcripts in rice
How many reads are required to cover whole transcripts in the rice cell? As the number of reads increased, the cumulative coverage approached a plateau (Figure 1). We summed four technical replicates (Table 1). RPKM is widely used to calculate the abundance of each transcript and is linear across a dynamic range . The distribution of RPKM of rice genes ranged from 0 to over 104 (Figure 2); genes involved in photosynthesis in the shoot or in regulation of physiological metals in the root were highly expressed, whereas about 30% of genes had RPKM < 1 (Additional file 2: Table S2). The saturation of sequencing in rice (Figure 1b) was almost the same as in a previous mammalian analysis . According to that analysis, "one transcript in a cell corresponds to 1 to 3 RPKM" , so genes having RPKM < 1 might rarely be expressed. However, data on the RNA content of each rice cell are required to calculate the number of existing molecules of RNAs. As rice tissue contains cells of various sizes and types, the relationship between the number of existing molecules and their RPKM has not yet been accurately determined. When we used four technical replicates, about 20% of genes expressed at relatively low levels (RPKM 3-30) did not reach their final RPKM (Figure 1b), suggesting that these model settings were insufficient for calculating the real RPKM of genes expressed at low levels.
Summing of the four technical replicates covered 70.1% of all annotated regions, corresponding to 15.8% of 389 Mb  of the rice genome (Figure 1a). This result suggests that these regions were transcriptionally active under the experimental conditions. Even though the cumulative coverage was close to a plateau, the coverage rose gradually; the accumulation of about 95 million reads covered 77.0% of annotated regions (Figure 1a), suggesting that some of the reads expressed at low levels were not sequenced. However, the gradual increase in coverage might have been due to the presence of contaminated genomic DNA or a very small amount of partly processed nuclear RNAs, because intron retention is the most prevalent alternative splicing form in rice , as it is in Arabidopsis thaliana. Thus, we consider that the summing of four technical replicates of 36-bp reads, corresponding to a total of 1 Gbp of filtered sequences, covered almost all the transcripts in the rice cell under the experimental conditions, although more reads are required to obtain the final RPKM of genes expressed at relatively low levels.
Identification of unannotated transcripts by mRNA sequencing
mRNA-Seq provides information on whole transcribed genes without the need to rely on annotation (Figure 3), whereas array technology is limited to providing data only on those previously annotated genes and on previously identified ESTs with no known homologies that have corresponding probes on the array. On the basis of the piling up of mapped reads, we predicted 2,795 (shoot) and 3,082 (root) currently unannotated transcripts in RAP-DB (Table 2; Figure 4a). Of the RAP2 unannotated transcripts, 54.6% (1,525/2,795) in shoot and 53.8% (1,659/3,082) in root had not been annotated by Michigan State University (MSU) (data not shown), suggesting that these transcripts were novel transcripts.
Unannotated transcripts included extended parts of previously annotated genes (Figure 4b, c). Extension of 5' exons might contribute to the making of a different start codon or the shifting of the reading frame of previously annotated genes. Extension of 3' UTRs might contribute to microRNA-mediated control of translation or post-transcriptional RNA metabolism [27, 28]. For example, mRNA-Seq provided evidence of the existence of extended parts of previously annotated genes and of the differential regulation of their expression. AK240862, previously annotated as a non-protein-coding transcript, had additional predicted exons distal to the 5' end of the previous gene model, and it encoded an indole-3-glycerol phosphate lyase (Additional file 4: Figure S2). Two neighboring genes (AK072595, AK288107) were also similar to the indole-3-glycerol phosphate lyase gene, suggesting that all three genes were tandemly duplicated. Although all three genes were upregulated in response to salinity stress, their tissue specificities and expression levels differed (Additional file 4: Figure S2), suggesting that their functions diversified after gene duplication.
mRNA-Seq also provided evidence of expression of computationally predicted genes. The existence of a number of genes computationally predicted in RAP-DB  has not been supported  by ESTs  or FL-cDNAs . Here, 1,738 (shoot) and 2,297 (root) transcripts identified by mRNA-Seq have been mapped on computationally predicted genes, the presence of which was not supported by experiments, suggesting the validity of the computationally predicted gene models in RAP-DB. We will use these sequence-based transcriptome analyses to improve RAP-DB.
mRNA-Seq provided details of the bridging sequences between exons, suggesting the presence of splicing junctions, whereas array technology--including whole-genome tiling arrays --provides no information on connecting exons. Because reads that bridge exon boundaries are not mapped directly to the genomic sequence, a mapping technique was required. As a first step, the enumeration of all theoretical splicing junctions within annotated transcripts allows the mapping of bridging reads [12, 16, 30] by using statistical models . We found that 5.0% to 5.7% of reads formed primary bridges with previously annotated exons (Table 1, Unique-bridged); this was not sufficient to discover sequences bridging unannotated transcripts. Programs such as TopHat  and G-Mo. R-Se (Gene Modeling using RNA-Seq)  are designed to align reads to form potential splice junctions without relying on known splice sites. In this study, sequences flanking potential donor/acceptor splice sites were joined to form canonical (GT-AG) introns between neighboring (but not necessarily adjacent) islands by using TopHat . Even though we used TopHat for our prediction, some of the predicted transcripts remained to be separated--unlike the case with the FL-cDNA sequences--because of the lack of sufficient bridging sequences between the exons (Additional file 4: Figure S3), suggesting that more bridging reads should be sequenced to connect predicted exons. Elongation of the length of each read may also enhance the chance to connect predicted exons.
Sequence-based transcriptome analysis for capturing salinity stress-inducible genes in rice
mRNA-Seq comprehensively identified salinity stress-inducible genes. Unannotated transcripts had ORFs (Table 2) with a mean length of 123 amino acids (root) or 125 amino acids (shoot) (Figure 5), suggesting that these unannotated transcripts could encode functional proteins. Of the unannotated transcripts, 213 (shoot) and 436 (root) were differentially expressed in response to salinity stress (Table 3, Additional file 3: Table S3). These unannotated transcripts encoded proteins associated with functions such as amino acid metabolism (indole-3-glycerol phosphate lyase) in response to abiotic stress , diterpenoid biosynthesis (gibberellin 2-beta-dioxygenase), and mechanosensitive ion channel (MSL2) function . Mechanosensitive ion channels are gated directly by physical stimuli such as osmotic shock and transduce these stimuli into electrical signals . mRNA-Seq also captured previously identified genes involved in salinity tolerance, namely those associated with trehalose synthesis (OsTPP1) (Figure 3), dehydrin (LIP9), ABA synthesis (OsABA2), sugar transport (OsMST3), glycerol transferase (WSI76), and transcription factors similar to those of the DREB family (Additional file 2: Table S2). A substantial number of transcripts were exclusively upregulated only in the root (Figure 2). As only the root was directly exposed to 1 h of salinity stress, it might take time to induce the expression of more genes in the shoot; OsTPP1 (Figure 3) might be expressed in the shoot after 10 h of exposure, as has been found in Yukihikari rice . With these genes, Nipponbare may have the potential to be tolerant to salinity stress.
Rice cultivars such as Nona Bokra and Pokkali are substantially more salinity tolerant than Nipponbare , suggesting that the genuine salinity stress tolerance gene might be missing in Nipponbare. The 23 Oryza species are geographically, physiologically, and genetically diverse , and many of the genes in cultivated rices have been selected by humans under field conditions, not by environmental stress. These essentially missing genes could serve as potential genetic resources for the improvement of cultivated crops. Sequence-based technology can be used to extract such missing genes by the piling-up of short reads on their own genomes without the need to rely on sequence similarity.
Overcoming the technical inaccuracy
Microarray technology has been used as a sophisticated platform for the expression profiling of previously annotated genes. However, as an array-based technology, evaluation of signal intensities close to background levels tends to cause artifacts in array analysis because of high levels of background noise and/or cross-hybridization ; moreover, hybridization efficiency might vary with the probes used, suggesting that the calculation of real molar concentrations is inaccurate. Whereas the Agilent rice 44K Array is designed to quantify 60-mer sequences at the 3'-end of transcripts, mRNA-Seq quantifies transcript abundance on the basis of the number of mapped sequences on the whole gene model. In our study, the two measures of transcript abundance (Figure 6) and change ratios (Figure 7) were highly correlated, as in a previous report . Moreover, for genes expressed at low or extremely high levels (Figure 6, root) and for genes differentially expressed in arrays (Additional file 4: Figure S1a), mRNA-Seq seemed to be accurate. Therefore, mRNA-Seq measures the molar concentrations of genes accurately over a broad dynamic range.
Biological replication is required for identifying differentially expressed genes through statistical analysis, as in array-based analysis. Unfortunately, with sequence-based transcriptome analysis there are greater costs than with microarrays for cDNA preparation and sequencing; this prevented us from performing further experiments. Illumina has improved its sequencing technology. Each read length has been continuously increased. Efficient base calling by using the latest Illumina data analysis pipeline software improved the quality and quantity of reads from the same raw image data. Controlled hydrolysis of RNA before cDNA synthesis substantially improved the uniformity of sequence coverage, as in a previous report . These technical innovations in hardware and software will enable remarkable progress in reducing costs and in increasing the sensitivity of detection of sequences transcribed at low levels, the accuracy of quantification and detection of splice forms, and the prediction of the whole structures of transcripts.
Sequence-based transcriptome analysis has recently been applied to various organisms: Arabidopsis thaliana[4, 39], yeasts [40, 41], Drosophila melanogaster, and human . During this study, two types of rice transcriptome analysis were reported, focusing on the transcriptional differences in two rice subspecies and their reciprocal hybrids  and in eight organs from different developmental stages of Oryza sativa L. ssp. indica '93-11' . We analyzed salinity stress-inducible transcripts and constructed gene models based on the pilling up of short reads by using the Cufflinks program. This approach should help to discover novel gene models without reliance on gene annotation.
Microarray-based gene expression profiling is limited to the analysis of annotated genes. In our mRNA-Seq analysis, unannotated salinity stress-inducible transcripts were identified on the basis of the piling up of mapped reads without reliance on gene annotation or FL-cDNA sequences. Some of these novel transcripts had ORFs encoding putative functional proteins and were differentially expressed in response to salinity stress. mRNA-Seq was valid as a gene expression profiling technology for quantifying the abundance of previously annotated genes. Our findings will contribute to improvement of our RAP-DB and to further sequence-based gene expression profiling in various organisms.
Plant material and salt stress treatment
Seeds of rice (Oryza sativa L. 'Nipponbare') were germinated in the dark at 28°C on a sterilized germination tray. Germinated seeds were evenly distributed on 96-well PCR plates supported by a plastic container. Seeds were grown in a growth chamber at 28°C, as previously described . After the seedlings had been grown for 7 days, they were transferred on their 96-well plates into containers filled with 150 mM NaCl solution, or with control solution, and placed at 28°C in a growth chamber for 1 h. Four kinds of tissue (normal shoot, normal root, shoot with 1-h salinity stress, or root with 1-h salinity stress) were collected and immediately frozen in liquid nitrogen. For RNA extraction from each treatment group, 10 plants were collected and mixed, to minimize the effect of transcriptome unevenness among plants.
Total RNA was extracted by using an RNeasy Plant kit (Qiagen, Hilden, Germany). RNA quality was calculated with a Bioanalyzer 2100 algorithm (Agilent Technologies); high-quality (RNA Integrity Number > 8) RNA was used. Total RNA samples (10 μg) were subjected to cDNA construction for Illumina sequencing, in accordance with the protocol for the mRNA-Seq sample preparation kit (Illumina). Oligo(dT) magnetic beads were used to isolate poly(A) RNA from the total RNA samples. The mRNA was fragmented by heating at 94°C for 5 min. First-strand cDNA was synthesized using random hexamer primers for 10 min at 25°C, 50 min at 42°C, and 15 min at 70°C. After the first strand had been synthesized, dNTPs, RNaseH, and DNA polymerase I were added to synthesize second-strand DNA for 2.5 h at 16°C. The ends of double-stranded cDNA were repaired by using T4 DNA polymerase and Klenow DNA polymerase and phosphorylated by using T4 polynucleotide kinase. A single "A" base was added to the cDNA molecules by using Klenow exo-nuclease, and the fragments were ligated to the PE adapters. cDNAs with 200 ± 25-bp fragments were collected. The purified cDNA was amplified by 15 cycles of PCR for 10 s at 98°C, 30 s at 65°C, and 30 s at 72°C using PE1.0 and PE2.0 primers.
Mapping of short reads, detection of bridging sequences, and prediction of transcripts
For each sample, cDNA was sequenced (single read) by an Illumina Genome Analyzer II. Data on nine technical replicates (nine sequencing lanes of a cDNA sample from root after salinity stress) were accumulated for Figure 1. Data on four technical replicates (four sequencing lanes of each cDNA sample, corresponding to about 27 to 35 million 36-bp reads) were summed for Table 1. In our preliminary experiment, two independent sequencing runs using the same cDNA were highly correlated (r = 0.99). The default Illumina pipeline quality filter, which uses a threshold of CHASTITY ≥ 0.6, was used to identify clusters with low signal-to-noise ratios. CHASTITY is defined as "the ratio of the highest of the four (base-type) intensities to the sum of the highest two." Passed filter reads were mapped onto both the Nipponbare reference genome (IRGSP build 4.0) and the spliced exon junction (SEJ) sequences by SOAP ver. 1.11 , allowing up to 2 bp of mismatch or up to 3 bp of indels. SEJ sequences were constructed by concatenating the 40 bases at the 3' end of the upstream exon to the 40 bases at the 5' end of the downstream exon for all RAP2 transcripts [14, 15] at a locus. To calculate the cumulative coverage of the genome or annotated regions, reads were mapped by BWA (Burrows-Wheeler Aligner)  with the default option. To predict transcripts, a series of programs--Bowtie , TopHat , and Cufflinks --was used. Briefly, mRNA-Seq reads were mapped against the whole reference genome (IRGSP build 4.0) by using Bowtie software. An initial consensus of exon sequences was extracted from the mapped reads. The reads that did not align to the genome but that were mapped to these potential junctions by TopHat were considered to bridge splice junctions. Cufflinks constructs gene models (RPKM ≥ 2, length ≥ 100 bp) on the basis of the exons and bridging sequences predicted by Bowtie and TopHat. ORFs were predicted by BLASTX search against UniProt (Swiss-Prot) and RefSeq (reviewed and validated) or by longest-ORF search (≥20 amino acids).
The same RNA material was shared for use in the Illumina sequencing and the microarray experiments and qRT-PCR analysis. The rice 44K oligo microarray (Agilent Technologies) contained approximately 44,000 60-mer oligonucleotides synthesized on the basis of RAP annotation. For each microarray experiment, 400 ng of total RNAs was used for Cy3- or Cy5-labeled complementary RNA (cRNA) synthesis. DNA microarrays were hybridized for 16 h with 825 ng of Cy3- and Cy5-labeled probes from salinity-stressed or unstressed plants. The microarray experiment was repeated with color-swapping of Cy3 and Cy5. Agilent Feature Extraction Software (ver. 22.214.171.124) was used to quantify microarray images. GeneSpring (ver. 10) software (Agilent Technologies) was used for background subtraction, LOWESS normalization, and extraction of normalized raw signal intensities for all probe sets from each array. Normalized raw signal intensities were compared with the corresponding RPKM. Parts of the signals were removed for further analysis if they were not positive, significant, or above background levels. The hybridization experiments and array scanning were performed at an open laboratory run by the DNA Bank of the National Institute of Agrobiological Sciences (http://www.dna.affrc.go.jp/).
Quantitative RT-PCR (qRT-PCR)
qRT-PCR primers were designed on the basis of the annotation of the RAP-DB (Additional file 5: Table S4). One microgram of total RNA was reverse-transcribed in a 20-μL reaction mixture of Transcriptor First Strand cDNA Synthesis Kit (Roche, Basel, Switzerland). qRT-PCR was performed in a 20-μL reaction mixture containing 2× SYBR Master Mix (Roche) and 1 μL of cDNA template (1:10 diluted). qRT-PCR of three technical replicates for each sample was performed using a LightCycler480 System with its relative quantification software (ver. 1.2) based on the delta-delta-Ct method (Roche). qRT-PCR was performed for 10 s at 95°C, 5 s at 55°C, and 10 s at 72°C. The detection threshold cycle for each reaction was normalized against the expression level of the ubiquitin gene.
All primary sequence read data have been submitted to DDBJ (DNA Data Bank of Japan) [DRA000159], and microarray data have been submitted to the GEO (Gene Expression Omnibus) [GSE20746].
The authors thank F. Aota, K. Ohtsu, and K. Yamada for their technical assistance in sample preparation, and Dr. Y. Nagamura and R. Motoyama for their technical assistance in the microarray experiment. This work was supported by the Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural Innovation, RTR-0001).
- Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P: Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 2005, 33 (18): 5914-5923. 10.1093/nar/gki890.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu F, Jenssen TK, Trimarchi J, Punzo C, Cepko CL, Ohno-Machado L, Hovig E, Kuo WP: Comparison of hybridization-based and sequencing-based gene expression technologies on biological replicates. BMC Genomics. 2007, 8: 153-10.1186/1471-2164-8-153.PubMed CentralPubMedView ArticleGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMed CentralPubMedGoogle Scholar
- Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB: Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol. 2007, 144 (1): 32-42. 10.1104/pp.107.096677.PubMed CentralPubMedView ArticleGoogle Scholar
- Sugarbaker DJ, Richards WG, Gordon GJ, Dong L, De Rienzo A, Maulik G, Glickman JN, Chirieac LR, Hartman ML, Taillon BE, et al: Transcriptome sequencing of malignant pleural mesothelioma tumors. Proc Natl Acad Sci USA. 2008, 105 (9): 3521-3526. 10.1073/pnas.0712399105.PubMed CentralPubMedView ArticleGoogle Scholar
- Torres TT, Metta M, Ottenwalder B, Schlotterer C: Gene expression profiling by massively parallel sequencing. Genome Res. 2008, 18 (1): 172-177. 10.1101/gr.6984908.PubMed CentralPubMedView ArticleGoogle Scholar
- Pepke S, Wold B, Mortazavi A: Computation for ChIP-seq and RNA-seq studies. Nat Methods. 2009, 6 (11 Suppl): S22-32. 10.1038/nmeth.1371.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.PubMed CentralPubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.PubMed CentralPubMedView ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.PubMed CentralPubMedView ArticleGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28 (5): 511-515. 10.1038/nbt.1621.PubMed CentralPubMedView ArticleGoogle Scholar
- Trapnell C, Salzberg SL: How to map billions of short reads onto genomes. Nat Biotechnol. 2009, 27 (5): 455-457. 10.1038/nbt0509-455.PubMed CentralPubMedView ArticleGoogle Scholar
- Rice_Full-Length_cDNA_Consortium: Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science. 2003, 301 (5631): 376-379. 10.1126/science.1081288.View ArticleGoogle Scholar
- Rice_Annotation_Project: Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res. 2007, 17 (2): 175-183. 10.1101/gr.5509507.View ArticleGoogle Scholar
- Rice_Annotation_Project: The Rice Annotation Project Database (RAP-DB): 2008 update. Nucleic Acids Res. 2008, D1028-1033. 36 Database
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.PubMedView ArticleGoogle Scholar
- Mizuno H, Wu J, Katayose Y, Kanamori H, Sasaki T, Matsumoto T: Chromosome-Specific Distribution of Nucleotide Substitutions in Telomeric Repeats of Rice (Oryza sativa L.). Mol Biol Evol. 2008, 25 (1): 62-68. 10.1093/molbev/msm227.PubMedView ArticleGoogle Scholar
- Rabbani MA, Maruyama K, Abe H, Khan MA, Katsura K, Ito Y, Yoshiwara K, Seki M, Shinozaki K, Yamaguchi-Shinozaki K: Monitoring expression profiles of rice genes under cold, drought, and high-salinity stresses and abscisic acid application using cDNA microarray and RNA gel-blot analyses. Plant Physiol. 2003, 133 (4): 1755-1767. 10.1104/pp.103.025742.PubMed CentralPubMedView ArticleGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al: The generic genome browser: a building block for a model organism system database. Genome Res. 2002, 12 (10): 1599-1610. 10.1101/gr.403602.PubMed CentralPubMedView ArticleGoogle Scholar
- Shima S, Matsui H, Tahara S, Imai R: Biochemical characterization of rice trehalose-6-phosphate phosphatases supports distinctive functions of these plant enzymes. FEBS J. 2007, 274 (5): 1192-1201. 10.1111/j.1742-4658.2007.05658.x.PubMedView ArticleGoogle Scholar
- Goddijn OJ, van Dun K: Trehalose metabolism in plants. Trends Plant Sci. 1999, 4 (8): 315-319. 10.1016/S1360-1385(99)01446-6.PubMedView ArticleGoogle Scholar
- Xu Y, Buchholz WG, DeRose RT, Hall TC: Characterization of a rice gene family encoding root-specific proteins. Plant Mol Biol. 1995, 27 (2): 237-248. 10.1007/BF00020180.PubMedView ArticleGoogle Scholar
- Wu J, Maehara T, Shimokawa T, Yamamoto S, Harada C, Takazaki Y, Ono N, Mukai Y, Koike K, Yazaki J, et al: A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell. 2002, 14 (3): 525-535. 10.1105/tpc.010274.PubMed CentralPubMedView ArticleGoogle Scholar
- IRGSP: The map-based sequence of the rice genome. Nature. 2005, 436 (7052): 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O: Automated classification of alternative splicing and transcriptional initiation and construction of visual database of classified patterns. Bioinformatics. 2006, 22 (10): 1211-1216. 10.1093/bioinformatics/btl067.PubMedView ArticleGoogle Scholar
- Ner-Gaon H, Fluhr R: Whole-genome microarray in Arabidopsis facilitates global analysis of retained introns. DNA Res. 2006, 13 (3): 111-121. 10.1093/dnares/dsl003.PubMedView ArticleGoogle Scholar
- Kidner CA, Martienssen RA: The developmental role of microRNA in plants. Curr Opin Plant Biol. 2005, 8 (1): 38-44. 10.1016/j.pbi.2004.11.008.PubMedView ArticleGoogle Scholar
- Zhang B, Pan X, Cobb GP, Anderson TA: Plant microRNA: a small regulatory molecule with big impact. Dev Biol. 2006, 289 (1): 3-16. 10.1016/j.ydbio.2005.10.036.PubMedView ArticleGoogle Scholar
- Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, et al: Global identification of human transcribed sequences with genome tiling arrays. Science. 2004, 306 (5705): 2242-2246. 10.1126/science.1103388.PubMedView ArticleGoogle Scholar
- Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008, 321 (5891): 956-960. 10.1126/science.1160342.PubMedView ArticleGoogle Scholar
- Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Addendum: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2009, 41 (6): 762-10.1038/ng0609-762d.View ArticleGoogle Scholar
- Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, et al: Annotating genomes with massive-scale RNA sequencing. Genome Biol. 2008, 9 (12): R175-10.1186/gb-2008-9-12-r175.PubMed CentralPubMedView ArticleGoogle Scholar
- Less H, Galili G: Principal transcriptional programs regulating plant amino acid metabolism in response to abiotic stresses. Plant Physiol. 2008, 147 (1): 316-330. 10.1104/pp.108.115733.PubMed CentralPubMedView ArticleGoogle Scholar
- Haswell ES, Meyerowitz EM: MscS-like proteins control plastid size and shape in Arabidopsis thaliana. Curr Biol. 2006, 16 (1): 1-11. 10.1016/j.cub.2005.11.044.PubMedView ArticleGoogle Scholar
- Arnadottir J, Chalfie M: Eukaryotic mechanosensitive channels. Annu Rev Biophys. 2010, 39: 111-137. 10.1146/annurev.biophys.37.032807.125836.PubMedView ArticleGoogle Scholar
- Pramanik MH, Imai R: Functional identification of a trehalose 6-phosphate phosphatase gene that is involved in transient induction of trehalose biosynthesis during chilling stress in rice. Plant Mol Biol. 2005, 58 (6): 751-762. 10.1007/s11103-005-7404-4.PubMedView ArticleGoogle Scholar
- Moons A, Bauw G, Prinsen E, Van Montagu M, Van der Straeten D: Molecular and physiological responses to abscisic acid and salts in roots of salt-sensitive and salt-tolerant Indica rice varieties. Plant Physiol. 1995, 107 (1): 177-186. 10.1104/pp.107.1.177.PubMed CentralPubMedView ArticleGoogle Scholar
- Vaughan DA, Morishima H, Kadowaki K: Diversity in the Oryza genus. Curr Opin Plant Biol. 2003, 6 (2): 139-146. 10.1016/S1369-5266(03)00009-8.PubMedView ArticleGoogle Scholar
- Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR: Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008, 133 (3): 523-536. 10.1016/j.cell.2008.03.029.PubMed CentralPubMedView ArticleGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320 (5881): 1344-1349. 10.1126/science.1158441.PubMed CentralPubMedView ArticleGoogle Scholar
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008, 453 (7199): 1239-1243. 10.1038/nature07002.PubMedView ArticleGoogle Scholar
- He G, Zhu X, Elling AA, Chen L, Wang X, Guo L, Liang M, He H, Zhang H, Chen F, et al: Global epigenetic and transcriptional trends among two rice subspecies and their reciprocal hybrids. Plant Cell. 2010, 22 (1): 17-33. 10.1105/tpc.109.072041.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang G, Guo G, Hu X, Zhang Y, Li Q, Li R, Zhuang R, Lu Z, He Z, Fang X, et al: Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res. 2010Google Scholar
- Sahi C, Agarwal M, Reddy MK, Sopory SK, Grover A: Isolation and expression analysis of salt stress-associated ESTs from contrasting rice cultivars using a PCR-based subtraction method. Theor Appl Genet. 2003, 106 (4): 620-628.PubMedGoogle Scholar
- Li R, Li Y, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics. 2008, 24 (5): 713-714. 10.1093/bioinformatics/btn025.PubMedView ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.