Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds
- Cheng-Ying Shi†1,
- Hua Yang†1,
- Chao-Ling Wei†1,
- Oliver Yu1, 2,
- Zheng-Zhu Zhang1,
- Chang-Jun Jiang1,
- Jun Sun1,
- Ye-Yun Li1,
- Qi Chen1,
- Tao Xia1 and
- Xiao-Chun Wan1Email author
© Shi et al; licensee BioMed Central Ltd. 2011
Received: 3 October 2010
Accepted: 28 February 2011
Published: 28 February 2011
Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes.
Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR).
An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.
The tea beverage yields many health benefits to humans due to the extensive secondary metabolites in tea leaves, including polyphenols, theanine, and volatile oils [1, 2]. The tea plant, Camellia sinensis, is both an economically important horticultural crop and a model system for studying self-incompatibility and Theaceae plants . Due to its large genome (4.0 Gigabases)  and lacks of developed genetic tools such as tissue culture and transformation, little genomic information is available. As of August 2010, only 810 nucleotide sequences, 12,664 expressed sequence tags (ESTs), 1 genome survey sequence (GSS), and 478 proteins from C. sinensis have been deposited in GenBank. The bulk of tea research has focused on analysis of secondary metabolism genes, which were mostly discovered through EST sequencing. In one study, 588 ESTs derived from a subtractive cDNA library were sequenced. Approximately 8.7% of these ESTs were related to secondary metabolism, including leucoanthocyanidin reductase (LCR) involved in catechin synthesis . In another study, 4.8% of the 1,684 ESTs in a tender shoot cDNA library were related to secondary metabolism, including chalcone isomerase (CHI) . Moreover, in a young tea root cDNA library constructed in our laboratory, 4.5% of the 4,860 valid ESTs were secondary metabolic genes .
EST sequencing has long been the core technology for reference transcript discovery [8–10]. However, EST sequencing has some inherent limitations, such as low throughput, high cost, and lack of quantitation of the expressed genes. In addition, there is some bias in the composition of cDNA libraries caused by bacterial cloning, such as vector contamination, overrepresentation of preferentially cloned sequences, and inadequate representation of rare or inherently unclonable transcripts [11–13]. These inherent limitations of EST sequencing and the small number of available ESTs suggested that our understanding of the tea transcriptome is far from complete.
The cost-effective and ultra-high-throughput DNA sequencing technology, RNA-seq, is a revolutionary advance in genome-scale sequencing. This transcriptome analysis method is fast and simple because it does not require bacterial cloning of the cDNAs. Direct sequencing of these cDNAs can generate short reads at an extraordinary depth. Following sequencing, the resulting reads can be assembled into a genome-scale transcription profile. It is a more comprehensive and efficient way to measure transcriptome composition, obtain RNA expression patterns, and discover new exons and genes [11–16]. Recent transcriptomic studies on yeast, Arabidopsis thaliana, mouse, and human cells have demonstrated that this approach is well-suited for surveying the complexity of transcription in eukaryotes [11, 17–23]. Since RNA-seq is not limited to detecting transcripts that correspond to existing genomic sequences, it is particularly attractive for non-model organisms with genomic sequences that are yet to be determined [24–27]. In addition, this new approach is very sensitive, allowing detections of low abundant transcripts.
In this report, we took advantage of RNA-seq to survey the poly (A)+ transcriptome of C. sinensis. The coverage of the transcriptome, at 2.59 gigabase pairs, was comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset will serve as a public information platform for gene expression, genomics, and functional genomics in C. sinensis.
Sequencing, de novo assembly, and sequence analysis
Summary of sequence assembly after Illumina sequencing
Base pairs (Mbp)
Mean length (bp)
Raw sequencing reads
Contigs (≥100 bp)
Scaffold sequences (≥100 bp)
Singletons (≥100 bp)
Clusters (≥100 bp)
Total unigenes (≥100 bp)
The data from the paired-end sequencing was used to join the contigs into scaffolds, which represent the collections of fragments originating from a single transcript. Overall, the contig-joining procedure based on paired-end reads condensed the number of contigs from 191,376 to 127,901. Among them, 119,105 contigs (62.2% of total) merged into 55,630 unique scaffolds.
Subsequently, gap filling and contig joining were carried out to complete as many scaffolds as possible. Through scaffold assembly, an additional 5.35 Mbp were incorporated into existing scaffolds. The amount of all processed data totalled 45.07 Mbp, which further improved the quality of de novo assembly of these short reads.
Assembly of transcriptome sequencing reads can produce more scaffolds than expressed genes, reflecting redundancy among the assembled sequences (i.e., more than one sequence per gene). To reduce any redundancy, assembled sequences from the above analysis were clustered using the Gene Indices Clustering Tools . Sequence clustering merged 1,596 scaffolds into 788 sequence clusters (also known as "consensuses"). These sequence clusters had 2-4 scaffolds per cluster, and 99% of the clusters contained two scaffolds. The remaining 126,306 scaffolds were retained as singletons, for a total of 127,094 "unigenes". This result demonstrated that the assembly and contig joining succeeded in processing a large amount of short reads from the tea plant samples with relatively little redundancy.
Out of the 127,094 unigenes, 22,757 unigenes were ≥500 bp and 7,820 were ≥1,000 bp, with an average unigenes length of 355 bp and an N50 of 506 bp. The size distribution for these unigenes is shown in Figure 1b. The unigene distribution followed the contig distribution closely, with the majority being shorter sequences.
To evaluate the quality of the dataset, we analyzed the ratio of the gap's length to the length of assembled unigenes (Figure 1c). The majority of the unigenes showed gap lengths that were less than 5% of the total length, which accounted for 90.0% of total unigenes numbers (114,346 unigenes). In addition, sequencing bias was analyzed by detecting random distribution of reads in assembled unigenes (Figure 1d). Although the 3' ends of all assembled unigenes contained relatively fewer numbers of reads, other positions of all assembled unigenes showed greater and more even distribution. This observation is consistent with previous publications , suggesting that the quality of our dataset was comparable to similar reports in other non-model species.
Functional annotation and classification of the C. sinensis transcriptome
Summary of annotations of the C. sinensis unigenes
All assembled unigenes
Gene annotations against plant proteins of NR
Gene annotations against Arabidopsis protein of NR
Unique gene annotations against NR
Gene annotations against UniProt
Gene annotations against InterPro
Gene annotations against Pfam
Gene annotations against COG
Gene annotations against KEGG
GO annotations for NR protein hits
3 main categories 43 sub-categories
GO annotations for Arabidopsis protein hits
3 main categories 41 sub-categories
All annotated Unigenes
Unigenes matching all six databases
Annotation of predicted proteins
Conserved domain annotation and COG classification
The 30 most frequently occurring InterPro domains/families in C. sinensis unigenes
Protein kinase, catalytic domain
Serine/threonine-protein kinase-like domain
Serine/threonine-protein kinase domain
Tyrosine-protein kinase, subgroup, catalytic domain
Tyrosine-protein kinase, catalytic domain
Zinc finger, RING-type
WD40 repeat, subgroup
WD40 repeat 2
RNA recognition motif, RNP-1
RNA-directed DNA polymerase (reverse transcriptase), related
Zinc finger, C3HC4 RING-type
ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter
Molecular chaperone, heat shock protein, Hsp40, DnaJ
Myb-type HTH DNA-binding domain
Ankyrin repeat-containing domain
Leucine-rich repeat, N-terminal
Protein phosphatase 2C
Myb transcription factor
Peptidase S8, subtilisin-related
Mitochondrial substrate carrier
ATPase, AAA-type, core
Mitochondrial substrate/solute carrier
Gene Ontology (GO) Classification
The 50,214 of BLASTX matches to Arabidopsis proteins was used for further GO mappings to improve the coverage and depth of GO annotation, because the Arabidopsis dataset has been extensively annotated with GO terms and the most abundant annotation records for C. sinensis unigenes were obtained from the comparisons with Arabidopsis proteins. As a result, the number of C. sinensis unigenes assigned with GO terms greatly increased to 32,107 (25% of the 127,094 unigenes). A total of 157,650 GO terms were associated with these 32,107 unigenes and classified into 41 functional sub-categories (Table 2 and Figure 5b). We discovered that the cluster of biological process was dominant (96,296, 61% of the total GO terms), in which the three sub-categories "response to stimulus" (GO: 0050896), "developmental process" (GO: 0032502), and "multicellular organismal process" (GO: 0032501) were included in the top 6 abundant sub-groups (in addition to the other three sub-categories of "cellular process," "metabolic process" and "biological regulation"). The major sub-categories in the clusters of cell component and molecular function were identical to the above mentioned GO classification. Moreover, only a few genes were assigned with "biological adhesion" (GO: 0022610), "locomotion" (GO: 0040011), "viral reproduction" (GO: 0016032) or "electron carrier" (GO: 0009055) GO terms. Out of 41 functional sub-categories, unigenes were sorted into the two previously unrepresented sub-categories of "rhythmic process" (GO: 0048511; 251 unigenes) and "viral reproduction" (GO: 0016032; 30 unigenes), based on Arabidopsis GO terms. No unigenes were annotated with "virion" (GO: 0019012), "virion part" (GO: 0044423), "chemoattractant" (GO: 0042056) or "chemorepellent" (GO: 0045499) GO terms. As a whole, 32,599 (26% of the 127,094 unigenes) of the unigenes were associated with at least one GO terms between the two kinds of GO annotations. These GO annotations demonstrate that C. sinensis expressed genes encoding diverse structural, regulatory and stress proteins.
KEGG Pathway mapping
In order to identify the biological pathways active in C. sinensis, the assembled unigenes were annotated with corresponding Enzyme commission (EC) numbers from BLASTX alignments against the KEGG database . By mapping EC numbers to the reference canonical pathways, a total of 16,939 unigenes were assigned to 214 KEGG pathways (Table 2 and Figure 3a). The pathways most represented by unique sequences were inositol phosphate metabolism (1,348 members), benzoate degradation via CoA ligation (1,262 members), starch and sucrose metabolism (781 members) and purine metabolism (371 members).
Taken together, 53,966 unique sequence-based annotations had BLAST scores exceeding our threshold (≤1e-5) in NR, Uniprot and KEGG databases (Figure 3a). The Venn diagram (Figure 3c) shows that an additional 1,122 unigenes were annotated by domain-based alignments. Overall, 55,088 unique sequence-based or domain-based annotations using the six selected public databases were assigned to C. sinensis unigenes (43.3% of the assembled unigenes). Among them, 9,139 unigenes had hits in all six public databases with relatively defined functional annotations (Table 2). These annotations provide a valuable resource for investigating specific processes, structures, functions, and pathways in tea research.
Analysis of metabolic pathway genes using C. sinensis unigenes
Primary metabolic pathways in C. sinensis
The primary metabolic pathways selected included glycolysis (10 genes), citrate cycle (10 genes), pentose phosphate cycle (6 genes), and Calvin cycle and photosynthesis (12 genes). Our dataset includes annotated sequences for all genes in the selected four primary metabolic pathways, except for the glucose-6-phosphatase gene in the Calvin cycle (Additional File 3). Pyruvate kinase of the glycolysis pathway has the greatest number of singletons (23) matching the description of the gene. The rest of the primary metabolic genes have between 1 and 20 singletons matching each gene. The genes of the primary metabolic pathways displayed high homology to Arabidopsis or other dicot genes, with most of the genes having more than 80% similarity at the protein levels (data not shown), suggesting that these genes were highly conserved during the evolution.
Secondary metabolic pathways in C. sinensis
Some genes of the phenylpropanoid pathway have been reported previously in C. sinensis. Some members have been expressed in Escherichia coli and were characterized [39–44] to be involved in catechin biosynthesis. We were able to detect these reported genes in our dataset (Additional File 4). Surprisingly, some genes associated with isoflavonoid metabolism were also discovered in our transcriptome, including isoflavone reductase (EC 1.3.1.-, 8 unigenes) and isoflavone 7-O-methyltransferase (EC 18.104.22.168, 1 unigene). The tea plant is not known to produce isoflavones. Similar discoveries have been reported in other non-isoflavone accumulating plants , suggesting homologs of isoflavone reductase and isoflavone O-methyltransferase may have general metabolic functions in many plant species.
In addition, new structural genes encoding flavonoid-modifying enzymes were discovered by a sequence or domain search on our dataset against the existing nucleotide and protein sequences of C. sinensis in NCBI or Uniprot databases. Most of these new genes are involved in glycosylation of flavonoids, catalyzing the transfer of either glucose from UDP-glucose or rhamnose from UDP-L-rhamnose to the hydroxyl groups of flavonoids (Additional File 4). For example, we found unigene sequences encoding tetrahydroxychalcone 2'-glucosyltransferase (1 unigene), flavonol-3-O-glycoside-7-O-glucosyltransferase 1 (6 unigenes), flavonol 3-O-glucosyltransferase/flavonoid-3-O-glucosyltransferase (EC 22.214.171.124, 7 unigenes), anthocyanidin 3-O-glucosyltransferase (EC 126.96.36.199, 33 unigenes), anthocyanidin 5,3-O-glucosyltransferase (EC 2.4.1.-, 12 unigenes), flavonol 3-O-glucoside L-rhamnosyltransferase (EC 188.8.131.52, 5 unigenes), and flavanone 7-O-glucoside 2''-O-beta-L-rhamnosyltransferase (EC 184.108.40.206, 10 unigenes). Moreover, unigenes encoding flavonol 3-O-methyltransferase (EC 220.127.116.11, associated with methylation of flavonols, 9 unigenes) and flavonol 3-sulfotransferase (EC 18.104.22.168, 2 unigenes) were also found in our annotated transcriptome. These genes represent an extensive repertoire of flavonoid modification enzymes that have not been reported before, although some of the conjugated compounds themselves have been identified.
Furthermore, a singleton for the flavonoid-related R2R3 transcription factor, a MYB4a repressor, was found. It is a member of the MYB transcription factor family that has been shown to interact with promoters of the phenylpropanoid pathway genes .
Theanine is a unique amino acid constituting 1-2% of the dry weight of tea leaf. It is synthesized in roots from glutamic acid and ethylamine by theanine synthetase (TS, EC 22.214.171.124; Figure 7b) [47, 48]. The substrate ethylamine is derived from decarboxylation of alanine . Newly synthesized theanine is translocated into the tender shoots through the xylem, where it either accumulates or is broken down into glutamic acid and ethylamine by theanine hydrolase (ThYD, EC 126.96.36.199) . The enzymes involved in theanine synthesis also include glutamine synthetase (GS, EC 188.8.131.52), glutamate synthase (Fe-GOGAT, EC 184.108.40.206), glutamate dehydrogenase (GDH, EC 220.127.116.11), alanine transaminase (ALT, EC 18.104.22.168), and alanine decarboxylase (AIDA). Most of these theanine pathway genes were found in our dataset, except for AIDA and ThYD, which are specific to tea plants with no orthologs from other species be found in the public databases (Figure 7b; unigene IDs in the theanine biosynthesis pathway are listed in Additional File 4.). The AIDA sequences here were selected from the homologues of arginine decaboxylase (ADC) and S-adenosylmethionine decarboxylase (SAMDC), which have similar domains to AIDA. The published TS was highly homologous to GS, thus the identified singletons needs to be confirmed enzymatically. Gamma-glutamyl transpeptidase (GGT, EC 22.214.171.124), used to synthesize theanine in bacteria [52–54], was chosen for a comparative study of the expression abundance in three different organs (see below).
Caffeine (1, 3, 7-trimethylxanthine) is a purine alkaloid present in high concentrations in the tea plant. The caffeine biosynthesis pathway (Figure 7c) is part of the purine metabolism and is catalyzed by three S-adenosyl-L-methionine- (SAM) dependent N-methylation steps, namely xanthosine → 7-methylxanthosine → 7-methylxanthine → theobromine → caffeine. Different N-methyltransferases play important roles in the three N-methylation steps . All related genes could be found in our transcriptome except the two genes guanosine deaminase (EC 126.96.36.199) and N-methylnucleosidase (EC 188.8.131.52) (Figure 7c; unigene IDs are listed in Additional File 4.). These two genes have not been cloned previously and thus do not exist in public databases.
ORF prediction of selected housekeeping gene
To further confirm our gene prediction and annotation algorithms, six housekeeping gene families were selected for open reading frame (ORF) analysis, namely actin, tubulin, histone, glyceraldehyde-3-phosphate dehydrogenase, 28S ribosomal protein, phosphofructokinase (Additional File 5). Out of a total of 88 unigene sequences, 62 sequences were predicted to contain the complete ORF, suggesting that among the house keeping genes, 70.5% of the genes already have the completed ORF. The deduced amino acid sequences have at least 68% homology to Arabidopsis or other dicot genes (data not shown).
Comparison of C. sinensis transcriptome with four Camellia cDNA libraries
Previously, we generated four cDNA libraries using different tissues of the C. sinensis plant. These tissues included young root, young leaf, subtractive young leaf, and drought-stressed root. The cDNAs were sequenced using the conventional Sanger sequencing. We compared these four cDNA libraries with our Illumina sequences using local BLASTN and TBLASTX.
Gene validation and expression analysis
In the RT-PCR analysis, every selected unigene was PCR positive with a single band at the calculated size (data not shown). In the qRT-PCR analysis, relative transcript levels of the unigenes from three different organs were further compared. In the theanine pathway, tea homologues of GDH2, GS1-1 and ADC were expressed much higher in young roots than in leaves and stems (Figure 9a), with the lowest expression level in the stems. In comparison, the expression levels of GS1-2, GGT, glutamate synthase (Fe-GOGAT) and SAMDC were low in young roots, and high in young shoots. The results confirmed that the expression of these selected genes directly correlated with theanine metabolism.
In the flavonoid pathway, six selected unigenes have differential expression patterns (Figure 9b). Overall expression levels of PAL, 4CL, DFR and FNSII were higher than those of CHS and LCR. The tea homologues of DFR and PAL were highly expressed in young roots; 4CL expression levels in roots were similar to those in stems. In contrast, the expression levels of FNSII, CHS and LCR in leaves were higher than those in roots. The results of qRT-PCR expression analysis matched the putative functions of these unigenes.
Ultra-high-throughput mRNA sequencing technology is a fast, efficient, and cost-effective way to characterize the poly (A)+ transcriptome. It is especially suitable for gene expression profiling in non-model organisms that lack genomic sequences. To date, most sequencing efforts in C. sinensis were based on EST sequencing, with a limited number of tags reported in public databases. In this study, we applied RNA-seq technology for C. sinensis transcriptome profiling, in which the poly (A)+ transcriptome was sequenced on the Illumina GA IIx platform. We obtained 2.59 G bp coverage with 34.5 million high-quality reads. We generated a total of 127,094 unigenes (≥100 bp) by de novo assembly. Among them, 55,088 assembled unigenes were annotated. Our coverage is approximately 10-fold more than all C. sinensis sequences deposited in GenBank combined (as of August 2010).
Since C. sinensis is self-incompatible and recalcitrant to genetic manipulations, little genetic or genomic information is currently available. Therefore, instead of a comprehensive in-depth investigation of the tea transcriptome, our experiment was designed to generate a quick landscape view. A number of strategies were adopted to obtain sufficient coverage of expressed transcripts, to improve the accuracy of de novo assembly, and to increase the effectiveness of the gene annotations. First, experimental materials for RNA preparation came from seven organs of the tea plant, which were selected to acquire as comprehensive coverage as possible. Second, an Illumina library was constructed based on the fragmenting RNA method, which has been shown both to reduce the amount of RNA secondary structure and 5' bias and to have better overall uniformity . Third, a paired-end library sequencing strategy was applied not only to increase the sequencing depth, but also to improve the efficiency of de novo assembly. Finally, all six public databases were selected for gene annotation comparisons in order to acquire complete functional information.
As a result, 55,088 unigenes (43.3% of all assembled unigenes) returned significant hits from BLAST comparisons with the six public databases. These unigenes were assigned not only gene or protein name descriptions, but also putative conserved domains, gene ontology terms and metabolic pathways. Detailed functional information is important to understand overall expression profiles of C. sinensis. In particular, the number of unigenes that hit all six public databases summed up to 9,139. Because these genes had relatively unambiguous annotation, they were selected for tea-specific pathway analyses. The remaining 72,006 unigenes (56.7% of all assembled unigenes) did not generate significant homology to existing genes. The absence of homology could be caused by several factors. Obviously, a large proportion (82.1%) of unigenes was shorter than 500 bp, some of which were too short to allow statistically meaningful matches. However, for some unigenes, the absence of homologous sequences in the public databases may indicate specific roles for them in C. sinensis. We are currently cataloging the longer unigenes (≥500 bp; 22,757 unigenes) in tea plants.
The annotated unigenes were used to study primary and secondary metabolic pathways. For the four primary metabolic pathways investigated, all essential structural genes were found (Additional File 3). The putative pathway genes from tea were highly similar to those from the model dicot plant Arabidopsis or other dicot plants. We also analyzed six families of house-keeping genes to evaluate the completeness of our transcriptome coverage. More than 70.5% of these high copy number genes had full-length ORFs. We believe future large-scale sequencing efforts on tea genome and transcriptome will increase the coverage of our dataset even further.
The quality of tea in large part depends on its metabolic profiles. We focused on flavonoids, theanine and caffeine biosynthesis for additional analyses. We were able to find almost all metabolic genes from these pathways (Figure 7 and Additional File 4). Many of these genes appeared to form multi-gene families. It implies that the tea genome, like many other higher plants, went through one or more round of genome duplications during evolution. C. sinensis has a diploid genome, thus, extensive genome re-arrangement might have occurred. We are interested in using SNPs analysis to better understand the genome structure when more RNA-seq data has been obtained.
A few genes are currently missing in these pathways, which might be due to their low expression, insufficient sampling, or ineffective annotations. Some of these genes, such as guanosine deaminase and N-methylnucleosidase, have not been reported in plants before. On the other hand, we found some genes that might play important roles in the above mentioned pathways. For example, many glycosylation enzymes and cytochrome P450 genes were discovered in the transcriptome, which might contribute to the extensive modifications of various secondary compounds found in tea leaf extracts.
By comparing our transcriptome data with four previously prepared cDNA libraries analyzed by EST sequencing, we showed that the number of unigenes from RNA-seq was approximate 20 times more than the existing cDNA libraries. Yet, a small number of genes discovered in the cDNA libraries did not generate BLAST hits in the Illumina transcriptome, which could be resolved by increasing the sequencing depth, enhancing the accuracy of the assembly, and perfecting gene annotation strategies. We have selected two sets of structural enzymes to validate our gene annotations. Each one of them generated the expected band size by RT-PCR, and qRT-PCR analysis showed consistent expression patterns. We are confident that our transcriptome dataset is a valuable addition to the publicly available tea genomic information.
Using Illumina sequencing technology, we surveyed the poly (A)+ transcriptome of C. sinensis at an unprecedented depth (2.59 gigabase pairs) and produced 127,094 assembled unigenes with 55,088 unigenes obtaining annotation. To our knowledge, our results represent approximately 10-fold more genes than all C. sinensis genes deposited in GenBank (as of August 2010) and approximate 20 times more than the existing C. sinensis cDNA libraries. This study demonstrated that the Illumina sequencing technology could be applied as a rapid and cost-effective method for de novo transcriptome analysis of non-model plant organisms without prior genome annotation. These findings provided a comprehensive enough coverage to discover almost all known genes of several major metabolic pathways, which also serves as a substantial contribution to existing sequence resources for the tea plant. We believe that this transcriptome dataset will serve as an important public information platform to accelerate research of gene expression, genomics, and functional genomics in C. sinensis.
Total RNA was extracted by modified CTAB method  from seven different tissues of the tea plant [Camellia sinensis (L.) O. Kuntze cv. Longjing 43], including tender shoots, young leaves, mature leaves, stems, young roots, flower buds and immature seeds which were snap-frozen and stored at -70°C until processing. RNA integrity was confirmed using the Agilent 2100 Bioanalyzer with a minimum integrity number value of 8. Equal amounts of total RNA from each tissue were pooled together for cDNA preparation.
Preparation of cDNA library for transcriptome sequencing
The poly (A)+ RNA was isolated from 20 μg of the total RNA pool using Dynal oligo(dT) 25 beads (Invitrogen) according to the manufacturer's protocol. Following purification, the mRNA was fragmented into smaller pieces at 70°C for 5 min in the fragmentation buffer (Ambion) and reverse-transcribed to synthesize first strand cDNA using SuperScript III reverse transcriptase (Invitrogen) and N6 random hexamers (Takara). Subsequently, second strand cDNA was synthesized using RNase H (Invitrogen) and DNA polymerase (Invitrogen). These cDNA fragments were further processed by an end repair using T4 DNA polymerase, Klenow DNA polymerase, and T4 polynucleotide kinase (NEB), and ligation of adaptors with Illumina's adaptor oligo mix and T4 DNA ligase (Invitrogen). The products were purified for the section of approximate 200 bp long using Qiaquick Gel Extraction Kit (Qiagen) and enriched with PCR for preparing the sequencing library. The cDNA library was detected by Agilent 2100 Bioanalyzer.
The cDNA library was sequenced from both of 5' and 3' ends on the Illumina GA IIx platform according to the manufacturer's instructions. The fluorescent images process to sequences, base-calling and quality value calculation were performed by the Illumina data processing pipeline (version 1.4), in which 75 bp paired-end reads were obtained.
De novo assembly of sequencing reads and sequence clustering
Before assembly, the raw reads were filtered to obtain the high-quality clean reads by removing adaptor sequences, duplication sequences, the reads containing more than 10% "N" rate (the "N" character representing ambiguous bases in reads), and low-quality reads containing more than 50% bases with Q-value ≤ 5. The Q-value is the quality score assigned to each base by the Illumina's base-caller Bustard from the Illumina pipeline software suite (version 1.4), similar to the Phred score of the base call. De novo assembly of the clean reads was performed using SOAPdenovo program (version 1.03, http://soap.genomics.org.cn) which implements a de Bruijn graph algorithm and a stepwise strategy . Briefly, the clean reads were firstly split into smaller pieces, the 'k-mers', for assembly to produce contigs using the de Bruijn graph. The resultant contigs would be further joined into scaffolds using the paired-end reads. Gap fillings were subsequently carried out to obtain the complete scaffolds using the paired-end information to retrieve read pairs that had one read well-aligned on the contigs and another read located in the gap region . To reduce any sequence redundancy, the scaffolds were clustered using the Gene Indices Clustering Tools (copyright(c), http://compbio.dfci.harvard.edu/tgi/software/) . The clustering output was passed to CAP3 assembler  for multiple alignment and consensus building. Others that can not reach the threshold set and fall into any assembly should remain as a list of singletons.
Functional annotation and classification
All Illumina assembled unigenes (consensuses and singletons) longer than 200 bp were annotated by the assignments of putative gene descriptions, conserved domains, Gene Ontology terms, and putative metabolic pathways to them based on sequence similarity with previously identified genes annotated with those details. For assignments of predicted gene descriptions, the assembled unigenes were compared to the plant protein dataset of NR, the Arabidopsis protein dataset of NR, and Swiss-Prot/Uniprot protein database respectively using BLASTALL procedure (ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.18/) with a significant threshold of E-value ≤ 10-5. To parse the features of the best BLASTX hits from the alignments, putative gene names, 'CDS', and predicted proteins of corresponding assembled sequences can be produced. At the same time, the orientation of Illumina sequences which failed to be obtained directly from sequencing can be derived from BLAST annotations. For other sequences falling beyond the BLAST, ESTScan program (version 3.0.1, http://www.ch.embnet.org/software/ESTScan.html) was used to predict the 'CDS' and orientation of them. And then, since a large portion of assembled unigenes have not yet been annotated, conserved domains/families were further identified in the assembled unigenes using the InterPro database (version 30.0, HMMpfam, HMMsmart, HMMpanther, FPrintScan, ProfileScan, and BlastProDom) , Pfam database (version 24.0)  and Clusters of Orthologous Groups database at NCBI (as of December 2009, ftp://ftp.ncbi.nih.gov/pub/wolf/COGs/) . Domain-based comparisons with the InterPro, Pfam and COGs databases were performed using InterProScan (version 4.5, ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/), HMMER3 (http://hmmer.janelia.org) and BLAST programs (E-value threshold: 10-5), respectively. Functional categorization by Gene Ontology terms (GO; http://www.geneontology.org)  was carried out based on two sets of best BLASTX hits from both the plant and Arabidopsis protein datasets of NR database using Blast2GO software (version 2.3.5, http://www.blast2go.de/)  with E-value threshold of 10-5. The KEGG pathways annotation was performed by sequence comparisons against the Kyoto Encyclopedia of Genes and Genomes database  using BLASTX algorithm (E-value threshold: 10-5).
Files containing the raw read sequences and their quality scores are available from the National Center for Biotechnology Information (NCBI) Short Read Archive with the accession number: SRX020193. The assembled sequences (longer than 150 bp) have been deposited in the Transcriptome Shotgun Assembly Sequence Database (TSA) at NCBI with the accession numbers: HP701085-HP777243.
Analysis of metabolic pathway genes using C. sinensis unigenes
Genes involved in four primary pathways (glycolysis, citrate cycle, pentose phosphate cycle, and Calvin cycle and photosynthesis pathways) and three secondary metabolic pathways (flavonoid biosynthesis, theanine biosynthesis, and caffeine biosynthesis pathways) that are related to tea quality were analyzed using C. sinensis unigenes as illustrated in Figure 6. They were firstly searched based on standard gene names and synonyms in the combined functional annotations, and each search result was further confirmed with BLAST searches. BLAST searches were performed by the alignments of Arabidopsis or other dicot protein sequences from the public databases against corresponding genes obtained from keyword searches using the local TBLASTN alignments with an E-value threshold of 1e-5. If no hits can be produced from TBLASTN alignments, Arabidopsis or other dicot nucleotide sequences should be downloaded and used for additional TBLASTX alignments (E-value threshold: 1e-5). The identical results from keyword searches and BLAST searches can be used to predict that these genes could be expressed in C. sinensis. To discover new genes, all searched unigenes were analyzed by BLAST alignments against the existing tea uniEST database [5–7].
Six kinds of housekeeping genes were also searched by the combination method of simple text searches and BLAST searches. The ORFs of the selected housekeeping genes were predicted using the online ORF finder program (http://www.ncbi.nlm.nih.gov/gorf/gorf.html), and their homology to the model dicot plant Arabidopsis or other dicot plants were also analyzed.
Comparisons with four Camellia cDNA libraries
The four Camellia cDNA libraries based on conventional Sanger sequencing were selected for the comparisons with C. sinensis transcriptome. The corresponding EST sequences from the four cDNA libraries were downloaded from the available Camellia ESTs in GenBank, including the EST sequences from the young root cDNA library of the tea plant (C. sinensis) submitted by our laboratory  (GenBank accession: GE652554.1-FE861258.1), two reported C. sinensis cDNA library respectively named subtractive cDNA library special for young leaves of the tea plant (GenBank accession: CV699876.1-CV699527.1) and the young leaf cDNA library of the tea plant (GenBank accession: CV067174.1-CV013548.1), and another drought-stressed root SSH cDNA library of C. sinensis var. assamica (GenBank accession:GW316945.1-GT969202.1). All EST sequences were assembled by Cap3 procedure  to obtain uniESTs with overlap length cutoff of 30nt and overlap percent identity parameter of 80. Comparisons of the C. sinensis transcriptome with the four Camellia cDNA libraries were performed with local BLASTN and TBLASTX procedures (1e-5 cut-off), respectively.
Gene validation and expression analysis
Thirteen selected unigenes with potential roles in theanine and flavonoid synthesis were chosen for validation using real time qPCR with gene specific primers designed with primer premier software (version 5.0) (Additional File 6). Total RNA was extracted from young roots, stems and young shoots of the tea plant using a modified CTAB method  and purified with RNA purification kit (Tiangen, China). One microgram of total RNA was used in reverse transcription in total volume of 20 μL in the presence of 6-mer random primer and oligo primer according to the protocol of Taraka. The standard curve for each gene was obtained by real-time PCR with several dilutions of cDNA. The reaction was performed in 20 μL, containing 10 μL 2×SYBR Green Mastermix (Taraka), 300 nM each primer and 2 μL 10-fold diluted cDNA template.
The PCR reactions were run in Bio-Rad Sequence Detection System using the following program: 95°C for 10 s and 40 cycles of 95°C for 15 s, anneal at 60°C for 30 s. Consequently, the specificity of the individual PCR amplification was checked using a heat dissociation protocol from 55 to 95°C following the final cycle of the PCR and agarose gel electrophoresis. Triplicates of each reaction were performed, and actin gene was chosen as an internal control for normalization after comparison the expressions of four reference genes (actin, GAPDH, 18S and β- tubulin) in different organs. Quantifying the relative expression of the genes in three different organs was performed using the delta-delta Ct method as described by Livak and Schmittgen . All data were expressed as the mean ± SD after normalization.
We appreciate the technical support for Illumina sequencing and initial data analysis from Beijing Genome Institute at Shenzhen, China. We are grateful to Cheng-Bin Xiang from University of Science and Technology of China and Anita K. Snyder for comments on the manuscript. We thank Jia-Yue Jiang, Yun-Sheng Wang, Zhuo-Chen Wang, Yi-Lin Wu, Zhen Zhu, Lin-Li Lu and Yang Luo for material collection and RNA extraction. This work was supported in part by the National Natural Science Foundation of China (Project 30972400), and Special Capital for the Construction of Modern Agriculture Industry Technical System, Ministry of Agriculture & Ministry of Finance of China, grants to Xiao-Chun Wan, and US DOE (DE-SC0001295), US NSF (MCB-0923779) and USDA (2010-65116-20514) grants to Oliver Yu.
- Yamamoto T, Juneja LR, Chu DC, Kim M, (Eds): Chemistry and Application of Green Tea. 1998, CRC Press, New York
- Rogers PJ, Smith JE, Heatherley SV, Pleydell-Pearce CW: Time for tea: mood, blood pressure and cognitive performance effects of caffeine and theanine administered alone and together. Psychopharmacology. 2008, 195: 569-577. 10.1007/s00213-007-0938-1.PubMedView Article
- Wang Y, Jiang CJ, Zhang HY: Observation on the Self-incompatibility of Pollen Tubes in Self-pollination of Tea Plant in Style in vivo. Tea Sci. 2008, 28: 429-435.
- Tanaka J, Taniguchi F: Estimation of the genome size of tea (Camellia sinensis), camellia (C. japonica), and their interspecific hybrids by flow cytometry. Journal of the Remote Sensing Society of Japan. 2006, 101: 1-7.
- Park JS, Kim JB, Hahn BS, Kim KH, Ha SH, Kim YH: EST analysis of genes involved in secondary metabolism in Camellia sinensis (tea), using suppression subtractive hybridization. Plant Sci. 2004, 166: 953-961. 10.1016/j.plantsci.2003.12.010.View Article
- Chen L, Zhao LP, Gao QK: Generation and analysis of expressed sequence tags from the tender shoots cDNA library of tea plant (Camellia sinensis). Plant Sci. 2005, 168: 359-363. 10.1016/j.plantsci.2004.08.009.View Article
- Shi CY, Wan XC: Construction of cDNA library for young roots of tea plant (Camellia sinensis (L.) O. Kuntze) and analysis of its expressed sequence tags. Journal of Nanjing Agricultural University. 2009, 32: 126-130.
- Boguski MS, Toltoshev CM, Bassett DE: Gene discovery in dbEST. Science. 1994, 265: 1993-1994. 10.1126/science.8091218.PubMedView Article
- Ohlrogge J, Benning C: Unravelling plant metabolism by EST analysis. Curr Opin Plant Bio. 2000, 3: 224-228.View Article
- Goossens A, Hakkinen ST, Laakso I, Seppanen-Laakso T, Biondi S, De Sutter V, Lammertyn F, Nuutila AM, Soderlund H, Zabeau M, et al: A functional genomics approach toward the understanding of secondary metabolism in plant cells. PNAS. 2003, 100: 8595-8600. 10.1073/pnas.1032967100.PubMed CentralPubMedView Article
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.PubMedView Article
- Simon SA, Zhai J, Nandety RS, McCormick KP, Zeng J, Mejia D, Meyers BC: Short-read sequencing technologies for transcriptional analyses. Annu Rev Plant Bio. 2009, 60: 305-333. 10.1146/annurev.arplant.043008.092032.View Article
- Morozova O, Hirst M, Marra MA: Applications of New Sequencing Technologies for Transcriptome Analysis. Annu Rev Genomics Hum Genet. 2009, 10: 135-151. 10.1146/annurev-genom-082908-145957.PubMedView Article
- Wold B, Myers RM: Sequence census methods for functional genomics. Nature Methods. 2008, 5: 19-21. 10.1038/nmeth1157.PubMedView Article
- Shendure J: The beginning of the end for microarrays?. Nature Methods. 2008, 5: 585-587. 10.1038/nmeth0708-585.PubMedView Article
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralPubMedView Article
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441.PubMed CentralPubMedView Article
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bähler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008, 453: 1239-1243. 10.1038/nature07002.PubMedView Article
- Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JA: Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008, 133 (3): 523-536. 10.1016/j.cell.2008.03.029.PubMed CentralPubMedView Article
- Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods. 2008, 5: 613-619. 10.1038/nmeth.1223.PubMedView Article
- Marioni J, Mason C, Mane S, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.PubMed CentralPubMedView Article
- Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al: A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome. Science. 2008, 321: 956-960. 10.1126/science.1160342.PubMedView Article
- Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M: Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques. 2008, 45: 81-94. 10.2144/000112900.PubMedView Article
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology. 2008, 17: 1636-1647. 10.1111/j.1365-294X.2008.03666.x.PubMedView Article
- Kristiansson E, Asker N, Forlin L, Larsson DGJ: Characterization of the Zoarces viviparus liver transcriptome using massively parallel pyrosequencing. BMC Genomics. 2009, 10: 345-10.1186/1471-2164-10-345.PubMed CentralPubMedView Article
- Meyer E, Aglyamova GV, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV: Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics. 2009, 10: 219-10.1186/1471-2164-10-219.PubMed CentralPubMedView Article
- Wang XW, Luan JB, Li JM, Bao YY, Zhang CX, Liu SS: De novo characterization of a whitefly transcriptome and analysis its gene expression during development. BMC Genomics. 2010, 11: 400-10.1186/1471-2164-11-400.PubMed CentralPubMedView Article
- Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, et al: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20: 265-272. 10.1101/gr.097261.109.PubMed CentralPubMedView Article
- Pertea G, Huang XQ, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, et al: TIGR Gene Indices clustering tools (TGICL):a software system for fast clustering of large EST Datasets. Bioinformatics. 2003, 19: 651-652. 10.1093/bioinformatics/btg034.PubMedView Article
- Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37: D211-D215. 10.1093/nar/gkn785.PubMed CentralPubMedView Article
- Sompompailin K, Makita Y, Saito K: A WD-repeat-containing putative regulatory protein in anthocyanin biosynthesis in Perilla frutescens. Plant Mol Biol. 2002, 50: 485-495. 10.1023/A:1019850921627.View Article
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al: The Pfam protein families database. Nucleic Acids Res. 2010, 38: D211-D222. 10.1093/nar/gkp985.PubMed CentralPubMedView Article
- Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29: 22-28. 10.1093/nar/29.1.22.PubMed CentralPubMedView Article
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene Ontology: tool for the unification of biology. Nature Genetics. 2000, 25: 25-29. 10.1038/75556.PubMed CentralPubMedView Article
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView Article
- Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32: D277-D280. 10.1093/nar/gkh063.PubMed CentralPubMedView Article
- Khan N, Mukhtar H: Tea polyphenols for health promotion. Life Sci. 2007, 81: 519-533. 10.1016/j.lfs.2007.06.011.PubMed CentralPubMedView Article
- Winkel-Shirley B: Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol. 2001, 126 (2): 485-493. 10.1104/pp.126.2.485.PubMed CentralPubMedView Article
- Takeuchi A, Matsumoto S, Hayatsu M: Chalcone synthase from Camellia sinensis: isolation of the cDNAs and the organ-specific and sugar-responsive expression of the genes. Plant Cell Physiol. 1994, 35: 1011-1018.PubMed
- Lin GZ, Lian YJ, Ryu JH, Sung MK, Park JS, Park HJ, Park BK, Shin JS, Lee MS, Cheon C: Expression and purification of His-tagged flavonol synthase of Camellia sinensis from Escherichia coli. Protein Expr Purif. 2007, 55: 287-292. 10.1016/j.pep.2007.05.013.PubMedView Article
- Singh K, Rani A, Kumar S, Sood P, Mahajan M, Yadav SK, Singh B, Ahuja PS: An early gene of the flavonoid pathway, flavanone 3-hydroxylase, exhibits a positive relationship with the concentration of catechins in tea (Camellia sinensis). Tree Physiol. 2008, 28: 1349-1356.PubMedView Article
- Rani A, Singh K, Sood P, Kumar S, Ahuja PS: p-Coumarate:CoA ligase as a key gene in the yield of catechins in tea [Camellia sinensis (L.) O. Kuntze]. Funct Integr Genomics. 2009, 9: 271-275. 10.1007/s10142-008-0098-3.PubMedView Article
- Singh K, Kumar S, Rani A, Gulati A, Ahuja PS: Phenylalanine ammonia-lyase (PAL) and cinnamate 4-hydroxylase (C4H) and catechins (flavan-3-ols) accumulation in tea. Funct Integr Genomics. 2009, 9: 125-134. 10.1007/s10142-008-0092-9.PubMedView Article
- Singh K, Rani A, Paul A, Dutt S, Joshi R, Gulati A, Ahuja PS, Kumar S: Differential display mediated cloning of anthocyanidin reductase gene from tea (Camellia sinensis) and its relationship with the concentration of epicatechins. Tree Physiol. 2009, 29: 837-846. 10.1093/treephys/tpp022.PubMedView Article
- Yu O, McGonigle B: Metabolic Engineering of Isoflavone Biosynthesis. Advances in Agronomy. 2005, 86: 147-190. full_text.View Article
- Hellens RP, Allan AC, Friel EN, Bolitho K, Grafton K, Templeton MD, Karunairetnam S, Gleave AP, Laing WA: Transient expression vectors for functional genomics, quantification of promoter activity and RNA silencing in plants. Plant Methods. 2005, 1: 13-26. 10.1186/1746-4811-1-13.PubMed CentralPubMedView Article
- Sasaoka K, Kito M, Inagaki H: Biosynthesis of theanine in tea seedlings. Synthesis of theanine by homogenate of tea seedlings. Agric Biol Chem. 1963, 27: 467-468.View Article
- Sasaoka K, Kito M, Onishi Y: Some properties of the theanine synthesizing enzyme in tea seedings. Agric Biol Chem. 1965, 29: 984-988.View Article
- Takeo T: L-Alanine decarboxylase in Camellia sinensis. Phytochemistry. 1978, 17: 313-314. 10.1016/S0031-9422(00)94173-6.View Article
- Tsushida T, Takeo T: An enzyme hydrolyzing L-theanine in tea leaves. Agric Biol Chem. 1985, 49: 2913-2917.View Article
- Kidron H, Repo S, Johnson MS, Salminen TA: Functional classification of amino acid decarboxylases from the alanine racemase structural family by phylogenetic studies. Mol Biol Evo. 2007, 24: 79-89. 10.1093/molbev/msl133.View Article
- Miflin BJ, Habash DZ: The role of glutamine synthetase and glutamate dehydrogenase in nitrogen assimilation and possibilities for improvement in the nitrogen utilization of crops. J Exp Bot. 2002, 53: 979-987. 10.1093/jexbot/53.370.979.PubMedView Article
- Suzuki H, Izuka S, Minami H: Use of bacterial gamma-glutamyltranspeptidase for enzymatic synthesis of gamma-D-glutamyl compounds. Appl Environ Microbiol. 2003, 69: 6399-6404. 10.1128/AEM.69.11.6399-6404.2003.PubMed CentralPubMedView Article
- Yamamoto S, Wakayama M, Tachiki T: Theanine production by coupled fermentation with energy transfer employing Pseudomonas taetrolens Y-30 glutamine synthetase and Baker's yeast cells. Bios Biotechnol Biochem. 2005, 69: 784-789. 10.1271/bbb.69.784.View Article
- Kato M, Mizuno K: Caffeine synthase and related methyltransferases in plants. Front Biosci. 2004, 9: 1833-1842. 10.2741/1364.View Article
- Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralPubMedView Article
- Shi CY, Wan XC, Jiang CJ, Sun J: Method for high-quality total RNA isolation from tea plants [Camellia sinensis (L.) O. Kuntze)]. Journal of Anhui Agricultural University. 2007, 34: 360-363.
- Li RQ, Wei F, Geng T, Zhu HM, He L, Cai J, Li B, Bai YQ, Zhang ZH, Zhang YP, et al: The sequence and de novo assembly of the giant panda genome. Nature. 2010, 463: 311-317. 10.1038/nature08696.PubMed CentralPubMedView Article
- Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C (T)) method. Applied Biosystems. 2001, 25: 402-408.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.