Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing
© Wang et al. 2009
Received: 13 February 2009
Accepted: 9 October 2009
Published: 9 October 2009
Skip to main content
© Wang et al. 2009
Received: 13 February 2009
Accepted: 9 October 2009
Published: 9 October 2009
Glandular trichomes produce a wide variety of commercially important secondary metabolites in many plant species. The most prominent anti-malarial drug artemisinin, a sesquiterpene lactone, is produced in glandular trichomes of Artemisia annua. However, only limited genomic information is currently available in this non-model plant species.
We present a global characterization of A. annua glandular trichome transcriptome using 454 pyrosequencing. Sequencing runs using two normalized cDNA collections from glandular trichomes yielded 406,044 expressed sequence tags (average length = 210 nucleotides), which assembled into 42,678 contigs and 147,699 singletons. Performing a second sequencing run only increased the number of genes identified by ~30%, indicating that massively parallel pyrosequencing provides deep coverage of the A. annua trichome transcriptome. By BLAST search against the NCBI non-redundant protein database, putative functions were assigned to over 28,573 unigenes, including previously undescribed enzymes likely involved in sesquiterpene biosynthesis. Comparison with ESTs derived from trichome collections of other plant species revealed expressed genes in common functional categories across different plant species. RT-PCR analysis confirmed the expression of selected unigenes and novel transcripts in A. annua glandular trichomes.
The presence of contigs corresponding to enzymes for terpenoids and flavonoids biosynthesis suggests important metabolic activity in A. annua glandular trichomes. Our comprehensive survey of genes expressed in glandular trichome will facilitate new gene discovery and shed light on the regulatory mechanism of artemisinin metabolism and trichome function in A. annua.
Secreting glandular trichomes (GTs) are a major site for biosynthesis and accumulation of a wide range of plant natural products. These plant natural products often function to protect the plants against insect predation [1, 2], and contribute to the flavour and aroma of plants. Many of the natural products also have pharmacological effects, such as the analgesic drug morphine, the anticancer compound taxol, and the antimalarial drug artemisinin. Artemisinin, a sesquiterpene lactone, is currently recognized as one of the most prominent anti-malarial treatment . A complete understanding of the artemisinin biosynthetic pathway and its regulatory mechanism holds the key to efficient metabolic engineering for increased artemisinin yield. In the past decades, research efforts have been dedicated to identification of enzymes and intermediate compounds leading to artemisinin production. Many genes encoding enzymes participate in the pathway have been cloned and functionally characterized [4–10]. However, little is known about the regulatory aspects of sesquiterpene metabolism. This is partly due to the fact that A. annua is a non-model plant with limited genomic information available, and sequencing of limited number of randomly selected cDNA clones often have insufficient coverage of less abundant transcripts, including important regulatory transcription factors (TFs). In addition, genes uniquely or preferentially expressed in trichomes may be under-represented in non-tissue-targeted EST sequencing projects. A comprehensive survey of genes expressed in glandular trichome will facilitate new gene discovery and contribute significantly to elucidating the terpenoid pathway regulation and trichome function in A. annua.
Whole genome or transcriptome sequencing enables functional genomic studies based on global gene expression. The newly developed high throughput pyrosequencing technology allows rapid production of sequence data with dramatically reduced time, labor, and cost [11–15]. So far, most applications of pyrosequencing have involved analysis of genomic DNA . Published reports on 454 pyrosequencing of transcriptomes have been mostly restricted to model species with genomic or comprehensive Sanger EST data available [11, 17–19]. Previous studies [11, 19] using genome or Sanger EST sequences for mapping and annotation of 454 ESTs were not able to accomplish de novo assembly of their 454 ESTs. We here present the global transcriptome characterization of A. annua glandular trichome, the so called biofactory for the production of artemisinin and other plant secondary metabolites. We assigned putative function to 28,573 unigenes, including previously undescribed enzymes likely involved in sesquiterpene biosynthesis. We verified the expression of 32 selected unigenes and novel transcripts in glandular trichomes using semi-quantitative RT-PCR. These 454 ESTs were linked to metabolic process specific in glandular trichomes and form the basis for further investigation.
Length distribution of assembled contigs and singletons
Nucleotides length (bp)
Summary of component reads per assembly
Number of reads
Number of contigs
2 to 10
11 to 20
Summary of blast hits from two pyrosequencing runs
NCBI database unique hits
Shared common GO terms (biological process) in all trichome EST databases
No. of unigenes
Positive regulation of protein metabolic process
Positive regulation of metabolic process
Glucose metabolic process
Positive regulation of biosynthetic process
Positive regulation of glycolysis
Positive regulation of lipid transport
Unigenes encoding the MEP and MVA pathway enzymes and all the sesquiterpene artemisinin pathway enzymes were present in our pyrosequencing collection. It is noteworthy that although the sequences were derived from normalized cDNA collections, unigenes corresponding to MEP pathway enzymes were two fold more abundant as compared with MVA pathway transcripts. This likely suggests that the MEP pathway may serve as a major route for DMAPP/IPP production in the A. annua trichomes. The MEP pathway has previously been shown to provide precursors for both mono-and sesqui-terpene biosynthesis in snapdragon flowers . In a recent report on hops, the ESTs encoding MEP pathway enzymes are also found more abundant than those of MVA pathways .
Furthermore, large amount of unigenes annotated as phenylpropanoids and flavanoids pathway enzymes were present in the assemebled pyrosequencing EST collection (ss Additional file 3), indicating the metabolic function of glandular trichomes in A. annua secondary metabolism.
RT-PCR was also used to confirm the expression of novel transcripts and singletons. A set of 18 novel transcripts and singletons was randomly selected to test if they are indeed expressed in GT (Figure 4B). Of the 20 primer pairs, 13 produced RT-PCR products that were of the correct size and whose sequence matched the sequences from which the primers were designed. Based on these results, we conclude that many of the novel transcripts and singletons detected among the 454-ESTs are not due to the sequencing artifacts. This result provides further evidence for the value of tissue specific 454 sequencing for gene discovery.
As the sole plant source for artemisinin production, the A. annua has been studied extensively for the past decades. Like most other non-model plant species, it has lacked genetic and genomic resources necessary for mechanistic study. Although a precise estimate of transcriptome coverage is unattainable without full genomic sequence, we appear to have recovered a significantly portion of the A. annua glandular trichome transcriptome. Novel transcripts detected highlights the hypothesis-expanding aspects of 454 deep pyrosequencing approach, which potentially facilitate the understanding of glandular trichome metabolic function. The assembled sequence data also provided a rich source of information for further investigation.
Two consecutive pyrosequencing runs identified a large number of genes expressed in glandular trichomes. In data analysis, approximately 85% of the pyrosequencing assemblies did not align to any ESTs available in GenBank. This high proportion could reflect the specialized cell type that was sampled or perhaps the greater complexity of the A. annua genome. Because our priority goal in this study is gene discovery, we therefore chose normalized cDNA population to reduce oversampling of abundant transcripts and to maximize coverage of less abundant transcripts present in the sample. The average contig length was fairly short (~334 bp), and only 62% of the sequence reads assembled into contigs, leaving 147,699 singletons.
Genes involved in plant secondary metabolism have frequently been identified by EST approach . The lower cost and greater sequence coverage offered by pyrosequencing makes it possible to identify more candidate genes involved in plant natural product biosynthetic pathways, esp. those with low abundance and often missed by conventional EST projects. For non-model species with little or no genomic data available, such as A. annua, pyrosequencing offers rapid characterization of a large portion of the transcriptome and therefore provides a comprehensive tool for gene discovery. However, one limitation of pyrosequencing is that one must rely on RACE PCR in order to obtain full-length sequence data for a given gene of interest.
Comparison between our glandular trichome 454 ESTs with conventional ESTs generated from trichomes of other plant species revealed likely common function in non-glandular and glandular trichomes. In addition, some unigenes corresponding to enzymes in sesquiterpene biosynthesis were found to be highly expressed in both trichome types in our RT-PCR analysis. Although it has been suggested that glandular trichomes are the site for synthesis and accumulation of plant secondary metabolites, it will be interesting to further investigate the different functional roles of non-glandular trichomes in artemisinin biosynthesis.
In conclusion, we describe the global analysis of glandular trichome in A. annua using massively parallel pyrosequencing. Mining the pyrosequencing ESTs resulted in the identification of many contigs likely involved in terpenoid biosynthesis and trichome function. Functional characterizations of selected genes are being carried out. These pyrosequencing data form the basis for further characterization of the molecular mechanism of glandular trichome function in A. annua. The results also highlight the value of using tissue-specific high throughput pyrosequencing technology for gene discovery in non-model plants. Access to all EST contigs obtained in this study is facilitated through a file available in the supplemental data (see Additional file 4).
A. annua seeds were purchased from Youyang, Sichuang province of China. Seeds were sown into commercial potting mixture for germination. The germinated plantlets were grown under natural light conditions in the greenhouse located at The Chinese University of Hong Kong. Flower buds were collected for trichome isolation before flowering.
Total RNA was extracted from glandular trichomes isolated from 30 g flower buds following the standard protocol of RNeasy Plant Mini Kit (Qiagen). cDNA was synthesized using the BD SMARTM PCR cDNA Synthesis Kit (Clontech). First-strand cDNA synthesis was performed with oligo(dT) primer as described in the provided protocol using 500 ng total RNA. Double-strand cDNA was prepared from 2 μl of the first-strand reaction by PCR with provided primers in a 100 μL reaction. cDNA was purified using Qiagen QIAquick PCR purification spin columns. Normalization was performed using TRIMMER cDNA normalization kit (EVR_GEN) to decrease the prevalence of abundant transcripts before sequencing. Approximately 1 μg of normalized double stranded cDNA was used for 454 pyrosequencing.
Approximately 1 μg of the adaptor-ligated cDNA population was sheared by nebulization and DNA sequencing was performed following protocols for the Genome Sequencer GS FLX System (Roche Diagnostic). Reads generated by the FLX sequencer were trimmed of low quality, low complexity [poly(A)] and adaptor sequences using the SeqClean software http://compbio.dfci.harvard.edu/tgi/. The cleaned sequences were subject to CAP3 program  for clustering and assembly using default parameters.
After assembly, the resulting contigs and singlets were aligned with NCBI non-redundant protein database using blast2go software with a cut-off e-value of 1e-10. The GI accessions of best hits were retrieved, and the GO accessions were mapped to GO terms according to molecular function, biological process and cellular component ontologies http://www.geneontology.org/.
To verify the presence of pyrosequencing ESTs in glandular trichomes, we totally selected 35 unigenes and novel transcripts for RT-PCR analysis. Total RNA were extracted from glandular trichomes, non-glandular hairy trichomes, leaves and hairy roots respectively. The first-strand cDNA was synthesized from 10 μL (about 1 μg) total RNA using SuperScript™ II Reverse Transcriptase (Invitrogen) with Oligo(dT)12-18 Primer. PCR was performed using 0.5 to 2 μL of the cDNA in a total of 50 μL reaction volume. The PCR conditions were 2 min at 95°C, 30 s at 95°C, 30 s at 47-56°C, 1 min at 72°C for 30 cycles, followed by 5 min at 72°C. These conditions were chosen because none of the samples analyzed reached a plateau at the end of the amplification (i.e. they were at the exponential phase of the amplification). Actin was used as a loading control, and loading was estimated by staining the gel with ethidium bromide. Expression analysis of each gene was confirmed in at least 2 independent RT-reactions using forward and reverse primers.
expressed sequence tag
ATP-binding cassette transponer
inferred from sequence similarity
2-C-methyl-d-erythritol 4-phosphate pathway
mevalonic acid pathway
We thank Mr. Patrick Lau from the core facility in the Faculty of Science at CUHK for performing the 454 pyrosequencing. The work was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project no. CUHK 4603/06M).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.