- Research article
- Open Access
De novo sequencing and analysis of root transcriptome using 454 pyrosequencing to discover putative genes associated with drought tolerance in Ammopiptanthus mongolicus
BMC Genomics volume 13, Article number: 266 (2012)
De novo assembly of transcript sequences produced by next-generation sequencing technologies offers a rapid approach to obtain expressed gene sequences for non-model organisms. Ammopiptanthus mongolicus, a super-xerophytic broadleaf evergreen wood, is an ecologically important foundation species in desert ecosystems and exhibits substantial drought tolerance in Mid-Asia desert. Root plays an important role in water absorption of plant. There are insufficient transcriptomic and genomic data in public databases for understanding of the molecular mechanism underlying the drought tolerance of A. mongolicus. Thus, high throughput transcriptome sequencing from A. mongolicus root is helpful to generate a large amount of transcript sequences for gene discovery and molecular marker development.
A total of 672,002 sequencing reads were obtained from a 454 GS XLR70 Titanium pyrosequencer with a mean length of 279 bp. These reads were assembled into 29,056 unique sequences including 15,173 contigs and 13,883 singlets. In our assembled sequences, 1,827 potential simple sequence repeats (SSR) molecular markers were discovered. Based on sequence similarity with known plant proteins, the assembled sequences represent approximately 9,771 proteins in PlantGDB. Based on the Gene ontology (GO) analysis, hundreds of drought stress-related genes were found. We further analyzed the gene expression profiles of 27 putative genes involved in drought tolerance using quantitative real-time PCR (qRT-PCR) assay.
Our sequence collection represents a major transcriptomic resource for A. mongolicus, and the large number of genetic markers predicted should contribute to future research in Ammopiptanthus genus. The potential drought stress related transcripts identified in this study provide a good start for further investigation into the drought adaptation in Ammopiptanthus.
Desert ecosystems currently cover at least 35% of the Earth’s land surface and, in China, the area of desert land amounts to approximately 2,080 million km2, covering 22% of total land area of the country . Furthermore, the desert region worldwide is still expanding partly due to the ongoing global warming. Conservation of the genetic resources of endemic desert plants is critical to global efforts to curb desertification, prevent further deterioration of the fragile ecosystems in arid and semi-arid regions, and maintain biodiversity in deserts. Ammopiptanthus, the only genus with evergreen broadleaf habit in the desert and arid regions of Mid-Asia, including Northern China, plays a critical role in maintaining desert ecosystems and delaying further desertification. A deeper understanding of the genetic control of adaptation to desert environment in Ammopiptanthus would be beneficial and timely.
According to fossil evidence, the vegetation in northwestern China was dominated by evergreen broadleaf forest in the early Tertiary period, but with the climate becoming drier and colder in central Asia, the forest was gradually replaced by steppe and then by desert . Ammopiptanthus is a relict survivor of the evergreen broadleaf forest of this region from the Tertiary period possibly owing to its high tolerance to drought and cold.
The genus Ammopiptanthus (Leguminosae) comprises of two species: Ammopiptanthus mongolicus (Maxim. ex Kom.) Cheng f. and Ammopiptanthus nanus (M. Pop.) Cheng f. In China, A. mongolicus mainly distributes in the desert and arid regions of Inner Mongolia and Ningxia Autonomous Regions, as well as Gansu Province. A. mongolicus is one of the constructive species of desert ecosystems and serves a vital function in maintaining desert vegetation. The habitats of A. mongolicus are stony and/or sandy deserts with an annual precipitation ranging from 100 mm to 160 mm and a mean annual potential evaporation up to 3,000 mm. To adapt to the harsh environment, A. mongolicus have developed sophisticated mechanisms to maintain the capacity of water absorption from soil. The deep flourishing root system is essential in the high drought tolerance of A. mongolicus; however, the genetic mechanism is still unknown. Because of the ecological importance and the high academic value in A. mongolicus, several studies have focused on anatomy and physiology , genetic marker and diversity [1, 4], freeze resistance protein  and cold tolerance mechanisms , and transgenic functional analysis of AmNHX2, AmLEA, and AmCBL1. Few studies have addressed the drought tolerance mechanism of A. mongolicus except that Xu et al. reported that more osmolyte was found in drought-stressed Ammopiptanthus leaves .
A large number of nucleotide sequences are prerequisite for identifying drought related genes and further understanding the molecular mechanism underlying drought tolerance of A. mongolicus. However, little resources exist for A. mongolicus in GenBank (749 EST and 125 Nucleotide sequences prior to 1 Sept. 2011) and A. nanus, another species in the genus Ammopiptanthus, despite of the importance of the genus. Considering the large genome size of the woody plants, whole genome sequencing of A. mongolicus is difficult. The construction of large EST collections is thus the most promising approach for providing functional genomic level information in A. mongolicus. Sequencing and analysis of ESTs is one of primary tools for discovery of novel genes, especially in non-model plants. In addition, ESTs can also be used for other functional genomic projects, including gene expression profiling, microarrays, molecular markers development, and physical mapping [11, 12].
In recent years, next-generation sequencing (NGS) technologies, including Roche/454 pyrosequencing, Illumina/Solexa sequencing technology, and Applied Biosystems SOLiD sequencing technology, have led to a revolution in genomics and provided cheaper and faster delivery of sequencing information . The newest 454 sequencing platform, the GS FLX Titanium, can generate one million reads with an average length of up to 400 base pairs (bp) at 99.5% accuracy per run. The 454 sequencing platform has been successfully applied in transcriptome sequencing of Brassica napu s , Artemisia annua, Eucalyptus grandis, Olea europaea, Arabidopsis thaliana[18, 19], Medicago truncatula, and other plant species . To date, the 454 pyrosequencing technique is the most widely used NGS technology for the de novo sequencing and analysis of transcriptomes in non-model organisms.
Simple sequence repeat (SSR) markers are microsatellite loci that can be amplified by polymerase chain reaction (PCR) using primers designed for unique flanking sequences. Compared with other types of molecular markers, SSRs have many advantages, such as simplicity, effectiveness, abundance, hypervariability, reproducibility, codominant inheritance, and extensive genomic coverage . Based on the original sequences used to identify the simple repeats, SSRs can be divided into genomic SSRs and EST-SSRs. Genomic SSR markers have some disadvantages. Firstly, genomic SSR markers are derived from genomic BAC library, most of which come from the intergenic regions with no gene function. Secondly, the procedures for developing such markers are difficult, complex, and high-cost. In addition, the interspecific transferability of genomic SSRs is limited because of either a disappearance of the repeat region or degeneration of the primer binding sites . Alternatively, EST-SSRs are derived from expressed sequences, which are more evolutionary conserved than noncoding sequences; therefore, EST-SSR markers have a relatively high transferability . With the increasing number of ESTs deposited in public databases, an expanding number of EST-SSRs have been developed, and the polymorphism and transferability of EST-SSRs have been evaluated in many plant species [24–29]; however, there is no report on development of EST-SSR markers in A. mongolicus by now.
In order to significantly expand the transcript catalog of A. mongolicus, identify candidate genes involved in drought tolerance, and develop more SSR markers, we performed large-scale transcriptome sequencing of A. mongolicus root using Roche/454 next-generation sequencing technology. A total of 672,002 root-specific ESTs were obtained and assembled into 29,056 unique sequences. Bioinformatics analysis indicated that these unique sequences represent at least 9,771 protein coding transcripts. Thousands of potential simple sequence repeats molecular markers are discovered and 27 genes that were differentially expressed under drought treatment were identified by further quantitative real-time PCR analysis. This study will provide novel insights into the molecular mechanism underlying the drought tolerance in A. mongolicus.
454 sequencing of the Ammopiptanthus root transcriptome
A cDNA library constructed by SMART technology from the pooled RNA from drought-stressed and unstressed root samples of A. mongolicus was subjected to a half plate run with the 454 GS FLX Titanium platform. This half plate run produced 672,002 high quantity (HQ) reads with an average sequence length of 279 bp (SD = 92.2, range = 40–902) (Table 1). Of the HQ reads, 81.8% were over 200 bp in length, and 44.6% were over 300 bp in length. The size distribution of the reads is shown in Figure 1. All HQ reads were also deposited in the National Center for Biotechnology Information (NCBI) and can be accessed in the Short Read Archive (SRA) under the accession number SRX142053.
Prior to assembly, the low quality reads, adapter/primer sequences and sequences of less than 50 bp were removed using SeqClean (latest version) and Lucy (1.20p) first, and then Newbler v2.5.3. As a result, a total of 654,834 (97.4% of total HQ reads) sequencing reads was used for de novo assembly. The length distribution of these sequencing reads is shown in Figure 2.
De novo assembly of sequencing data using three assemblers and comparison of the assemblies
To get a better assembly result, three assembly programs, Newbler (version 2.5.3), Mira (version 3.2.1) and Cap3 with default or optimized parameters were used for de novo assembly of our 454 sequencing data. We aimed at more long contigs and more contigs with homologs in soybean protein database (Phytozome v7.0, http://www.phytozome.net/).
We first run assemblies using the three assemblers with their default parameters, and similar assembly results were obtained in assemblies using Mira and Cap3. However, remarkably less contigs quantity and less contigs with homologs in soybean protein database were shown in the assemblies using Newbler with default parameters (Table 2). To increase the number of reads used in the assembly and get more amount of contigs, we then run assemblies using Newbler with a set of optimized parameters according to the assembler manual by checking “Use duplicate reads”, “Extend low depth overlaps”, “Reads limited to one contig”, and “Single Ace file” options.
We compared the four assemblies using the following standard metrics: total number of reads used in the assembly, number of contigs generated, N50 length of contigs, number of contigs, mean contig length, and summed contig length. We also evaluated assembly integrity and completeness by comparing with the soybean protein datasets (Table 2).
Ideally, the optimal assembler will use almost all the reads given. In this respect, Newbler (optimized parameter) behaved best, and then Cap3 and Newbler, and MIRA use the least reads. The optimal assembler will produce the longest summed length of contigs, with a relatively longer mean contig length, while avoiding over-assembly of reads into in silico chimaeras. Although Newbler with default parameter generated an assembly with the largest N50, mean contig length and number of contig no less than 1,000 bp, it also produced the smallest summed length of contigs, and startlingly low total number of contigs. MIRA with default parameter generated an assembly with the longest summed length of contigs and maximum total number of contigs, but it also produced the smallest N50 and mean contig length. Cap3 generated a relatively larger assembly size than Newbler (optimal parameter), but with shorter N50, mean contig length, and number of contig no less than 1,000 bp.
Another optimality criterion for a novel de novo assembled transcriptome we used in this study is how well the assembly represents protein sequences from soybean, the most related organism to A. mongolicus with sequenced genome (Table 2). A better assembler will return contigs that hit soybean data well, and will show a high coverage of the soybean protein datasets. The assembly generated by MIRA had the largest quantity of contigs with significant hits and soybean protein hits, while the assembly generated by Newbler (optimized parameter) had the largest quantity of contigs with 80% or greater coverage and soybean proteins with 80% or greater coverage.
Of the four assemblies we generated using the three assemblers, the assembly generated by Newbler (optimized parameter) was selected for further analysis, since it used the largest quantity of sequencing reads for assembly and had relatively large assembly size, longer contig length, and better assembly integrity and completeness. Another reason that we choose Newbler was due to its frequent use in de novo assembly of 454 pyrosequencing transcriptome projects .
Characteristics of the Ammopiptanthus root transcriptome
Using Roche Newbler (version 2.5.3) with optimized parameter, the 654,834 preprocessed sequencing reads were assembled into 29,056 unique sequences including 15,173 contig and 13,883 singlets. The sequencing coverage ranged from 2- to 17,162-fold, with an average 43.2-fold coverage. In total, 640,951 reads were assembled into contigs, accounting for 97.88% of the assembled reads and 95.38% of all sequencing reads. The contigs ranged from 100 to 4,659 bp, with an average size of 484 ± 349 bp. About 78.07% of the contigs were assembled from three or more reads. The size distribution for these contigs and singlets is shown in Figure 3.
To study the sequence conservation of A. mongolicus in other plant species, we used BLAST  to align both contigs and singlets to the non-redundant database (nr) of the NCBI (the last update time: Jan. 23, 2011) using an E value threshold of 1e-10. Of 29,056 unique sequences, 8,790 (30.25%) had BLAST hits in nucleotide sequence database in NCBI. The majority of the annotated sequences corresponded to known nucleotide sequences of plant species, with 25.1%, 11.0%, 9.0%, 2.5%, and 1.9% matching with Glycine max Lotus japonicus Medicago truncatula Vitis vinifera, and Populus trichocarpa sequences, respectively (Figure 4).
Frequency and distribution of EST-SSRs in the A. mongolicus root transcriptome
After screening EST-SSRs using MISA software in the 29,056 unique sequences (15,173 contigs, 13,883 singlets, and 9,958,274 bp total length), 1,827 SSRs distributed in 1,684 sequences were identified. The EST-SSR frequency in the A. mongolicus transcriptome was 5.80%, and the distribution density was 5.45 per kb. Two hundred and forty-six sequences contained more than two EST-SSRs. Based on the repeat motifs, all SSR loci were divided into mono-nucleotide, di-nucleotide, tri-nucleotide, tetra-nucleotide, penta-nucleotide, hexa-nucleotide, and multi-nucleotide. The most abundant type of repeat motif was tri-nucleotide (554, 30.32%), followed by mono-nucleotide (526, 28.80%), di-nucleotide (434, 23.75%), multi-nucleotide (198, 10.84%), tetra-nucleotide (75, 4.11%), hexa-nucleotide (19, 1.04%), and penta-nucleotide (21, 1.15%) repeat units (Table 3).
The frequencies of EST-SSRs with different numbers of tandem repeats were calculated and are shown in Table 4. The SSRs with six tandem repeats (28.44%) were the most common, followed by five tandem repeats (26.92%), > 10 tandem repeats (13.36%), seven tandem repeats (13.26%), eight tandem repeats (8.50%), nine tandem repeats (5.87%), and ten tandem repeats (3.64%). The dominant repeat motif in EST-SSRs was AG/CT (26.72%), followed by AAG/CTT (13.77%), AAC/GTT (9.62%), AC/GT (9.31%), and AAT/ATT (8.81%) (Table 4). However, very few CG/CG (0.1%) repeats were identified in the databases.
To find potential genes involved in drought response in our assembly, we used BLASTx  to align both contigs and singlets to the PlantGDB (http://www.plantgdb.org/), the protein database of soybean (Gmax_109, http://www.phytozome.net/soybean), and TAIR10 protein database using an E threshold of 1e-3 and protein identity no less than 30%.
Of 15,173 contigs, 6,486 (42.75%) had the BLAST hits to known proteins in PlantGDB (Table 5).
Numbers and percentages of 454 ESTs in the assembled contigs, singlets, and the combined sequence set with matches to known proteins in BLASTx searches of three annotated protein databases (PlantGDB, Gmax_109, and TAIR10)
As expected, a remarkably lower percentage of the shorter singlet reads had BLAST hits to PlantGDB proteins. Of 13,883 singlet reads, 3285 (23.66%) had blast hits to PlantGDB proteins (Table 5). Smaller percentages of contigs and singlets had BLAST hits to the Gmax_109 and TAIR10 database (Table 5). This seemingly low percentage of BLAST hits is partially due to the shortage of protein sequences from Leguminosae woody plants in the public database, although annotation of only 30%-40% of sequences is common in analyses of large EST collections [32, 33]. Nonetheless, BLAST searches identified a total of approximately 9,771 unique protein accessions, indicating that our transcriptome assembly datasets represented a substantial fraction of A. mongolicus root genes.
Gene ontology assignments were used to classify the functions of the A. mongolicus transcripts. Based on sequence homology, the 9,771 annotated sequences, which had BLAST hits to PlantGDB proteins, were categorized into 40 functional groups (Figure 5). In each of the three main categories (biological process, cellular component, and molecular function) of the GO classification, “metabolic process”, “cell & cell part”, and “binding” terms were dominant, respectively. We also noticed a high-percentage of genes from categories of “cellular process”, “organelle”, and “catalytic activity” and only a few genes from terms of “carbon utilization”, “cell killing”, “extracellular region part”, and “protein binding transcription factor activity” (Figure 5).
To identify the biological pathways that are active in root of A. mongolicus, we mapped the 9,771 annotated sequences (annotation by PlantGDB) to the reference canonical pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG)  and the top 26 KEGG pathways are shown in Figure 6. The pathways with most representation by the unique sequences were “metabolic pathways”, “Ribosome”, and “Biosynthesis of secondary metabolites” (Figure 6). These results indicate that the diversifying metabolic processes are active in A. mongolicus root, and a variety of metabolites are synthesized in the root. In short, these annotations provide a valuable resource for investigating specific processes, functions, and pathways and facilitate the identification of novel genes involved in drought stress tolerance in root of A. mongolicus.
Expression analysis of genes possibly involved in drought response in A. mongolicus root
To identify drought responsive genes, 27 unigenes were selected from the unique sequences classified in GO categories “response to osmotic stress” (unigene 1–11 in Figure 7), “response to oxidative stress” (unigene 12–18 in Figure 7), “response to hormone stimulus” (unigene 19–21 in Figure 7), and “response to light stimulus” (unigene 23–27 in Figure 7). Quantitative real-time PCR assay were performed using the primers ( Additional files 1) designed according to these unigenes to monitor their expression profiles under 1 h and 72 h exposure to 20% PEG-6000 treatment.
The results indicated the expression of 27 unigenes that showed significantly up-regulated or down-regulated patterns at least at one time-point under exposure to PEG-6000 treatment. According to their expression patterns, the 27 drought-responsive unigenes were classified into four groups ( Additional file 2), U-I increased at both 1 h and 72 h, U-II increased at 1 h but decreased at 72 h, -I decreased at both 1 h and 72 h, and D-II decreased at 1 h but increased at 72 h. Among the 27 unigenes responsive to PEG-6000 treatment, 12 showed D-II pattern and 9 shows U-I pattern; in contrast, four unigenes behaved D-I pattern and only two unigenes behaved U-II pattern.
As a relic survivor of the evergreen broadleaf forest of central Asia from the Tertiary period, A. mongolicus can tolerate serious drought stress. The stress tolerance of A. mongolicus may not only associated with the epicuticular wax and stomata, which reduce the water evaporation, but also the deep flourishing root system, which enables the pant to absorb water deep below the soil surface. Our previous work (unpublished observations) revealed that, comparing with the shoot, the physiological index (i.e., proline content and antioxidants) in the root of A. mongolicus responded to the drought stress faster and more significant. Investigation of the gene expression regulation network under drought stress will be helpful to understand the biochemical and physiological adaptation process in A. mongolicus, since there are only 748 Ammopiptanthus ESTs in GenBank. In the present study, large-scale root-specific transcriptome data were obtained by high throughput 454 sequencing as the first step of our endeavor to provide a clear insight into the molecular mechanism of drought tolerance in A. mongolicus.
Most plant transcriptomic studies sequenced the pooled cDNA samples from different tissues [33, 35–37], or assembly transcriptomic data using sequencing reads from different tissues , only a few work perform root-specific transcriptomic sequencing and assembly [39, 40]. Although more extensive transcriptomic data can be obtained using the former strategy, more accurate data can be produced using the later method, since alternative splicing may exist in different tissues , which will make the contig assembly difficult. Furthermore, the tissue-specific transcriptomic study will provided a good reference data for gene expression profiling, especially in non-model plant.
There are three high throughput sequencing methods that can be used for transcriptomic study, including the classic and the most popular 454 pyrosequencing, and the low-cost solexa sequencing, which were employed more and more frequently in recent years . In this study 454 pyrosequencing was adopted to gain a longer and more reliable transcriptomic dataset.
Choosing suitable assembler and parameters is critical to getting a better assembly performance, which is even more important in transcriptomic studies in non-model organisms. However, most previous analyses of transcriptomic data generated by Roche 454 pyrosequencing have almost always used only one software program for assembly  except a recent study  in which the assembles from six assemblers were compared including Velvet, ABySS, MIRA, Newbler v2.3, Newbler v2.5p, CLC, and TGICL. In the present study, we compared the assembly from the three most frequently used assemblers, i.e. MIRA, Newbler v2.5.3, and Cap3 , since Velvet and ABySS are not developed for relatively long sequence assembly.
Evaluation of assembly performance is a challenging work, especially in non-model organisms. We adopted two groups of index for assembly evaluation according to an earlier study . The first group of index included total number of reads used in the assembly, number of contigs generated, N50 length of contigs, number of contigs, mean contig length, and summed contig length (Table 2). The second group of index was obtained by comparing with the soybean protein datasets (Table 2).
Indeed, the comparison (Table 2) revealed that the assemblies generated from different software programs showed advantages and disadvantages in different aspect. Anyway, the assembly generated by Newbler (optimized parameter) was selected for further analysis according to the comparison result and its frequent application.
From 672,002 sequence reads, 29,056 unigenes were assembled, which consisted of 15,173 contigs and 13,883 singlets from drought-stressed and unstressed roots of A. mongolicus. Although a high number of unigenes were not long enough to cover the complete protein-coding regions as revealed by BLASTX aligment, up to now, the dataset we reported here still provided the largest dataset of different genes representing a substantial part of the transcriptome of A. mongolicus, which probably embraces the majority part of genes involved in the sophisticated regulation networks for sensing and acclimating the water-deficit soil environment.
Relatively large portion (97.26%) of reads were assembled into contigs, which is significantly higher than that reported for several other recent 454 transcriptome assemblies (e.g., 48% ; 88% ; and 90% ). As a consequence, our A. mongolicus root transcriptomic data showed a relatively high coverage depth (ranging from 1 to 17,162-fold with an average 45.3-fold), comparing with some other transcriptomic data from other plants (e.g., 3.6 ; 8 ; 3.1 ). This may indicate that half-plate 454 pyrosequencing is deep enough for root transcriptome. Nonetheless, our contig length (484 bp) is not higher than other transcriptomic data (e.g., 345 ; 364 ; 452 ; 526 ; and 618 ).
SSRs consist of tandem repeats of short (1–6 bp) nucleotide motifs . These repeat sequences are distributed throughout the genome. Polymorphism revealed by SSRs results from variation in repeat number, which primarily results from slipped-strand mispairing during DNA replication. Thus, SSRs reveal much higher levels of polymorphism than most other marker systems [45, 46]. SSRs have proven to be more reliable than other markers, and the utility of SSRs in genetics studies is well established.
We screened 1,827 SSR loci, and EST-SSR frequency in the A. mongolicus transcriptome was 5.80%. The AG/CT and AAG/CTT repeat motifs were the most SSR motifs in all nucleotide repeat motifs, and tri-nucleotide repeats was the most frequent type of SSR motif. This finding is consistent with the results reported in cereals such as rice (Oryza sativa), wheat (Triticum aestivum), and barley (Hordeum vulgare) . Di-nucleotide repeats were the most abundant class of SSRs in many plant species such as Arabidopsis, peanut (Arachis hypogaea), canola (Brassica napus), sugar beet (Beta vulgaris), cabbage (Brassica oleracea), soybean (Glycine max), sunflower (Helianthus annuus), sweet potato (Ipomoea batatas), pea (Pisum sativum), and grape (Vitis vinifera) [24, 48]. Among the di-nucleotide repeats, AG/CT was the most frequent motif in our study, whereas CG/CG motif was very rare. Among the tri-nucleotide repeats, the AAG/CTT motif was the most frequent one. Our results are consistent with those in other plant species [24, 48–50]. In plants, CT and CTT repeats are found in both transcribed regions and 5'-untranslated regions (UTRs); CT microsatellites in 5' UTRs may be involved in antisense transcription and play a role in gene regulation .
Drought tolerance is a complex trait and involves multiple mechanisms that act in combination to avoid or tolerate periods of water deficit. It is well-established that, under drought stress, the genes involved in osmotic and redox homeostasis will be regulated and hormones such as ABA will participate in the readjustment process. Recently, light-mediated root growth is believed to be relevant to drought tolerance of root . Hence, 27 unigenes classified in GO categories “response to osmotic stress”, “response to oxidative stress”, “response to hormone stimulus”, and “response to light stimulus” were selected for further expression analysis. As expected, some ion channel and transporter genes (i.e., sdq_isotig00642, sdq_isotig01704, sdq_isotig11437, sdq_isotig00259, sdq_isotig01086, and sdq_isotig10416), as well as several anti-oxidant (i.e., sdq_isotig08490, sdq_isotig01610, sdq_isotig00634, sdq_isotig11067, sdq_isotig07261, and sdq_isotig00577) were shown to be involved into the drought response. Quantitative real-time PCR also revealed that the gene expressions of some blue light photoreceptor NPH3 (i.e., sdq_isotig01737, sdq_isotig01131, sdq_isotig3894, and sdq_isotig07698) and an interacting protein of NPH1 (sdq_isotig00917) were regulated under drought stress, which confirmed the relevance of light-mediated root growth to drought tolerance of root. Furthermore, an ethylene receptor gene was shown to be up-regulated only at 72 h, and an auxin receptor and an auxin induced gene, IAA9, were up-regulated only at 1 h, suggesting that the ethylene and auxin may participate in drought response of root in A. mongolicus.
Our study identified 27 drought responsive genes. The functions of these genes in drought tolerance of root will be analyzed by transgenic study. At the same time, more drought response genes will be discovered by digital gene expression analysis based on the transcriptome data obtained in this study. We are confident that more light will soon be shed on the adaptive significance of A. mongolicus root for plant adaptation to the drought environment.
Ammopiptanthus mongolicus is an ecologically important plant species in Mid-Asia desert and exhibits substantial tolerance to drought condition. Insufficient transcriptomic and genomic data in public databases has limited our understanding of the molecular mechanism underlying the stress tolerance of A. mongolicus. The 29,056 unique sequences in this 454 EST collection represent a major transcriptomic level resource for A. mongolicus, and will be useful for further functional genomics study in Ammopiptanthus genus. The thousands of SSR markers predicted in our 454 ESTs should facilitate population genomic studies in Ammopiptanthus. The potential drought stress related transcripts identified in this study provide a good start for further investigation into the drought adaptation in Ammopiptanthus. Additionally, our results also highlight the utility of high-throughput transcriptome sequencing as a fast and cost-effective approach for marker development and gene discovery in non-model species.
Sample preparation and 454 pyrosequencing
Seeds of A. mongolicus (collected from the desert region in Zhongwei County, Ningxia Autonomous Region, China) were soaked in water for 48 h at 25 °C and then sown in 9 cm diameter commercial pots containing vermiculite and perlite (with 1:1 ratio of vermiculite to perlite) in a greenhouse at approximately 25 °C and 35% relative humidity under a photosynthetic photon flux density of 120 μmol m-2 s-1 with a photoperiod of 16 h light and 8 h dark. The plantlets were watered in a three-day interval with half strength of Hoagland’s solution. Two weeks after germination, the seedlings were divided into two groups. The first group served as the control (CK) whilst the second (T) was irrigated with 20% PEG-6000. The roots of both samples were harvested after 72 h and used for RNA extraction immediately.
Total RNA extraction, mRNA purification, and cDNA library construction were conducted by LC Sciences (Houston, TX, USA). In brief, total RNA was obtained from roots using the total RNA purification kit (LC Sciences, Houston, TX, USA) as instructed, treated with RNAase free DNAase, and re-purified with the RNeasy kit (Qiagen, Valencia, CA, USA) following the manufacturer’s protocol. Equal quantity of RNA from both CK and T samples were blended for cDNA library construction.
cDNA synthesis was performed using SMART II™ cDNA Synthesis kit (Clontech Laboratories, Inc., Mountain View, CA, USA) following manufacturer’s recommendations. Double stranded cDNA was separated on a 2% agarose gel, and the cDNA with a length no less than 100 bp was separated by gel extraction. The concentration of cDNA was determined using Bioanalyzer 2100 (Agilent Technologies, Inc., Waldbronn, Germany). Approximately 5 μg of cDNA sample was sheared via sonication into small fragments, and then Roche GS-FLX 454 pyrosequencing was conducted according to the manufacturer’s recommendations.
De novo assembly
Raw data generated from 454 pyrosequencing were preprocessed to remove nonsense sequences including (1) adapters that were added for reverse transcription and 454 sequencing, (2) primers, (3) very short (<50 bp) sequences, and (4) low quality sequences using Lucy, Seqclean and Newbler program.
The preprocessed sequences were then assembled using assembly program with default or optimal parameter. Among various programs available, we used publicly available programs Cap3 (http://seq.cs.iastate.edu/cap3.html), and MIRA (version 3.2.1; http://sourceforge.net/projects/miraassembler/), as well as GS De novo Assembler (Newbler v2.5.3; http://www.454.com/products-solutions/analysis-tools/gs-de-novoassembler.asp) supplied with the GS FLX Titanium sequencer.
To examine the coverage of the sequences, all unique sequences (contigs and singlets) generated from different assembler with default or optimal parameter were compared with the publicly available soybean protein dataset (Phytozome v7.0, http://www.phytozome.net/) using Blastx program and a typical cutoff value of E < 1e-5 was used.
Functional annotation and EST-SSRs marker identification
Sequence assembly and annotation were carried out by Zhongxin Biotechology Shanghai Co., Ltd (Shanghai, China). For annotation, all unique sequences were searched against protein database PlantGDB (http://www.plantgdb.org/, the update time: April 20, 2011), soybean (Gmax_109, http://www.phytozome.net/soybean, version: 7.0, the last update time: Mar. 31, 2011) , and TAIR10 protein database  using a threshold of E < 1e-5 and protein identity >30%. Gene ontology analysis was conducted on the annotated sequences through custom Perl script.
Pathway assignments were carried out according to KEGG mapping  using custom Perl script. MISA (http://pgrc.ipk-gatersleben.de/misa/) was used to identify the potent EST-SSR markers in all unique sequences. Dinucleotides repeats of more than six times and trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats of more than five times were considered as the search criteria for SSRs in MISA script.
Quantitative real-time PCR analysis
Approximately 1 μg of DNase I-treated total RNA was converted into single-stranded cDNA using M-MLV Reverse Transcriptase (Promega, USA). The cDNA products were then diluted 50-fold with deionized water before using as a template in real-time PCR. The quantitative reaction was performed on an MyiQ2 two-color real-time PCR detection system (Bio-Rad Laboratories, Hercules, CA, USA) using the SsoFast EvaGreen Supermix (Bio-Rad Laboratories, Hercules, CA, USA). The reaction mixture (20 μL) contained 2× SsoFast EvaGreen Supermix, 0.9 μM each of the forward and reverse primers, and 1 μL of template cDNA. PCR amplification was performed under the following conditions: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 10 s. Two independent biological replicates for each sample and three technical replicates of each biological replicate were analyzed in quantitative real-time PCR analysis. The gene expressions of selected unigenes were normalized against an internal reference gene, 18SrRNA. The relative gene expression was calculated using the 2-ΔΔCt method . All primers used in this study are listed in Additional File 1.
Ge XJ, Yu Y, Yuan YM, Huang HW, Yan C: Genetic diversity and geographic differentiation in endangered Ammopiptanthus (Leguminosae) populations in desert regions of northwest China as revealed by ISSR analysis. Ann Bot. 2005, 95: 843-851. 10.1093/aob/mci089.
Yan S, Mu GJ, Xu YQ: Quaternary environmental evolution of the Lop Nur region, NW China. Acta Micropalaeontologica Sin. 2000, 17: 165-169.
Liu JQ, Qiu MX: Ecological, physiological and anatomical traits of Ammopiptanthus mongolicus grown in desert of China. Acta Bot Sin. 1982, 24: 568-573.
Chen GQ, Huang HW, Kang M, Ge XJ: Development and characterization of microsatellite markers for an endangered shrub, Ammopiptanthus mongolicus (Leguminosae) and cross-species amplification in Ammopiptanthus nanus. Conserv Genet. 2007, 8: 1495-1497. 10.1007/s10592-007-9306-2.
Jiang Y, Wei LB, Fei YB, Shu NH, Gao SQ: Purification and identification of antifreeze proteins in Ammopiptanthus mongolicus. Acta Bot Sin. 1999, 41: 967-971.
Lu CF, Yin LK, Li KH: Proteome expression patterns in the stress tolerant evergreen Ammopiptanthus nanus under conditions of extreme cold. Plant Growth Regul. 2010, 62: 65-70. 10.1007/s10725-010-9487-4.
Wei Q, Guo YJ, Cao HM, Kuai BK: Cloning and characterization of an AtNHX2-like Na+/H+antiporter gene from Ammopiptanthus mongolicus (Leguminosae) and its ectopic expression enhanced drought and salt tolerance in Arabidopsis thaliana. Plant Cell Tiss Organ Cult. 2011, 105: 309-316. 10.1007/s11240-010-9869-3.
Liu RL, Liu MQ, Liu J, Chen YZ, Chen YY, Lu CF: Heterologous expression of a Ammopiptanthus mongolicus late embryogenesis abundant protein gene (AmLEA) enhances Escherichia coli viability under cold and heat stress. Plant Growth Regul. 2010, 60: 163-168. 10.1007/s10725-009-9432-6.
Chen JH, Sun Y, Sun F, Xia XL, Yin WL: Tobacco plants ectopically expressing the Ammopiptanthus mongolicus AmCBL1 gene display enhanced tolerance to multiple abiotic stresses. Plant Growth Regul. 2011, 63: 259-269. 10.1007/s10725-010-9523-4.
Xu S, An L, Feng H, Wang X, Li X: The seasonal effects of water stress on Ammopiptanthus mongolicus in a desert environment. J Arid Environ. 2002, 51: 437-447. 10.1006/jare.2001.0949.
Parkinson J, Blaxter M: Expressed sequence tags: an overview. Methods Mol Biol. 2009, 533: 1-12. 10.1007/978-1-60327-136-3_1.
Bouck A, Vision T: The molecular ecologist's guide to expressed sequence tags. Mol Ecol. 2007, 16: 907-924.
Morozova O, Marra MA: Applications of next-generation sequencing technologies in functional genomics. Genomics. 2008, 92: 255-264. 10.1016/j.ygeno.2008.07.001.
Trick M, Long Y, Meng J, Bancroft I: Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J. 2009, 7: 334-346. 10.1111/j.1467-7652.2008.00396.x.
Wang W, Wang Y, Zhang Q, Qi Y, Guo D: Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing. BMC Genomics. 2009, 10: 465-10.1186/1471-2164-10-465.
Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9: 312-10.1186/1471-2164-9-312.
Alagna F, D’Agostino N, Torchia L, Servili M, Rao R, Pietrella M, Giuliano G, Chiusano ML, Baldoni L, Perrotta G: Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development. BMC Genomics. 2009, 10: 399-10.1186/1471-2164-10-399.
Jones-Rhoades MW, Borevitz JO, Preuss D: Genome-wide expression profiling of the Arabidopsis female gametophyte identifies families of small, secreted proteins. PLoS Genet. 2007, 3: 1848-1861.
Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB: Sampling the Arabidopsis transcriptome with massive parallel pyrosequencing. Plant Physiol. 2007, 144: 32-42. 10.1104/pp.107.096677.
Hsiao YY, Jeng MF, Tsai WC, Chung YC, Li CY, Wu TS, Kuoh CS, Chen WH, Chen HH: A novel homodimeric geranyl diphosphate synthase from the orchid Phalaenopsis bellina lacking a DD(X)2-4D motif. Plant J. 2008, 55: 719-733. 10.1111/j.1365-313X.2008.03547.x.
Varshney RK, Nayak SN, May GD, Jackson SA: Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol. 2009, 27: 522-530. 10.1016/j.tibtech.2009.05.006.
Powell W, Morgante M, Andre C, Hanafey M, Vogel J, Tingey S, Rafalski A: The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) markers for germplasm analysis. Mol Breed. 1996, 2: 225-238. 10.1007/BF00564200.
Rungis D, Berube Y, Zhang J, Ralph S, Ritland CE, Ellis BE, Douglas C, Bohlmann J, Ritland K: Robust simple sequence repeat markers for spruce (Picea spp.) from expressed sequence tags. Theor Appl Genet. 2004, 109: 1283-1294. 10.1007/s00122-004-1742-5.
Wei WL, Qi XQ, Wang LH, Zhang YX, Hua W, Li DH, Lv HX, Zhang XR: Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genomics. 2011, 12: 451-10.1186/1471-2164-12-451.
Peakall R, Gilmore S, Keys W, Morgante M, Rafalski A: Cross-species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol Biol Evol. 1998, 15: 1275-1287. 10.1093/oxfordjournals.molbev.a025856.
Cho YG, Ishii T, Temnykh S, Chen X, Lipovich L, McCouch SR, Park WD, Ayres N, Cartinhour S: Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.). Theor Appl Genet. 2000, 100: 713-722. 10.1007/s001220051343.
Poncet V, Rondeau M, Tranchant C, Cayrel A, Hamon S, de Kochko A, Hamon P: SSR mining in coffee tree EST databases: potential use of EST-SSRs as markers for the Coffea genus. Mol Genet Genomics. 2006, 276: 436-449. 10.1007/s00438-006-0153-5.
Luro FL, Costantino G, Terol J, Argout X, Allario T, Wincker P, Talon M, Ollitrault P, Morillon R: Transferability of the EST-SSRs developed on Nules clementine (Citrus clementina Hort ex Tan) to other Citrus species and their effectiveness for genetic mapping. BMC Genomic. 2008, 9: 287-10.1186/1471-2164-9-287.
Wang JY, Pan LJ, Yang QL, Yu SL: Development and Characterization of EST-SSR Markers from NCBI and cDNA Library in Cultivated Peanut (Arachis hypogaea L.). Mol Plant Breed. 2009, 7: 806-810.
Kumar S, Blaxter ML: Comparing de novo assemblers for 454 transcriptome data. BMC Genomics. 2010, 11: 571-10.1186/1471-2164-11-571.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Meyer E, Aglyamova GV, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV: Sequencing and de novo analysis of a coral larval transcriptome using 454 GS Flx. BMC Genomics. 2009, 10: 219-10.1186/1471-2164-10-219.
Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11: 180-10.1186/1471-2164-11-180.
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32: D277-D280. 10.1093/nar/gkh063.
Sui C, Zhang J, Wei J, Chen S, Li Y, Xu J, Jin Y, Xie C, Gao Z, Chen H, Yang C, Zhang Z, Xu Y: Transcriptome analysis of Bupleurum chinense focusing on genes involved in the biosynthesis of saikosaponins. BMC Genomics. 2011, 12: 539-10.1186/1471-2164-12-539.
Hsiao YY, Chen YW, Huang SC, Pan ZJ, Fu CH, Chen WH, Tsai WC, Chen HH: Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids. BMC Genomics. 2011, 12: 360-10.1186/1471-2164-12-360.
Hou R, Bao Z, Wang S, Su H, Li Y, Du H, Hu J, Wang S, Hu X: Transcriptome sequencing and de novo analysis for Yesso scallop (Patinopecten yessoensis) using 454 GS FLX. PLoS One. 2011, 6 (6): e21560-10.1371/journal.pone.0021560.
Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, Yadav G, Bhatia S, Chattopadhyay D, Tyagi AK, Jain M: Gene discovery and tissue-specific transcriptome analysis in chickpea with massively parallel pyrosequencing and web resource development. Plant Physiol. 2011, 156: 1661-1678. 10.1104/pp.111.178616.
Sun C, Li Y, Wu Q, Luo H, Sun Y, Song J, Lui EM, Chen S: De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC Genomics. 2010, 11: 262-10.1186/1471-2164-11-262.
Barrero RA, Chapman B, Yang Y, Moolhuijzen P, Keeble-Gagnere G, Zhang N, Tang Q, Bellgard MI, Qiu D: De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes. BMC Genomics. 2011, 12: 600-10.1186/1471-2164-12-600.
Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ: Deciphering the splicing code. Nature. 2010, 465: 53-59. 10.1038/nature09000.
Edwards CE, Parchman TL, Weekley CW: Assembly, gene annotation and marker development using 454 floral transcriptome sequences in Ziziphus celata (Rhamnaceae), a highly endangered, Florida endemic plant. DNA Res. 2012, 19: 1-9. 10.1093/dnares/dsr037.
Pazos-Navarro M, Dabauza M, Correal E, Hanson K, Teakle N, Real D, Nelson MN: Next generation DNA sequencing technology delivers valuable genetic markers for the genomic orphan legume species.Bituminaria bituminosa. BMC Genet. 2011, 12: 104-
Gupta PK, Balyan IS, Sharma PC, Ramesh B: Microsatellites in plants: A new class of molecular markers. Curr Sci. 1996, 70: 45-54.
Toth G, Gaspari Z, Jurka J: Microsatellites in different eukaryotic genomes: Survey an analysis. Genome Res. 2000, 10: 967-981. 10.1101/gr.10.7.967.
Li YC, Korol AB, Fahima T, Beiles A, Nevo E: Microsatellites: genomic distribution, putative functions and mutational mechanisms: A review. Mol Ecol. 2002, 11: 2453-2465. 10.1046/j.1365-294X.2002.01643.x.
La Rota M, Kantety R, Yu J-K, Sorrells M: Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics. 2005, 6: 23-10.1186/1471-2164-6-23.
Kumpatla SP, Mukhopadhyay S: Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species. Genome. 2005, 48: 985-998. 10.1139/g05-060.
Kantety RV, La Rota M, Matthews DE, Sorrells ME: Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Mol Biol. 2002, 48: 501-510. 10.1023/A:1014875206165.
Zeng S, Xiao G, Guo J, Fei Z, Xu Y, Roe B, Wang Y: Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim. BMC Genomics. 2010, 11: 94-10.1186/1471-2164-11-94.
Martienssen RA, Colot V: DNA methylation and epigenetic inheritance in plants and filamentous fungi. Science. 2001, 293: 1070-1074. 10.1126/science.293.5532.1070.
Galen C, Rabenold JJ, Liscum E: Functional ecology of a blue light photoreceptor: effects of phototropin-1 on root growth enhance drought tolerance in Arabidopsis thaliana. New Phytol. 2007, 173: 91-99. 10.1111/j.1469-8137.2006.01893.x.
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA: Genome sequence of the palaeopolyploid soybean. Nature. 2010, 463: 178-183. 10.1038/nature08670.
Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E: The Arabidopsis Information Resource (TAIR): Improved gene annotation and new tools. Nucleic Acids Res. 2012, 40: D1202-D1210. 10.1093/nar/gkr1090.
Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27.
Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(−Delta Delta C(T)) Method. Methods. 2001, 25: 402-408. 10.1006/meth.2001.1262.
This work was supported by the National Natural Science Foundation of China (grant No. 31070361), Key Project of Chinese Ministry of Education (grant No. 210266), Scientific Research Project of State Ethnic Affairs Commission (grant No. 10ZY01); the Fundamental Research Funds for the Central Universities (grant No. 0910KYZY43 & 1112KYQN31); “985 Project” of Minzu University of China (grant No. MUC98504-14 & MUC98507-08) and 111 Project for Minzu University of China (grant No. B08044) to YZ and FG.
The authors declare that they have no competing interests.
The study was conceived by YZ, JF, and FG. JF collected the seeds of A. mongolicus. The plant material preparation and gene expression analyses were carried out by RL. YZ, FG, and HL contributed to data analysis, bioinformatics analysis, and manuscript preparation. All authors had read and approved the final manuscript.
Electronic supplementary material
Additional file 1: The primers used in quantitative real-time PCR analysis. This table lists all the primers used in quantitative real-time PCR analysis. (DOC 84 KB)
Additional file 2: The annotation, GO category and relative transcript abundance of the 27 unigenes selected for quantitative real-time PCR analysis. This table lists the annotation, GO category, relative transcript abundance of the 27 unigenes selected for quantitative real-time PCR analysis. (DOC 72 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Zhou, Y., Gao, F., Liu, R. et al. De novo sequencing and analysis of root transcriptome using 454 pyrosequencing to discover putative genes associated with drought tolerance in Ammopiptanthus mongolicus. BMC Genomics 13, 266 (2012). https://doi.org/10.1186/1471-2164-13-266
- Ammopiptanthus mongolicus
- 454 pyrosequencing