As a relic survivor of the evergreen broadleaf forest of central Asia from the Tertiary period, A. mongolicus can tolerate serious drought stress. The stress tolerance of A. mongolicus may not only associated with the epicuticular wax and stomata, which reduce the water evaporation, but also the deep flourishing root system, which enables the pant to absorb water deep below the soil surface. Our previous work (unpublished observations) revealed that, comparing with the shoot, the physiological index (i.e., proline content and antioxidants) in the root of A. mongolicus responded to the drought stress faster and more significant. Investigation of the gene expression regulation network under drought stress will be helpful to understand the biochemical and physiological adaptation process in A. mongolicus, since there are only 748 Ammopiptanthus ESTs in GenBank. In the present study, large-scale root-specific transcriptome data were obtained by high throughput 454 sequencing as the first step of our endeavor to provide a clear insight into the molecular mechanism of drought tolerance in A. mongolicus.
Most plant transcriptomic studies sequenced the pooled cDNA samples from different tissues [33, 35–37], or assembly transcriptomic data using sequencing reads from different tissues , only a few work perform root-specific transcriptomic sequencing and assembly [39, 40]. Although more extensive transcriptomic data can be obtained using the former strategy, more accurate data can be produced using the later method, since alternative splicing may exist in different tissues , which will make the contig assembly difficult. Furthermore, the tissue-specific transcriptomic study will provided a good reference data for gene expression profiling, especially in non-model plant.
There are three high throughput sequencing methods that can be used for transcriptomic study, including the classic and the most popular 454 pyrosequencing, and the low-cost solexa sequencing, which were employed more and more frequently in recent years . In this study 454 pyrosequencing was adopted to gain a longer and more reliable transcriptomic dataset.
Choosing suitable assembler and parameters is critical to getting a better assembly performance, which is even more important in transcriptomic studies in non-model organisms. However, most previous analyses of transcriptomic data generated by Roche 454 pyrosequencing have almost always used only one software program for assembly  except a recent study  in which the assembles from six assemblers were compared including Velvet, ABySS, MIRA, Newbler v2.3, Newbler v2.5p, CLC, and TGICL. In the present study, we compared the assembly from the three most frequently used assemblers, i.e. MIRA, Newbler v2.5.3, and Cap3 , since Velvet and ABySS are not developed for relatively long sequence assembly.
Evaluation of assembly performance is a challenging work, especially in non-model organisms. We adopted two groups of index for assembly evaluation according to an earlier study . The first group of index included total number of reads used in the assembly, number of contigs generated, N50 length of contigs, number of contigs, mean contig length, and summed contig length (Table 2). The second group of index was obtained by comparing with the soybean protein datasets (Table 2).
Indeed, the comparison (Table 2) revealed that the assemblies generated from different software programs showed advantages and disadvantages in different aspect. Anyway, the assembly generated by Newbler (optimized parameter) was selected for further analysis according to the comparison result and its frequent application.
From 672,002 sequence reads, 29,056 unigenes were assembled, which consisted of 15,173 contigs and 13,883 singlets from drought-stressed and unstressed roots of A. mongolicus. Although a high number of unigenes were not long enough to cover the complete protein-coding regions as revealed by BLASTX aligment, up to now, the dataset we reported here still provided the largest dataset of different genes representing a substantial part of the transcriptome of A. mongolicus, which probably embraces the majority part of genes involved in the sophisticated regulation networks for sensing and acclimating the water-deficit soil environment.
Relatively large portion (97.26%) of reads were assembled into contigs, which is significantly higher than that reported for several other recent 454 transcriptome assemblies (e.g., 48% ; 88% ; and 90% ). As a consequence, our A. mongolicus root transcriptomic data showed a relatively high coverage depth (ranging from 1 to 17,162-fold with an average 45.3-fold), comparing with some other transcriptomic data from other plants (e.g., 3.6 ; 8 ; 3.1 ). This may indicate that half-plate 454 pyrosequencing is deep enough for root transcriptome. Nonetheless, our contig length (484 bp) is not higher than other transcriptomic data (e.g., 345 ; 364 ; 452 ; 526 ; and 618 ).
SSRs consist of tandem repeats of short (1–6 bp) nucleotide motifs . These repeat sequences are distributed throughout the genome. Polymorphism revealed by SSRs results from variation in repeat number, which primarily results from slipped-strand mispairing during DNA replication. Thus, SSRs reveal much higher levels of polymorphism than most other marker systems [45, 46]. SSRs have proven to be more reliable than other markers, and the utility of SSRs in genetics studies is well established.
We screened 1,827 SSR loci, and EST-SSR frequency in the A. mongolicus transcriptome was 5.80%. The AG/CT and AAG/CTT repeat motifs were the most SSR motifs in all nucleotide repeat motifs, and tri-nucleotide repeats was the most frequent type of SSR motif. This finding is consistent with the results reported in cereals such as rice (Oryza sativa), wheat (Triticum aestivum), and barley (Hordeum vulgare) . Di-nucleotide repeats were the most abundant class of SSRs in many plant species such as Arabidopsis, peanut (Arachis hypogaea), canola (Brassica napus), sugar beet (Beta vulgaris), cabbage (Brassica oleracea), soybean (Glycine max), sunflower (Helianthus annuus), sweet potato (Ipomoea batatas), pea (Pisum sativum), and grape (Vitis vinifera) [24, 48]. Among the di-nucleotide repeats, AG/CT was the most frequent motif in our study, whereas CG/CG motif was very rare. Among the tri-nucleotide repeats, the AAG/CTT motif was the most frequent one. Our results are consistent with those in other plant species [24, 48–50]. In plants, CT and CTT repeats are found in both transcribed regions and 5'-untranslated regions (UTRs); CT microsatellites in 5' UTRs may be involved in antisense transcription and play a role in gene regulation .
Drought tolerance is a complex trait and involves multiple mechanisms that act in combination to avoid or tolerate periods of water deficit. It is well-established that, under drought stress, the genes involved in osmotic and redox homeostasis will be regulated and hormones such as ABA will participate in the readjustment process. Recently, light-mediated root growth is believed to be relevant to drought tolerance of root . Hence, 27 unigenes classified in GO categories “response to osmotic stress”, “response to oxidative stress”, “response to hormone stimulus”, and “response to light stimulus” were selected for further expression analysis. As expected, some ion channel and transporter genes (i.e., sdq_isotig00642, sdq_isotig01704, sdq_isotig11437, sdq_isotig00259, sdq_isotig01086, and sdq_isotig10416), as well as several anti-oxidant (i.e., sdq_isotig08490, sdq_isotig01610, sdq_isotig00634, sdq_isotig11067, sdq_isotig07261, and sdq_isotig00577) were shown to be involved into the drought response. Quantitative real-time PCR also revealed that the gene expressions of some blue light photoreceptor NPH3 (i.e., sdq_isotig01737, sdq_isotig01131, sdq_isotig3894, and sdq_isotig07698) and an interacting protein of NPH1 (sdq_isotig00917) were regulated under drought stress, which confirmed the relevance of light-mediated root growth to drought tolerance of root. Furthermore, an ethylene receptor gene was shown to be up-regulated only at 72 h, and an auxin receptor and an auxin induced gene, IAA9, were up-regulated only at 1 h, suggesting that the ethylene and auxin may participate in drought response of root in A. mongolicus.
Our study identified 27 drought responsive genes. The functions of these genes in drought tolerance of root will be analyzed by transgenic study. At the same time, more drought response genes will be discovered by digital gene expression analysis based on the transcriptome data obtained in this study. We are confident that more light will soon be shed on the adaptive significance of A. mongolicus root for plant adaptation to the drought environment.