Transcriptome sequencing and microarray design for functional genomics in the extremophile Arabidopsis relative Thellungiella salsuginea (Eutrema salsugineum)
© Lee et al.; licensee BioMed Central Ltd. 2013
Received: 12 December 2012
Accepted: 11 November 2013
Published: 14 November 2013
Most molecular studies of plant stress tolerance have been performed with Arabidopsis thaliana, although it is not particularly stress tolerant and may lack protective mechanisms required to survive extreme environmental conditions. Thellungiella salsuginea has attracted interest as an alternative plant model species with high tolerance of various abiotic stresses. While the T. salsuginea genome has recently been sequenced, its annotation is still incomplete and transcriptomic information is scarce. In addition, functional genomics investigations in this species are severely hampered by a lack of affordable tools for genome-wide gene expression studies.
Here, we report the results of Thellungiella de novo transcriptome assembly and annotation based on 454 pyrosequencing and development and validation of a T. salsuginea microarray. ESTs were generated from a non-normalized and a normalized library synthesized from RNA pooled from samples covering different tissues and abiotic stress conditions. Both libraries yielded partially unique sequences, indicating their necessity to obtain comprehensive transcriptome coverage. More than 1 million sequence reads were assembled into 42,810 unigenes, approximately 50% of which could be functionally annotated. These unigenes were compared to all available Thellungiella genome sequence information. In addition, the groups of Late Embryogenesis Abundant (LEA) proteins, Mitogen Activated Protein (MAP) kinases and protein phosphatases were annotated in detail. We also predicted the target genes for 384 putative miRNAs. From the sequence information, we constructed a 44 k Agilent oligonucleotide microarray. Comparison of same-species and cross-species hybridization results showed superior performance of the newly designed array for T. salsuginea samples. The developed microarrays were used to investigate transcriptional responses of T. salsuginea and Arabidopsis during cold acclimation using the MapMan software.
This study provides the first comprehensive transcriptome information for the extremophile Arabidopsis relative T. salsuginea. The data constitute a more than three-fold increase in the number of publicly available unigene sequences and will greatly facilitate genome annotation. In addition, we have designed and validated the first genome-wide microarray for T. salsuginea, which will be commercially available. Together with the publicly available MapMan software this will become an important tool for functional genomics of plant stress tolerance.
KeywordsArabidopsis thaliana Cold acclimation Gene annotation LEA proteins MAP kinases Microarray design microRNAs Protein phosphatases Thellungiella salsuginea Transcriptome sequencing
The majority of molecular studies of plant stress tolerance have been performed with Arabidopsis thaliana, although it is not an extremophile and can therefore be expected to lack many protective mechanisms required to survive extreme environmental conditions. Thellungiella salsuginea has therefore attracted increasing interest as an alternative plant model species that possesses the characteristics of an extremophile, i.e. high tolerance of salinity, freezing, nitrogen-deficiency and drought stress [1–7]. The genus Thellungiella is part of the Brassicaceae family and therefore related to Arabidopsis[8, 9]. The species name has sometimes been used synonymously with Thellungiella halophila, but it is clear now that three distinct Thellungiella species (T. halophila, T. salsuginea, and T. botschantzevii) have to be distinguished . T. salsuginea is also referred to as Eutrema salsugineum, however, we prefer to stay with the older and better established species name as that is not in conflict with the taxonomy . On the other hand, Thellungiella parvula is not as closely related to these other species as previously thought and should be called Schrenkiella parvula. However, to avoid unnecessary confusion in the assignment of sequence information, we have used the species names used in the respective publications in all comparisons with our data.
T. salsuginea resembles Arabidopsis in many important features such as short life cycle, self-fertility and the possibility to transform plants by the floral-dip method . While the genomes of T. salsuginea[11, 12] and T. parvula have recently been sequenced, important functional genomics tools such as dedicated microarrays are still lacking for these species.
Arabidopsis-based microarrays [2, 4, 14–17] and a cDNA array comprising 3,628 unique T. salsuginea ESTs  have been used to profile the effects of abiotic stress conditions on the T. salsuginea transcriptome. However, T. salsuginea is not in the Arabidopsis genus and therefore has several distinguishing genetic and developmental properties [1, 4]. Identity between the Arabidopsis genome sequence and T. salsuginea ESTs (Expressed Sequence Tags) is between 95% and 60%, but there are also specific ESTs in T. salsuginea[4, 7, 19, 20]. Importantly, analysis of T. salsuginea mRNA using the Affymetrix Arabidopsis GeneChip ATH1 produced signals significantly above background for only 20% of the probe sets , indicating the necessity for a dedicated T. salsuginea microarray for cost-effective transcriptome studies.
Transcriptome sequencing is playing an increasingly important role in functional genomic studies of organisms without full genome sequences (e.g. [21–25]). The 454 technology is an efficient and cost effective way to generate high-throughput EST data for such organisms [26–30], providing the possibility for de novo assembly of transcriptomes from uncharacterized genomes [31, 32].
Large scale isolation of ESTs is a powerful approach for the discovery of novel genes, the investigation of genes of unknown function, comparative genomic studies and identification of exon/intron boundaries . There are currently 44,559 T. salsuginea EST sequences in the GenBank dbEST (as of June 19, 2012; ) and 19,429 full-length cDNAs are available from RIKEN . These numbers are far smaller than for Arabidopsis (1,529,700 ESTs as of June 19, 2012), indicating the need for further gene discovery and characterization in T. salsuginea.
Here, we have sequenced, assembled and annotated the transcriptome of the T. salsuginea Yukon accession using 454 pyrosequencing. On the basis of these transcriptome sequences, we have developed a dedicated T. salsuginea microarray using the Agilent long oligonucleotide array platform. This array has been benchmarked against the corresponding Arabidopsis array and tested in comparative cold acclimation experiments .
454 sequencing and transcriptome assembly
Summary of T. salsuginea ESTs generated from 454 sequencing and assembled contigs
Number of reads
Number of contaminated reads2
Average read length (bp)
Total reads (Mbp)
Number of assembled reads
Number of contigs3
Number of contigs with only 454 reads4
Average contig length (bp)
Number of contigs with ORF5
Number of Arabidopsis peptides found6
To assess the quality of the assembly, we predicted open reading frames (ORFs) of the contigs and used BLASTX to align all contigs to the TAIR9 Arabidopsis annotated protein database using an E-value threshold of 10-3. Of 27,379 Arabidopsis peptides, 24,457 unique peptides were matched to contigs generated from the combined libraries (Table 1), indicating a high quality of the transcriptome assembly.
The transcriptome assembly yielded 4,020 contigs sharing more than 99% sequence identity and coverage with at least one other contig in the assembly, likely constituting alleles of the same genes (or very close paralogs) with single nucleotide polymorphysms (SNPs). Because the seed material was not derived from single seed descent, the plants used in the experiments were not genetically identical, making the presence of SNPs in this plant population likely. We therefore condensed these 4,020 contigs into 610 clusters with ambiguous nucleotides at the SNP sites, reducing the final number of unigenes to 42,810. Since T. salsuginea and Arabidopsis are both members of the Brassicaceae family, we estimated the number of T. salsuginea unigenes that were orthologous to Arabidopsis genes using BLASTX. This identified 32,029 unigenes as orthologs of Arabidopsis genes, while approximately 25% were unique to T. salsuginea. No consensus overlap was found for the number of matched sequences between both species when a reciprocal search was performed, because every given reference sequence may hit more than one paralog in the other species, while also more than one homologous sequence in one species may hit a single sequence in the other species (Figure 3B). The genome sequence of T. parvula was published recently  and although T. salsuginea and T. parvula are related, about 26% of T. salsuginea unigenes have no counterparts in T. parvula, while 21% of the T. parvula genes have no orthologs in T. salsuginea (Figure 3C), probably reflecting the fact that the relationship between T. salsuginea and T. parvula (Schrenkiella parvula) is not as close as originally assumed. On the other hand, approximately 88% (37,622 of 42,810) of the T. salsuginea unigenes could be aligned to 3,703 T. salsuginea genomic contig sequences comprising 135.6 Mbp of genome sequence  using BLASTN with a stringent threshold (E-value < 10-50).
More recently the T. salsuginea (Shandong accession) genome sequence obtained by Sanger sequencing technology has been published . We estimated that 72% (18,970 out of 26,351) of the gene models predicted from the genome were expressed under our experimental conditions (BLASTN, E-value < 10-10). On the other hand, 37,447 T. salsuginea Yukon unigenes were best matched (BLASTN, E-value < 10-10) with the Shandong gene models (Figure 3D). Therefore, 5,363 unigenes from the Yukon accession have no counterpart among the predicted gene models from the Shandong accession and may be Yukon-specific transcripts (Figure 3D). Approximately 56.7% of matched unigenes (21,228 out of 37,447) were estimated to be full length in relation to the published gene models from the genome sequence (E-value <10-10 and total query coverage > 90%).
Enrichment of unigenes in MapMan terms for 12,272 duplicated Yukon unigenes
Mapman bin code
Number of duplicated Yukon unigenes
Fisher’s exact test P-value (Bonferroni-corrected)*
lipid metabolism – Fatty Acid synthesis and Fatty Acid elongation
nucleotide metabolism.phosphotransfer and pyrophosphatases.nucleoside diphosphate kinase
lipid metabolism.FA synthesis and FA elongation.ACP proteins
RNA.regulation of transcription.MYB domain transcription factor family
RNA.regulation of transcription.Global transcription factor group
Non-ambiguous SNPs identified between Shandong gene models and Yukon unigenes
Number of Shandong gene models
Higher plant transcriptomes contain a large number of non-coding RNAs including microRNA (miRNA) that function in diverse regulatory processes that influence e.g. growth and development . Since primary miRNA transcripts are polyadenylated [45–47] it seemed possible that the T. salsuginea unigenes contain primary miRNA transcripts that code for one or more miRNA stem loops that could be further processed to mature miRNAs. We scanned all unigenes using BLASTN with a threshold E-value of 10-10 against the complete set of miRNA precursor transcripts from the plant miRNA database  and obtained 384 unigenes that putatively encode miRNA precursor transcripts. The predicted target transcripts of the corresponding miRNAs were classified according to their functions using MapMan ontology and the predicted targets were compared with those of Arabidopsis miRNAs (Figure 4B). The proportion of predicted targets in the different functional classes was generally lower for T. salsuginea than for Arabidopsis except for target functions related to protein metabolism (54.5% vs. 20.1%) and mitochondrial electron transport (1.3% vs. 0.2%), where T. salsuginea showed higher proportions compared to Arabidopsis.
Genes encoding protein kinases and protein phosphatases
Mitogen activated protein kinases (MAPK) and MAPK phosphatases are important components that mediate abiotic stress signaling in plant cells . Members of the families of protein tyrosine phosphatases (PTPs), including dual specificity phosphatases (DSPs) and protein phosphatases 2C (PP2Cs) have been characterized as plant MAPK phosphatases that regulate stress responses [50, 51]. In the PP2C gene family the A-type PP2C phosphatases are generally upregulated in response to abscisic acid (ABA), whereas B-type PP2Cs (AP2Cs) have been characterized as MAPK phosphatases . Since drought or salt stress induces ABA biosynthesis in plants and MAPK cascades have been related to salt stress tolerance , it was of high interest to identify putative orthologous proteins of PP2Cs and PTPs/DSPs, as well as MAPKs and MAPK kinases (MAPKKs) in the extremophile T. salsuginea.
Number of putative protein tyrosine phosphatase (PTP), dual specificity phosphatase (DSP), protein phosphatase 2C (PP2C), mitogen activated protein kinase (MAPK) and MAPK kinase (MAPKK) unigenes and proteins in the T. salsuginea ( T.s. ), T. parvula ( T.p. ) and A. thaliana ( A.t. ) genomes
Proteins from unigenes
Proteins from genome
All plant species studied so far revealed a single gene encoding a PTP . Correspondingly, a single unigene of a PTP was identified in the T. salsuginea transcriptome. However, two PTP genes were identified in the genome of T. parvula (Table 4). Reciprocal BLAST of 22 and 20 DSP sequences from Arabidopsis and T. parvula, respectively, revealed 28 DSP encoding unigenes (Table 4) that code for 18 T. salsuginea DSP proteins (Additional file 8).
The genome sequences of T. salsuginea and T. parvula[11–13] revealed 18 and 19 MAPK encoding genes, respectively, in comparison to 20 MAPKs  found in Arabidopsis. Here, analysis of MAPKs in the transcriptome of T. salsuginea identified 25 unigenes, which were translated into 13 putative MAPK proteins (Table 4, Additional file 8). MAPKK proteins are represented by 10 and 11 members in the genomes of Arabidopis and T. parvula, respectively. In the T. salsuginea trancriptome we identified eight ESTs representing MAPKK, corresponding to six MAPKK proteins (Table 4, Additional file 8).
Genes encoding Late Embryogenesis Abundant (LEA) proteins
Following this strategy for all 148 putative T. salsuginea LEA proteins identified 15 Arabidopsis LEA proteins without an orthologous protein in the T. salsuginea transcriptome. However, a BLASTN search (E < 1e-50) revealed the presence of genes encoding 12 of these proteins in the T. salsuginea genome. For example, there were no apparent orthologs of LEA_5 proteins among the unigenes, but the BLASTN search of the gene sequences (At2g40170 and At3g51810) against the T. salsuginea genome retrieved two genomic sequences (AHIU01008450.1 and AHIU01008580.1), indicating that two LEA_5 proteins are encoded in the T. salsuginea genome, but probably not expressed under our experimental conditions. LEA_4 and dehydrin are the two largest groups of LEA proteins in Arabidopsis and this was also true for T. salsuginea with 18 LEA_4 proteins in both species. Interestingly, T. salsuginea contains more dehydrins (17 vs 10) than Arabidopsis. This is due to the fact that some Arabidopsis LEA proteins were represented by several T. salsuginea orthologs. For instance, LEA23 (At2g42530; COR15B) and LEA51 (At5g66400; RAB18) were each represented by four distinct protein sequences and several others by three (Additional file 12).
Overall, the majority of putative T. salsuginea LEA proteins showed negative GRAVY (grand average of hydropathy) values indicating the expected high hydrophilicity (Additional file 12). Since a precise in-vivo function has not been established for any Arabidopsis LEA protein, it is currently impossible to evaluate the functional significance of the multiple orthologous T. salsuginea proteins.
Design and performance of a dedicated T. salsuginea microarray
Between 2 and 12 60-mer oligonucleotide probes were designed for each T. salsuginea unigene, depending on the length of the predicted gene. With this oligonucleotide sequence information as input, 400 K arrays were produced by Agilent. A Pre-Selection Strategy (PSS)  was used to select the best-performing probes by hybridizing either cRNA derived from pooled total RNA (identical to the RNA used in cDNA library construction for sequencing) or genomic DNA to these arrays. On the basis of the hybridization signal strength, the 43,800 best-performing probes were selected. 40,952 probes were selected on the basis of their hybridization signals with the cRNA sample and 2,848 additional probes were derived from the hybridization with genomic DNA. Using the Agilent eArray Platform , 44 K arrays were produced from these oligonucleotide sequences. For 2,237 unigenes, the number of features on the array allowed the design of two different probes for the same contig, while for the remaining 39,326 unigenes only one probe was designed. The resulting 43,800 probes therefore represented 41,563 unigenes.
To test the newly designed arrays, they were hybridized with total RNA derived from control (NA) or 14-day cold acclimated (ACC) T. salsuginea plants from three independent biological experiments. Spot intensities between replicates were highly correlated with r-values of Pearson correlations between 0.985 and 0.993, p < 2.2 × 10-16 (Additional file 13). In addition, the log2-transformed fold change in hybridization signals between samples from non-acclimated and cold acclimated plants for the 2,237 pairs of probes targeting the same unigene were also highly correlated (Additional file 14, r = 0.951, p < 2.2 × 10-16). These results validate the probe design and selection with the PSS method.
Significantly, the distribution of raw signal intensities from all probes on the arrays revealed a bimodal distribution [61, 62] with a dominant density peak at high signal intensities (likely derived from perfect-match hybridizations) and a minor peak at low signal intensities (likely derived from noise and mismatch hybridization) for the same-species array hybridizations (black and blue lines, Figure 6C and F). In contrast, cross-species array hybridizations yielded predominantly low signal intensities (green and red lines, Figure 6C and F), suggesting that these hybridizations were dominated by non-specific signals.
To facilitate data visualization and biological interpretation of array hybridization results, a T. salsuginea ontology/mapping file (available freely on-line at ) was developed by transferring the Arabidopsis MapMan ontology  to the T. salsuginea transcripts for use with the MapMan software [42, 43].
T. salsuginea has many features similar to Arabidopsis and is in addition considered an “extremophile” with a higher tolerance to various abiotic stresses and has therefore been proposed as an alternative model species to Arabidopsis. While the genomes of T. salsuginea and of the related species T. parvula have recently been sequenced [11–13], genome annotation is still far from complete and, due to the low number of ESTs available, evidence for transcriptional activity is also scarce for most genes. In addition, the effective use of T. salsuginea for functional genomics investigations of mechanisms of plant abiotic stress tolerance is severely hampered by a lack of affordable tools for genome-wide gene expression studies. We therefore used next generation sequencing to obtain comprehensive information about the T. salsuginea transcriptome and used this information for gene annotation and the development of a dedicated microarray for global expression studies. To cover the transcriptome as fully as possible, we performed 454 pyrosequencing on ESTs generated from both a non-normalized and a normalized library synthesized from RNA samples pooled from a wide range of plant samples from different developmental stages and from plants grown under different abiotic stress conditions. Although the number of reads obtained from the normalized library was only about half of that obtained from the non-normalized library, both libraries yielded a large number of unique sequences that were not detected in the other library, indicating that sequencing of normalized and non-normalized libraries is necessary to obtain optimal coverage of a transcriptome.
A combination of 44,551 publicly available ESTs and more than 1 million new sequence reads were assembled into 42,810 contigs for the prediction of gene functions. Of these contigs, 33,147 were novel because they were not covered by publicly available T. salsuginea ESTs. In total, the number of publicly available T. salsuginea unigenes increased more than 3-fold (from 13,073 to 42,810) with the present study. The number of genes estimated from the two draft genomes of T. salsuginea is 29,509  and 26,351 , while the draft genome of T. parvula contains 30,419 genes . This suggests that the unigenes identified in this study represent a comprehensive T. salsuginea transcriptome. The average contig/unigene length in our assembly is 567 bp, which is shorter than the length of the predicted Shandong gene models (1,224 bp; ). The likely reason for this discrepancy is the fragmentation procedure necessary for the construction of normalized cDNA libraries. Nevertheless, more than 50% of our unigenes are full length (21,228 out of 42,810) based on the BLASTN alignment (E-value < 10-10 and query coverage > 90% in relation to Shandong gene models). In addition, 87.5% of our unigenes (37,447 out of 42,810) were mapped to Shandong gene models and 94.6% (40,480 out of 42,810) were mapped to the Shandong genomic sequence . These results indicate that most of our sequence assembly is correct based on its high similarity to a closely related accession.
Many SNPs were found between our Yukon unigenes and the Shandong gene models  when the Yukon unigenes were aligned with the reference Shandong gene models (Table 3, Additional file 1). The significance of the putative SNPs in the context of abiotic stress tolerance or other phenotypic variation between the Yukon and Shandong accessions remains to be determined.
Based on the MapMan ontology [42, 43], functional annotation of both T. salsuginea unigenes and Arabidopsis genes revealed similar proportions of genes in each functional group, with the exception of genes annotated as encoding proteins with a function in photosynthesis, which were more abundant in T. salsuginea, and genes encoding proteins annotated as functioning in DNA metabolism, which were less abundant. A similar observation was also made when we identified 12,272 Yukon unigenes that were likely derived from gene duplication events relative to Shandong gene models (Additional file 17). These duplicated unigenes map for most of their length (coverage > 90%) over Shandong genes and are significantly enriched for functional categories of protein synthesis, lipid metabolism, photosynthesis and abiotic/biotic stress (Table 2). The higher abundance of photosynthesis-related genes, together with the large number of such genes that were found to be regulated during cold acclimation in the microarray study, may be related to the importance of an effective regulation of photosynthesis for the abiotic stress tolerance of plants to sustain growth and avoid oxidative damage (see  for a review).
The percentage of genes in different functional groups in T. salsuginea that were predicted to be targeted by miRNAs was generally lower than the corresponding percentage in Arabidopsis, except for target functions related to protein metabolism and mitochondrial electron transport, where T. salsuginea showed much higher proportions. Whether any of the relevant miRNAs play a role in abiotic stress tolerance, however, remains to be investigated. In addition, some genes encoding LEA proteins were present in higher copy numbers of orthologous genes in the T. salsuginea than in the Arabidopsis genome. This is in agreement with the observed higher frequency of gene duplications in the T. salsuginea and T. parvula genomes . This may suggest that both Thellungiella species retained a higher fraction of duplicated genes since the last whole genome duplication event [71, 72] or after tandem duplications that diverged in function (neofunctionalization; ) as an adaptive strategy to survive in habitats with harsher environmental stress conditions than Arabidopsis.
On the other hand, the analysis of PP2C, PTP(DSP) and MAPK(K) in T. salsuginea, T. parvula and A. thaliana indicated very similar numbers of putative genes and proteins of these families in all three species. This may indicate that the retention of duplicated genes was highly selective, a hypothesis that was also put forward in the most recent T. salsuginea genome sequencing study .
The assembled unigenes were used for the construction of a 44 K Agilent microarray that showed superior specificity for T. salsuginea compared to Arabidopsis samples, high reproducibility and the ability to detect the complex cold-acclimation induced changes in transcript abundance. In addition, it showed interesting differences in the cold regulated transcript patterns between Arabidopsis and T. salsuginea. Interestingly, a previous paper provided evidence for possible differences in the metabolic adaptation strategies of the two species , based on targeted biochemical measurements of compatible solutes. In the future, the availability of microarrays will allow us to investigate such differences in adaptation strategies at the functional genomics level.
The T. salsuginea microarray consists of 43,800 oligonucleotide probes representing 41,563 unigenes. Based on a BLASTN search of all unigenes against the T. salsuginea genomic contig sequences , 37,622 T. salsuginea unigenes were aligned to the published genome sequence. The remaining 5,188 unigenes showed no significant BLAST hits and filtering out unigenes that were not represented on the array still left 4,092 unigenes that had produced above-background signal intensities in the hybridization experiments performed during the pre-selection procedure. Likely reasons for this discrepancy are that approximately 10% of the genome have not been sequenced yet , that the corresponding transcripts are encoded by genes that consist of short exons and long introns that may result in low BLASTN hit score values, or that these unigenes have relatively high sequencing or sequence assembly errors. The latter point may have resulted in the relatively high fraction of 71% (2,942 out of 4,092) among these unigenes that were annotated as having an unknown function, while for the whole unigene set this fraction was about 50%.
To assess whether the use of a dedicated T. salsuginea microarray really yields data of a higher quality than cross-species hybridization on an Arabidopsi s array from the same manufacturer, we hybridized identical samples from either non- or cold-acclimated plants on same-species T. salsuginea and cross-species Arabidopsis arrays. In the reciprocal experiment, Arabidopsis samples were also hybridized to the same set of arrays. The results demonstrated the expected bimodal signal distribution [61, 62]. However, only signals obtained from same-species array hybridizations showed a distribution consistent with a high proportion of specific hybridization reactions and a low proportion of non-specific hybridizations and noise. In fact, using the 25-mer oligonucleotide-based Affymetrix GeneChip ATH1, only 20% of the probes produced signals significantly above background with T. salsuginea samples . In contrast, using longer oligonucleotide probes (70-mer) showed that approximately 80% of the probes on the Arabidopsis microarray  yielded significant signals after hybridization with T. salsuginea samples. However, the detection of differentially expressed T. salsuginea genes was limited to Arabidopsis homologs with an overall sequence identity of 92-95% with this type of array . Obviously, the expression of T. salsuginea-specific unigenes could not be detected using Arabidopsis-based arrays and approximately 58% (2,553 out of 4,425) of the cold-responsive genes detected using the same-species array were not identified as differentially expressed from the Arabidopsis array, further stressing the need for a dedicated T. salsuginea microarray for the reliable identification of stress-regulated genes at the genomic level.
One of the challenges faced when using microarrays is to visualize the expression data in an efficient way . To make the T. salsuginea microarray as useful as possible for the research community, we adapted the MapMan visualization [43, 77] and the Robin  and mefisto statistical analysis tools to T. salsuginea by designing a mapping file for the annotated unigenes by transfer into the MapMan ontology . To validate its performance, the T. salsuginea microarray was used to identify cold-responsive genes in comparison to Arabidopsis. The results were visualized using the MapMan tool. Further functional analysis was provided by statistical tests for the significant under- or over-represention of cold regulated genes in different functional groups. These results demonstrate that the newly designed T. salsuginea microarray reliably detects genes and pathways that are known to be differentially affected during cold acclimation in Arabidopsis and other plants. We are currently using the T. salsuginea array to investigate the cold and salt responsiveness of the T. salsuginea transcriptomes in the diverse set of accessions from all three Thellungiella species recently described .
While model species such as Arabidopsis are valuable tools to study many important traits in plants, additional models are needed to investigate the adaptation mechanisms to more extreme environmental conditions. T. salsuginea is such an emerging model species and the present study provides the first comprehensive transcriptome information for this species. The large number of additional publicly available unigene sequences will greatly facilitate genome annotation. In addition, the newly designed and validated genome-wide microarray for T. salsuginea, which will be commercially available, will be an important tool for future functional genomics investigations of plant stress tolerance. Also, we hope that the approaches developed in this study will help in the establishment of similar genomic tools for other non-model species with interesting physiological and ecological traits.
Plant material and abiotic stress treatments
Seeds of the Thellungiella salsuginea ((Pallas) O.E. Schulz) (Eutrema salsugineum) accession Yukon were kindly provided by Prof. Ray A. Bressan (Purdue University, West Lafayette, IN). For transcriptome sequencing, plants were either grown in soil or hydroponically under non-sterile conditions in a greenhouse at 16-h day length with light supplementation to reach at least 200 μE m-2 s-1 and a temperature of 20°C during the day, 18°C during the night. The nutrient solution for hydroponically-grown plants contained 1 mM MgSO4, 1 mM KH2PO4, 1 mM NH4NO3, 0.5 mM CaCl2, 0.1 mM FeNA-EDTA, 25 mM NaCl, 0.1 mM H3BO3, 1.5 μM CuSO4, 50 μm KCl, 10 μm MnSO4, 0.075 μM Na2MoO4, 0.1 mM Na2O3Si, and 2 μM ZnSO4. Six-week-old soil-grown and five-week-old hydroponically-grown plants were treated for 24 h or 2 weeks at 4°C in a growth cabinet at 16 h day length with 90 μE m-2 s-1, or stressed with 200 mM NaCl solution by irrigation or supplementation of the nutrient solution. For drought treatment, water was withheld from soil-grown plants for 5 days (mild drought with slight curling of leaves) or 10 days (severe drought with strong curling of leaves). Shoots and roots were collected from soil- and hydroponically-grown plants, except for soil-grown roots after stress treatments. In addition, rosette leaves, cauline leaves, opened flowers and siliques were collected from control plants grown in soil or hydroponically. For all samples, material from 5 to 10 individual plants was pooled. All samples were immediately frozen in liquid nitrogen and stored at −80°C.
For the microarray experiments, T. salsuginea seeds were sown in soil and exposed to 4°C under the conditions described above for one week to promote germination. Seedlings were transferred to a greenhouse as above for 8 weeks (nonacclimated plants). For cold acclimation, plants were transferred to a 4°C growth cabinet under the conditions described above for an additional 14 days . Arabidopsis plants were grown and acclimated under identical conditions [79, 80], but were grown under nonacclimating conditions for 6 weeks to reach the same developmental state .
RNA isolation, library construction and sequencing
Total RNA was isolated and on-column DNA digestion was performed with the RNeasy Plant mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. Equal amounts of RNA from 23 individual samples obtained as described above were pooled and cDNA was synthesized with the Mint-Universal kit (Evrogen, Moscow, Russia). Half of the cDNA was normalized to equalize the abundance of transcripts by using the duplex-specific nuclease normalization approach  with the TRIMMER-DIRECT cDNA normalization kit (Evrogen) according to the manufacturer’s protocol. The cDNA in both libraries was size-fractionated using Chroma Spin-1000 columns (Clontech, Mountainview, CA). 454 libraries were constructed using custom-made adapters with MIDs (Multiplex Identifiers) following the Roche Technical Bulletin TCB 09004, introducing Sfi I-sites for creating directed sequencing libraries. The 454 libraries were immobilized on beads and clonally amplified using the GS FLX Titanium LV emPCR Kit (Lib-L) (Roche, Basel, Switzerland). The libraries were then sequenced using the GS FLX Titanium Sequencing Kit XLR70 (Roche) and GS FLX Titanium PicoTiterPlate Kit (Roche) according to the manufacturer’s protocols.
Transcriptome assembly and annotation
Adapter and polyA sequences were removed from the data with the SFF Tools of the Genome Sequencer FLX System software package 2.3 (Roche) and the cross_match (http://www.phrap.org) algorithm. The clipped reads were pre-checked for contamination by non-plant sequences using BLAST  searches in the NCBI database of non-redundant sequences (nr; updated at 12-02-2010) with an E-value threshold of 10-10. All reads matching as best hit a sequence originating from an organism outside the Streptophyta phylum were considered “contaminant”. These reads were a minor part of the population and were thus included during the assembly process, but contigs including them were marked as “contaminant contigs”. The assembly was conducted using the MIRA assembler program version 3.1.15  with default settings and the 454 reads from the normalized, the non-normalized and the combined libraries. To improve the accuracy of sequence assembly, we downloaded the 44,551 publicly available T. halophila and T. salsuginea ESTs from GenBank, NCBI (as of April 19, 2010) and included them in all assemblies. Reads not alignining to any other read (singletons) were not included in the final contig population. Average contig coverage was calculated as the mean number of reads per base per contig. The N50 was calculated as the contig length above which 50% of the sequence information is contained, i.e. 50% of the nucleotides in the output are contained in contigs of length N50 or longer.
All contigs were checked for presence of open reading frames (ORFs) as previously described . To assess the completeness of the transcriptome assemblies and the degree of overlap between T. salsuginea and Arabidopsis, we used BLASTX to align the contig sequences to the 27,739 Arabidopsis Information Resource (TAIR9; ) peptide sequences . We calculated the percentage of Arabidopsis proteins matching T. salsuginea contigs with a loose threshold to account for interspecies variation (E-value < 10-3). All following steps were conducted on the assembly based on the combined library only. Since the population of plants used for RNA extraction was not homozygous, SNPs generated several nearly identical contigs. We aligned each contig to all contigs in the population via BLAST, to identify clusters of contigs matching each other with a sequence coverage and identity higher than 99%. A multiple alignment was produced for each cluster using MUSCLE . Consensus sequences for each cluster were extracted from the multiple alignments using the consambig tool from the EMBOSS suite . Where disagreeing base pairs were found, the resulting cluster sequence was dubbed using the International Union of Pure and Applied Chemistry (IUPAC) code for nucleotide ambiguity.
Functional classification of the putative transcripts was performed using the Mercator pipeline . Mercator aligns all sequences against five different databases: TAIR9 proteins , SwissProt/Uniprot plant proteins (PPAP) , Conserved Domain Database (CDD) , Clusters of Orthologous Groups (COG) , and InterProScan  and subsequently computes preliminary MapMan bin  codes/ontology based on manually curated reference classifications using a majority vote scheme. The programs used to perform the searches were RPSBLAST  for CDD and COG and BLASTX  for TAIR9 and PPAP. Hits with bit scores lower than 50 were ignored as not significantly similar. The sequencing procedure does not guarantee that the orientation of the original mRNAs is kept. Hence any of the 42,810 transcripts can be either 5″-3′ or 3′-5′ oriented. To unify the orientation, we used the protein models present in the NCBI nr database and, where available, we used the best BLAST hit (E-value < 10-3) to define the orientation of the transcript.
To estimate the number of T. salsuginea unigenes that were orthologs of Arabidopsis or T. parvula genes, we used BLASTX to search the unigenes with a stringent threshold (E-value < 10-10) against Arabdopsis TAIR10 database or coding sequence of T. parvula genome assembly version 2 . To estimate the number of T. salsuginea unigenes/transcripts that were derived from the T. salsuginea genome, we used BLASTN to search the unigenes with a stringent threshold (E-value < 10-50) against the T. salsuginea genome assembly . In addition, we took advantage of the recently published gene models of the Shandong accession of T. salsuginea (Eutrema salsugineum)  by comparing them to the unigenes from the Yukon accession used in our study, using BLASTN with an E-value < 10-10. We estimated the full length genes of Yukon by aligning the unigene sequences with Shandong gene models using BLASTN (E-value < 10-10) and > 90% coverage of the aligned query region with the nucleotide sequence length of the published gene models (query coverage). An estimate of single nucleotide variation between Shandong gene models and Yukon unigenes was obtained using VarScan version 2.3.5 with default parameters  with Shandong gene models  as reference sequences. Ambiguous single nucleotide variations (more than one nucleotide variation at a given nucleotide position) were filtered out.
Orthologs were identified to determine duplicated genes among Yukon unigenes relative to Shandong gene models based on BLASTN analysis (E-value < 10-10) and aligned query coverage > 90% compared to Shandong gene models.
All T. salsuginea contigs were searched by BLASTN against the complete set of miRNA precursor transcripts from the plant miRNA database  with an E-value < 10-10. The functions of the predicted miRNAs were inferred from the target transcripts that were extracted based on the tables provided as search results .These inferred functions were classified based on MapMan ontology. Bincodes were truncated at the primary level of the hierarchy. We aslo performed a similar miRNA prediction of Arabidopsis TAIR9 transcripts and classified their functions based on MapMan ontology with identical search parameter settings.
PP2C/DSP/PTP/MAPK or MAPKK protein sequences derived from the A. thaliana, T. parvula and T. salsuginea genomes (http://www.arabidopsis.org/, [11, 13]) were compared to T. salsuginea unigenes using BLASTN (e < 10-20). BLASTX (e < 10-60) scans were performed using putative T. salsuginea protein models to verify BLASTN hits . Unigenes were translated according to BLASTX results. Individual T. salsuginea putative MAPKK, MAPK, PP2C, DSP and PTP proteins were designated according to the corresponding orthologous protein annotations in Arabidopsis. Multiple alignments of Arabidopsis and T. salsuginea putative PP2Cs, PTPs, DSPs, MAPKKs and MAPKs were performed using MUSCLE [86, 95] and were edited manually afterwards. Unrooted phylogenetic trees were computed with UGENE  using a JTT model with standard parameters (gamma distribution and a coefficient of variation of substitution rates among sites of 0.5).
Genes encoding putative LEA proteins of T. salsuginea were identified based on reciprocal BLASTX (E < 10-10) searches of the 51 Arabidopsis LEA genes  against all T. salsuginea unigenes. All resulting LEA proteins were aligned using ClustalW and an unrooted dendogram was drawn . GRAVY (grand average of hydropathy) values were determined using protparam . Subcellular localization was predicted from protein sequence analysis using TargetP .
Microarray design and hybridization
We designed a T. salsuginea microarray based on the custom gene expression 4 × 44 K platform from Agilent, consisting of four arrays per slide with 45,220 features, 43,803 of which are user-defined 60-mer oligonucleotide probes. Probes and microarrays were designed by imaGenes GmbH (Berlin, Germany). Briefly, 415,000 oligonucleotide probes were designed from the 42,810 contig sequences and printed as 2 × 400 K arrays. The same pooled RNA that was used for the sequencing libraries was used for the synthesis of Cy3-labeled cRNA that was hybridized to one of these arrays. Genomic DNA was extracted from rosette leaves of 6-week-old T. salsuginea plants using the DNeasy mini kit (Qiagen). Fragmented and labeled genomic DNA was hybridized to the other 400 K array. The 43,800 best-performing probes were selected and submitted to Agilent eArray platform to fabricate 4 × 44 K arrays. The arrays are commercially available from Agilent  with the design number 031554.
Microarray data analysis
The feature extraction files were analyzed using the Limma bioconductor package implemented in the Robin/RobiNA tool [78, 100]. Background signals were subtracted and values were quantile-normalized before a linear model was fitted for statistical testing of differential expression. Transcripts showing a log2 fold change ≥ 1 or ≤ −1 with a p-value below 0.05 after false discovery rate correction  in the contrast between cold acclimated and nonacclimated samples were accepted as significantly changed. Non-acclimated replicate 1 of Col-0 hybridized to the Arabidopsis array was an outlier and was excluded from the analysis of differential gene expression. Otherwise, all RNA samples were included in all analyses. Statistical comparisons of functional groups of cold-responsive genes were made against all genes in a particular group represented on the array using Fisher’s exact test implemented in the mefisto software . Significantly regulated genes were considered as significantly over- or under-represented in a MapMan functional group at p < 0.05 after false discovery rate correction . Microsoft Excel 2010 and the R software  were used to perform general analyses.
Availability of supporting data
The 454 reads for Thellungiella salsuginea reported here have been submitted to NCBI sequence read archive (SRA, http://www.ncbi.nlm.nih.gov/sra) under the accession numbers SRX197603 and SRX197604. Microarray hybridization results are available at GEO (http://www.ncbi.nlm.nih.gov/geo) under the accession number GSE42156.
We would like to thank Prof. Ray Bressan (Purdue University, West Lafayette, IN) for making his Thellungiella salsuginea seed collection available to us. YPL thanks the Swiss National Science Foundation and the Max Planck Society for post-doctoral fellowships.
- Bressan RA, Zhang C, Zhang H, Hasegawa PM, Bohnert HJ, Zhu JK: Learning from the Arabidopsis experience. The next gene search paradigm. Plant Physiol. 2001, 127: 1354-1360. 10.1104/pp.010752.PubMed CentralView ArticlePubMedGoogle Scholar
- Gong Q, Li P, Ma S, Indu RS, Bohnert HJ: Salinity stress adaptation competence in the extremophile Thellungiella halophila in comparison with its relative Arabidopsis thaliana. Plant J. 2005, 44: 826-839. 10.1111/j.1365-313X.2005.02587.x.View ArticlePubMedGoogle Scholar
- Griffith M, Timonin M, Wong ACE, Gray GR, Akhter SR, Saldanha M, Rogers MA, Weretilnyk EA, Moffatt BA: Thellungiella: an Arabidopsis-related model plant adapted to cold temperatures. Plant Cell Environ. 2007, 30: 529-538. 10.1111/j.1365-3040.2007.01653.x.View ArticlePubMedGoogle Scholar
- Inan G, Zhang Q, Li P, Wang Z, Cao Z, Zhang H, Zhang C, Quist TM, Goodwin SM, Zhu J: Salt cress. A halophyte and cryophyte Arabidopsis relative model system and its applicability to molecular genetic analyses of growth and development of extremophiles. Plant Physiol. 2004, 135: 1718-1737. 10.1104/pp.104.041723.PubMed CentralView ArticlePubMedGoogle Scholar
- Kant S, Bi YM, Weretilnyk E, Barak S, Rothstein SJ: The Arabidopsis halophytic relative Thellungiella halophila tolerates nitrogen-limiting conditions by maintaining growth, nitrogen uptake, and assimilation. Plant Physiol. 2008, 147: 1168-1180. 10.1104/pp.108.118125.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee YP, Babakov A, de Boer B, Zuther E, Hincha DK: Comparison of freezing tolerance, compatible solutes and polyamines in geographically diverse collections of Thellungiella spec. and Arabidopsis thaliana accessions. BMC Plant Biol. 2012, 12: 131-10.1186/1471-2229-12-131.PubMed CentralView ArticlePubMedGoogle Scholar
- Wong CE, Li Y, Whitty BR, Díaz-Camino C, Akhter SR, Brandle JE, Golding GB, Weretilnyk EA, Moffatt BA, Griffith M: Expressed sequence tags from the Yukon ecotype of Thellungiella reveal that gene expression in response to cold, drought and salinity shows little overlap. Plant Mol Biol. 2005, 58: 561-574. 10.1007/s11103-005-6163-6.View ArticlePubMedGoogle Scholar
- Al-Shehbaz IA, O’Kane SL, Price RA: Generic placement of species excluded from Arabidopsis (Brassicaceae). Novon. 1999, 9: 296-307. 10.2307/3391724.View ArticleGoogle Scholar
- Amtmann A: Learning from evolution: Thellungiella generates new knowledge on essential and critical components of abiotic stress tolerance in plants. Mol Plant. 2009, 2: 3-12. 10.1093/mp/ssn094.PubMed CentralView ArticlePubMedGoogle Scholar
- Koch MA, German DA: Taxonomy and systematics are key to biological information: Arabidopsis, Eutrema (Thellungiella), Noccaea and Schrenkiella (Brassicaceae) as examples. Front Plant Sci. 2013, 4: 267-PubMed CentralView ArticlePubMedGoogle Scholar
- Wu H-J, Zhang Z, Wang J-Y, Oh D-H, Dassanayake M, Liu B, Huang Q, Sun H-X, Xia R, Wu Y: Insights into salt tolerance from the genome of Thellungiella salsuginea. Proc Natl Acad Sci USA. 2012, 109: 12219-12224. 10.1073/pnas.1209954109.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang R, Jarvis DJ, Chen H, Beilstein M, Grimwood J, Jenkins J, Shu SQ, Prochnik S, Xin M, Ma C: The reference genome of the halophytic plant Eutrema salsugineum. Front Plant Sci. 2013, 4: 46-PubMed CentralPubMedGoogle Scholar
- Dassanayake M, Oh D-H, Haas JS, Hernandez A, Hong H, Ali S, Yun D-J, Bressan RA, Zhu J-K, Bohnert HJ: The genome of the extremophile crucifer Thellungiella parvula. Nat Genet. 2011, 43: 913-918. 10.1038/ng.889.PubMed CentralView ArticlePubMedGoogle Scholar
- Li P, Mane SP, Sioson AA, Robinet CV, Heath LS, Bohnert HJ, Grene R: Effects of chronic ozone exposure on gene expression in Arabidopsis thaliana ecotypes and in Thellungiella halophila. Plant Cell Environ. 2006, 29: 854-868. 10.1111/j.1365-3040.2005.01465.x.View ArticlePubMedGoogle Scholar
- Oh D, Gong Q, Ulanov A, Zhang Q, Li Y, Ma W, Yun D, Bressan RA, Bohnert HJ: Sodium stress in the halophyte Thellungiella halophila and transcriptional changes in a thsos1-RNA interference line. J Integr Plant Biol. 2007, 49: 1484-1496. 10.1111/j.1672-9072.2007.00548.x.View ArticleGoogle Scholar
- Taji T, Seki M, Satou M, Sakurai T, Kobayashi M, Ishiyama K, Narusaka Y, Narusaka M, Zhu JK, Shinozaki K: Comparative genomics in salt tolerance between Arabidopsis and Arabidopsis-related halophyte salt cress using Arabidopsis microarray. Plant Physiol. 2004, 135: 1697-1709. 10.1104/pp.104.039909.PubMed CentralView ArticlePubMedGoogle Scholar
- Volkov V, Wang B, Dominy PJ, Fricke W, Amtmann A: Thellungiella halophila, a salt-tolerant relative of Arabidopsis thaliana, possesses effective mechanisms to discriminate between potassium and sodium. Plant Cell Environ. 2003, 27: 1-14.View ArticleGoogle Scholar
- Wong CE, Li Y, Labbe A, Guevara D, Nuin P, Whitty B, Diaz C, Golding GB, Gray GR, Weretilnyk EA: Transcriptional profiling implicates novel interactions between abiotic stress and hormonal responses in Thellungiella, a close relative of Arabidopsis. Plant Physiol. 2006, 140: 1437-1450. 10.1104/pp.105.070508.PubMed CentralView ArticlePubMedGoogle Scholar
- Taji T, Sakurai T, Mochida K, Ishiwata A, Kurotani A, Totoki Y, Toyoda A, Sakaki Y, Seki M, Ono H: Large-scale collection and annotation of full-length enriched cDNAs from a model halophyte, Thellungiella halophila. BMC Genomics. 2008, 11: 262-Google Scholar
- Wang Z, Li P, Fredricksen M, Gong Z, Kim CS, Zhang C, Bohnert HJ, Zhu J, Bressan RA, Hasegawa PM: Expressed sequence tags from Thellungiella halophila, a new model to study plant salt-tolerance. Plant Sci. 2004, 166: 609-616. 10.1016/j.plantsci.2003.10.030.View ArticleGoogle Scholar
- Hahn D, Ragland G, Shoemaker D, Denlinger D: Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis. BMC Genomics. 2009, 10: 234-10.1186/1471-2164-10-234.PubMed CentralView ArticlePubMedGoogle Scholar
- Parchman T, Geist K, Grahnen J, Benkman C, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11: 180-10.1186/1471-2164-11-180.PubMed CentralView ArticlePubMedGoogle Scholar
- Sun C, Li Y, Wu Q, Luo H, Sun Y, Song J, Lui E, Chen S: De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC Genomics. 2010, 11: 262-10.1186/1471-2164-11-262.PubMed CentralView ArticlePubMedGoogle Scholar
- Toulza E, Shin M, Blanc G, Audic S, Laabir M, Collos Y, Claverie J, Grzebyk D: Gene expression in proliferating cells of the dinoflagellate Alexandrium catenella (Dinophyceae). Appl Environ Microbiol. 2010, 76: 4521-4529. 10.1128/AEM.02345-09.PubMed CentralView ArticlePubMedGoogle Scholar
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17: 1636-1647. 10.1111/j.1365-294X.2008.03666.x.View ArticlePubMedGoogle Scholar
- Ellegren H: Sequencing goes 454 and takes large-scale genomics into the wild. Mol Ecol. 2008, 17: 1629-1635. 10.1111/j.1365-294X.2008.03699.x.View ArticlePubMedGoogle Scholar
- Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet. 2010, 11: 31-46. 10.1038/nrg2626.View ArticlePubMedGoogle Scholar
- Novaes E, Drost D, Farmerie W, Pappas G, Grattapaglia D, Sederoff R, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9: 312-10.1186/1471-2164-9-312.PubMed CentralView ArticlePubMedGoogle Scholar
- Ronaghi M: Pyrosequencing sheds light on DNA sequencing. Genome Res. 2001, 11: 3-11. 10.1101/gr.11.1.3.View ArticlePubMedGoogle Scholar
- Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol. 2008, 26: 1135-1145. 10.1038/nbt1486.View ArticlePubMedGoogle Scholar
- Emrich SJ, Barbazuk WB, Li L, Schnable PS: Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2007, 17: 69-73.PubMed CentralView ArticlePubMedGoogle Scholar
- Jarvie T, Harkins T: Transcriptome sequencing with the Genome Sequencer FLX system. Nat Meth. 2008, 5: v-viii.Google Scholar
- Parkinson J, Blaxter M: Expressed sequence tags: an overview. Meth Mol Biol. 2009, 533: 1-12. 10.1007/978-1-60327-136-3_1.View ArticleGoogle Scholar
- GenBank dbEST.http://www.ncbi.nlm.nih.gov/,
- RIKEN full length Thellungiella cDNAs.http://www.brc.riken.jp/lab/epd/Eng/catalog/halophila.shtml,
- Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004, 14: 1147-1159. 10.1101/gr.1917404.PubMed CentralView ArticlePubMedGoogle Scholar
- Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L: VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009, 25: 2283-2285. 10.1093/bioinformatics/btp373.PubMed CentralView ArticlePubMedGoogle Scholar
- Guo M, Rupe MA, Zinselmeier C, Habben J, Bowen BA, Smith OS: Allelic variation of gene expression in maize hybrids. Plant Cell. 2004, 16: 1707-1716. 10.1105/tpc.022087.PubMed CentralView ArticlePubMedGoogle Scholar
- Padovan A, Keszei A, Foley W, Kulheim C: Differences in gene expression within a striking phenotypic mosaic Eucalyptus tree that varies in susceptibility to herbivory. BMC Plant Biol. 2013, 13: 29-10.1186/1471-2229-13-29.PubMed CentralView ArticlePubMedGoogle Scholar
- Shastry BS: SNPs: impact on gene function and phenotype. Single Nucleotide Polymorphisms. Edited by: Komar AA. 2009, Totowa, NJ: Humana Press, 3-22. 578View ArticleGoogle Scholar
- Usadel B, Poree F, Nagel A, Lohse M, Czedik-Eysenberg A, Stitt M: A guide to using MapMan to visualize and compare omics data in plants: a case study in the crop species, maize. Plant Cell Environ. 2009, 32: 1211-1229. 10.1111/j.1365-3040.2009.01978.x.View ArticlePubMedGoogle Scholar
- Thimm O, Bläsing O, Gibon Y, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M: MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 2004, 37: 914-939. 10.1111/j.1365-313X.2004.02016.x.View ArticlePubMedGoogle Scholar
- Meins F, Si-Ammour A, Blevins T: RNA silencing systems and their relevance to plant development. Annu Rev Cell Dev Biol. 2005, 21: 297-318. 10.1146/annurev.cellbio.21.122303.114706.View ArticlePubMedGoogle Scholar
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116: 281-297. 10.1016/S0092-8674(04)00045-5.View ArticlePubMedGoogle Scholar
- Cai X, Hagedorn CH, Cullen BR: Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA. 2004, 10: 1957-1966. 10.1261/rna.7135204.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee Y, Kim M, Han J, Yeom K-H, Lee S, Baek SH, Kim VN: MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 2004, 23: 4051-4060. 10.1038/sj.emboj.7600385.PubMed CentralView ArticlePubMedGoogle Scholar
- Plant microRNA database.http://bioinformatics.cau.edu.cn/PMRD/,
- Rodriguez MC, Petersen M, Mundy J: Mitogen-activated protein kinase signaling in plants. Annu Rev Plant Biol. 2010, 61: 621-649. 10.1146/annurev-arplant-042809-112252.View ArticlePubMedGoogle Scholar
- Bartels S, Besteiro MA, Lang D, Ulm R: Emerging functions for plant MAP kinase phosphatases. Trends Plant Sci. 2010, 15: 322-329. 10.1016/j.tplants.2010.04.003.View ArticlePubMedGoogle Scholar
- Fuchs S, Grill E, Meskiene I, Schweighofer A: Type 2C protein phosphatases in plants. FEBS J. 2013, 280: 681-693. 10.1111/j.1742-4658.2012.08670.x.View ArticlePubMedGoogle Scholar
- Schweighofer A, Hirt H, Meskiene I: Plant PP2C phosphatases: emerging functions in stress signaling. Trends Plant Sci. 2004, 9: 236-243. 10.1016/j.tplants.2004.03.007.View ArticlePubMedGoogle Scholar
- Xue T, Wang D, Zhang S, Ehlting J, Ni F, Jakab S, Zheng C, Zhong Y: Genome-wide and expression analysis of protein phosphatase 2C in rice and Arabidopsis. BMC Genomics. 2008, 9: 550-10.1186/1471-2164-9-550.PubMed CentralView ArticlePubMedGoogle Scholar
- Kerk D, Templeton G, Moorhead GB: Evolutionary radiation pattern of novel protein phosphatases revealed by analysis of protein data from the completely sequenced genomes of humans, green algae, and higher plants. Plant Physiol. 2008, 146: 351-367.PubMed CentralView ArticlePubMedGoogle Scholar
- Hamel LP, Nicole MC, Sritubtim S, Morency MJ, Ellis M, Ehlting J, Beaudoin N, Barbazuk B, Klessig D, Lee J: Ancient signals: comparative genomics of plant MAPK and MAPKK gene families. Trends Plant Sci. 2006, 11: 192-198. 10.1016/j.tplants.2006.02.007.View ArticlePubMedGoogle Scholar
- Tunnacliffe A, Hincha DK, Leprince O, Macherel D: LEA proteins: versatility of form and function. Sleeping Beauties - Dormancy and Resistance in Harsh Environments. Edited by: Lubzens E, Cerda J, Clark M. 2010, Berlin: Springer, 91-108.View ArticleGoogle Scholar
- Tunnacliffe A, Wise MJ: The continuing conundrum of LEA proteins. Naturwissenschaften. 2007, 94: 791-812. 10.1007/s00114-007-0254-y.View ArticlePubMedGoogle Scholar
- Hundertmark M, Hincha DK: LEA (Late Embryogenesis Abundant) proteins and their encoding genes in Arabidopsis thaliana. BMC Genomics. 2008, 9: 118-10.1186/1471-2164-9-118.PubMed CentralView ArticlePubMedGoogle Scholar
- imaGenes Pre-Selection Strategy.http://www.lifesciences.sourcebioscience.com/genomic-services/gene-expression-/microarray-platforms.aspx,
- Tan PK, Downey TJ, Spitznagel EL, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC: Evaluation of gene expression measurements from commercial microarray platforms. Nucl Acids Res. 2003, 31: 5676-5684. 10.1093/nar/gkg763.PubMed CentralView ArticlePubMedGoogle Scholar
- Zahurak M, Parmigiani G, Yu W, Scharpf RB, Berman D, Schaeffer E, Shabbeer S, Cope L: Pre-processing Agilent microarray data. BMC Bioinforma. 2007, 8: 142-10.1186/1471-2105-8-142.View ArticleGoogle Scholar
- Thellungiella MapMan ontology/mapping file.http://mapman.gabipd.org/web/guest/mapmanstore,
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995, 57: 289-300.Google Scholar
- Fowler S, Thomashow MF: Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell. 2002, 14: 1675-1690. 10.1105/tpc.003483.PubMed CentralView ArticlePubMedGoogle Scholar
- Hannah MA, Heyer AG, Hincha DK: A global survey of gene regulation during cold acclimation in Arabidopsis thaliana. PLoS Genet. 2005, 1: e26-10.1371/journal.pgen.0010026.PubMed CentralView ArticlePubMedGoogle Scholar
- Kreps JA, Wu Y, Chang HS, Zhu T, Wang X, Harper JF: Transcriptome changes for Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiol. 2002, 130: 2129-2141. 10.1104/pp.008532.PubMed CentralView ArticlePubMedGoogle Scholar
- Seki M, Narusaka M, Abe H, Kasuga M, Yamaguchi-Shinozaki K, Carninci P, Hayashizaki Y, Shinozaki K: Monitoring the expression pattern of 1300 Arabidopsis genes under drought and cold stresses by using a full-length cDNA microarray. Plant Cell. 2001, 13: 61-72.PubMed CentralView ArticlePubMedGoogle Scholar
- Seki M, Narusaka M, Ishida J, Nanjo T, Fujita M, Oono Y, Kamiya A, Nakajima M, Enju A, Sakurai T: Monitoring of the expression profiles of 7000 Arabidopsis genes under drought, cold, and high-salinity stresses using a full-length cDNA microarray. Plant J. 2002, 31: 279-292. 10.1046/j.1365-313X.2002.01359.x.View ArticlePubMedGoogle Scholar
- Foyer CH, Noctor G: Redox regulation in photosynthetic organisms: signaling, acclimation, and practical implications. Antioxid Redox Signal. 2009, 11: 861-905. 10.1089/ars.2008.2177.View ArticlePubMedGoogle Scholar
- Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004, 16: 1667-1678. 10.1105/tpc.021345.PubMed CentralView ArticlePubMedGoogle Scholar
- Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y: Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA. 2005, 102: 5454-5459. 10.1073/pnas.0501102102.PubMed CentralView ArticlePubMedGoogle Scholar
- Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004, 16: 1679-1691. 10.1105/tpc.021410.PubMed CentralView ArticlePubMedGoogle Scholar
- Oh DH, Dassanayake M, Bohnert HJ, Cheeseman JM: Life at the extreme: lessons from the genome. Genome Biol. 2012, 13: 241-10.1186/gb4003.PubMed CentralView ArticlePubMedGoogle Scholar
- Arabidopsis spotted microarray.http://ag.arizona.edu/microarray,
- Gehlenborg N, O’Donoghue SI, Baliga NS, Goesmann A, Hibbs MA, Kitano H, Kohlbacher O, Neuweger H, Schneider R, Tenenbaum D: Visualization of omics data for systems biology. Nat Meth. 2010, 7: S56-S68. 10.1038/nmeth.1436.View ArticleGoogle Scholar
- Usadel B, Nagel A, Thimm O, Redestig H, Blaesing OE, Palacios-Rojas N, Selbig J, Hannemann J, Piques MC, Steinhauser D: Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of coresponding genes, and comparison with known responses. Plant Physiol. 2005, 138: 1195-1204. 10.1104/pp.105.060459.PubMed CentralView ArticlePubMedGoogle Scholar
- Lohse M, Nunes-Nesi A, Krüger P, Nagel A, Hannemann J, Giorgi FM, Childs L, Osorio S, Walther D, Selbig J: Robin: an intuitive wizard application for R-based expression microarray quality assessment and analysis. Plant Physiol. 2010, 153: 642-651. 10.1104/pp.109.152553.PubMed CentralView ArticlePubMedGoogle Scholar
- Hannah MA, Wiese D, Freund S, Fiehn O, Heyer AG, Hincha DK: Natural genetic variation of freezing tolerance in Arabidopsis. Plant Physiol. 2006, 142: 98-112. 10.1104/pp.106.081141.PubMed CentralView ArticlePubMedGoogle Scholar
- Rohde P, Hincha DK, Heyer AG: Heterosis in the freezing tolerance of crosses between two Arabidopsis thaliana accessions (Columbia-0 and C24) that show differences in non-acclimated and acclimated freezing tolerance. Plant J. 2004, 38: 790-799. 10.1111/j.1365-313X.2004.02080.x.View ArticlePubMedGoogle Scholar
- Zhulidov PA, Bogdanova EA, Shcheglov AS, Vagner LL, Khaspekov GL, Kozhemyako VB, Matz MV, Meleshkevitch E, Moroz LL, Lukyanov SA: Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucl Acids Res. 2004, 32: e37-10.1093/nar/gnh031.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.View ArticlePubMedGoogle Scholar
- Min XJ, Butler G, Storms R, Tsang A: OrfPredictor: predicting protein-coding regions in EST-derived sequences. Nucl Acids Res. 2005, 33: W677-W680. 10.1093/nar/gki394.PubMed CentralView ArticlePubMedGoogle Scholar
- Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucl Acids Res. 2003, 31: 224-228. 10.1093/nar/gkg076.View ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.View ArticlePubMedGoogle Scholar
- Schneider M, Bairoch A, Wu CH, Apweiler R: Plant protein annotation in the UniProt knowledgebase. Plant Physiol. 2005, 138: 59-66. 10.1104/pp.104.058933.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z: CDD: a Conserved Domain Database for protein classification. Nucl Acids Res. 2005, 33: D192-D196. 10.1093/nar/gni191.PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E, Krylov D, Mazumder R, Mekhedov S, Nikolskaya A: The COG database: an updated version includes eukaryotes. BMC Bioinforma. 2003, 4: 41-10.1186/1471-2105-4-41.View ArticleGoogle Scholar
- Zdobnov EM, Apweiler R: InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.View ArticlePubMedGoogle Scholar
- Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucl Acids Res. 2001, 29: 2994-3005. 10.1093/nar/29.14.2994.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Z, Yu J, Li D, Zhang Z, Liu F, Zhou X, Wang T, Ling Y, Su Z: PMRD: plant microRNA database. Nucl Acids Res. 2010, 38: D806-D813. 10.1093/nar/gkp818.PubMed CentralView ArticlePubMedGoogle Scholar
- Thellungiella salsuginea genome sequence.http://www.phytozome.org/search.php,
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 2004, 5: 113-10.1186/1471-2105-5-113.View ArticleGoogle Scholar
- Okonechnikov K, Golosova O, Fursov M: Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012, 28: 1166-1167. 10.1093/bioinformatics/bts091.View ArticlePubMedGoogle Scholar
- Unrooted dendrograms.http://align.genome.jp,
- Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP, and related tools. Nat Prot. 2007, 2: 953-971. 10.1038/nprot.2007.131.View ArticleGoogle Scholar
- Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B: RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucl Acids Res. 2012, 40: W622-W627. 10.1093/nar/gks540.PubMed CentralView ArticlePubMedGoogle Scholar
- Mefisto software.http://www.usadellab.org/cms/index.php?page=mefisto,
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- MAPK Group: Mitogen-activated protein kinase cascades in plants: a new nomenclature. Trends Plant Sci. 2002, 7: 301-308. 10.1016/S1360-1385(02)02302-6.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.