The rehydration transcriptome of the desiccation-tolerant bryophyte Tortula ruralis: transcript classification and analysis

Background The cellular response of plants to water-deficits has both economic and evolutionary importance directly affecting plant productivity in agriculture and plant survival in the natural environment. Genes induced by water-deficit stress have been successfully enumerated in plants that are relatively sensitive to cellular dehydration, however we have little knowledge as to the adaptive role of these genes in establishing tolerance to water loss at the cellular level. Our approach to address this problem has been to investigate the genetic responses of plants that are capable of tolerating extremes of dehydration, in particular the desiccation-tolerant bryophyte, Tortula ruralis. To establish a sound basis for characterizing the Tortula genome in regards to desiccation tolerance, we analyzed 10,368 expressed sequence tags (ESTs) from rehydrated rapid-dried Tortula gametophytes, a stage previously determined to exhibit the maximum stress induced change in gene expression. Results The 10, 368 ESTs formed 5,563 EST clusters (contig groups representing individual genes) of which 3,321 (59.7%) exhibited similarity to genes present in the public databases and 2,242 were categorized as unknowns based on protein homology scores. The 3,321 clusters were classified by function using the Gene Ontology (GO) hierarchy and the KEGG database. The results indicate that the transcriptome contains a diverse population of transcripts that reflects, as expected, a period of metabolic upheaval in the gametophyte cells. Much of the emphasis within the transcriptome is centered on the protein synthetic machinery, ion and metabolite transport, and membrane biosynthesis and repair. Rehydrating gametophytes also have an abundance of transcripts that code for enzymes involved in oxidative stress metabolism and phosphorylating activities. The functional classifications reflect a remarkable consistency with what we have previously established with regards to the metabolic activities that are important in the recovery of the gametophytes from desiccation. A comparison of the GO distribution of Tortula clusters with an identical analysis of 9,981 clusters from the desiccation sensitive bryophyte species Physcomitrella patens, revealed, and accentuated, the differences between stressed and unstressed transcriptomes. Cross species sequence comparisons indicated that on the whole the Tortula clusters were more closely related to those from Physcomitrella than Arabidopsis (complete genome BLASTx comparison) although because of the differences in the databases there were more high scoring matches to the Arabidopsis sequences. The most abundant transcripts contained within the Tortula ESTs encode Late Embryogenesis Abundant (LEA) proteins that are normally associated with drying plant tissues. This suggests that LEAs may also play a role in recovery from desiccation when water is reintroduced into a dried tissue. Conclusion The establishment of a rehydration EST collection for Tortula ruralis, an important plant model for plant stress responses and vegetative desiccation tolerance, is an important step in understanding the genome level response to cellular dehydration. The type of transcript analysis performed here has laid the foundation for more detailed functional and genome level analyses of the genes involved in desiccation tolerance in plants.


Background
The cellular response of plants to water deficits has both economic and evolutionary importance directly affecting plant productivity in agriculture and plant survival in the natural environment. Ramanathan [1] has argued, based on predictions of global environmental changes, that developing crops which are more tolerant to water deficits while maintaining productivity, will become a critical requirement in the early part of the 21 st century. Understanding how plant cells tolerate water loss is a vital prerequisite for developing strategies for improving tolerance of, and biomass/seed production under drought conditions. In the last decade the genes induced by water-deficit stress have been successfully enumerated in plants that are relatively sensitive to cellular dehydration, in particular Arabidopsis thaliana [2][3][4][5][6]. In addition, the mechanisms by which the genetic response to water deficit is controlled by abscisic acid (ABA)-dependent and independent pathways have also been extensively elucidated [6,7]. However, even with the recent addition of in-depth examination of gene expression patterns using Arabidopsis microarrays [8,9] we have little functional knowledge of the genes that respond to water deficits. Of critical importance is the question of which of the genes identified as responding to water deficits actually have an adaptive role in establishing tolerance, in particular tolerance of cellular dehydration, and which are genes that are only responding to the injury incurred by the imposition of the stress. Injury may induce, or repress, specific genes that are not involved in promoting adaptation to cellular dehydration. An indication that at least some of the genes have an adaptive function in dehydration tolerance derives from the observation that they are expressed in tissues that acquire desiccation tolerance, the extreme manifestation of dehydration tolerance, such as in maturing seeds and in leaves of desiccation-tolerant plants during drying [3,10,11]. However, in order to fully address the question of the adaptive importance of genes involved in responses to cellular dehydration it is necessary to gain an evolutionary perspective of the involvement of a gene in dehydration tolerance mechanisms. To this end we have established an ongoing comparative genomics program to study the genetic responses to dehydration in species that span the evolution of dehydration (desiccation) tolerance mechanisms within the land plants. Here we present an analysis of ESTs derived from the desiccation-tolerant bryophyte Tortula ruralis [Hedw.] Gaertn. Meyer & Scherb., that is representative of the primitive genetic strategy for the acquisition of desiccation-tolerance [see [12,13]].
Desiccation tolerance, the ability to recover from the almost complete loss (90%) of protoplasmic water, is a phenomenon common in the reproductive structures of green plants: pollen, spores and seeds. However, the ability to survive vegetative desiccation is a demonstrable but uncommon occurrence in the plant kingdom [13][14][15][16][17][18]. Within the flowering plants there are only approximately 300 species of flowering plants that are known to tolerate vegetative desiccation [16,17].
Recent physiological phylogenetic analyses indicate that vegetative desiccation tolerance was primitively present in the bryophytes (the basal-most living clades of land plants), but was lost in the evolution of tracheophytes. Desiccation-tolerant bryophytes are found worldwide and occupy a variety of ecological niches, most of which could, during some period of the year, be considered as extreme either on a macro or microhabitat level. In most cases the extremes that these plants experience are both in water availability and temperature [19][20][21][22][23].
Desiccation-tolerant bryophytes, because of their simple architecture, have few, if any, morphological (or indeed physiological) characteristics or adaptations that can limit water loss or regulate plant temperature. As a result of this, the internal water content of their photosynthetic tissues rapidly equilibrates to the water potential of the environment once free water is lost from the surface of the plant. This in turn means that these plants experience drying rates that are much faster than those experienced by their more complex pteridophyte or angiosperm counterparts. In fact, the drying rates that desiccation-tolerant bryophytes experience are generally lethal to desiccation-tolerant ferns and flowering plants [14]. The rapid equilibration of protoplasmic water potential with that of the environment in bryophyte tissues appears to demand a type of desiccation tolerance that is significantly different from that exhibited by desiccation tolerant angiosperms [15]. Rather than acquiring desiccation tolerance in response to a dehydration event as seen in Craterostigma plantagineum, Sporobolus stapfianus, and other desiccation-tolerant angiosperms, desiccation-tolerant bryophytes appear to express this trait constitutively [15,24]. This form of desiccation tolerance is considered the most primitive of those that have received attention so far [13]. In this type of tolerance the primary response to a desiccation event, at least at the level of gene expression, occurs after the fact, during the first hour or two following rehydration. This has led to the suggestion that a major component of the mechanism of desiccation tolerance in bryophytes is a rehydration-induced cellular repair response [15,24]. The implication is that although cellular protection and hence desiccation tolerance is constitutive, it is not sufficient to prevent some damage from occurring (or being manifested) upon rehydration, and thus repair processes are needed and induced when water returns to the protoplasm of the cells.
The repair aspect of the mechanism of desiccation tolerance in these plants, although demonstrated to be a major component of tolerance, is difficult to detail and characterize. Most work has focused on the proteins whose synthesis is induced immediately upon rehydration of desiccated gametophytic tissue. Early work [25] established the ability of T. ruralis and other mosses to rapidly recover synthetic metabolism when rehydrated. The speed of this recovery was inversely dependent upon the rate of prior desiccation: the faster the rate of desiccation, the slower the recovery. In addition, although the pattern of protein synthesis in the first two hours of rehydration of T. ruralis is distinctly different from that of hydrated controls, novel transcripts were not made in response to desiccation [26]. Hence it was suggested that T. ruralis responds to desiccation by an alteration in protein synthesis upon rehydration that is in large measure the result of a change in translational control. Changes in transcriptional activity were observed for nearly all transcripts studied [27] but did not result in a qualitative change in the transcript population during desiccation or rehydration. It thus appears that T. ruralis relies more upon the activation of pre-existing repair mechanisms for desiccation tolerance than it does on either pre-established or activated protection systems.
In a detailed study of the changes in protein synthesis initiated by rehydration in T. ruralis, Oliver [26] demonstrated that during the first two hours of hydration the synthesis of 25 proteins is terminated, or substantially decreased, and the synthesis of 74 proteins is initiated, or substantially increased. Controls over changes in synthesis of these two groups of proteins, the former termed hydrins and the latter rehydrins, are not mechanistically linked. It takes a certain amount of prior water loss to fully activate the synthesis of rehydrins upon rehydration. RNA blots revealed that several rehydrin transcripts accumulate during slow drying [28,29] at a time when it is assumed that transcriptional activity is rapidly declining. These transcripts do not accumulate during rapid desiccation, nor is their accumulation during slow drying associated with an increase in endogenous ABA accumulation. ABA is undetectable in this moss [ [30], M. J. Oliver, unpubl data], and T. ruralis does not synthesize specific proteins in response to applied ABA. The accumulation of these transcripts was postulated to be the result of an increase in mRNA stability brought about by the removal of water from the cells [27]. Recent studies clearly demonstrate that these transcripts are sequestered in the dried gametophytes in mRNP particles [29] and that this results in the change in their stability. The implication from this work is that the sequestration of mRNAs required for recovery hastens the repair of damage induced by desiccation or rehydration and thus minimizes the time needed to restart growth upon rehydration.
The major question arising from these studies concerns the identity of rehydrins and what possible functions and roles they may play in regards to the response to desiccation and rehydration and to desiccation tolerance per se. We have some limited knowledge of the functions (or postulated functions) of a few of the rehydrins from classical molecular analyses, [31,32] and from a small-scale EST collection analysis [33]. However, there is an obvious need to extend this base and to develop testable hypotheses that will help us to elucidate the metabolic and genetic mechanisms that control the recovery and repair of dried plant cells and their role in the development and evolution of desiccation tolerance. To gain an appreciation of the number of possible rehydrin genes and the range of possible functions encompassed by their expression we have initiated a genomics level analysis of gene expression during the recovery of Tortula gametophytes from the desiccated state. The first step in this process was to establish an EST collection that is representative of the transcripts available to the moss during the first few hours following rehydration. In this report we present a bioinformatic analysis of 10, 368 ESTs from the early phases of recovery following rehydration of rapidly-dried Tortula ruralis gametophytes; rapid dried gametophytes were chosen in order to maximize the recovery and repair response upon rehydration. The bioinformatics approach we have taken is based on that used by McCarter et al., [34] to conduct a comprehensive analysis of 5,700 Meloidogyne incognita L2 ESTs, and includes cluster analyses, transcript abundancy estimations, and functional classifications based on Inter-Pror domains, Gene Ontology hierarchy, and KEGG biochemical classifications. The overall goal was to gain an appreciation of the Tortula transcriptome during the time period following rehydration when the desiccation driven alteration in gene expression is at its peak. During this period we hypothesize the processes of cellular repair and recovery are the main focus of the metabolism of the gametophytic cells.

Results and discussion
Ten thousand three hundred and sixty eight individual cDNA clones were selected from a Tortula ruralis rehydration library and subjected to single-pass 5' directional sequencing to generate 10,368 primary ESTs of which 9,159 (88%) passed through quality control, vector trimming, E. coli contamination, and cloning artifact removal. The 9,159 ESTs averaged 648 nucleotides in length and totaled 5.93 million nucleotides submitted to Genbank. These submitted ESTs form the basis of the subsequent transcriptome analysis utilizing the High Throughput-Gene Ontology-Genome Annotation Toolkit (HT-GO-GAT), a software developed by S.E. Dowd (unpublished).

Cluster analysis
Utilizing the assembly algorithms incorporated in the SeqManII software (part of the DNASTAR suite from DNASTAR Inc, Madison WI), the 9,159 ESTs were grouped into contigs and clusters by establishing assembly stringencies that generated groupings defined by the analysis protocols of McCarter et al., 2003 [34]. Contigs contain EST members that appear to originate from single transcripts whereas clusters are assemblies of ESTs that could represent transcripts from the same gene but alternate splice isoforms, or in the case of Tortula ESTs, which derive from a population of individual gametophytes, alleles of the same gene. The 9,159 ESTs formed 7,272 contigs and 5,563 clusters, both of which exhibit an average size of 669 nucleotides. However, the longest sequence increased from 1,575 nucleotides for contigs to 2,229 nucleotides for clusters. Clusters varied in content from a single EST (singletons) in 4,362 cases (78%) to 48 ESTs for a single cluster ( Figure 1). The elimination of redundancy during contig building and cluster formation reduced the total number of nucleotides for further analyses from 5.93 million to 4.87 million (contigs) and 3.71 million (clusters). Overall the 9,159 ESTs potentially represent 5,563 genes, a discovery rate of 60%, with 47.6% of the ESTs as singletons. This is an overestimation of gene discovery since several non-overlapping clusters can represent a single gene and from our blast search data this appears to be a possibility, at least in the case of clusters 121 and 204 that appear to independently represent a gene that has a weak similarity to the LEA protein of Caenorhabditis elegans. Even though the ESTs derive from a non-normalized library 96.5% of the clusters still have 5 or fewer EST members.

Transcript abundance
The consensus Cluster sequences were subjected to a BLASTx style search of a custom curated non-redundant database derived from UNIPROT and annotated according to the degree of similarity of the cluster to the highest scoring match in the database of known gene sequences. The degree of similarity was based on the quality of the BLASTx statistical outputs as well as a visual inspection of the aligned sequences between the query and the target. Clusters that generated an HSP-bit score below 70 and or E values higher than 10 -7 were assessed manually for all possible alignments taking into account the alignment length, number of identical matches, gaps, and positive replacements. Utilizing these criteria we were able to annotate 3,321 (59.7 %) of the 5,563 clusters by their similarity to known genes within the database. This also meant that 2,242 of the clusters, or 40.3%, represent sequences that have no known counterpart in the public databases and that we categorize as unknowns. Table 1 lists the 30 most abundant EST clusters derived from the Tortula rehydration EST collection. These transcripts only account for 8.4% of the generated ESTs, however. Ten of the most abundantly represented transcripts encode proteins that do not match any of the sequences in the databases searched in this study and are designated as unknowns. Seven abundant transcripts appear to encode proteins that belong to a class of proteins known as Late Embryogenesis Abundant (LEA) proteins, although the relatively low HSP bit scores and high E-values for most of these BLASTx matches has to be considered as a caveat in this assessment. LEA proteins have long been ascribed a protective role for cells that are experiencing dehydration [11,35]. This is a conclusion drawn on the strong correlation between LEA transcript accumulation and water loss, rapid decline in transcript levels upon rehydration, and especially as LEA gene expression relates to the programmed desiccation stage of seed maturation. If one can conclude that LEA protein transcripts are abundant in the rehydration transcriptome of Tortula ruralis, which is consistent with some of our earlier findings [32], then it is also possible that these proteins are involved in either protection of cellular integrity during the initial phases following rehydration when cell disruption is apparent in bryophytes [36], or are actively involved in the restoration of cells damaged by a desiccation event. This difference in the response of LEA gene expression between Tortula and what has been reported for angiosperms may also be a reflection of what we believe to be a more primitive mechanism of dehydration tolerance, and perhaps a more primitive role for and control of LEA gene expression. Figure 1 Histogram of distribution of ESTs by cluster size.

Histogram of distribution of ESTs by cluster size
Other abundant transcripts appear to fall into the membrane transport (aquaporin, cysteine rich proteins, channel and pore proteins) and proteins that can be associated with plant stress events (metallothionine, esterase, and rubredoxin). The Early Light Inducible Protein A (ELIP-A) transcript is the only member of the abundant transcripts that we have previously reported [33] as belonging to the group of proteins we have termed, rehydrins [31]. These proteins have been suggested to be synthesized in response to stress-induced photo-damage within the Tortula chloroplast and may play a protective or repair function for the photosynthetic apparatus [37].
Although transcript abundance may reflect the metabolic or physiological needs of the moss during the rehydration phase of a wet/dry/wet cycle it would be more desirable to know how these transcripts are recruited and utilized by the translational machinery to make the proteins that actually contribute to the recovery of the gametophytes following rehydration. Such information is beyond the scope of this analysis but the identification and isolation of the Clusters that represent the Tortula rehydration transcriptome does represent the first major step for such a pursuit.

Functional classification of transcripts
Physiological, biochemical, and molecular data all point towards an active period of cellular activity, presumably for repair and recovery from desiccation induced damage, during the first two hours following rehydration of dried gametophytes [13,15]. The identity of the more abundant transcripts (Table 1.) does provide some insight into the nature of the metabolic activity that is associated with rehydration, at least it gives an indication of what metabolic processes may be of prime concern as the plant recovers from desiccation. However, the functional classification of the Tortula transcripts present during the initial phases following rehydration using the Gene Ontology (GO) classification system paints a broader view of the possible metabolic activity of the gametophytic cells at this time. This does, of course, come with the understanding that these are transcript based analyses and do not  directly reflect protein levels which would offer a more definitive assessment of the metabolic capability of the cells during rehydration.
Functional classification of the Tortula rehydration transcripts was achieved by matching the Tortula clusters to characterized protein domains in a combined protein database, using HT-GO-GAT (Materials and Methods), which allowed us to assign GO terms to each cluster. The assignment of GO terms made it possible to place the clusters into the GO hierarchy which can be viewed by use of an AmiGo browser. Of the 3, 321 clusters that exhibited significant similarity to known genes in the public databases, 2,203 (66% of annotated or 40% of all clusters) represent genes that contain conserved protein domains that have known biochemical and physiological functions in other organisms and map to the GO hierarchy. The GO representations for the Tortula rehydration clusters are presented in Tables 2 through 4. The representations are segregated into the three main organizing principles of GO: biological process (Table 2), cellular component (Table 3), and molecular function (  (Table 2). Within physiological processes 80% of the clusters were associated with metabolism and 20% cell growth and maintenance (it is within this group that most of the overlap occurs with the Cellular Processes category). The distribution is not surprising for ESTs (clusters) derived from a tissue that is harvested at a time of metabolic upheaval such as rehydration and recovery from the desiccated state. In support of this notion are the almost identical distributions of ESTs observed for cDNA collections from protonemal tissues of the moss Physcomitrella patens following various hormonal treatments designed to illicit developmental perturbations and metabolic switching ( Table 5, and from data reported by Nishiyama et al., 2003 [38]).
The subcategory distributions within the Physiological and Cellular processes also seem to reflect the nature of the cellular disturbances that result from a desiccationrehydration event. Processes involved in metabolite and ion transport within and between cells are represented by 17% of the clusters that map to Biological Processes, and  almost 86% of those that map to the Cell Growth and Maintenance subcategory. Within the Metabolism subcategory of Physiological processes 40% (32% of total) of the clusters map to protein metabolism (synthesis) and 37% (30% of total) map to biosynthetic processes. The considerable representation within the aforementioned three subcategories is consistent with much of our biochemical and physiological evidence concerning the metabolic activity and emphasis in gametophytic cells during rehydration [15,24,25]. In particular, following rehydration the protein synthetic machinery is rapidly reconstituted, having been dismantled during drying, to direct the synthesis of pattern of proteins termed rehydrins that appear to be a crucial aspect of the desiccation tolerance mechanism of Tortula ruralis (as described above). The importance of the protein synthetic machinery and its re-establishment following rehydration is also highlighted by the preponderance of clusters associated with the Ribosomal subcategory of the Cellular Component Classification; 40% of the Cytoplasm category (23% overall).
Only 982 clusters (45% of the 2,203 that constitute the mapped population) map within the Cellular Component Classification and are presumably associated with structural functions (Table 3). Of these 982 clusters, almost all map to the Cell classification within which 72% map to Intracellular components and 38% to the Membrane category. Within these categories, representation is most significant in the ribosomal, integral membrane protein, and plastid subcategories. As discussed above the importance of protein synthesis during recovery from desiccation tolerance may explain the preponderance of clusters associated with ribosomal structural com-ponents as well as the number of clusters that are associated with the membrane and plastid subcategories. Ultrastructural studies of dried and rehydrated gametophytes clearly indicate that the inrush of water during rehydration disrupts membranes and causes a disorganization of the internal granal structures and swelling of the large chloroplasts of the Tortula leaf cells [36,39].
Of the clusters that can be mapped into GO hierarchies, 90% can be ascribed molecular functions (Table 4). Of the major categories, Binding activity (38%), Catalytic activity (58.5%), Structural Molecule activity (13%), and Transporter activity (14%) are best represented in the cluster collection. Metal ion, nucleic acid, and nucleotide binding are the most represented subcategories within the Binding activity category perhaps reflective of the need for biosynthetic and repair activity associated with rehydration of moss cells. Within the Catalytic activity category the majority of the clusters are associated with Hydrolase (19%), Transferase (15.5%), Oxireductase (13.5%), and Kinase (7%) activities. Almost half of the clusters associated with the Transferase activity subcategory map as transferring phosphate-containing groups. Each one of these subcategories represent catalytic activities that could be argued as important for a cell to recover from a major metabolic perturbation such as that seen during rehydration. The significant representation within the Kinase and phosphate transfer categories also suggests an active metabolic control "program" occurs when the desiccated cells receive water and attempt to recover from the damage. It is also intriguing that there is a significant representation within the Transporter subcategory as little is known of this group within the context of desiccation tolerance in bryophytes, although solute (osmolytes) and sugar   Of particular interest to this study are clusters that represent gene expression control factors both at the transcriptional (41 clusters) and translational level (54 clusters). These clusters, along with those that represent biochemical control mechanisms for signaling and gene expression at the protein level, such as kinases (90 clusters) and phosphate-transfer activities (154 clusters), may represent critical elements in the activation and execution of the cellular recovery processes necessary for the mechanism of desiccation tolerance exhibited by Tortula ruralis. This is of importance because of the key position of bryophytes in the evolution of desiccation tolerance in plants. Our main hypothesis is that an elucidation of the signaling and activation pathways for the rehydration response in this assumed primitive tolerance mechanism could have major implications for the study of stress tolerance mechanisms in all plants and thus these clusters represent important targets for further study at the molecular and biochemical levels.

GO based comparison with Physcomitrella patens EST collections
The representation of the Tortula rehydration clusters throughout the GO mapping system is indicative of the emphases on, but not expression levels of, particular cellular activities represented in the moss gametophytes during this period of a wet and dry cycle. In an attempt to assess if the accents on particular cellular activities indicated for rehydrated gametophytes are characteristic of the rehydration induced metabolic state or are simply indicative of processes associated with normally active bryophyte cells, we compared the representation of the Tortula rehydration clusters within the GO categories with similar "clusters" from Physcomitrella patens, the only other bryophyte that has similar genomic level information. The majority of the Physcomitrella "clusters" are derived from a large EST collection representing transcripts from both untreated and hormone induced (to switch developmental pathways) cells of protonemal cultures. The Physcomitrella ESTs are described by Nishiyama et al., [38] and were obtained from Physcobase http://moss.nibb.ac.jp/ as assembled contigs (assembled in an identical fashion to what we designate as clusters). In total 22,885 Physcomitrella contigs, derived from 102553 ESTs obtained from Physcobase and Genbank, were subjected to a BLASTx search, as described for the Tortula clusters using HT-GO-GAT. Of the 22,885 contigs 9,981 (43.6%) represent genes that contain conserved protein domains that have known biochemical and physiological functions in other organisms and map to the GO hierarchy.
There are two caveats for this comparison; 1) differences in representation may simply reflect species differences in the emphasis on individual classes of cellular activities between Tortula and Physcomitrella, or 2) differences in representation may reflect differences in the emphasis on individual classes of cellular activities between mature gametophytes (Tortula) and protonema (Physcomitrella). Until a comparison of GO mapping distributions can be made directly between clusters derived from an EST collection from hydrated control gametophytes from Tortula with those from the rehydration collection these caveats   [4,23] and what is known about plant responses to abiotic stress in general [8,13]. The differences evident in the comparison are presented in Table 5, where the extent of the similarity in the distribution of the clusters are expressed as the ratio of representation for Tortula to representation for Physcomitrella (T:P). In the Biological category the most striking differences in GO representation of Tortula clusters occurs in categories where the percentage representations are relatively low but the numbers of clusters are substantial. In the level 4 relationship, Photosynthesis, the representation for Tortula is 3.31% compared to 1.38% for Physcomitrella, a 2.4 fold difference, the majority of which is accounted for by the difference in the representation levels for the light reaction category. The rapid recovery of photosynthesis is critical for the recovery of bryophyte cells, particularly in regards to the production of energy and reducing power for the metabolic activity associated with repair and reconstitution of the gametophytic cells. Chloroplast structure in Tortula is severely disrupted, especially if desiccation occurred rapidly, in the first few hours following rehydration [36,40] but recovers quickly. The difference in representation in this category is consistent with these observations and thus may reflect the greater need for a supply of a diversity of photosynthetic components in rehydrated Tortula than in Physcomitrella cells that have not experienced a disruption in the photosynthetic apparatus. Similar inferences can be made concerning the differences in representation observed for carbon utilization (T:P of 2.0) and oxidative phosphorylation (T:P of 2.1) for Tortula in that mitochondrial activity and integrity are also compromised during rehydration [25,36]. Other differences in representation between Tortula and Physcomitrella mappings relate to a higher representation for Tortula in the categories that relate to responses to external stimuli and responses to stress both of which would seem consistent with the emphasis that cellular activity for Tortula would have in comparison to unstressed Phys-comitrella cells. In particular the increased representation within the response to oxidative stress is of note, as an elevated protection of cellular integrity from the damaging reactive oxygen species (ROS) typically associated with a desiccation rehydration event is a distinctive component of desiccation tolerant bryophytes when compared to their desiccation sensitive relatives such as Physcomitrella [41].
The above observations concerning the more extensive representation of clusters within the GO mappings related to organelle function in Tortula are mirrored in the Cellular Component Level 2 classification mappings. In the Cellular Component classification Tortula exhibits an almost two fold difference in representation within categories associated with either the chloroplast or mitochondria, such as Extrachromsomal DNA (T:P of 1.8), Thylakoid (T:P of 1.8), Mitochondrial membrane (T:P of 1.9), and both Inner and Outer membranes (T:P of 1.8).
In this case the categories are generally related to genes representing membrane components and since it is the organelle membranes that exhibit the majority of the damage during desiccation and rehydration it is consistent that these categories would be better represented in the Tortula cluster mappings than those for Physcomitrella.
In addition to the organellar related classifications the Tortula clusters also exhibit a higher representation within the Ribonucleoprotein complex category (T:P of 1.8) which in all likelihood reflects an emphasis on ribosomal components since a similar difference in representation is seen in the Structural Constituent of the Ribosome category (T:P of 2.0) within the Molecular Function Level 2 classification. Again such differences in representation in the comparison between Tortula and Physcomitrella GO mappings are consistent with our previous studies on the responses of Tortula gametophytes to desiccation and rehydration and comparisons to non-stressed bryophyte tissues. Protein synthesis is critical to the recovery of Tortula cells following a desiccation event [11,13,15] not only for the synthesis of proteins damaged by the stress of desiccation but also directing the response to the stress at the level of gene expression [26,29]. Early studies determined that the speed at which desiccation occurred has a marked effect on both the rate of recovery of protein synthesis and the rate at which either new ribosomes are formed or pre-existing ones are repaired [42,43], rapid desiccation results in a more prolonged recovery of normal protein synthetic levels and also slows the reconstitution of ribosomes upon rehydration. Since the Tortula clusters are derived from ESTs of rehydrated moss that was dried rapidly, the greater representation in ribosome related GO mappings for this collection compared to the Physcomitrella clusters, that represent transcripts from cells where presumably normal ribosomal turnover and synthetic rates are prevalent, is consistent with the biological state of the Tortula cells. The emphasis on protein synthesis in rehydrated Tortula cells compared to those of Physcomitrella is also evident in the comparison of representations within the Translation regulator, Translation factor, and Translation elongation factor activity GO mappings.
Other differences seen in the Molecular function classification, such as the greater representation within the Tortula collection of clusters involved in Two-component Sensor or Channel/pore class Transporter activity designated mappings, offer novel possibilities for investigation into rehydration metabolism that have not been indicated as important until now. The individual identity of the clusters that map to these categories should offer possible hypotheses that can be tested in our future research.  (24 enzymes). All of these pathways have been previously associated with the cellular recovery processes associated with rehydrated moss gametophytes [14,15]. Within the metabolic activities not represented by the Tortula clusters only the lack of ascorbate metabolizing enzymes appears unusual as ascorbate has been well documented as an important metabolite in the protection of moss gametophytes from oxidative damage during stress. This however appears to be the result of a limitation in the assignment of EC numbers since several Tortula clusters show significant similarities to enzymes involved in ascorbate metabolism in the original BLASTx search used to generate the GO mappings. The limitation of the KEGG based classification can also been seen in the poor representation of Tortula clusters in the other KEGG pathways (Genetic Information Processing, Environmental Information processing, and Cellular Processing) which is surprising given the number of clusters that were identified in the BLASTx search and in the GO databases as being contained within these classifications. As an example, none of the 239 clusters that map in the GO hierarchy as structural components of the ribosome ( Table 4) or any of the clusters that mapped to transcriptional and translational components were contained in the corresponding KEGG pathways. Thus although useful information can be gleaned from the KEGG classification system and metabolic pathway mappings, especially in practical terms for functional studies using individual clusters, it has some major limitations for drawing any broad based hypotheses from the representation of Tortula clusters within each pathway.

ORF based assessment of Tortula clusters
Of the 5,563 consensus cluster sequences used in the BLASTx search of our database (see above) 40.3% failed to exhibit sufficient similarity (did not meet set criteria, see above) with known sequences to allow for an accurate annotation of the cluster. It is possible that these contigs, rather than containing novel amino-acid coding regions, contain mainly 3' or 5' untranslated regions (UTRs) or coding regions that are so short as to render them incapable of generating a significant similarity score. In order to investigate this possibility we examined the three classes of contigs, those with significant similarity scores; "good hits", those that generated poor similarity scores; "false hits", and those that failed to generate any scored similarity; "no hits", to determine the longest open reading frame (ORF). We limited the ORF determination to those clusters that contain an AUG codon in the 5' to 3' direction of the clone in any one of the three possible reading frames (the cDNAs were directionally cloned). The results of this analysis are shown in Figure 2. Of the 5,563 clusters generated in the study, 4,789 generated ORFs under the limitations imposed by the analysis. Of these 2,983 were classified as "good hits", 1,564 as "false hits" and 242 as "no hits". Those clusters that are classified as "good hits" exhibit ORFs that are in general evenly distributed from 20-40 amino acids long to 220-240 amino acids long. The clusters that are classified as "false hits" from the BLASTx search do have a relatively larger proportion of shorter ORFs, in the 20-40 and 40-60 amino acid range but also a substantial proportion that are much longer. The distribution of ORFs in this category ('false hits") does not appear to be sufficiently skewed from that for the "good hit" clusters to render them incapable of generating similarity scores in the BLASTx search. This would suggest that these clusters do contain novel amino acid sequences that are not represented in the public protein databases by sequences sufficiently similar to generate significant HSP Bit scores. In addition, the distribution of ORFs in the "no hit" cluster category are distributed in a similar manner to those of the "good hit" classification and so are also likely to represent clusters encoding proteins with novel amino acid sequences. The clusters contained within the "false hit" and "no hit" categories are of particular interest in our search for novel genes and pathways that are associated with the ability of certain plants to acquire vegetative desiccation tolerance.

Conserved gene comparison to Physcomitrella and Arabidopsis
In this analysis the Physcomitrella and Arabidopsis databases were independently used in a BLASTx search using the 5,563 Tortula clusters as query sequences. Even though we used the entire 5,563 cluster sequences as individual queries only 3,321 actually represent sequences that could be annotated using the criteria for similarity discussed previously. When the Tortula clusters were used as queries against the Physcomitrella database, 5,554 generated HSP Bit scores however only 282 exhibited a level of conservation of sequence that passed the criteria established for annotation of our Tortula clusters (HSP Bits score above 70 and or E values of 10 -7 or less, see above). Against the Arabidopsis database only 927 Tortula clusters generated HSP Bit scores but 612 had a level of conservation sufficient to be considered reliable matches to Arabidopsis genes. These observations are difficult to rationalize and may simply reflect the inequality of the target databases. The Physcomitrella database, which is generated from a relatively small EST collection, is much smaller than that for Arabidopsis, which is gleaned from the full genomic sequence. This may explain why although most Tortula clusters generated HSP Bit scores only a relative few were capable of producing scores sufficiently high enough, and with low enough error probabilities, for confident assignment of co-identity with a Physcomitrella contig. Obviously Tortula ruralis is more closely related to Physcomitrella patens, as they are both bryophytes, than to Arabidopsis thaliana but the phylogenetic distance between the two bryophytes is still substantial, which may also help explain the paucity of high value matches and the large number of low value hits. The Arabidopsis database is much more comprehensive and would be expected to generate more significant matches as observed and perhaps because of the evolutionary distance between the two plants more queries that do not generate an HSP.
The twenty gene products that exhibit the highest level of conservation for both comparisons, Physcomitrella (E values of 0 to -39) and Arabidopsis (E values of 0 to -55) are presented in Table 6. Several common highly conserved genes are present in both comparisons, including genes involved in cell structure (tubulin and actin), protein synthesis (ribosomal proteins and elongation factors), protein-turnover (polyubiquitins), stress proteins (heat shock), chromosomal proteins (histones), signal transduction (ADP-ribosylation factor), and binding proteins (calmodulin). Interestingly there are differences between the two lists, which may reflect the relative phylogenetic distances between the three species. The three most conserved proteins between Tortula and Physcomitrella did not register as highly conserved between Tortula and Arabidopsis, the most conserved protein between Tortula and Physcomitrella, the rRNA intron-encoded homing endonuclease, did not generate a match at all with an Arabidopsis counterpart. Interestingly, the most conserved photosynthesis related protein between Tortula and Physcomitrella is a chlorophyll a/b-binding protein which generates a HSP Bit score of 205 and E value of 3.0E -53 , the Arabidopsis counterpart generates a HSP Bit score of 82 and E value of 8.9E-15. In this case the level of conservation of this protein within the comparisons appears to reflect the phylogenetic relationships that exist between the three plants. However, the most conserved photosynthesis related protein between Tortula and Arabidopsis is a photosystem I P700 apoprotein A2 which generates a HSP Bit

Conclusions
Bryophytes have an important and underestimated place in the study of plant responses to water deficits, in particular desiccation tolerance. Bryophytes occupy what we believe to be one of the most primitive states, along with algae, in the evolution of desiccation tolerance and represent, in all probability, the stage in the emergence of plants from a fresh water environment to occupy the various niches available on dry land [13]. Unfortunately, it is only with the advent of the development of Physcomitrella patens as a model plant for molecular genetic studies, fired by its particular ability to perform efficient and homologous recombination in vitro, that bryophyte genomics has become a topic of some interest. Physcomitrella is rapidly becoming the model of choice for developmental and transgenic studies [44] because of its ease of manipulation and indeed there are plans in place to sequence its genome [45]. Tortula ruralis on the other hand has long been established as an attractive model for the analysis of environmental stress tolerance, in particular desiccation tolerance, It has been a very useful model in assessing structural, physiological, biochemical and genetic (gene expression) aspects of severe dehydration of plant cells and mechanisms by which primitive plants respond to and survive protoplasmic water loss [15,25,46]. The progression of the Tortula model into genomics is a critical aspect in the development, along with a transformation system and assessment of its ability for efficient homologous recombination, into a more useful and manipulable model for understanding desiccation tolerance and the nature of extremophiles. In addition to the importance of Tortula for stress biology, the establishment of a second and contrasting bryophyte model, especially with regards to genomics, is essential for the validation and usefulness of the information gained from the analysis of the Physcomitrella genome and the general principles gleaned from its use as a plant model. It is to these ends that we have initiated this study into the transcriptome of Tortula gametophytes as they respond to a major stress event, in this case a combination of desiccation and rehydration.
The rehydration transcriptome of Tortula, as defined by the clusters presented herein, is remarkably consistent with what we know about the desiccation response for this bryophyte and its metabolic activity during the first two hours following rehydration. The GO mapping of the Tortula clusters enabled a broad look at what cellular activities appear to be emphasized in the rehydrated gametophytes and in agreement with our previous biochemical analyses highlighted the prominence of the protein synthetic machinery, both in structure and control, membrane structure and metabolism, and the need to reestablish plastid integrity. These observations were bolstered by the comparative GO analysis using the extensive EST collection generated for the desiccation sensitive moss Physcomitrella patens. The GO analysis has also provided fuel for new investigations and hypotheses into the role of other cellular processes, such as membrane transport, phosphorylation and signal transduction, in the mechanisms that enable desiccation tolerance in plants.
Signal transduction is especially intriguing with regards to desiccation tolerance in this bryophyte as it appears to rely on alterations in translational control to effect a response to desiccation in contrast to the well characterized transcriptional responses, and associated signaling pathways, associated with abiotic stress in the Angiosperms [5,7]. In addition to the functional based analyses, the simple abundance estimates has also correlated well with previous work and has given further credence to the notion that LEA proteins may also play a role in maintaining cellular integrity when water is reintroduced into desiccated plant tissues. The strong correlation between what is known about the mechanism for desiccation tolerance employed by Tortula ruralis and what can be inferred from the analyses of the Tortula rehydration EST collection gives a measure of confidence not only in the value of a bioinformatics approach to gain a view of a particular transcriptome but also in the basis for new hypotheses and research directions that are generated from them.
Although the type of analyses presented here are extremely useful in assessing the types of transcripts present in a particular tissue at a particular time and in response to some perturbation, either external or internal in origin, and generating hypotheses concerning the functional aspects of the transcriptome they are intrinsically correlative in nature. In order to gain a more direct picture of the transcriptome and more importantly, with regards to functional assessments, the "translatome", the bioinformatics must be linked to a detailed expression profile of the transcripts represented within the ESTs generated to investigate the particular biological response or mechanism of interest. To this end the clusters described in this report form the basis of a Tortula gametophyte microarray designed for the profiling of both the extant transcriptome and the associated translatome and their response to des-iccation and rehydration under various conditions. The array and the experimental design of the expression profiling will allow us to generate a more accurate assessment of both transcript levels and transcript recruitment into the protein synthetic machinery during, and recovery from, desiccation in Tortula ruralis. In combination with the bioinformatics analyses presented here, the expression profiling will allow us to generate a more complete picture of the cellular response of a tolerant plant species, an extremophile, to an extreme abiotic stress event.

Source material
Tortula ruralis ([Hedw.] Gaertn, Meyer and Scherb), also classified as Syntrichia ruralis, gemetophytes were collected, harvested, and stored as described previously [27]. For experimental purposes, gametophyte tissue was hydrated for 48 h to fully recover from dried storage and trimmed to remove stem material. Rapid-dried moss was prepared by placing the cropped gametophytes in a closed atmosphere of 0% relative humidity (RH) on 3-mm filter paper over activated silica gel in a Petri dish. This drying regime resulted in the attainment of the air-dried state within 30 min. The gametophytes remained in this atmosphere overnight to ensure desiccation and prior to library construction were rehydrated for 2 h in deionized water at 18°C in the light.

Library construction
Total RNA was isolated from the rehydrated gametophytes by a series of phenol extractions as described by Lane and Tumiatis Kennedy [47]. PolyA RNA was isolated from the total RNA fraction by oligo-(dT) chromatography [48], using DynaBeads oligo-(dT25) (Dynal, Inc., Lake Success, NY, USA), through two rounds of selection according to the manufacturer's instruction. The purified polyA fraction was used as a template for double-stranded cDNA synthesis using the Superscript Plasmid System (Invitrogen Life Sciences, Carlsbad, CA, USA). The resultant cDNA population was cloned into the pSPORT1 vector, according to manufacturer's instructions, to construct the unidirectional rehydration cDNA library. Small scale sequencing of 384 random clones confirmed the directional aspect of the inserts, the plant nature of the source cDNAs, and the frequency of positive clones. The average insert length for the library was assessed at 1.2 Kb. A subset of 10, 368 randomly picked positive clones (white in a blue-white X-Gal/IPTG based screen) were transferred to individual wells in 384 well plates containing suitable growth medium for storage, replication, and sequencing.

Sequence analysis
High throughput sequencing of the inserts contained in the 10,368 individual clones was performed using "rolling circle amplification" of the individual plasmids to generate suitable sequencing templates at the Joint Genome Institute, Walnut Creek, CA, U.S.A. Clones were sequenced using primers specific for vector sequence upstream of the multiple cloning site and at the 5' end of the cDNA insert. The sequences were delivered as primary binary files (raw trace files), which were then processed through our sequencing, pipeline as described below.

Sequence preparation
Trace files were entered into SeqMan II and quality screening performed with medium stringency corresponding to a phred threshold value of 12. Vector searching was performed using the pSPORT™ vector both in forward and reverse orientations with minimum match length of 7, connect distance of 3, Maximum register shift of 10, Minimum NW percent match of 90, gap weight of 0 and length weight of 2. Contaminant screening was performed with minimum match of 25. Following the quality screening the sequences were exported as a single FASTA file and polyA tails were removed using a custom PERL script. Several hundred sequences that contained internal polyA stretches were manually identified. These sequences were manually edited to remove polyA and trailing sequences.

Contigs
Edited sequences were then entered into SeqMan for assembly into contigs. Assembly for contigs was performed with match size of 50, minimum match percentage of 97, minimum sequence length of 100.

Clusters
Consensus contig sequences were exported from SeqMan as individual files and entered into a separate SeqMan project and reassembled using the same parameters used for contigs but with a minimum match percentage of 90. These were considered clusters.

EST submission
EST submission to GenBank was performed using the USDA-ARS Livestock Issues Research Unit's (LIRU) High Throughput-Gene Ontology-Genome Annotation Toolkit (HT-GO-GAT). HT-GO-GAT can be obtained from the LIRU website at http://199.133.147.108. A total of 9159 EST sequences were submitted to GenBank and assigned accession numbers CN200321-CN209479

Functional genetics
Functional annotations, Enzyme commission numbers, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway assignments were assigned to cluster sequences using HT-GO-GAT. Consensus sequences derived from Clusters were entered into the software that utilizes custom BLASTx, RPS-BLAST, and relational mySQL databases to identify potential functional assignments based upon sequence and functional domain similarity matching. The