Transcriptome profiling using pyrosequencing shows genes associated with bast fiber development in ramie (Boehmeria nivea L.)

Background Ramie (Boehmeria nivea L.), popularly known as “China grass”, is one of the oldest crops in China and the second most important fiber crop in terms of area sown. Ramie fiber, extracted from the plant bast, is important in the textile industry. However, the molecular mechanism of ramie fiber development remains unknown. Results A whole sequencing run was performed on the 454 GS FLX + platform using four separately pooled parts of ramie bast. This generated 1,030,057 reads with an average length of 457 bp. Among the 58,369 unigenes (13,386 contigs and 44,983 isotigs) that were generated through de novo assembly, 780 were differentially expressed. As a result, 13 genes that belong to the cellulose synthase gene family (four), the expansin gene family (three) and the xyloglucan endotransglucosylase/hydrolase (XTH) gene family (six) were up-regulated in the top part of the bast, which was in contrast to the other three parts. The identification of these 13 concurrently up-regulated unigenes indicated that the early stage (represented by the top part of the bast) might be important for the molecular regulation of ramie fiber development. Further analysis indicated that four of the 13 unigenes from the expansin (two) and XTH (two) families shared a coincident expression pattern during the whole growth season, which implied they were more relevant to ramie fiber development (fiber quality, etc.) during the different seasons than the other genes. Conclusions To the best of our knowledge, this study is the first to characterize ramie fiber development at different developmental stages. The identified transcripts will improve our understanding of the molecular mechanisms involved in ramie fiber development. Moreover, the identified differentially expressed genes will accelerate molecular research on ramie fiber growth and the breeding of ramie with better fiber yields and quality. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-919) contains supplementary material, which is available to authorized users.


Background
Ramie (Boehmeria nivea L.), usually called "China grass", is a perennial herbaceous plant in the nettle family, Urticaceae. Ramie fiber, generated from the stem bast, is an important natural fiber in the textile industry because of the ability to retain shape, reduce wrinkling and introduce a silky luster to the appearance of a fabric [1]. Nevertheless, the flaws, such as resistance to dyeing, confined elasticity and elongation potential and the stiff and brittle nature of the cloth, have encouraged breeding of ramie with better fiber quality. Thus, focusing on the developmental process will benefit in improving ramie fiber yield and quality. To increase our understanding of ramie fiber development and related molecular mechanism, researchers have cloned genes that may related, including GalAT [2], Actin [3], CesA [4], COMT [5] and UDPGDH [6]. However, the restricted single-gene cloning strategy has limited benefits. One way to improve the process is the application of the next-generation sequencing platform, including the 454 FLX Instrument, the ABI SOLiD System and the Illumina Genome Analyzer, which have been applied very successfully in previous studies [7][8][9][10][11][12][13]. For example, the ramie universal transcriptome has been determined, with a preliminary quantification of 51 CesA (cellulose synthase) genes [11]. However, there has been no further research related to ramie fiber development, especially at different growth stages.
A previous study on ramie was carried out using the cultivar, "ZhongZhu 1" [11], which is an elite variety with good fiber quality (with a fiber fineness of about 2000 m/g). However, we chose "1504" (with a fiber fineness of about 2800 m/g) as the best qualified material to study ramie fiber development. In field production, the growth period of ramie fiber could be roughly divided into initiation, fast growth and maturation stages. However, these descriptions did not meet the need for exact experimental conditions, and the timings of their boundaries are vague, which vary in the different growth seasons of ramie. Previous studies on ramie used a fixed timeline [11,14], which may be appropriate for other experiments (e.g., for drought stress), but was not suitable for studies of ramie fiber development. Fortunately, previous studies had extracted the fiber of flax from stem bark [15,16], which we then followed as the sampling method.
Plant cell walls are complex structures composed of polysaccharides, proteins, and lignins. Among the cell wall polysaccharides, cellulose is the main load-bearing wall component [17]. It is the first and most important family in the cell wall, especially with regards to fiber growth and development. Genes in the cellulose synthase superfamily have been studied in model plants, such as rice [18] and Arabidopsis [17]. Expansin, known to have cell wall loosening activity and is involved in cell expansion and other developmental events during which cell wall modification occurs [19], plays important roles during cell elongation [20] and expansion [21], and specifically regulates fiber elongation [22] and fruit ripening [23]. Other studies have highlighted the potential relationship between expansin genes and fiber growth in cotton [24,25]. Moreover, the polysaccharide, xyloglucan, is thought to play an important structural role in the primary cell wall of dicotyledons [26]. Consequently, the xyloglucan endotransglucosylase/hydrolase (XTH) family that was shown to be involved in the formation of the secondary cell walls of vascular tissue [27] (especially in regulating cotton fiber elongation [28][29][30]) indicating the importance of this gene family in studying ramie fiber development. In this study, the ramie transcriptome was constructed from mixed samples, and the differentially expressed genes among these samples were used to verify potential genes that may be related to ramie fiber development. We inferred that the early developmental stage (represented by the top part of the stem bark sample) of ramie fiber growth may play a more important role in the whole developmental period, and the 13 genes that all-up-regulated (respectively up-regulated in one of the four samples than rest three) in sample T (top part of the bast), especially the four from the expansin (two) and XTH (two) gene families, should be subjected to further in-depth characterization.

Results
De novo assembly of the transcriptome library The read lengths of four separately pooled samples, which were evenly distributed throughout the ramie stem (stem shoot with Leaves: L; Top part of stem bark: T; Middle part of stem bark: M; Bottom part of stem bark: B), as well as the overall assembled sequences, are shown in Table 1 and Additional file 1. The average read lengths were 449, 453, 442 and 460 bp for the four samples, respectively. The whole run generated 1,030,057 reads with an average length of 457 bp. After removing the low-quality reads (Q value <20), the clean reads were assembled de novo by the Newbler software, which is frequently used in de novo pyrosequencing projects [31], using default settings. As a result, 17,881 contigs and 52,996 isotigs were generated. After elimination of redundancy, 58,369 unigenes (13,386 contigs and 44,983 isotigs) were generated; the contigs had an average length of 1,343 bp (Table 1). Among the unigenes, 51,025 (87.42%) and 8,025 (13.75%) sequences were longer than 200 bp and 1000 bp, respectively. The lengths of the assembled unigenes ranged from 90 to 7641 bp (Additional file 1).

Gene annotation and functional pathway construction
The GetORF tool from EMBOSS [32] was used for gene prediction, which identified all of the assembled contigs and isotigs to encode protein sequences and were qualified for further annotation. The predicted protein sequences were then annotated to the non-redundant protein databases in GenBank and Swiss-Prot using BLASTp under the threshold of 1e-5. The first entry was regarded as the annotation information for the corresponding unigenes. However, there might be a situation where the "true" BLAST result was concealed by the first non-informative entry (e.g., an unnamed protein product) when choosing the first entry as the annotation information. To overcome this obstacle, we repeated the BLAST of the unigenes that had been annotated to meaningless results, and the output results were expanded to 15 entries (if possible) for manual selection. As a result, 156,970 entries from 11,555  [33]. In addition, the predicted protein sequences were compared to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [34] using a bidirectional BLAST under the threshold of 1e-5, which resulted in a total of 3076 KO (KEGG Orthology) numbers that corresponded to particular sequences. The pathway distribution is shown in Additional file 7 and detailed in Additional file 8.

Differential expression redundancy and enrichment analyses
As indicated in Table 2 Table 2). This suggested that either the degree of fiber development was similar between samples M and B or gene regulation only has a minor influence on fiber growth during the later developmental stages. Among the differentially expressed unigenes, the enrichment of GO terms (Table 3) and KEGG pathways ( Table 4) was dependent on the ratio of divergence to all unigenes (false discovery rate (FDR) <0.01). As indicated in Table 3, the two conjunct GO terms enriched in sample T (compared with L, M and B) were the external encapsulating structure (GO: 0030312) and the extracellular region (GO:0005576). Thus, the enrichment of conjunct GO terms in one of the four samples compared with the other three happened only in sample T, which implied that the early stage of ramie fiber development (represented by sample T) may be more sensitive to molecular regulation. Meanwhile, the numbers of unigenes that made up these two GO terms were fairly small (388 for the external encapsulating structure and 681 for the extracellular region, shown in Additional file 5 and listed in Additional file 6). Additionally, four KEGG pathways: pathogenic Escherichia coli infection; the gap junction; the photosynthesis-antenna proteins and the cutin and the suberine and the wax biosynthesis pathways (Table 4) were simultaneously enriched in sample L compared with the other three samples.

Unigenes up-regulated in the top part of the bast (sample T)
Among all the unigenes that had differential expression patterns, most of the unigenes that belong to the cellulose synthase (four out of five), expansin (three out of four), and the XTH (six out of eight, listed in Additional file 9) gene families were included in the all-up pattern from sample T. Specifically, isotig01514_77, isotig06919_78, iso-tig06943_25 and isotig07154_19 (Figure 2A, B, C and D, respectively) from the cellulose synthase superfamily; iso-tig10345_23, isotig02054_18 and isotig01251_6 ( Figure 2E, F and G, respectively) from the expansin family; and HRX1MBH01BLWDW_9, isotig00610_25, isotig01663_22, isotig04660_8, isotig09773_10 and contig04902_19 ( Figure 2H, I, J, K, L and M, respectively) from the XTH family were highlighted for their coincident all-up expression patterns in sample T (up-regulated in sample T than in samples L, M and B, respectively).
Therefore, we hypothesized that the early fiber developmental stage, represented by sample T (Figure 3), has a more important role in ramie fiber growth than the other stages. Subsequently, these 13 unigenes were picked out for quantitative real-time reverse transcription PCR (qRT-PCR) quantification using similarly pooled samples from the first season (FirS), the second season Table 2 Numbers of unigenes that had differential expression patterns between each two-sample combination (SecS) and the third season (ThiS). The overall expression quantities are shown by their colors ( Figure 2N) that were translated from the quantification results (−log 10 E). Generally speaking, the expression pattern trends for FirS ( Figure 2A1 to M1) were similar to the Reads Per Kilobase of exon model per Million mapped reads (RPKM) values (A0 to M0), whereas there were differences between every unigene in terms of timing, which coincided with the phenomenon that ramie fiber has differing qualities during each harvest period [35].   When we compared the 13 unigenes to investigate whether they were included in the two GO terms that were enriched in sample T (GO: 0030312 and GO: 0005576), six (isotig01251_6 and isotig02054_18 from the expansin family and isotig00610_25, isotig01663_22, isotig04660_8 and isotig09773_10 from the XTH family) were assigned to both GO terms, while two (isotig10345_ 23 from expansin and HRX1MBH01BLWDW_9 from the XTH families) were only included in the former GO term (GO: 0030312) and one (contig04902_19 from XTH family) was only assigned to the latter one (GO: 0005576). None of the specifically expressed cellulose synthase genes (isotig01514_77, isotig06919_78, isotig06943_25and iso-tig07154_19) were assigned to either of the two GO terms (Additional file 6).

Additional features included in the transcriptome data
After annotation, the statistical numbers of sequences from different species that matched ramie unigenes were calculated from the annotation features. As displayed in Figure 4, the five most abundant species were Vitis vinifera (32.05%), Ricinus communis (22.28%), Glycine max (14.17%), Medicago truncatula (6.39%) and Arabidopsis thaliana (4.53%), representing around 80% of all the species that were annotated. This was very similar to previous results for ramie transcriptome sequencing [11]. Any dissimilarity may be attributed to the differences in the parts sampled in this study. In addition, the codon usage frequency was calculated (Additional file 11) for homologous sequence cloning using the COnsensus-DEgenerate Hybrid Oligonucleotide Primer (CODEHOP) method [36].

Sample preparation and pyrosequencing
Ramie is a non-model highly heterozygosis perennial crop, known as "China grass". Little previous genetic research has been conducted on this crop, especially on fiber development. To standardize experimental conditions, a fixed timeline was set in previous studies [11,14]. However, the growth periods vary between ramie plants in different seasons; therefore, it is difficult for researchers to ascertain the exact standard (fixed timing or assured phenotypes) for sampling that represents the different fiber growth stages. In this study, we used the sampling method that has been successfully applied to flax [15,16]. Accordingly, samples that represented no bast fiber cell formation (sample L), initial thickening (sample T), obvious thickening (sample M) and end of thickening (samples B) were generated ( Figure 3A, B, C and D, respectively) from different parts of ramie from the start of fiber ripening (the aboveground stem about to turn pitchy, as indicated in Figure 3E). Correspondingly, the pyrosequencing output presented obvious and interesting dissimilarities among the four samples, represented as differentially expressed genes ( Table 2 and Figure 1), and differentially enriched GO terms (Table 3) and KEGG pathways ( Table 4). The 454 FLX + platform produced average read lengths in our library of 457 bp, which improved the subsequent de novo assembly results. However, the vague boundary of our sampling standard (difficult to ascertain the exact boundary among the four samples through different timings, as well as the initiation of the fiber ripening stage) resulted in the quantitative results in FirS ( Figure 2A1 to M1) being dissimilar to the RPKM values ( Figure 2A0 to M0), which suggested that our sampling method could be further standardized for future studies of ramie fiber development.

The three gene families participating in ramie fiber development
Cellulose is the main component in plant cell walls [17]. Therefore, genes in the cellulose synthase superfamily, which have been shown to be associated with primary [37][38][39] or secondary [40][41][42] cell wall formation in Arabidopsis, were the first genes we investigated. A previous study identified 51 expressed sequence tags that belong to the cellulose synthase family [11] in ramie. Along with cellulose, polysaccharide xyloglucan is thought to play an important structural role in the primary cell walls of dicotyledons [26], which means that the XTH family, proven to have a function during the formation of secondary cell walls in vascular tissue [27] (especially in regulating cotton fiber elongation [28][29][30]), would be important in ramie fiber development. Finally, expansin, which has important roles in cell elongation [20] and expansion [21], is also associated with cotton fiber development [22,24,25]. Thus, we concentrated on the above three gene families in this study. As a result, most of the differentially expressed unigenes from these three gene families shared the all-up expression pattern in sample T (four out of five unigenes from the cellulose synthase superfamily, three out of four unigenes from the expansin gene family and six out of eight unigenes from the XTH gene family; Additional file 9), which showed that the early stage of ramie fiber growth (initiation of fiber cell thickening) is the most important for molecular regulation (especially for these three gene families) and for fiber development. In total, 96 unigenes showed the all-up expression pattern in sample T (Additional file 9), which should be subjected to further studies as a priority.

Expression quantities of the unigenes that shared the all-up pattern in sample T
The variant numbers of all-up expressed unigenes (those that were respectively up-regulated in one of the four samples compared with the other three) in samples L (99) and T (96) compared with samples M (20) and B (13, listed in Additional file 9) were interesting. This result suggested that ramie grows more vigorously at the start of stem fiber growth. Furthermore, the results that: (1) two GO terms (GO: 0030312 and GO: 0005576) were concurrently enriched in sample T compared with the other three samples (samples L, M and B); and (2) most unigenes belonging to the three gene families (cellulose synthase superfamily, expansin gene family and XTH gene family) were all-up-regulated in sample T also suggested that the top part of the stem bark (sample T) was the most important part for fiber growth.
The results of qRT-PCR analysis of these 13 unigenes from similarly assayed samples across the whole growing season (FirS, SecS and ThiS) were interesting. First, similar trends between the FirS and RPKM results (A1 to M1 and A0 to M0, respectively) could indicate the effectiveness for the whole transcriptome data. Nevertheless, the quantitative result for isotig09773_10 was Figure 2 Relative expression quantities of the 13 unigenes. The relative expression levels within the four parts (samples L, T, M and B) among the 13 unigenes (columns A to M, which contained four unigenes (isotig01514_77, isotig06919_78, isotig06943_25 and isotig07154_19, A to D) from the cellulose synthase superfamily, three unigenes (isotig10345_23, isotig02054_18 and isotig01251_6, E to G) from the expansin gene family and six unigenes (HRX1MBH01BLWDW_9, isotig00610_25, isotig01663_22, isotig04660_8, isotig09773_10 and contig04902_19, H to M) from the XTH gene family, respectively) at three seasons (FirS, SecS and ThiS, which displayed in rows 2 to 4, respectively) are shown by line charts (A1 to M3). Each line chart (A0 to M3) was formed as the order of samples L, T, M and B (X-axis) for their quantitative results (Y-axis). Quantitative qRT-PCR was carried out (using the primers listed in Additional file 12) on the four sample parts (L, T, M, B) in the first (FirS), second (SecS) and third (ThiS) seasons with three biological replicates. The expression levels (using GAPDH as control) were transformed (−log 10 E) into colors (N) in order to assess their overall expression levels. Meanwhile, the significance analyses were performed within each unigene among four samples, using the Holm-Sidak method of Sigmaplot software, under the P-value of 0.05. The RPKM values that were obtained from the transcriptome sequencing results are also shown (row 1, A0 to M0). very different ( Figure 2L1) to RPKM result (L0), and we could not determine the precise cause of this difference.
Second, despite the similar trends (expressed more abundantly in sample T than in samples M and B) for each unigene between FirS ( Figure 2A1 to M1) and the RPKM results (A0 to M0), the relative expression quantities of each unigene in sample L were universally higher from reflected data (FirS) than that from the predicted data (RPKM). The sampling method was implicated as the origin of this discrepancy. As we previously mentioned, the stem shoot apex, including the top three (or so) young leaves, comprised sample L and contained no bast fiber ( Figure 3A). Yet the upper part of the ramie stem (including sample L and sample T) grows more vigorously (more unigenes shared the all-up pattern in samples L and T), which made it more sensitive to environmental elements, such as temperature and sampling time. Additionally, despite the uniform sampling lengths and the validation of the slicing results (Figure 3), samples that were generated at different times (the samples for transcriptome sequencing and the corresponding quantitative experiments in FirS were generated in 2012 and in 2014, respectively) were still different.
Finally, despite the similarity in trends between the data from FirS ( Figure 2A1 to M1) and RPKM (A0 to M0), which assured the reliability of the transcriptome results, discrepancies rose between the growth stages (FirS, SecS and ThiS). To determine their possible relationship with ramie fiber quality, further analysis was conducted. Among the 13 unigenes that were chosen for further quantitative experiments, six of them (iso-tig01251_6 and isotig02054_18 from the expansin family and isotig00610_25, isotig01663_22, isotig04660_8 and isotig09773_10 from the XTH family) were assigned to both GO terms (GO: 0030312 and GO: 0005576) that were enriched in sample T compared with the other three samples (samples L, M and B, displayed in Table 3). Furthermore, four of the six (isotig10345_23, isotig02054_18, isotig01663_22 and contig04902_19) unigenes shared congruent expression patterns in SecS and ThiS ( Figure 2E2, E3; F2, F3; J2, J3 and M2, M3, respectively). If this phenomenon was relevant to the differences in ramie fiber quality during the three seasons, the slightly higher relevant expression quantities of sample T (compared with sample M or sample B) in SecS compared with   Figure 2N) could further confirm this hypothesis. In support, GhExp2, which shared a high similarity with isotig02054_18 (based on the sequence BLAST results, data not shown), is expressed specifically in developing cotton fiber [24], while the sequence similarity was high between con-tig040902_19 and AtXTH31/ AtXTH32, which are highly expressed in tissues undergoing elongation/stem and shoot apical system [43]. Furthermore, considering the importance of xyloglucan-cellulose cross links in modulating the strength and extensibility of the primary plant cell wall, which is a key feature of classical models of this composite structure [44,45] and that xyloglucan endo-transglycosylase activity, along with expansins, are the primary catalyst in cell wall loosening [43], we suggest that the early ramie fiber developmental stages were more sensitive to molecular regulation, which was reflected by the up-regulations of expansin and XTH genes (which control the deposition of cellulose fibrils, etc.). However, we could not draw any conclusions on whether these four unigenes were specifically relevant to ramie fiber fineness or crystallinity based on our limited results. Further studies should focus on the functional analysis or diversity among different cultivars during different seasons to improve our understanding of ramie fiber development. Finally, future studies should focus on identifying simple sequence repeats in unigenes of interesting sites and designing primers from them, to further our understanding of the underlying molecular systems.

Additional features included in the transcriptome data
Although lacking of duplication in transcriptome data construction, the effectiveness of our transcriptome sequencing result was assured by a comparison of the quantitative output for FirS in 2014 ( Figure 2A1 to M1) to RPKM values ( Figure 2A0 to M0) conducted in 2012. Meanwhile, manual selection significantly improved the annotation information (Additional files 2, 3 and 4). In addition, the four most abundant species that were annotated to Vitis vinifera, Ricinus communis, Glycine max and Medicago truncatula ( Figure 4) were similar to previous results (Vitis vinifera, Ricinus communis, Populus trichocarpa and Glycine max) [11], in which the authors declared that the Malpighiales (Ricinus communis and Populus trichocarpa), Fabales (Glycine max) and Rosales (ramie) are commonly placed in the superorder of rosids, while the relationship between Vitales (Vitis vinifera) and Rosales (ramie) is probably closer than that between Rosales, Malpighiales and Fabales [11]. In this study, our results further supported the relationship between Vitales (Vitis vinifera), Rosales (ramie), Malpighiales (Ricinus communis) and Fabales (Glycine max and Medicago truncatula). However, Populus trichocarpa did not appear in the most abundant annotated species list, which may indicate a dissimilarity of stem growth between ramie and Populus trichocarpa, especially in bast fiber development or cell wall thickening.

Conclusions
In this study, transcriptome sequencing of samples from different ramie fiber developmental stages was carried out via pyrosequencing. The 58,369 unigenes (13,386 contigs and 44,983 isotigs) that were generated by de novo assembly had a high correlation with ramie fiber development, which increased our knowledge of the molecular mechanisms underlying bast fiber development in ramie. Most of the differentially expressed genes from the three targeted gene families (the cellulose synthase superfamily, the expansin gene family and the XTH gene family) shared an all-up expression pattern in sample T, which suggested that the early fiber developmental stage may play a crucial role in molecular regulation throughout the entire fiber growth period. These 13 unigenes, especially the four of them that from the expansin (two) and the XTH (two) families, could be applied in further in-depth expression and functional analyses. In addition, the sampling method used in this study could be applied and further standardized in other ramie fiber development studies.

Sample preparation and RNA extraction
Ramie cultivar 1504 (with a fiber finesse of 2800 m/g) was transplanted with pot from our Germplasm Resources Garden and was grown under natural conditions in 2010. Samples from the stem shoot (with about three leaves, sample L), the top part of stem bark (sample T), the middle part of stem bark (sample M) and the bottom part of stem bark (sample B, representing different stages of ramie fiber development, as displayed in Figure 3) were collected when the aboveground stem was turning pitchy in May 2012 (the first season of ramie fiber growth). Each sample was made up from three plants and consisted of samples that were about 5 cm long. They were mixed and frozen in liquid nitrogen for subsequent RNA extraction. Total RNA was separately extracted from the four samples, based on the method described previously [46]. Afterwards, the RNA quality was confirmed by gel electrophoresis and by NanoDrop 2000 spectrophotometer (Thermo, MA, USA).

High throughput transcriptome sequencing
Transcriptome sequencing was performed on four equally pooled samples at Hanyu Biotech, Shanghai, China (http://www.hanyubio.com/) using the Roche 454 FLX + platform, according to the product manual. Briefly, after digestion by DNase I (Ambion, USA) at 37°C for 1 h, poly (A) RNA was isolated from 20 μg of total RNA using a Micropoly(A) Purist™ mRNA purification kit (Ambion, USA). Afterwards, cDNA was synthesized, based on a method described previously [47] with improvements: firstly, the first-strand cDNA was synthesized by Superscript II reverse transcriptase (Invitrogen, USA) at 42°C for 1 h using 10 μg of total RNA as template and GsuI-oligo dT as primer. Secondly, Dynal M280 magnetic beads (Invitrogen, USA) were used to select the mRNA/cDNA that ligated with biotin, which was performed by oxidizing the 5′ cap of mRNA using NaIO 4 (Sigma, USA). After that, the first-strand cDNA was released by alkaline lysis, then added adaptor at the 5′ end by T4 DNA ligase (TAKARA, Japan) and synthesized the second-strand cDNA by Ex Taq polymerase (TAKARA, Japan). Finally, the polyA tail at 3′ end and the adaptor at 5′ end were removed by Gsu I. Before loading into the Roche 454 Genome Sequencer FLX + machine, the double-stranded cDNA with lengths ranging from 300 bp to 800 bp (broken by ultrasonoscope (Fisher, USA) and purified by Ampure beads (Agencourt, USA) immediately) were converted into a single strand template DNA (sstDNA) library using a GS DNA Library Preparation kit (Roche Applied Science, USA), and then fixed onto magnetic beads using GS emPCR kit (Roche Applied Science, USA). A whole run was performed from equivalent mixture of the four separately pooled samples. The sequence data generated in this study were deposited at the NCBI in the Short Read Archive (SRA) database under the accession number SRP040605.
De novo assembling, annotation and GO terms/KEGG pathway construction Raw reads, obtained from 454 pyrosequencing, were preprocessed by removing low quality reads (Q value <20) and assembled by Newbler under default parameters. The predicted protein coding sequences (obtained using the GetORF tool in EMBOSS [32]) were annotated to the non-redundant protein databases in GenBank and Swiss-Prot by BLASTp with a threshold of 1e-5. GO mapping was carried out using GoPipe software [33]. The predicted protein sequences were compared with the KEGG database [34] to obtain the KO numbers using bidirectional BLAST with threshold of 1e-5.

Differential expression redundancy and enrichment analyses
After assembly and annotation, the universal reads from the four separately pooled samples were mapped to unigenes, and the RPKM values [48] were calculated. Then the differences between each unigene within the four samples were determined from the RPKM values using the MA-plot-based method with Random Sampling (MARS) model from the DEGseq program package [49], with a threshold of p-value less than 0.001. The differentially expressed genes were used in the GO terms/KEGG pathway enrichment analysis using the hyper geometric test to measure the significantly enriched terms. The formula was: P ¼ 1− Each sample was duplicated three times. Dissociation curve analysis was performed after each assay by ramping from 55°C to 95°C, at a rate of 0.5°C for 81 cycles and with each cycles held 6 s. Relative expression levels were calculated as described previously [50]. All primers for qRT-PCR (Additional file 12) were designed by the primer3 online tool (http://primer3.ut.ee/) and adjusted using oligo software version 7.56 [51]. GAPDH was selected as internal control based on preliminary experiments (data not shown).