Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression
- Elie Raherison†1,
- Philippe Rigault†2,
- Sébastien Caron†1,
- Pier-Luc Poulin1,
- Brian Boyle1,
- Jukka-Pekka Verta1,
- Isabelle Giguère1,
- Claude Bomal1,
- Jörg Bohlmann3 and
- John MacKay1Email author
© Raherison et al.; licensee BioMed Central Ltd. 2012
Received: 3 February 2012
Accepted: 11 July 2012
Published: 29 August 2012
Conifers have very large genomes (13 to 30 Gigabases) that are mostly uncharacterized although extensive cDNA resources have recently become available. This report presents a global overview of transcriptome variation in a conifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues.
An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents 23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison of secondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highly conserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robust validation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed from over 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on the microarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues than genes of known or predicted function. Diversity of expression within gene families may be rapidly assessed from PiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA) osmotic regulation proteins occur in large gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family, while diverse patterns suggested a greater degree of diversification among LEAs.
Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource of this kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate genetic studies in the large and important group comprised of conifer trees.
Microarray (MA) transcript profiling and RNA sequencing (RNA-Seq) represent powerful approaches to rapidly gain functional information on a genome-wide scale. Information on RNA transcript abundance is a key to assessing the biological role of gene products and cannot be directly deduced from a gene’s sequence. This has lead researchers to develop databases of RNA abundance profiles, first and foremost for model organisms. For example, the AtGenExpress database was created for the model-plant Arabidopsis from a host of tissue preferential and stress response expression profiles . Databases such as AtGenExpress are particularly useful for the identification of groups of co-expressed genes. Other plant oriented databases include the poplar PopGenIE made up of tissue, developmental and stress response profiles . Reflecting the value of gene expression data, public organizations and institutes also maintain generic databases like the Gene Expression Omnibus or GEO (NCBI) and ArrayExpress (EBI), among others, which host datasets from a wide array of organisms.
Recent transcriptome-wide analyses underscore the importance of gene expression in the genetic architecture of complex traits. Studies in fruit flies, mice, humans and maize show that a proportion of the genetic variants underlying complex phenotypes exert their effects through gene expression [3, 4]; so, discovering the genetic basis for the variation in transcript abundance is central to understanding phenotypic variation . Gene expression studies also provide insights into the molecular impacts of natural selection. For example, expression profiling showed the differential action of selection pressure on different tissues and organs in humans . A comparative analysis of mouse and human showed a high level of conservation in the expression of orthologous genes, showing the stability of house-keeping genes and the variability of tissue specific genes .
Transcriptome profiling is facilitated by the availability of a reference genome but many studies have also been based on large-scale cDNA sequence datasets. In plants, many angiosperm genomes have been sequenced, including the model plant Arabidopsis thaliana , rice , poplar  and grapevine ; however, reference genomes are still lacking for plant phyla belonging to the gymnosperms. The best studied gymnosperms are conifers, which as a group have extremely large genomes (ranging from 13 to 30 Gb). In conifers including pines (Pinus spp.), spruces (Picea spp.), Douglas-fir (Pseudotsuga menziesii) and Japanese cedar (Cryptomeria japonica) over 1 million expressed sequence tags (ESTs) have been obtained from dideoxy sequencing and assembled to infer putative unigenes or transcript sets (reviewed in ). Large collections of cDNAs are available for white spruce (Picea glauca)  and Sitka spruce (P. sitchensis) . From 30% to 40% of conifer sequences cannot be annotated because they lack sequence similarity to known genes [13–16].
This report describes a large-scale oligonucleotide microarray developed from the extensive cDNA datasets available for spruce trees (P. glauca, P. sitchensis) to achieve broad transcriptome coverage. Previously, microarrays were developed from PCR amplicons (cDNA microarrays) primarily in pines and spruces (reviewed in ). Many of the cDNA microarrays have ranged from a few hundred to several thousand cDNAs, and a few of them have included over 20,000 spots, i.e. in Picea sitchensis  and Pinus taeda . They have essentially been used in comparative experiments (using two-dye designs) to investigate transcriptome remodeling during tissue differentiation, development, or in response to environmental cues , but a general characterization of conifer transcriptomes has been lacking.
A major goal of the present study was to assemble transcript profiles from spruce trees (Picea spp.) into a database called PiceaGenExpress, aiming to characterize the basic features of a conifer transcriptome such as the number of transcribed genes in a variety of tissues. The reference profiles in PiceaGenExpress enabled exploratory analyses of the diversity of expression patterns within and among gene families and the expression of retrotransposons. In addition, conservation of gene expression in secondary vascular tissue was studied based on interspecific comparisons of tissue preferential expression. The accuracy of microarray profiles and design was directly evaluated by RNA-Seq analysis of the same samples as those used for one of the microarray experiments included in PiceaGenExpress.
Results and Discussion
Development of an oligonucleotide microarray for spruces (Picea spp.)
Development a large-scale oligonucleotide array for spruces ( Picea spp ): sequence information used to design oligonucleotide probes from Picea glauca and P. sitchensis sequences
Number of probes
Confirmation of P. glauca (Pgl) cDNA clone or other
Confirmed in Pgl and Psi cDNAs
Confirmed by Pgl 454 seqs or Psi cDNAs
Confirmed by Pgl 454 seq
Unconfirmed but Pgl clones ≥ 2
Psi full-length cDNA only (not found Pgl)
Analysis of probes: sequence similarity of probes aligned to Picea glauca and P. sitchensis sequences
Number of probes
Differential expression in the vascular transcriptome is conserved among Picea species
Different MA experiments comparing xylem and phloem tissues in angiosperms including Arabidopsis [19, 20] and poplar , and in P. glauca have helped to delineate groups of genes whose expression was of particular relevance to secondary vascular growth. A MA profiling study in Arabidopsis root-hypocotyl defined a set of 319 genes specifically regulated in secondary xylem compared to phloem or non-vascular tissues . In young spruce trees, 360 sequences were shown to be xylem preferential compared to needles and phloem . A core set of 52 xylem genes was identified by Ko et al.  based on transcriptome analyses of secondary xylem in Arabidopsis thaliana and poplar, and of cotton fibres. The expression patterns reported here for three spruces indicated that tissue preferential expression for xylem compared to phelloderm were conserved among spruces. These conserved patterns could be the basis for comparative genomics of conifers and angiosperms trees. This finding is also relevant for studying the genetic architecture of wood traits because it was shown that xylem preferential expression was a feature of genes associated with genetic variation in wood properties .
Validation of the microarray and profiling results by RNA-Sequencing
Total or Both
Total reads (HQ) (millions)
Reads mapped (millions)
Genes - RPKM1 > 1
Genes - RPKM > 3
Validation of microarray results by RNA-Seq
RNA-Seq and MA2
RPKM > 1
Tissue specificity in RNA-Seq
Genes RPKM > 1
Genes RPKM > 3
PiceaGenExpress contains reference profiles that reveal patterns of tissue preferential expression
The PiceaGenExpress database: sample characteristics, hybridizations and detected genes
Non-annotated genes are expressed at low levels and in fewer tissues
Second, our data showed that annotated and non-annotated genes were strongly contrasted in regard to the number of tissues in which transcripts were detected (Figure 4D). The majority of the annotated sequences were detected in seven or eight of the tissues. In contrast, the non-annotated genes were much more likely to be found in few tissues. This observation is consistent with the idea that less conserved (including non-annotated) sequences may generally play more specialized roles. It is also consistent with findings from comparative expression studies of mice and humans showing that house-keeping genes were more highly expressed and were more conserved among species both in terms of their sequence and their expression .
Diversity of expression profiles varies within and among gene families with related functions
Expression profiles of the 49 gene sequences containing a LEA domain also varied between tissues and appeared more diversified than observed for dehydrins (Figure 5B). The extent of divergence among the members in each family was analyzed by clustering and determination of Euclidean distances among the sequences (Figure 5C). Overall most of the nodes (38 out of 49) were separated by a greater distance in the LEA family than in the dehydrin family, indicating that during normal development, the regulation of the dehydrin family members is less diversified. This observation may point at greater functional diversification among LEAs.
Both dehydrins and LEAs have been shown to be expressed during normal development and to be water stress responsive in conifers [18, 27, 28]. Our study only considered expression during normal development. Previous studies in conifers have monitored the expression of a small subset of these large gene families. For example, five and eight dehydrins transcript sequences were studied in foliage of maritime pine  and vegetative buds of Norway spruce, respectively . We detected 22 distinct sequences by MA profiling in young foliage, strongly suggesting that a comprehensive view of this protein family in response to stress remains to be developed. The expression profiles presented here indicated that osmotic-regulation during normal development may involve more genes (especially dehydrin genes) and potentially more varied functions in some of the tissues (megagametophytes, roots and phelloderm) than others (foliage, embryos). This observation indicates that osmotic response monitoring, whether it is related to drought conditions  or to normal developmental processes  may be sensitive to tissue preference. In the PiceaGenExpress dataset, several dehydrins were down regulated in vegetative buds that were sampled at the time of bud flush in the spring, as was observed through detailed time series analyses of Norway spruce dehydrins . An interesting feature of the data is that the two seed derived tissues, i.e. immature somatic embryos and megagametophytes from germinating seeds, were highly contrasted in regard to the expression of specific dehydrin and LEA sequences.
Expression of LTR retrotransposon sequences
The MA sequences matching putative LTR were estimated to represent up to 78,520 unique copies in the P. glauca genome (Figure 6B) and a total of 1.18 Million sequences (not shown) based on their occurrence in shotgun sample sequencing data for P. glauca . Data are not currently available to estimate what proportion of the genome these sequences may represent; however, if the sequences were derived from an intact LTR of 4000 bp on average, they would represent nearly 5 Gbp or 20% of the P. glauca genome. In other words, such a large number of copies is expected to occupy a sizable fraction of the genome. The number of predicted copies did not appear to correlate with transcript accumulation, i.e. number of tissues in which they were detected or relative levels (Figure 6A, see right panel; 6B). This observation indicates that the data were more likely to result from transcription than genomic contaminations of the RNA samples. It also shows that the number of copies accumulated in the genome is not useful for predicting the level of transcript production. In fact, many of the high copy sequences (>30,000 copies) were not detected at all in any of the tissues. These results point to specific LTR sequences that have the highest potential for mobilization in P. glauca.
An oligonucleotide microarray was developed from P. glauca and P. sitchensis datasets. It represents 23,853 unique P. glauca genes or 85% of the recently reported gene catalogue . Single dye analysis and a ranking procedure were used to develop PiceaGenExpress, a database of reference transcript profiles, based on 150 hybridizations in eight different tissue sample types. These data represent the first resource of this kind for a gymnosperm plant.
The pine family comprises over two hundred species belonging to eight genera. It is the largest and the most economically important of the conifers. Interspecific comparison experiments presented in this report indicated that the microarray may be applied to at least three of these genera. It could be a valuable tool for species where cDNA resources are lacking or underdeveloped. Our findings also indicate that expression profiles from P. glauca are likely to be representative of other conifers. RNA-sequencing has become a method of choice for transcriptome profiling but the analysis of RNA-Seq data can be complex owing to factors such as sequence polymorphisms, gene paralogs, and alternate splicing. Therefore, successful application of RNA-Seq depends on the availability or the development of a good quality reference genome or gene catalogue , and both are lacking in many non-model species. By mapping P. glauca RNA-Seq and microarray data back to the same cDNAs sequences, we were able to show that the two methods were highly congruent. New sequencing technologies promise to generate high quality sequences in addition to very large volumes of data. Given the large size of conifer genomes, it may be advantageous to use these technologies to develop high quality gene catalogues rather than attempt to assemble entire genomes. Once a gene catalogue is produced, these high throughput sequences represent a powerful methodology for unrestricted gene expression studies .
Conifer genomes have a number of characteristics that make them unique, most prominently their enormous size which can reach or even exceed 30 Gbp  and their highly repetitive sequences . They are also known to have high levels of heterozygocity and are believed to harbour many gene paralogs, at least in some gene families . For the MA described here, the empirical tests of probe specificity (Additional file 2: Figure S1) and the probe design parameters appeared sufficient to discriminate between paralogs that are known in the P. glauca catalogue of 27,720 genes . The very high level of congruence observed between MA and RNA-Seq results, 99.3% of confirmation of tissue preference from over 2100 genes, also suggest that the MA results accurately reflect the expression of the target sequences. In contrast, issues of cross-hybridization between gene paralogs with different expression patterns would have likely resulted in a lower validation rate.
The tools and methods presented in this report may lead to diverse applications for fundamental discovery in forest genetics and evolutionary biology, such as understanding phenotypic variation in economic and adaptive traits. The genetic architecture of complex phenotypes in plants and trees is routinely probed by scanning the genome for DNA sequence polymorphisms through QTL mapping and association studies . However, Huang et al.  summarized several recent studies by stating that discovering the genetic basis for the variation in transcript abundance was central to understanding phenotypic variation. Examining the genetic architecture of gene expression can provide functional insights into physiology and metabolism, for example by revealing the organization of gene networks [35, 36].
Evaluation of array design and manufacture parameters with test oligonucleotide microarray
A custom MA comprised of 3,900 oligonucleotide probes was developed to evaluate the impact of design parameters (for details see Additional file 1, Additional file 2: Figure S1 and Additional file 3: Figure S2). It contained multiple probes for 929 distinct genes as well as cDNA amplicons for 96 of those genes. The parameters tested include oligonucleotide lengths of 50, 60 and 70 nucleotides, the position of probes within the transcript, the impact of SNPs and indels. The presence of one or three SNP mismatches (distributed throughout the oligonucleotide probes) had a small effect on hybridization signal intensities and in many cases the expression ratio between tissues was largely conserved (See Additional file 2: Figure S1). In contrast, the presence of seven SNP mismatches in the probe had a large effect on intensities and expression ratios. Based on these data, 70-mer probes that vary by up to three SNPs distributed throughout the probe (95.7% sequence identity) are expected to hybridize to the same RNA and on average produce a signal of similar strength, where as probes with seven SNPs or more (less than 90% sequence identity) give very little cross hybridization. These observations establish thresholds of sensitivity to sequence variation, i.e. up to three mismatches have little impact on sensitivity, and specificity, i.e. specificity is achieved with seven mismatches or more to other sequences. The cut-off is situated between four and six SNPs, i.e. between 95% and 91% sequence identity. The presence of short insertions and deletions (up to six nucleotides) located at the center of the 70-mer probes had a small impact on probe performance. The impacts of other parameters tested were generally small and less predictable.
The impact of different spotting buffers and surface chemistries used to manufacture the microarray were also assessed in regard to image quality and data reproducibility. For details on methods and findings, see Additional file 1. We found that the optimum conditions were obtained by using aminosilane coated slides and 3X SSC without betaine as a spotting buffer.
Microarray design and manufacture
The sequences included in the microarray were selected on the basis of reproducible sequence quality from all of the ESTs and FL-cDNA described for P. glauca in Rigault et al.  and P. sitchensis in Ralph et al.  (Table 1). To obtain a robust probe set, we selected sequences that were either detected in the two species, verified with two technologies (Sanger and 454) or were derived from a FL-cDNA (Table 1). The probes were 70 nucleotides in length, and were designed to minimize similarity with other sequences in the dataset. Sequence similarity between the probes and known sequences in P. sitchensis and P. glauca was determined from sequence alignments. The microarray contains 25,045 probes that match with known P. glauca sequences; however, they represent 23,853 unique genes based on the most recent clustering , such that 1,017 genes are represented by more than one probe.
The microarray consists of 33,984 spotted features including 33,024 sample spots (Invitrogen, Carlsbad, CA, USA), 240 negative buffer spots, and 480 Spot Report Alien Oligos (Agilent Technologies, Inc., Santa Clara, CA, USA). Oligonucleotides were consolidated into 384-well plates, lyophilized by speed-Vac, and resuspended in 3X SSC to a printing concentration of 30 μM. Oligos were printed on aminosilane slides (Erie, Hudson, NH, USA) with a QArrayMax microarray printer (Genetix Limited, Hampshire, UK) using 946MP2 microarray pins (ArrayIt Corp, Sunnyvale, CA, USA) in a 48-pin tool depositing ~0.5 nL per spot onto the slide. The resulting microarrays had a 4 x 12 subgrid layout with 708 spots per subgrid, each spot having approximate diameter and pitch of 90 μm and 160 μm, respectively. A 280-bp GFP (green fluorescent protein) oligonucleotide was printed in subgrid corners to assist in grid alignment during image processing. The slides were crosslinked in a UV Stratalinker 2400 (Agilent Technologies, Inc.) at 300 mJ. Array quality was assessed by visual inspection and hybridization of representative slides from a print run by dye-labeled random 9-mer oligonucleotides. The quality control images were acquired via the GenePix 4200AL scanner (Molecular Devices, CA, USA) at a 10 micron resolution and quantified with the Imagene 8.0 software suite. Oligonucleotide library management, printing of microarrays and quality control was performed by the Genome BC Microarray Platform (Vancouver, BC, Canada).
The array design details are available in the Gene Expression Omnibus (accession No. GPL15033). All of the transcript profiling data is also deposited in GEO (available upon acceptance of this manuscript for publication).
The origin, genotypes and method of collection of each type of material are described in Table 6, except for details described here. All of the tissue samples were frozen in liquid nitrogen immediately after removal from the trees, the seed or tissue culture vessels, and stored at −80°C until further use.
Vegetative Buds: Buds were collected from branches of several clonal replicates of 9-year-old trees of P. glauca regenerated from two genetically distinct somatic embryogenesis lines as described  during the mid-Spring when the buds were just beginning to grow. For each genotype, five biological samples each consisting of 6 buds (approximately 80 mg fresh weight) were used for analysis.
Secondary xylem, phelloderm (including phloem), young needles of juvenile trees: Nursery planting stock (from open-pollinated seed lots) were obtained as 3-year-old seedlings of Picea glauca, Picea mariana, Picea abies, Picea Sitchensis, Pinus strobus, Pinus resinosa and Larix laricina, were transferred to 8-inch pots and grown in a greenhouse under natural light conditions. Sampling of tissues was timed with the ending of primary shoot elongation, i.e. after 6 to 8 weeks of growth, and was as described . For each tissue type five biological samples were prepared by pooling 6 independent trees within each species. These materials were used for interspecific comparisons (all 6 species) and for the development of PiceaGenExpress (P. glauca only).
Megagametophytes: Control-pollinated seed (cross C962856) were obtained from P. glauca tree (80112) described in , were surface-sterilized for 1 minute in 70% EtOH and 10 minutes in 3% Na-hypochlorite, washing the seeds with sterile water three times between and after treatments. The seeds were then immersed in 4°C sterile water for 24 hours and stratified at 4°C for 28 days. Next, the seeds were moved to 26°C on petri dishes with a moist paper and kept in the dark to start the germination process. After 4 hours of incubation the seeds were opened and the embryo was separated from the megagametophyte under a dissecting microscope. A total of 3 biological samples comprised of a single megagametophyte were used for analysis.
Adventitious roots: Norway spruce (P. abies) seedlings were grown in the experimental nursery of Finnish Forest Research Institute for about one and a half years before sampling. They were grown in standard nursery growing media, light Sphagnum peat, and fertilized with mineral nutrients. Roots were washed under tap water to remove surrounding peat and approximately 1 cm of the root ends was collected for analysis. Between two and four biological samples comprised of several root ends were analyzed for each of eight different genotypes.
RNA extraction, labelling and hybridization
Total RNA was extracted following Chang et al.  as described in Pavy et al.  for all of the sample types, except for megagametophytes where poly A + RNA was extracted directly by using polyT coated magnetic beads (Dynal). One μg of total RNA was amplified for each sample replicates, except for the megagametophyte samples where 10 ng of poly A + RNA was used, with the Amino Allyl MessageAmp II aRNA Amplification Kit (Applied Biosystems by Life Technologies, Carlsbad, CA, USA) according to manufacturer’s instructions. Five μg of amplified RNA (aRNA) was then labelled with Alexa Fluor 555 or 647 dyes (Invitrogen, Carlsbad, CA, USA) and purified as per the manufacturer’s instructions. Dye incorporation efficiency was determined by using a Nanodrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer’s instructions. Depending on the experiment, each microarray was hybridized with one labeled aRNA sample or two samples labelled with different dyes. The sample(s) to be hybridized to a microarray were mixed and the volume was reduced to ~10 μl by evaporating excess water in a DNA 120 speedvac (Thermo Fisher Scientific). Labelled aRNAs were fragmented for 15 minutes at 70°C using Ambion’s ”RNA Fragmentation Reagents“ (Applied Biosystems), placed on ice for 1 minute, denatured for 2 minutes at 95°C, put on ice for 2 min and resuspended in 120 μl hybridization buffer (50% formamide, 5X SSC, 0,1% SDS, 0,1 mg/mL Herring sperm DNA) preheated to 55°C. Samples were kept in a heating block at 50°C until hybridization.
Hybridizations were performed in HS400Pro hybridization stations (Tecan Group Ltd., Männedorf, Switzerland). The slides were heated at 80°C for 10 minutes, then washed once at 37°C with 0.5X SSC, 0.1% SDS for 20 seconds and once at 50°C with 2X SSC, 0.5% SDS for 20 seconds, and prehybridized for 1 hour at 65°C in 5X SSC, 0.1% SDS, 0.1 mg/ml BSA, 0.1 mg/ml Herring Sperm DNA. Next the slides were washed at 55°C with 2X SSC, 0.5% SDS for 1 minute with a 30 second soak and washed again at 45°C for 1 minute with the same solution. The resuspended labeled targets were injected into the chambers and hybridized for 16 hours at 45°C with sample agitation. The slides were then washed as follows: 2 times 1 minute 30 seconds at 45°C with 30 seconds soaking in 2X SSC, 0.5% SDS, 1 time 1 minute at 45°C in 2X SSC, 0.5% SDS, 2 times 1 minute 30 seconds at 45°C with 30 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1 minute at 37°C with 20 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1 minute at 23°C with 20 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1 minute 30 seconds at 23°C with 30 seconds soaking in 0.1X SSC, 1 time 30 seconds at 23°C in 0.1X SSC and 2 times 30 seconds at 23°C in milliQ filtered water. Finally slides were dried for 2 minutes 30 seconds with nitrogen gaz. Slide scanning and image processing were performed as described in Table 6.
Microarray data processing and analysis
A procedure was developed to process and analyze microarray intensity data from single dyes, as opposed to fold change methods routinely used for spotted arrays. Data analyses were performed using customized scripts for R and Bioconductor (http://www.r-project.org and http://www.bioconductor.org). Spots that were flagged as presenting abnormal morphology during the image processing were replaced by mean value of the remaining spots of the same probe from the other slides from the same sample type. Background intensities were subtracted from the foreground intensities. Background-subtracted data were log2-transformed and normalized using quantile correction approach.
A filtering step was applied to select positive genes to be used for further analysis. The mean intensity of spots containing buffer only was calculated for each row of sub-grids, and was taken as the minimum intensity of probes for that subgrid. A probe was called positive (detected above background) when its signal intensity was above the buffer intensity on at least 50% of slides within a given sample type. When determining differential expression, positive probes were probes that were detected according to this criterion in at least one of the tissues (e.g. phelloderm and xylem juvenile). Mean probe intensity was determined for genes represented by more than one positive probe. All microarray experiment data has been submitted to the Gene Expression Omnibus (GEO) under accession numbers GSE35624, GSE35847 and GSE35922.
Statistical testing for differential gene expression used the linear modeling approach and the empirical Bayes statistics ; the p-values were adjusted for multiple testing according to Benjamini and Hochberg . Differential genes were those meeting an adjusted p-value ≤0.01, unless stated otherwise.
The PiceaGenExpress database
The PiceaGenExpress database was developed from transcript profiles obtained for eight different tissue types coming from five independent experiments (see Table 6). For this, the average signal intensity was determined from all of the slides available for a sample type. The genes were then ranked based on their average signal intensities within a tissue type and equally divided into 10 separate classes according to their signal intensity. Genes from class 1 or class 10 were the 10% with lowest and highest signal intensities, respectively (for the number of genes per class in each type, see Additional file 6: Table S2 and Additional file 8: Figure S5). Functional annotations were based on the matches with Arabidopsis proteins (TAIR 9 release) with E-value <1e-10 and on the detection of Pfam domains as described . PiceaGenExpress is made available as a flat file (Additional file 5: Table S1), which may be uploaded to any type of data processing or spreadsheet platform.
Agglomerative hierarchical clustering with the complete linkage method was performed using hclust function in R  on the expression levels for two gene families, i.e. dehydrins and LEA proteins. This approach used a similarity matrix based on Euclidean distance. A smaller distance means that two genes or clusters have more similar expression levels in the tissues analyzed. First, the clustering analysis placed each gene into its own singleton group or cluster. Second, the closest clusters were iteratively joined together until all genes were merged into a single cluster based upon similarity/distance measures between clusters. Dendrograms showing clusters of genes were drawn (Additional file 9: Figure S6).
RNA-Seq data processing and analysis
Two composite samples were analysed by RNA- Sequencing for validation and comparison to MA results. Samples were prepared by combining equal molar amounts of the P. glauca RNAs isolated from secondary xylem (juvenile trees) and phelloderm. These samples were also used in the MA profiling of each of the tissues. RNA-Sequencing, filtering of quality reads and mapping of reads onto the cDNA clusters were described . The number of sequences matching each cDNA cluster was normalized by transforming the data to the number of reads per kilobase of exon model per million mapped reads (RPKM) following the method of Mortazavi et al. . Unless specified otherwise, an RPKM >1 was used as a minimum threshold of detection for RNA-Seq. Differential expression of genes was determined by using the Chi-squared test corrected for multiple testing according to Benjamini and Hochberg .
The authors thank A. Séguin, I. Duval, K. Klimaszewska, S. Velmala, and T. Pennanen for sharing data before publication, for incorporation into PiceaGenExpress reference transcript profiles. The authors would like to acknowledge Stephane LeBihan of the Genome BC Microarray Platform for advice and dedicated support of microarray printing. Funding for this work was obtained from Genome Canada, Genome Quebec, Genome BC for the Arborea, Treemomix , and SMarTForests Projects, and the Discovery grant program of the Natural Sciences and Engineering Council (JM).
- Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann JU: A gene expression map of Arabidopsis thaliana development. Nat Genet. 2005, 37 (5): 501-506. 10.1038/ng1543.View ArticlePubMedGoogle Scholar
- Sjödin A, Street NR, Sandberg G, Gustafsson P, Jansson S: The Populus Genome Integrative Explorer (PopGenIE): a new resource for exploring the Populus genome. New Phytol. 2009, 182 (4): 1013-1025. 10.1111/j.1469-8137.2009.02807.x.View ArticlePubMedGoogle Scholar
- Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G: Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003, 422 (6929): 297-302. 10.1038/nature01434.View ArticlePubMedGoogle Scholar
- Lemos B, Meiklejohn CD, Cáceres M, Hartl DL: Rates of divergence in gene expression profiles of primates, mice, and flies: stabilizing selection and variability among functional categories. Evolution. 2005, 59 (1): 126-137.View ArticlePubMedGoogle Scholar
- Huang GJ, Shifman S, Valdar W, Johannesson M, Yalcin B, Taylor MS, Taylor JM, Mott R, Flint J: High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Res. 2009, 19 (6): 1133-1140. 10.1101/gr.088120.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M: The evolution of gene expression levels in mammalian organs. Nature. 2011, 478 (7369): 343-348. 10.1038/nature10532.View ArticlePubMedGoogle Scholar
- Zheng-Bradley X, Rung J, Parkinson H, Brazma A: Large scale comparison of global gene expression patterns in human and mouse. Genome Biol. 2010, 11 (12): R124-10.1186/gb-2010-11-12-r124.PubMed CentralView ArticlePubMedGoogle Scholar
- Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408 (6814): 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002, 296 (5565): 79-92. 10.1126/science.1068037.View ArticlePubMedGoogle Scholar
- Tuskan G, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, et al: The genome of black cottonwood, Populus trichocarpa (Torr & Gray ex Brayshaw). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.View ArticlePubMedGoogle Scholar
- Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449 (7161): 463-467. 10.1038/nature06148.View ArticlePubMedGoogle Scholar
- MacKay J, Dean J: Transcriptomics. Genomics and Breeding of Conifers. Edited by: Plomion C, Bousquet J, Kole C. 2011, New York: Edenbridge Science Publishers and CRC Press, 333-357.Google Scholar
- Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay JJ: A White Spruce Gene Catalog for Conifer Genome Analyses. Plant Physiol. 2011, 157 (1): 14-28. 10.1104/pp.111.179663.PubMed CentralView ArticlePubMedGoogle Scholar
- Ralph S, Chun H, Kolosova N, Cooper D, Oddy C, Ritland C, Kirkpatrick R, Moore R, Barber S, Holt R: A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). BMC Genomics. 2008, 9: 484-10.1186/1471-2164-9-484.PubMed CentralView ArticlePubMedGoogle Scholar
- Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Nat Acad Sc USA. 2003, 100 (12): 7383-7388. 10.1073/pnas.1132171100.View ArticleGoogle Scholar
- Cairney J, Pullman GS: The cellular and molecular biology of conifer embryogenesis. New Phytol. 2007, 176 (3): 511-536. 10.1111/j.1469-8137.2007.02239.x.View ArticlePubMedGoogle Scholar
- Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN: Global monitoring of autumn gene expression within and among phenotypically divergent populations of Sitka spruce (Picea sitchensis). New Phytol. 2008, 178 (1): 103-122. 10.1111/j.1469-8137.2007.02346.x.View ArticlePubMedGoogle Scholar
- Lorenz WW, Sun F, Liang C, Kolychev D, Wang H, Zhao X, Cordonnier-Pratt MM, Pratt LH, Dean JFD: Water stress-responsive genes in loblolly pine (Pinus taeda) roots identified by analyses of expressed sequence tag libraries. Tree Physiol. 2006, 26: 1-16. 10.1093/treephys/26.1.1.View ArticlePubMedGoogle Scholar
- Zhao C, Craig JC, Petzold HE, Dickerman AW, Beers EP: The xylem and phloem transcriptomes from secondary tissues of the Arabidopsis root-hypocotyl. Plant Physiol. 2005, 138 (2): 803-818. 10.1104/pp.105.060202.PubMed CentralView ArticlePubMedGoogle Scholar
- Oh S, Park S, Han KH: Transcriptional regulation of secondary growth in Arabidopsis thaliana. J Exper Bot. 2003, 54 (393): 2709-2722. 10.1093/jxb/erg304.View ArticleGoogle Scholar
- Ko JH, Beers EP, Han KH: Global comparative transcriptome analysis identifies gene network regulating secondary xylem development in Arabidopsis thaliana. Mol. Gen. Gen. 2006, 276 (6): 517-531.View ArticleGoogle Scholar
- Pavy N, Boyle B, Nelson C, Paule C, Giguère I, Caron S, Parsons LS, Dallaire N, Bedon F, Bérubé H: Identification of conserved core xylem gene sets: conifer cDNA microarray development, transcript profiling and computational analyses. New Phytol. 2008, 180 (4): 766-786. 10.1111/j.1469-8137.2008.02615.x.View ArticlePubMedGoogle Scholar
- Beaulieu J, Doerksen T, Boyle B, Clément S, Deslauriers M, Beauseigle S, Blais S, Poulin PL, Lenz P, Caron S: Association genetics of wood physical traits in the conifer white spruce and relationships with gene expression. Genetics. 2011, 188 (1): 197-214. 10.1534/genetics.110.125781.PubMed CentralView ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Nairn CJ, Haselkorn T: Three loblolly pine CesA genes expressed in developing xylem are orthologous to secondary cell wall CesA genes of angiosperms. New Phytol. 2005, 166 (3): 907-915. 10.1111/j.1469-8137.2005.01372.x.View ArticlePubMedGoogle Scholar
- Suzuki S, Li L, Sun YH, Chiang VL: The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in Populus trichocarpa. Plant Physiol. 2006, 142 (3): 1233-1245. 10.1104/pp.106.086678.PubMed CentralView ArticlePubMedGoogle Scholar
- Velasco-Conde T, Yakovlev IP, Majada JP, Aranda I, Johnsen O: Dehydrins in maritime pine (Pinus pinaster) and their expression related to drought stress response. Tree Genetics & Genomes. 2012Google Scholar
- Yakovlev IA, Asante DK, Fossdal CG, Partanen J, Junttila O, Johnsen O: Dehydrins expression related to timing of bud burst in Norway spruce. Planta. 2008, 228 (3): 459-472. 10.1007/s00425-008-0750-0.View ArticlePubMedGoogle Scholar
- Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z, Garcia SA, Kubisiak TL, Amerson HV, Carlson JE, Nelson CD: Evolution of genome size and complexity in Pinus. PLoS One. 2009, 4 (2): e4332-10.1371/journal.pone.0004332.PubMed CentralView ArticlePubMedGoogle Scholar
- Morgante M, De Paoli E: Toward the conifer genome sequence. Genetics, Genomics and Breeding of Conifers Trees. Edited by: Plomion C, Bousquet J, Kole C. 2011, New York: Edenbridge Science Publishers and CRC Press, 389-407.Google Scholar
- Kovach AS, Wegrzyn JL, Parra G, Holt C, Bruening GE, Loopstra CA, Hartigan J, Yandell M, Langley CH, Korf I, Neale DB: The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics. 2010, 11: 420-10.1186/1471-2164-11-420.PubMed CentralView ArticlePubMedGoogle Scholar
- Parchman T, Geist K, Grahnen J, Benkman C, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11: 180-10.1186/1471-2164-11-180.PubMed CentralView ArticlePubMedGoogle Scholar
- Pavy N, Laroche J, Bousquet J, Mackay J: Large-scale statistical analysis of secondary xylem ESTs in pine. Plant Mol Biol. 2005, 57 (2): 203-224. 10.1007/s11103-004-6969-7.View ArticlePubMedGoogle Scholar
- Grattapaglia D, Plomion C, Kirst M, Sederoff RR: Genomics of growth traits in forest trees. Curr Opin Plant Biol. 2009, 12 (2): 148-156. 10.1016/j.pbi.2008.12.008.View ArticlePubMedGoogle Scholar
- Keurentjes JJB, Fu J, Terpstra IR, Garcia JM, Van Den Ackerveken G, Snoek LB, Peeters AJM, Vreugdenhil D, Koornneef M, Jansen RC: Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Nat Acad Sci USA. 2007, 104 (5): 1708-1713. 10.1073/pnas.0610429104.PubMed CentralView ArticlePubMedGoogle Scholar
- Drost DR, Benedict CI, Berg A, Novaes E, Novaes CRDB, Yu Q, Dervinis C, Maia JM, Yap J, Miles B: Diversification in the genetic architecture of gene expression and transcriptional networks in organ differentiation of Populus. Proc Nat Acad Sci USA. 2010, 107 (18): 8492-8497. 10.1073/pnas.0914709107.PubMed CentralView ArticlePubMedGoogle Scholar
- Klimaszewska K, Overton C, Stewart D, Rutledge RG: Initiation of somatic embryos and regeneration of plants from primordial shoots of 10-year-old somatic white spruce and expression profiles of 11 genes followed during the tissue culture process. Planta. 2011, 233 (3): 635-647. 10.1007/s00425-010-1325-4.View ArticlePubMedGoogle Scholar
- Pelgas B, Bousquet J, Meirmans PG, Ritland K, Isabel N: QTL mapping in white spruce: gene maps and genomic regions underlying adaptive traits across pedigrees, years and environments. BMC Genomics. 2011, 12: 145-10.1186/1471-2164-12-145.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang S, Puryear J, Cairney J: A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Rep. 1993, 11 (2): 113-116. 10.1007/BF02670468.View ArticleGoogle Scholar
- Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Applic Gen Mol Biol. 2004, 3: 1-Google Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Society Series B (Methodological). 1995, 57 (1): 289-300.Google Scholar
- Kaufman L, Rousseeuw P: Finding Groups in Data: An Introduction to Cluster Analysis. 1990, New York: John Wiley & Sons, IncView ArticleGoogle Scholar
- Zhao N, Boyle B, Duval I, Ferrer JL, Lin H, Seguin A, Mackay J, Chen F: SABATH methyltransferases from white spruce (Picea glauca): gene cloning, functional characterization and structural analysis. Tree Physiol. 2009, 29 (7): 947-957. 10.1093/treephys/tpp023.View ArticlePubMedGoogle Scholar
- Korkama T, Pakkanen A, Pennanen T: Ectomycorrhizal community structure varies among Norway spruce (Picea abies) clones. New Phytol. 2006, 171 (4): 815-824. 10.1111/j.1469-8137.2006.01786.x.View ArticlePubMedGoogle Scholar