Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don
© Li et al. 2009
Received: 02 September 2008
Accepted: 21 January 2009
Published: 21 January 2009
Skip to main content
© Li et al. 2009
Received: 02 September 2008
Accepted: 21 January 2009
Published: 21 January 2009
Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs) from genes involved in wood formation in radiata pine.
Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology) terms and their functions are unknown or unclassified. More than half (52.1%) of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of wood development.
The first large scale genomic resource in radiata pine was generated from six developing xylem cDNA libraries. Cell wall-related genes and transcription factors were identified. Juvenile earlywood has a distinct transcriptome, which is likely to contribute to the undesirable properties of juvenile wood in radiata pine. The publicly available resource of radiata pine will also be valuable for gene function studies and comparative genomics in forest trees.
Radiata pine (Pinus radiata D. Don) is the dominant forest plantation species for the sawmill industry in Australia, New Zealand, Chile and some other countries. Breeding programs in radiata pine have been conducted in Australia since the early 1950s. The first generation of breeding increased the growth rate by 33%, thus reduced the rotation period to 27–30 yrs from the previous 40–45 yrs . Consequently, the faster growth rate resulted in a large proportion (30–50%) of juvenile wood in the harvested logs [2, 3]. Juvenile wood has a number of undesirable wood properties [1, 3] and its higher proportion in the harvested logs reduces the value of timber products. Improving juvenile wood quality and reducing its proportion have been identified as the priorities in the next generation breeding program of radiata pine. Understanding wood formation at the molecular level would underpin more efficient breeding strategies for the improvement of juvenile wood.
Genomics approaches have been applied to explore the molecular basis of growth and development in forest tree species. Expressed sequence tags (ESTs) and microarray gene expression studies have been carried out in poplar, loblolly pine, spruce and eucalypts [4–15]. Due to the economic value of wood, all forest genomic projects were primarily focused on the transcriptional regulation of wood formation (xylogenesis). Xylogenesis is initiated in the vascular cambium and proceeded from cell division to expansion, secondary wall formation, lignification, and finally programmed cell death [5, 12]. Large numbers of xylogenesis ESTs from forest tree species have been deposited in public databases, including 59,797 ESTs from loblolly pine, 25,218 ESTs from poplars, 16,430 ESTs from white spruce and 52,330 ESTs from sitka spruce (extracted from [6, 8, 14, 16], respectively). Furthermore, the genome of Populus trichocarpa (~550 Mbp) has been published , and efforts to sequence the Eucalyptus grandis genome (~640 Mbp) are in progress . However, the large conifer genome (~20,000 Mbp)  is unlikely to be sequenced in the near future, thus EST sequencing remains an important approach for gene discovery in conifers.
Despite the commercial importance of radiata pine in many countries, little genomic research has been done for this species compared to loblolly pine, Populus, spruce, maritime pine and eucalypts. As of January 20, 2009, only 151 radiata pine ESTs appear in the NCBI GeneBank (dbEST), and no unigene information is available. Recently, 455 genes were observed to be differentially expressed in the base to the crown of the radiata pine trees using modified differential display . Gene expression in the early embryogenesis of Pinus radiata was studied using the cDNA-AFLPs strategy, which revealed a total of 82 up- or down-regulated transcript-derived fragments (TDFs) . Development of juvenile and mature wood in radiata pine is still poorly characterised at the genomics level.
We applied genomics approaches to investigate the transcriptional regulation of xylogenesis in radiata pine, with a focus on juvenile wood formation. Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues collected from juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) trees, respectively. The sampled xylem tissues represent the major stages of wood development in a typical rotation period of radiata pine. A total of 6,389 high quality xylogenesis ESTs were sequenced from 5,952 cDNA clones and assembled into 3,304 unigenes. Here we report the generation and analysis of a genomic resource for wood formation in radiata pine.
Assembly of radiata pine xylogenesis ESTs from six cDNA libraries
Redundancy (%) c
Assembly for six libraries a
Assembly for each library b
Total of each library
Blast searches of the 5,952 ESTs against the NCBI nr database revealed 73.4% and 69.0% matches with tblastx and blastx (E-value ≤ 10-5), respectively. However, 71.3% of all matches with blastx are either unknowns (47.5%), unnamed (13.6%), hypothetical (5.4%) or predicted proteins (4.7%). When using the UniProt database with blastx (E-value ≤ 10-5) and TIGR database with blastn (E-value ≤ 10-15), 97.6% of the 5,952 ESTs have homologs and only 20.5% of all matches are unknowns or uncharacterized proteins. Of the 139 ESTs (2.3%) with no matches in the UniProt and TIGR databases, 41 ESTs are greater than 500 bp in length and remain as singletons after assembly, thus some of which are likely to represent putative novel ESTs in radiata pine wood formation. Based on the Pfam known protein family database, 52.1% of the ESTs have homologs with blastx (E-value ≤ 10-5) and were classified into 772 protein families. Nearly half of the ESTs (47.9%) did not match the known protein families in the Pfam database, thus they were regarded as unknown protein families.
Summary of annotation and functional classification of 3,304 xylogenesis unigenes in radiata pine*.
With GO Terms
No GO terms
The most enriched GO terms in the radiata pine xylogenesis genomic resource*.
GO:0044267 cellular protein metabolic process
GO:0016043 cellular component organization and biogenesis
GO:0009056 catabolic process
GO:0044444 cytoplasmic part
GO:0032991 macromolecular complex
GO:0043228 non-membrane-bound organelle
GO:0000166 nucleotide binding
GO:0005215 transporter activity
GO:0008233 peptidase activity
GO:0005198 structural molecule activity
Thirty highly abundant genes (or gene families) in the 5,952 xylogenesis ESTs of radiata pine.
Gene or gene family
Cellulose synthase (CesA)
Arabinogalactan protein (AGP)
S-adenosylmethionine synthetase (SAMS)
Methionine synthase (cobalamin-independent)(MetE)
Photoassimilate-responsive protein (PAR)
Caffeoyl-CoA O-methyltransferase (CCoAMT)
Unknown (Emb| CAB86899.1)
Sucrose synthase (SuSy)
Eukaryotic translation initiation factor
Metallothionein-like protein class II (MT-II)
Pollen-specific protein C13
Some cell wall genes were moderately abundant (10–20 ESTs) in the radiata pine EST resource, including protective protein for beta-galactosidase (PPGB), methionine synthase, proline-rich protein (PRP), translationally controlled tumor protein (TCTP), alpha-galactosidase, UDP-glucose glucosyltransferase, malate dehydrogenase, pectinesterase, and glycine-rich protein (GRP). Other cell wall related genes including pectin methylesterases (PMEs) and xyloglucan endotransglycosylases (XETs) have five ESTs each. Eight ESTs encoding cyclin or cyclin-like and seven ESTs encoding cell cycle proteins were identified in radiata pine. These cell cycle genes can activate cell-cycle machinery and cell division . Interestingly, most genes involved in lignin biosynthesis were present in radiata pine wood formation. Laccases, SAMS and CCoAOMT are in the 30 highly abundant genes with 29 to 46 ESTs (Table 4). C4H, 4CL, peroxidase, CAD, C3H, COMT and dirigent-like protein are moderately abundant with seven to 18 ESTs. CCR and PAL also occurred with two ESTs each in the radiata pine EST resource.
Thirty highly abundant protein families in the radiata pine xylogenesis EST resource.
Major intrinsic protein
Protein kinase domian
Tubulin/FtsZ family, C-terminal domain
S-adenosylmethionine synthetase, Central domain
RNA recognition motif
Cobalamin-independent synthase, Catalytic domain
Unknown function protein
Harpin-induced protein 1 (Hin1)
Protease inhibitor/seed storage/LTP family
Tubulin/FtsZ family, GTPases domain
Multicopper oxidase-like domains
Signaling proteins and buffering/transport proteins
Mitochondrial carrier protein
Papain family cysteine protease
Glycosyl hydrolase family 9
ADP ribosylation factor (Arf)
Ricin-type beta-trefoil lectin domain
Chitinase class I
NAD dependent epimerase/dehydratase family
Forty-one transcription factor families in the radiata pine xylogenesis EST resource.
Cys4--His--Cys3 zinc finger
Zinc finger, C-x8-C-x5-C-x3-H type (and similar)
Zinc finger, C2H2 type
No apical meristem (NAM) protein
Myb-like DNA-binding domain
Polycomb group (PcG) proteins
Helix-loop-helix DNA-binding domain
Ethylene insensitive1 (EIN3)
HMG (high mobility group) box
GATA zinc finger
WRKY DNA-binding domain
Basic leucine zipper (bZIP) motif
Trihelix DNA-binding domain
DNA-binding and dimerisation domain
Zinc-finger protein expressed in inflorescence meristem
Cys4 zinc finger and His/Cys3
TAZ zinc finger
Calmodulin-binding transcription activators
Multiprotein bridging factor 1
Auxin response factor
AT-rich interaction domain
Heat shock factor
Nodule inception protein
ZF-HD class homeobox domain
Comparative analysis of protein family revealed some known protein families more abundant in particular libraries (Additional file 3). Pectinesterase, pfkB family carbohydrate kinase, UDP-glucose/GDP-mannose dehydrogenase, homeobox domain, UDP-glucosyl transferase, profilin, thioredoxin, and plastocyanin-like domain were more abundant in earlywood. In contrast, CesA, SAMS, TCTP, tubulin, dehydrin, metallothionein, protein tyrosine kinase, aminotransferase, and WD domain protein were more abundant in latewood. The protein families more abundant in juvenile wood include pectate lyase, skp1, zinc-binding dehydrogenase and inorganic H+ pyrophosphatase. While plastocyanin-like, peroxidase, elongation factor, calreticulin, UDP-glucose/GDP-mannose dehydrogenase, SAMS, glycosyl hydrolases family 17, aldehyde dehydrogenase, major intrinsic protein (MIP), pectinesterase, CesA and O-methyltransferase are more abundant in mature wood. Some protein families are highly represented in both latewood and mature wood, suggesting possibly similar transcriptional regulation in these developmental stages.
Genes or gene families with more expression in different libraries were also identified in the radiata pine EST resource (Additional file 4). Homeodomain, AGP4, LTP, peroxidase, glycosyl transferase, actin 2, PRP, malate dehydrogenase, phytocyanin and PPGB have higher expression in earlywood. In contrast, expansin (ripening-related), dehydrin, tubulin alpha 1, COMT,CC-NBS-LRR resistance-like, and green ripe-like 1 are more abundant in latewood. Genes more expressed in juvenile wood include methionine synthase, HB1, cytochrome c oxidase, alpha tubulin 1, metallothionein-like, AGP6, etc. Some genes related to secondary wall biosynthesis (ie, peroxidase, AGP5, FLA17, CAD and CesA1) are more expressed in mature wood. Interestingly, in the large AGP gene family, PrAGP4 is more expressed in earlywood, while PrAGP5 and PrFLA17 are more expressed in mature wood, suggesting divergent roles of different AGP genes during wood formation.
GO term enrichments in different stages of wood development were also revealed using DAVID Bioinformatics Resources 2008 (Additional file 5). Specifically enriched GO terms in earlywood included: cellular component organization and biogenesis, membrane, protein complex, catalytic activity, hydrolase activity and transporter activity. Latewood specifically enriched terms included: cellular macromolecule metabolic process, protein modification process, cytoplasm, nucleotide binding, and protein kinase activity. The number of specifically enriched GO terms in juvenile wood is about two times as in mature wood, suggesting more unique gene expression in juvenile wood than mature wood (Additional file 5). The most enriched specific GO terms in juvenile wood included: cellular macromolecule metabolic process, intracellular organelle part, protein complex, hydrolase activity, and nucleotide binding; while in latewood cellular component organization and biogenesis, membrane, and transporter activity are the most enriched specific GO terms. The specifically enriched GO terms in different development stages are likely to reflect their unique transcriptomes.
The 6,389 xylogenesis ESTs and 3,304 unigenes described here form a significant genomic resource for radiata pine. Being derived from major stages of wood development in a typical rotation period, this resource represents a broad xylogenesis transcriptome in the species. The highest proportion of ESTs is derived from juvenile wood and will assist in understanding transcriptional regulation at this stage. Most families of genes involved in secondary cell wall development are represented in the EST resource.
Lignin biosynthesis is a key developmental feature that distinguishes secondary from primary cell walls. The synthesized monolignols are transported to the apoplast, where polymerisation starts in the middle lamella and cell corners . Most genes in the monolignol pathway are represented with high or moderate abundance in radiata pine xylogenesis including SAMS, MetE,laccase, CCoAOMT, C4H, 4CL,methionine synthase,peroxidase,CAD, C3H, COMT and cytochrome c oxidase. A total of 46 radiata pine ESTs (four contigs and two singletons) were found to encode SAMS protein. Lignin precursors are extensively methylated in the S-adenosyl methionine (SAM) dependent reaction . In white spruce seven SAMS genes were previously identified . Some other lignin biosynthetic genes were also identified at lower abundance including CCR, PAL and dirigent-like. The dirigent protein was presumed to act as a template for lignin polymerization . The presence and abundance of most lignin biosynthetic genes in the radiata pine EST collection suggests active secondary cell wall biosynthesis in the xylem tissues sampled.
Hemicelluloses and pectin form a large group of heteropolysaccharides composed of D-xylose, L-arabinose, L-rhamnose, L-fucose, D-mannose, D-galactose, D-galacturonate, or D-glucose. In the radiata pine EST collection we identified genes involved in the synthesis and degradation of hemicellulose and pectin biosynthesis, including 21 ESTs (three contigs) of UDP-glucose dehydrogenase. UDP-D-glucuronate synthesis is the rate-limiting step for the biosynthesis of both hemicellulose and pectin . We also identified four radiata pine ESTs of xyloglucan endotransglycosylases (XETs). XETs can cut and rejoin xyloglucan (XG) chains, and are believed to be important regulators of primary cell wall expansion . One radiata pine XET is most homologous to Arabidopsis XTH8, which belongs to the same class as PttXET16A, a poplar secondary wall XET . The considerable involvement of XETs in secondary walls is likely in the breakage and reconnection of linkages between adjacent microfibrils soon after their synthesis.
AGPs, AGP-like proteins and FLAs are highly abundant in the radiata pine EST collection with a total of 89 ESTs. The ortholog of loblolly pine PtaAGP4 is the most abundant AGP in radiata pine with 45 ESTs. Other identified radiata pine AGPs include PrAGP5, PrAGP6, PrAGP-like and PrFLAs (PrFLA1, 8, 10, 12, 16, 17 and 26). Six loblolly pine PtaAGPs (AGP3, AGP4, AGP5, AGP6, 3H6 and 14A9) were predominantly expressed in xylem . The role of these AGPs in secondary wall development is currently unknown. In loblolly pine PtaAGP6 epitopes are restricted to cells formed immediately before secondary cell wall thickening . However, the preponderance of genes in the radiata pine EST resource that are clearly involved in secondary wall development suggests that AGPs may also play an important role in secondary wall synthesis. AGPs were also found to be expressed in xylem tissues of Angiosperms. In Populus 15 FLAs are expressed in xylem, ten of which are up-regulated in tension wood . While in eucalypts two FLAs are strongly up-regulated in upper branch wood . A tobacco AGP is a candidate linker at the cell surface to mediate signal transduction between the plasma membrane and the cytoskeleton .
Conspicuous in the radiata pine xylem EST resource are genes encoding the cytoskeletal proteins: tubulin, actin and other cytoskeletal-associated proteins. The cytoskeleton provides the structural basis for cell polarity establishment and maintenance [49, 50]. Cellulose microfibril arrangement and deposition is believed to be directed by cortical microtubules, the dynamic heteropolymer arrays of α-tubulin and β-tubulin proteins. A Eucalyptus grandis β-tubulin gene (EgrTUB1) has been implicated in determining the orientation of cellulose microfibrils in plant secondary walls . Two Populus tremuliodes α-tubulin genes (TUA1 and TUA5) are highly expressed in wood tissues  and ten tubulin genes in Populus tremula × P. tremuliodes are up-regulated in secondary walls . In the radiata pine EST resource, a total of 102 ESTs were annotated to encode tubulin proteins, among which three tubulin genes (PrTUA1, PrTUB2 and PrTUB3) are highly or moderately abundant with 52, 26 and 15 ESTs, respectively. These three PrTUBs are most homologous to poplar tubulin genes that are strongly expressed in xylem tissues .
Actin microfilaments ensure the delivery of vesicles to specific sites in plant cells and are involved in cell shape determination . Actin genes are highly abundant in the radiata pine EST collection with a total of 58 ESTs (six contigs and five singletons). Other actin related and interacting genes are also found, including ADF, actin binding protein (ABP), actin-related protein (ARP), profilin and villin. Arabidopsis has ten actin genes , eight actin-related proteins (ARP)  and many actin-interacting proteins including profilin, actin depolymerisation factor (ADF), fimbrin, villin, rho-type small GTPase, capping protein, and actin-interacting cyclise-associated protein [55, 56]. The radiata pine ARPs are the putative homologs to Arabidopsis ARP2, ARP3 and ARP6, suggesting the possible involvement of the ARP2/3 complex in actin cytoskeleton development during radiata pine xylem formation.
Cytoskeleton formation involves cycling dynamics of polymerization and depolymerization of its basic structural subunits, such as G-actin and tubulin dimers. Microtubule-associated proteins (MAPs) and actin-binding proteins (ABPs) are believed to regulate the dynamic cytoskeletal changes . In the radiata pine EST resource, several transcripts are present for MAPs (three ESTs), microtubule-binding protein (one EST) and actin-binding protein (two ESTs). Kinesin-1 (conventional kinesin) is a dimeric motor protein that carries cellular cargo along microtubules . A novel kinesin (GhKCH1) from cotton fibers plays a role in coordinating the actin network with the cortical microtubule array . We identified eight radiata pine ESTs annotated as kinesin-1 or kinesin-like proteins which may have similar roles in actin and microtubule interactions.
Of the 64 transcription factor families in poplar genome, 41 families (64.1%) are presented in the radiata pine xylogenesis genomic resource (Table 6). These transcription factor families include many regulatory genes, such as MYBs, MADS-box, LIM domain, zinc finger, Class III HD-Zip, WRKY and AUX/IAA. Several members of NAC, MYB, zinc finger, LIM domain, MADS-box, AUX/IAA and homeodomain (HD) are believed to regulate secondary wall biosynthesis [60–65]. MYBs and zinc fingers can recognise AC elements in the promoters of many monolignol biosynthesis genes [65–67]. Some members of the NAC and MYB families are the key switches in the transcriptional network for secondary wall development .
Three radiata pine ESTs of R2R3-type MYB are most homolgous to the three secondary xylem MYB genes (MYB2, MYB4 and MYB8) in loblolly pine and Picea glauca . In loblolly pine PtMYB1 and PtMYB4 are transcriptional activators of lignin synthesis [66, 67]. The Eucalyptus EgMYB2 regulates lignin biosynthesis through binding to AC element . Homeobox domain (HD) transcription factors were also identified in the radiata pine EST resource with 29 ESTs (two contigs). Three radiata pine ESTs were annotated as AUX/IAA genes. Auxin can rapidly induce AUX/IAA gene expression  and five AUX/IAA genes from hybrid aspen are up-regulated in xylem .
In radiata pine EST resource Class III HD-Zip genes were represented by 15 ESTs (six unigenes). Five Arabidopsis HD-Zip III genes are up-regulated in secondary xylem [63, 72] and the hybrid aspen class III HD-Zip gene (PtaHB1) is closely associated with wood formation . Eleven radiata pine ESTs (four unigenes) were annotated as LIM domain and four ESTs (one unigene) as MADS-box. A tobacco LIM protein can bind to the Pal-box sequence in the promoter regions of several phenylpropanoid biosynthesis genes . Some Arabidopsis MADS-box genes have been implicated in the regulation of lignin biosynthesis .
Wood formation is a complex and continuous process involving a series of developmental stages during which considerable variation in wood properties is observed. Compared to mature wood, juvenile wood shows higher spiral grain, lower density, higher microfibril angle (MFA), more longitudinal shrinkage, lower cellulose content, higher lignin content, more pectin and shorter tracheids [1, 3, 74]. Compared to latewood, earlywood has higher growth rate, thinner cell walls, larger radial lumen, more lignin, lower hemicellulose, lower pectin, slightly lower cellulose, higher MFA and lower density [74–76].
Comparisons of EST abundance of genes related to lignin biosynthesis in different wood development stages.
EST abundance in EW and LW
EST abundance in JW and MW
Cytochrome c oxidase
RT-MLPA validation of four genes preferentially represented in earlywood and latewood.
Proline-rich protein (PRP)
The radiata pine genomic resource was developed from six developing xylem libraries with 6,389 high quality ESTs and 3,304 unigenes. This is the first large scale and publicly available genomic resource for radiata pine. Many genes involved in cell wall biosynthesis and transcription factors were identified in the wood formation of radiata pine. Comparative analysis among different development stages revealed a distinct transcriptome in juvenile earlywood. Genes with relatively more expression in earlywood, latewood, juvenile wood and mature wood were also identified, respectively. The identified genes in this study will be candidates for functional genomics and association studies in radiata pine. This genomic resource will also be valuable for comparative genomics of wood formation in forest trees.
In order to broadly represent the xylogenesis transcriptome in radiata pine, developing xylem tissues were sampled at six critical development stages: earlywood (spring) and latewood (autumn) tissues collected at juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) ages. Two 7-year-old and two 30-year-old trees were sampled at Yarralumla ACT in August 2004 (spring) and April 2005 (autumn). Three 11-year-old trees were sampled at Bondo NSW in September 2003 (spring) and April 2004 (autumn). Developing xylem tissues were collected by scraping the thin (approximately 1.0 mm) and partially lignified layer on the exposed xylem surface at breast height. All xylem tissues were immediately frozen in liquid nitrogen in the field, and then stored in the laboratory at -80°C for later RNA extraction.
Total RNA was extracted using the method of Chang, et al  with slight modification. Poly(A)+ mRNA was isolated using the polyATtract system III/IV (Promega, CA). Five micrograms of mRNA were used as starting template for cDNA library construction using the ZAP-cDNAGigapack III Gold Cloning Kit (Stratagene, CA). After first and second strand cDNA synthesis, the radioactive cDNA samples were checked on an alkaline agarose gel and exposed to x-ray film. The double strand cDNAs were fractionated using a drip column filled with Sepharose CL-2B gel. The fractions at approximately 600 bp or more were pooled and ligated into the EcoRI/XhoI cloning site in the Uni-ZAP XR vector. One to two microliters of ligated cDNA inserts were packaged using Gigapack III Gold Packaging Extract. To make a large quantity of high titer stock of the lambda phage library, the primary libraries were amplified according to the manufacturer's protocols (Stratagene, CA). The pBluescript phagemids containing cDNA inserts were mass excised in vivo from the Uni-ZAP XR vector using Ex-Assist helper phage and XL1-Blue MRF'. The titer of lambda phage and phagemid libraries was measured using the host cells of XL1-Blue MRF' strain and SOLR strain, respectively. Library quality and the size of cDNA inserts were determined by PCR screening of 96 randomly selected clones. All lambda phage cDNA libraries were stored at -80°C.
The excised phagemids were replicated in SOLR cells and grown on LB-ampicillin agar plates overnight at 37°C for colony isolation. All phagemid colonies were picked out and put into LB broth with ampicillin. Colony picking was done by a Versarray Colony Picking Robot (Bio-Rad, CA) or manually with a tooth-pick. The picked colonies were cultured overnight at 37°C with shaking. These suspension cultures were used for PCR reactions and EST sequencing, or stored at -80°C in glycerol stock for later use.
An average of 1,200 cDNA clones from each of the six libraries was used for EST sequencing. Approximately 8,000 sequencing reactions were performed on a total of 7,200 cDNA clones. The cDNA inserts were PCR amplified using the M13 forward and reverse primers followed by clean-up with ExoSap (GE Healthcare, UK). The sequencing primer for the 5' end of cDNA inserts was the SK, T3 or M13 primer. A small proportion of sequences were obtained from the 3' end using the M13 forward primer. The sequencing reaction was performed using BigDye 3.1. After a quick start at 96°C 1 min, the reaction was cycled 25 times at 96°C 1 sec, 50°C 5 sec and 60°C 4 min, followed by holding at 4°C. After sodium acetate-EDTA-ethanol precipitation, the sequencing reaction was run using ABI 3730 capillary sequencers (Applied Biosystems, CA). The trace files were base called by Sequencing Analysis v5.2 which was integrated into the ABI 3730 capillary sequencers.
The raw EST sequences were processed using Sequencher 4.7 (Gene Codes Corporation, MI) to trim vector and ambiguous ends. The poly(A) in the EST sequences was deleted manually. A total of 6,389 high quality ESTs with at least 100 bp in length were obtained. These ESTs were submitted to dbEST at the National Center for Biotechnology Information with Genbank accession numbers FE518213 to FE524601. The 5' end ESTs from 5,952 different clones were assembled to generate consensus sequences for unigenes, including contigs and singletons. The EST assembly was performed using the CAP3 program  integrated in BIO301, an automated EST sequence management and functional annotation system . In order to minimize chimeric contigs in assembly, an overlap of at least 40 bp and identity of sequence of at least 95% were used as thresholds in the assembly. The number of ESTs in each contig was used to estimate the redundancy in the EST sequencing.
The high quality ESTs and derived unigenes were used to search the NCBI nr (non-redundant) database using blastx and tblastx (E-value ≤ 10-5). They were also used to search the UniProt known protein database  using blastx (E-value ≤ 10-5). The sequences with no matches in UniProt were further blasted against the TIGR all tentative consensus sequence gene indices database  using blastn (E-value ≤ 10-15). Putative functions for ESTs and unigenes were assigned with GO (Gene Ontology) terms . To identify the known protein families, the ESTs were also searched with the Pfam protein family database  using blastx (E-value ≤ 10-5). Initial functional classification was based on the assigned GO terms with two levels of classification. Gene enrichment analysis and further functional classification on GO terms were conducted using the DAVID Bioinformatics Resources 2008 [23, 24]
Comparison of EST abundance was used to identify differentially represented genes in different libraries or combined libraries. Number of ESTs for each gene in different libraries was previously normalized at the presumed same number of ESTs in each library. A normalized ratio of EST abundance was calculated for the comparisons. The significance of differentially represented genes was statistically tested using the method from Audic and Claverie .
Preferentially represented genes were validated using the strategy of multiplex ligation-dependent probe amplification (MLPA) . More specifically we used the RT-MLPA protocol for mRNA detection and quantification. Four genes (AGP4, proline-rich protein, peroxidase and dehydrin) preferentially represented in the combined earlywood or latewood libraries were included in the validation. Gene expressions of these four genes were detected in earlywood and latewood tissues from three radiata pine trees at transition age (11 yrs). In addition, expressions of AGP4 and peroxidase were also measured in earlywood and latewood tissues from eight radiata pine trees at juvenile age (5 yrs). We used four technical replicates in RT-MLPA. Reverse transcription for ~400 ng total RNA after DNase treatment was performed using the ImProm-II Reverse Transcription System (Promega, WI) and oligo (dT)15. Synthesized cDNA was hybridized with mixed NPK and LIG probes designed for the validated genes (Additional file 7) at 60°C overnight, followed by ligation and PCR amplification with SALSA D4 primer. Mixture PCR products from multiple genes were separated and analysed using the CEQ™ 8000 Genetic Analysis System (Beckman Coulter, CA).
million base pairs
expressed sequence tag
4-cinnamoyl CoA ligase
cinnamyl alcohol dehydrogenase
caffeoyl CoA O-methyltransferase
cinnamoyl CoA reductase
caffeic acid O-methyltransferase
methionine synthase (cobalamin-independent)
fasciclin-like arabinogalactan protein
major intrinsic protein
water deficit inducible protein
single nucleotide polymorphisms
multiplex ligation-dependent probe amplification
reverse transcriptase MLPA.
The authors would like to thank Wei Li, Philippe Matter and Maureen Nolan for their help with EST sequencing, Bala Thumma, Colleen MacMillan and Charlie Bell for their technical assistance in the lab, Chris Boland from CSIRO Entomology for his assistance with the robot colony picking, and Colleen MacMillan, Iain Wilson from CSIRO Plant Industry for their critical internal review on the preparation of the manuscript. We also thank the four anonymous reviewers for their valuable comments and suggestions to improve this manuscript. This work is part of the Juvenile Wood Initiative (JWI), a project funded by Forest and Wood Products Australia (FWPA), ArborGen LLC, the Southern Tree Breeding Association (STBA), Queensland Department of Primary Industry (QDPI) and CSIRO. Wei Li was a postdoctoral fellow under an agreement between CSIRO and the Chinese Academy of Forestry.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.