Generation, functional annotation and comparative analysis of black spruce (Picea mariana) ESTs: an important conifer genomic resource
© Mann et al.; licensee BioMed Central Ltd. 2013
Received: 1 October 2013
Accepted: 8 October 2013
Published: 11 October 2013
EST (expressed sequence tag) sequences and their annotation provide a highly valuable resource for gene discovery, genome sequence annotation, and other genomics studies that can be applied in genetics, breeding and conservation programs for non-model organisms. Conifers are long-lived plants that are ecologically and economically important globally, and have a large genome size. Black spruce (Picea mariana), is a transcontinental species of the North American boreal and temperate forests. However, there are limited transcriptomic and genomic resources for this species. The primary objective of our study was to develop a black spruce transcriptomic resource to facilitate on-going functional genomics projects related to growth and adaptation to climate change.
We conducted bidirectional sequencing of cDNA clones from a standard cDNA library constructed from black spruce needle tissues. We obtained 4,594 high quality (2,455 5' end and 2,139 3' end) sequence reads, with an average read-length of 532 bp. Clustering and assembly of ESTs resulted in 2,731 unique sequences, consisting of 2,234 singletons and 497 contigs. Approximately two-thirds (63%) of unique sequences were functionally annotated. Genes involved in 36 molecular functions and 90 biological processes were discovered, including 24 putative transcription factors and 232 genes involved in photosynthesis. Most abundantly expressed transcripts were associated with photosynthesis, growth factors, stress and disease response, and transcription factors. A total of 216 full-length genes were identified. About 18% (493) of the transcripts were novel, representing an important addition to the Genbank EST database (dbEST). Fifty-seven di-, tri-, tetra- and penta-nucleotide simple sequence repeats were identified.
We have developed the first high quality EST resource for black spruce and identified 493 novel transcripts, which may be species-specific related to life history and ecological traits. We have also identified full-length genes and microsatellite-containing ESTs. Based on EST sequence similarities, black spruce showed close evolutionary relationships with congeneric Picea glauca and Picea sitchensis compared to other Pinaceae members and angiosperms. The EST sequences reported here provide an important resource for genome annotation, functional and comparative genomics, molecular breeding, conservation and management studies and applications in black spruce and related conifer species.
KeywordsPicea mariana Expressed sequence tag Gene discovery Gene expression Gene ontology Microsatellites
In non-model species with large genome size, EST (expressed sequence tag) sequencing and their annotation can provide the first step towards understanding the transcriptome and expression patterns of specific genes, which can complement the whole genome sequencing, and can assist with genome sequence annotation. Traditionally, EST sequencing was conducted with the Sanger sequencing system [1–5]. More recently, next-generation sequencing (NGS) platforms have been used to generate enormous amounts of genome and transcriptome sequences [6–10]. NGS methods facilitate whole transcriptome sequencing at a fraction of the time and cost previously required for the Sanger method [11, 12]. However, commonly used NGS platforms produce shorter reads and/or reduce the quality per base call . The improved length and accuracy of reads obtained from Sanger sequencing can complement NGS workflows. This technology can assist in validating the NGS platform sequences by serving as a reference by which short reads can be aligned and corrected . Therefore, EST sequences derived from the Sanger method are still a valuable resource in the NGS era.
Conifers have a large genome size (~18-35 Gbp) and are ecologically and economically important, long-lived plants. They form a major part of the northern boreal and temperate forests, which constitute the major biome of the world. The genus Pinus (pine) and Picea (spruce) are two important genera among conifers. Black spruce (Picea mariana (Mill.) B.S.P.) is a widely-distributed transcontinental species of the North American boreal and temperate forests with high ecological and economic importance . Black spruce is one of the most important softwood species for the production of pulp and paper in Canada . It is an early successional species and has a corresponding suite of species-specific life history, growth, eco-physiological and adaptive traits . The estimated haploid genome size of black spruce is about 17.5 Gbp, with 1C contents of 17.4 pg .
As of, January 1st, 2013 dbEST release (130101), there were approximately 74.19 million ESTs from 2,473 species available in the GenBank at the National Centre for Biotechnology Information (NCBI) . In conifers, the major EST contributing species are loblolly pine - Pinus taeda (328,662), followed by white spruce - Picea glauca (313,110) and Sitka spruce - Picea sitchensis (186,637). Among spruce species, white spruce has the maximum number of ESTs, followed by P. sitchensis and P. engelmannii X P. sitchensis. Recently 27,720 unique cDNA clusters (unigene set) have also been reported for P. glauca. Also, very recently, draft genomes of Norway spruce (Picea abies) and white spruce have been published [9, 10]. However, the black spruce transcriptome is not yet fully characterized and only 4 ESTs and 699 cDNA sequences are reported within the NCBI’s dbEST (excluding ESTs reported from the current study). Due to a number of life history, morphological, adaptive, eco-physiological, and insect resistance traits and phylogenetic differences of black spruce from white spruce and Sitka spruce [14, 19–21], we expect some unique genes in the black spruce transcriptome. Black spruce is an early successional shade-intolerant species whereas white spruce and Sitka spruce are late successional shade-tolerant species. Black spruce can grow in poor conditions, such as bogs, whereas white spruce grows on well-drained soils. These species-specific traits affirm a need to sequence and characterize the black spruce transcriptome.
Much of the EST sequencing in conifers has been performed using wood forming tissues and secondary xylem due to the economic importance of wood [1, 2, 4, 22–25]. There are a number of studies that have sequenced transcripts from needle tissues, including: lodgepole pine (Pinus contorta) , sugar pine (Pinus lambertiana) , loblolly pine , maritime pine (Pinus pinaster)  and Norway spruce . These studies have provided some basis for the genes expressed in needle tissue; however, more comparative work is needed to understand their role in important metabolic pathways, such as photosynthesis.
The objective of our study was to develop a black spruce transcript resource, and thus, facilitate structural and functional spruce genomics projects related to growth and adaptation. Here, we report the results of the first EST sequencing project from black spruce in which cDNA clones were isolated and sequenced from a standard cDNA library constructed from needle tissues in 2002. We conducted bidirectional Sanger sequencing of ESTs to produce high quality, long reads to assist with the identification of full-length genes. We assembled ESTs into contigs and singletons, and subsequently performed comparative protein annotations with the non-redundant (NR) protein database and UniGene clusters available at NCBI for model plant species. We further conducted nucleotide similarity analysis with EST sequences available from all major plant species (dbEST), as well as species-specific sequences from various gymnosperms and angiosperms. Gene Ontology terms were assigned and the ESTs were manually evaluated for specific categories, including transcription factors and photosynthetic genes. Finally we used black spruce EST data for the detection of simple sequence repeats (SSRs).
Plant material and cDNA library construction
Total RNA was extracted from 2 g of freshly growing needles of three different black spruce seedlings established in the greenhouse at Dalhousie University, following the protocol described in . Quality and quantity of the isolated RNA were determined using a spectrophotometer (SPECTRAmax PLUS, Molecular Devices Corporation, Sunnyvale, CA, USA) and extracted RNA was found to be of high quality (OD260/OD280 = 1.82). The quantity of isolated RNA was approximately 120 μg per g of the needle tissue used. The polyA RNA was purified using RNeasy Mini Kit (Qiagen Inc., Mississauga, ON, Canada) following the manufacturer’s instructions. The cDNA library was constructed using Creator smart cDNA library construction kit (Clontech Laboratories Inc. CA, USA). The oligo dT primed cDNA inserts were directionally cloned in pDNR-LIB vector and transformed using XL-10 gold ultra-competent cells of Escherichia coli. Plasmid DNA was isolated from the transformed white colonies selected from the overnight grown cells on Luria Broth agar plates containing chloramphenicol (30 μg/ml) using QIAprep Spin Miniprep kit (Qiagen Inc. Mississauga, ON, Canada). The quality and quantity of the isolated plasmid DNA was confirmed on 0.8% agarose gels with known amount of lambda DNA before sequencing.
Sequencing reactions were performed in a PTC-200 thermal cycler (MJ Research, Reno, NV, USA) using the Thermosequenase fluorescent labeling primer cycle sequencing kit with 7-deaza dGTP (Amersham Pharmacia Biotech, Freiburg, Germany) according to the manufacturer’s instructions. The sequencing products were resolved on a LI-COR 4200 L sequencing system (LI-COR Biosciences, Lincoln, NE, USA). A total of 2,486 cDNA clones were sequenced in both directions using IRD labeled M13F (5΄-AAA CAG CTA TGA CCA TGT TCA-3΄) and M13R (5΄-GTA AAA CGA CGG CCA GT-3΄) primers.
Preliminary sequence processing
Processing of raw trace files was performed through the customized TreeGenes EST pipeline . Base-calling and quality-assignment of the sequences were conducted with Phred (versions 0.000925.c and 0.020425.c) [29, 30]. Low quality bases below Phred20 were masked and vector sequences were trimmed from the ends. The cross_match program was used for this purpose with minmatch 12 and minscore 20 . Sequences with less than 100 high-quality bases (Phred20 or better) after trimming and sequences with polyA tails of ≥ 100 bases were removed from the analysis. The resulting sequence set was compared against the non-redundant (NR) protein database  and top ranked BLAST matches to species other than plants with score values > 70 were flagged as contaminants; no such sequences were found in our sequence dataset. The processed sequences were assembled into contigs and singletons using USEARCH v6.0  with 95% identity. EST and contig redundancy was calculated as described in Kirst et al.. Simple sequence repeats (SSRs) present in the EST sequences were identified and analyzed using the simple sequence repeat identification Tool (SSRIT) . The parameters were set for detection of perfect di-, tri-, tetra-, and pentanucleotide motifs with a minimum of 10, 7, 5, and 4 repeats, respectively.
Comparative sequence analysis
The following databases were used to perform BLASTX and BLASTN  analyses for annotation of the EST singletons and contigs: 1) Arabidopsis thaliana, UniGene Build #74, 30,633 clusters; 2) Populus UniGene Build#11, 15,056 clusters; 3) Oryza sativa, UniGene Build #86, 44,118 clusters; 4) Vitis vinifera, UniGene Build #13, 22,101 clusters; 5) Physcomitrella patens, UniGene Build #4, 17,573 clusters; 6) Pinus and Picea, UniGene Build #13, 61,706 clusters; 7) NR database of GenBank, NCBI release 192, release date October 15, 2012; 8) EST_OTHERS in NCBI download date October 21, 2012; 9) UniProt Plant Protein databank in NCBI download date October 9, 2012. All BLAST searches were subject to an e-value cut-off of 1e - 05. In reporting BLAST results, the BLAST score was used which incorporates both the similarity metric and the e-value to provide a representation of the hit’s uniqueness and overall similarity to the query sequence. BLASTX searches were targeted against model species while BLASTN searches focused on comparisons against conifer species with public sequence resources. In addition to BLAST annotations, the pipeline-directed Gene Ontology (GO) assignments were conducted from applicable results in the categories of Molecular Function and Biological Process. The hierarchical GO structure was stored locally to resolve consistent levels of annotation. In order to classify sequences into comparable categories, InterPro scan wrappers were applied to generate BRENDA enzyme, SignalP, TMHMM, and PFAM protein domain results. Full-length unique ESTs were identified from BLASTX sequence similarity searches. To be considered full-length, sequence were required to have greater than 80% identity and include the start codon for the translated protein.
Results and discussion
EST sequence quality, contigs and unique sequences
Summary of EST sequencing and assembly results
EST sequences and contigs
Total EST sequences
Number of 5′ sequences
Number of 3′ sequences
Number of contigs
Number of singletons
Average assembled EST length
Number of full-length cDNA sequences
Number of assembled ESTs with:
Significant BLASTX annotations
Significant BLASTX annotations with known function
No BLASTX annotation information
Average number of sequences per contig
Number of contigs containing:
Comparison of EST sequencing statistics with representative Picea and Pinus studies
Average length, bp
Number of ESTs (a)
No of unigenes (b + c)
EST redundancy,% [(a-b)/a]
Contig redundancy [(a-b)/c]
Functional gene annotations for unique transcript sequences and gene discovery
Translated nucleotide to protein comparisons were made for the 2,731 P. mariana unique sequences (2,234 singletons and 497 contigs) against the non-redundant (NR) protein database. 1,319 (59%) of 2,234 singletons and 398 (80.1%) of 497 contigs, had significant BLASTX hits to known proteins, yielding annotations for 1,717 (62.8%) black spruce unique sequences (Table 1; Additional file 1: Table S1). As expected, the percentage of contigs (80.1%) showing significant similarity with the NR database was higher than singletons (59%). This may be due to greater sequence lengths of the contigs in comparison to shorter singletons. Of the 1,717 annotated unique sequences, 1,478 (54%) represented sequences with known gene functions. In all cases, the most significant, informative annotation was selected. The remaining 239 annotated sequences had annotations that were predicted, hypothetical, or unknown (Additional file 1: Table S1). No contaminants were found after analysis of BLASTX results as the cDNA library was developed from fresh needles of green-house grown seedlings.
A total of 1,014 (37.1%) sequences had no significant BLASTX hits with the NR protein database. The sequence divergence among gymnosperms and angiosperms is a limiting factor for gene annotation in conifers. Similar statistics were obtained for BLASTX similarity analysis of ESTs against publically available databases for white spruce  Sitka spruce , and Norway spruce , which reported no annotations for 15-30% of the transcripts. These results demonstrate that available datasets are not sufficient for annotation of conifer transcripts. In theory, these un-annotated sequences (1,014) could be P. mariana specific transcripts or short segments of genes that would be recognized as homologs if more substantial sequences sets were available. Perhaps, these sequences represent regions of proteins that have diverged too much and escaped our similarity search criteria. Finally, these un-annotated sequences could represent partial transcripts with mostly UTRs which, in general, show lower degree of conservation among species. Following the initial analysis of ESTs with the NR database, sequences were annotated against the highly curated plant protein UniProt databank and produced a total of 1,478 significant annotations with known functions (Table 1 and Additional file 1: Table S1). The gene annotations from ESTs in this study represent only a portion of gene repertoire in P. mariana, more transcriptome sequencing is needed to identify the needle tissue transcriptome.
Predicted proteins from the first whole genome sequence of Norway spruce have become available . However, the Norway spruce genome assembly and protein predictions are at the very first stage, whereas we have used highly curated and reliable plant protein and NR protein databases for annotation of our black spruce unique contigs and singletons. Thus, the functional annotations reported here, although conservative, should be quite reliable. Also, the first draft genome of white spruce is published  but there is no information available on its predicted proteins. In future, with the availability of advance generations of Norway spruce genome assembly and identification of functional proteins, the black spruce unique transcriptome sequences should be analyzed against the Norway spruce and other conifers (if available) functionally analyzed and predicted proteins. This may provide information if an EST is a member of a longer protein that is actually or predictably expressed.
Gene ontology classification, full-length genes, and gene families
Biological process describes the major biochemical pathways that the sequences may be involved in (a much higher resolution than molecular function). The primary categories in our study include cellular metabolic process (17%), primary metabolic process (13%), macromolecule metabolic process (8%), and biosynthetic process (8%) (Figure 3B). The results from the functional distribution highlight that transcripts from diverse categories are represented in P. mariana unique sequences. The molecular functions and biological processes assigned for the black spruce unique contigs and singletons are consistent with the metabolic pathways active during vigorously-growing black spruce seedlings in the greenhouse conditions from which samples for RNA preparation were collected. The molecular function and biological process Gene Ontology terms are also consistent with the similar results reported in other studies that used needles for EST or transcriptome sequencing [6, 23, 24, 26].
EST singletons and contigs annotated as putative transcription factors
Annotation sequence identifier
Transcription factor 25-like
RNA polymerase II transcriptional coactivator KIWI
Myb family transcription factor
Transcription elongation factor 1 homolog isoform
Transcription initiation factor IIA subunit 2-like
Transcription factor ILR3-like
Transcription factor, putative
Similar to C-Myc binding protein
PAP-specific phosphatase HAL2-like
Similar to C-Myc binding protein
Thylene-responsive transcription factor 7-like
Transcription factor jumonji domain-containing protein
Transcription repressor MYB4
Transcription factor, putative
WRKY transcription factor, putative
RNA polymerase II transcriptional coactivator KIWI
Associate of C-myc, putative
mTERF domain-containing protein
Nuclear transcription factor Y subunit C-9
Transcription elongation factor SPT6
Transcription factor jumonji domain-containing protein
Transcription initiation factor iia (tfiia), gamma chain, putative
WRKY transcription factor 6
Since the cDNA library was constructed from needle tissues, genes related to photosynthesis were expected to be abundantly expressed. A total of 232 P. mariana unique transcripts related to photosynthetic pathway were identified (Additional file 3: Table S3). This group included chlorophyll a/b binding protein, light-harvesting complex proteins, photosystem I and II reaction center proteins, ribulose bisphosphate carboxylase (RuBisCo), oxygen-evolving enhancer protein, granule-bound starch synthase and nucleoside diphosphate. In plants, RuBisCo is more abundant during the day when it is transcriptionally regulated by the light receptor phytochrome, thus fixing carbon dioxide in photosynthesis . We collected needle samples for RNA extraction during daytime.
Gene expression and highly abundant transcripts
Estimation of gene expression: unique EST sequences with > 10 ESTs
Putative protein identification
Number of unique EST sequences
Number of ESTs
Non-specific lipid-transfer protein
Bet v I allergen family protein. (Os04t0465600-01)
Ribulose bisphosphate carboxylase small chain 1A
Antimicrobial peptide 1
Non-protein coding transcript. (Os07t0139600-01), partial
Germin-like protein 8-14-like
Metallothionein-like protein 3B-like
Photosystem I reaction center subunit V
Photosystem II subunit X
Cell wall-associated hydrolase, partial
LOW QUALITY PROTEIN: photosystem II 10 kDa Polypeptide, chloroplastic-like
Translation machinery associated protein TMA7
Photosystem I reaction center subunit N, chloroplast precursor, putative
Protein ralf-like 34
Transmembrane protein TPARL, putative
Hypothetical protein BrnapMp036 (mitochondrion)
Hypothetical protein EAAG1_11607
ATP synthase subunit beta
Metallothionein-like protein 2-like isoform 2
Ribosomal protein S14 (chloroplast)
Auxin-binding protein ABP19a precursor, putative
Photosystem II 5 kDa protein, chloroplastic-like
Similar to Anth (Pollen-specific desiccation-associated LLA23 protein). (Os11t0167800-01)
Chlorophyll a-b binding protein M9, chloroplastic precursor
Photosystem I subunit O
High abundance of genes involved in photosynthesis, growth and transcription factors is quite expected because the cDNA library was constructed from needles of actively growing black spruce seedlings in the greenhouse conditions. Abundance of stress and disease responsive genes expressed in black spruce seedlings growing under optimal greenhouse conditions suggest that these genes are also involved in plant functions other than their response to abiotic and biotic stress. Nevertheless, the stress and disease responsive genes identified in our study provide very valuable transcriptomic resource for structural and functional genomics studies in black spruce.
Sequence similarities, life history and ecological traits and evolutionary relationships
Among the remaining, 2,238 sequences, 96% had a hit to a member of the Picea genus and 6% had significant similarities to a member of the Pinus genus (Figure 5A). When viewing the results by species, Picea glauca and Picea sitchensis had the majority of significant matches, with more than 65% of the sequences generating a BLAST score > 200 (Figure 5B). These similarity results suggest that the majority of P. mariana genes discovered are homologues (orthologs) of other Picea species genes and may have originated from a common ancestor. The significant similarities with P. glauca ESTs is not surprising as both species are sympatric transcontinental boreal species which can hybridize naturally, although rarely . Within Pinus, the greatest number of hits was observed with Pinus taeda (68 unique sequences) as expected since the EST resource generated for that species is very large (328,662). Pinus contorta followed with 30 unique sequence similarities with scores > 200 (Figure 5B). Only 17 (0.6%) sequences had significant similarity to another plant species outside of the Picea and Pinus families. These 0.6% BLAST hits to distant species may represent sequences not well characterized in closely related conifer species.
Simple sequence repeats
Types and distribution of simple sequence repeats
Type of repeat
% of sequences having repeat motif
0.11 or 0.07
0.22 or 0.07
0.07 or 0.07
0.15 or 0.11 or 0.07
0.07 or 0.11
0.11 or 0.11
0.11 or 0.07
0.22 or 0.07
0.11 or 0.07
0.22 or 0.07
0.07 or 0.07
0.15 or 0.07
0.11 or 0.15
0.77 or 0.18 or 0.11
1.39 or 0.15
2.27 or 0.70 or 0.11 or 0.15 or 0.15
0.81 or 0.07 or 0.11
0.29 or 0.11 or 0.07
0.84 or 0.18
0.99 or 0.07 or 0.18
0.40 or 0.15 or 0.07
0.37 or 0.07 or 0.07
2.05 or 0.37 or 0.11 or 0.11
1.28 or 0.22 or 0.07
0.73 or 0.15
We report here the first EST resource of high quality for a widely-distributed, ecologically and economically important boreal conifer, black spruce. Despite the relatively small number of EST sequences compared to Picea glauca and P. sitchensis, our study identified 493 novel transcripts with no nucleotide similarity with dbEST, and therefore, represent important addition to dbEST. We have identified genes involved in 36 molecular functions and 90 biological processes. Genes involved in stress response, photosynthetic pathway and growth were most abundant in the ESTs. We have identified 216 full-length genes, ranging from 18 to 265 amino acids in length. The sequences showed the greatest similarities to ESTs from the congeneric and sympatric species, Picea glauca. Black spruce ESTs containing 57 different di-, tri-, tetra-, and penta-nucleotide repeats were identified. These sequences could be used for the development of microsatellite DNA markers.
The ESTs, and their annotations provide a valuable genomics resource to the forest tree genomics community in specific and plant genomics community in general. Markers developed from some of the EST sequences have already been mapped on a black spruce genetic linkage map . The ESTs reported will provide an excellent resource for future assembly and annotation of transcriptome sequences from the NGS platforms, as well as for annotation of the spruce whole genome sequences. A comparison of 454 and Sanger reads showed that Sanger reads can improve the assembly and annotation of the 454 datasets [13, 53].
Availability of supporting data
All of the 4,594 high quality EST sequences have been deposited into GenBank under the accession numbers dbEST JZ079173 - JZ083766. They have also been submitted to the TreeGenes database admits multiple queries on the occurrence of ESTs in the library and their functional annotation.
Expressed sequence tags
Simple sequence repeats
We thank John Major for providing black spruce seedlings, Dr Daoquian Xiang for assistance with cDNA library construction and EST sequencing, Taralynn Cluney, Brent Higgins and Waleed Abousamak for help in running LiCor sequencing gels, and Minyoung Choi and Ben Figueroa for assistance with bioinformatics analysis. The research was funded by a Natural Sciences and Engineering Research Council of Canada (NSERC) Strategic Project grant (STPGP 234783 – 00) to O.P. Rajora. I.K. Mann was financially supported by NSERC Strategic Grants funds (STPGP 234783 – 00) to O.P. Rajora and Dalhousie University Graduate Scholarship. O.P. Rajora held the Stora Enso Senior Chair in Forest Genetics and Biotechnology at Dalhousie University, which was supported by Stora Enso Port Hawkesbury Ltd., and the Senior Canada Research Chair in Forest and Conservation Genomics and Biotechnology at UNB, which was supported by the Canada Research Chair Program (CRC950-201869).
- Pavy N, Laroche J, Bousquet J, Mackay J: Large-scale statistical analysis of secondary xylem ESTs in pine. Plant Mol Biol. 2005, 57: 203-224.View ArticlePubMedGoogle Scholar
- Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci USA. 2003, 100: 7383-7388.PubMed CentralView ArticlePubMedGoogle Scholar
- Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B, Bhalerao R, Larsson M, Villarroel R: Gene discovery in the wood-forming tissues of poplar: Analysis of 5,692 expressed sequence tags. Proc Natl Acad Sci USA. 1998, 95 (22): 13330-13335.PubMed CentralView ArticlePubMedGoogle Scholar
- Ralph S, Chun H, Kolosova N, Cooper D, Oddy C, Ritland C, Kirkpatrick R, Moore R, Barber S, Holt R: A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). BMC Genomics. 2008, 9 (1): 484-PubMed CentralView ArticlePubMedGoogle Scholar
- Ewing R, Poirot O, Claverie JM: Comparative analysis of the Arabidopsis and rice expressed sequence tag (EST) sets. In Silico Biol. 1999, 1 (4): 197-213.PubMedGoogle Scholar
- Parchman T, Geist K, Grahnen J, Benkman C, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11 (1): 180-PubMed CentralView ArticlePubMedGoogle Scholar
- Metzker ML: Sequencing technologies the next generation. Nat Rev Genet. 2010, 11 (1): 31-46.View ArticlePubMedGoogle Scholar
- Zhang J, Chiodini R, Badr A, Zhang G: The impact of next-generation sequencing on genomics. J Genet Genom. 2011, 38 (3): 95-109.View ArticleGoogle Scholar
- Birol I, Raymond A, Jackman SD, Pleasance S, Coope R: Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics. 2013, doi:10.1093/bioinformatics/btt1/78Google Scholar
- Nystedt B, Street N, Wetterbom A, Zuccolo A, Lin Y-C: The Norway spruce genome sequence and conifer genome evolution. Nature. 2013, doi.10.1038/nature12211Google Scholar
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17 (7): 1636-1647.View ArticlePubMedGoogle Scholar
- Hudson ME: Sequencing breakthroughs for genomic ecology and evolutionary biology. Mol Ecol Resour. 2008, 8 (1): 3-17.View ArticlePubMedGoogle Scholar
- Ueno S, Le Provost G, Leger V, Klopp C, Noirot C, Frigerio J-M, Salin F, Salse J, Abrouk M, Murat F: Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics. 2010, 11 (1): 650-PubMed CentralView ArticlePubMedGoogle Scholar
- Viereck LA, Johnston WF: Picea mariana (Mill.) B.S.P. - Black spruce. Silvics of North America vol. 1, Conifers. U.S.D.A. Forest Service Handbook 654. Edited by: Burns RM, Honkala BH. 1990, Washington, DC, USA: U.S.D.A. Forest ServiceGoogle Scholar
- Morgenstern EK, Wang BSP: Trends in forest depletion, seed supply, and reforestation in Canada during the past four decades. Forest Chron. 2001, 6: 1014-1021.View ArticleGoogle Scholar
- Bai C, Alverson WS, Follansbee A, Waller DM: New reports of nuclear DNA content for 407 vascular plant taxa from the United States. Ann Botany. 2012, 110: 1623-1629.View ArticleGoogle Scholar
- National Center for Biotechnology Information.http://www.ncbi.nlm.nih.gov/,
- Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay JJ: A white spruce gene catalog for conifer genome analyses. Plant Physiol. 2011, 157: 14-28.PubMed CentralView ArticlePubMedGoogle Scholar
- Ninestaedt H, Zasada JC: Picea glauca (Moench) Voss White spruce. Silvics of North America vol. 1, Conifers. U.S.D.A. Forest Service Handbook 654. Edited by: Burns RM, Honkala BH. 1990, Washington, DC, USA: U.S.D.A. Forest ServiceGoogle Scholar
- Harris AS: Picea sitchensis (Bong.) Carr. White spruce. Silvics of North America vol. 1, Conifers. U.S.D.A. Forest Service Handbook 654. Edited by: Burns RM, Honkala BH. 1990, Washington, DC, USA: U.S.D.A. Forest ServiceGoogle Scholar
- Ran J-H, Wei X-X, Wang X-Q: Molecular phylogeny and biogeography of Picea (Pinaceae): Implications for phylogeographcal studies using cytoplasmic haplotypes. Mol Phylogenet Evol. 2006, 41: 405-419.View ArticlePubMedGoogle Scholar
- Allona I, Quinn M, Shoop E, Swope K, St Cyr S, Carlis J, Riedl J, Retzel E, Campbell M, Sederoff R: Analysis of xylem formation in pine by cDNA sequencing. Proc Natl Acad Sci USA. 1998, 95: 9693-9698.PubMed CentralView ArticlePubMedGoogle Scholar
- Fernandez-Pozo N, Canales J, Guerrero-Fernandez D, Villalobos D, Diaz-Moreno S, Bautista R, Flores-Monterroso A, Guevara MA, Perdiguero P, Collada C: EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics. 2011, 12 (1): 366-PubMed CentralView ArticlePubMedGoogle Scholar
- Lorenz WW, Ayyampalayam S, Bordeaux J, Howe G, Jermstad K, Neale D, Rogers D, Dean JD: Conifer DBMagic: a database housing multiple de novo transcriptome assemblies for 12 diverse conifer species. Tree Genet Genom. 2012, 8 (6): 1477-1485.View ArticleGoogle Scholar
- Pavy N, Paule C, Parsons L, Crow J, Morency M-J, Cooke J, Johnson J, Noumen E, Guillet-Claude C, Butterfield Y: Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters. BMC Genomics. 2005, 6 (1): 144-PubMed CentralView ArticlePubMedGoogle Scholar
- Chen J, Uebbing S, Gyllenstrand N, Lagercrantz U, Lascoux M, Kallman T: Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms. BMC Genomics. 2012, 13 (1): 589-PubMed CentralView ArticlePubMedGoogle Scholar
- Chang S, Puryear J, Cairney J: A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Report. 1993, 11 (2): 113-116.View ArticleGoogle Scholar
- Wegrzyn JL, Lee JM, Tearse BR, Neale DB: TreeGenes: a forest tree genome database. Int J Plant Genom. 2008, 2008: Article ID 412875, 7 pages, doi: 10.1155/2008/412875Google Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.View ArticlePubMedGoogle Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.View ArticlePubMedGoogle Scholar
- Valledor L, Jorrín JV, Rodríguez JL, Lenz C, Meijón M, Rodríguez R, Cañal MJ: Combined proteomic and transcriptomic analysis identifies differentially expressed pathways associated to Pinus radiata needle maturation. J Proteome Res. 2010, 9 (8): 3954-3979.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMedGoogle Scholar
- Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010, 26 (19): 2460-2461.View ArticlePubMedGoogle Scholar
- Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S: Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res. 2001, 11 (8): 1441-1452.PubMed CentralView ArticlePubMedGoogle Scholar
- Dendrome project.http://dendrome.ucdavis.edu/dfgp/about.html,
- Rounsley SD, Glodek A, Sutton G, Adams MD, Somerville CR, Venter JC, Kerlavage AR: The construction of Arabidopsis expressed sequence tag assemblies (a new resource to facilitate gene identification). Plant Physiol. 1996, 112 (3): 1177-1183.PubMed CentralView ArticlePubMedGoogle Scholar
- Kinlaw CS, Neale DB: Complex gene families in pine genomes. Trends Plant Sci. 1997, 2: 356-359.View ArticleGoogle Scholar
- Nagaraj SH, Gasser RB, Ranganathan S: A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief Bioinform. 2007, 8 (1): 6-21.View ArticlePubMedGoogle Scholar
- Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF: Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991, 252 (5013): 1651-1656.View ArticlePubMedGoogle Scholar
- Rai HS, Reeves PA, Peakall R, Olmstead RG, Graham SW: Inference of higher-order conifer relationships from a multi-locus plastid data set. Botany. 2008, 86: 658-669.View ArticleGoogle Scholar
- Etscheid M, Klümper S, Riesner D: Accumulation of a metallothionein-like mRNA in norway spruce under environmental stress. J Phytopathol. 1999, 147 (4): 207-213.View ArticleGoogle Scholar
- Maret W: The Function of Zinc Metallothionein: A link between cellular zinc and redox state. J Nutrit. 2000, 130: 1455S-1458S.PubMedGoogle Scholar
- Zhou G-K, Xu Y-F, Liu J-Y: Characterization of a rice class II metallothionein gene: Tissue expression patterns and induction in response to abiotic factors. J Plant Physiol. 2005, 162 (6): 686-696.View ArticlePubMedGoogle Scholar
- Liu P, Goh C-J, Loh C-S, Pua E-C: Differential expression and characterization of three metallothionein-like genes in Cavendish banana (Musa acuminata). Physiologia Plantarum. 2002, 114 (2): 241-250.View ArticlePubMedGoogle Scholar
- Dynan WS, Tjian R: Control of eukaryotic messenger RNA synthesis by sequence-specific DNA-binding proteins. Nature. 1985, 316 (6031): 774-778.View ArticlePubMedGoogle Scholar
- Tandre K, Albert VA, Sundås A, Engström P: Conifer homologues to genes that control floral development in angiosperms. Plant Mol Biol. 1995, 27 (1): 69-78.View ArticlePubMedGoogle Scholar
- Sundström J, Engström P: Conifer reproductive development involves B-type MADS-box genes with distinct and different activities in male organ primordia. Plant J. 2002, 31 (2): 161-169.View ArticlePubMedGoogle Scholar
- Eulgem T, Rushton PJ, Robatzek S, Somssich IE: The WRKY superfamily of plant transcription factors. Trends Plant Sci. 2000, 5 (5): 199-206.View ArticlePubMedGoogle Scholar
- Rushton PJ, Somssich IE, Ringler P, Shen QJ: WRKY transcription factors. Trends Plant Sci. 2010, 15 (5): 247-258.View ArticlePubMedGoogle Scholar
- Sabala I, Elfstrand M, Farbos I, Clapham D, von Arnold S: Tissue-specific expression of Pa18, a putative lipid transfer protein gene, during embryo development in Norway spruce (Picea abies). Plant Mol Biol. 2000, 42 (3): 461-478.View ArticlePubMedGoogle Scholar
- Little EL, Pauley SS: A natural hybrid between black and white spruce in Minnesota. Am Midland Natur. 1958, 60: 202-211.View ArticleGoogle Scholar
- Kang B-Y, Mann IK, Major JE, Rajora OP: Near-saturated and complete genetic linkage map of black spruce (Picea mariana). BMC Genom. 2010, 11 (1): 515-View ArticleGoogle Scholar
- Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, Saar I, Bahram M, Bechem E, Chuyong G, Kõljalg U: 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytolog. 2010, 188 (1): 291-301.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.