A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing

Pereira-Leal, José B; Abreu, Isabel A; Alabaça, Cláudia S; Almeida, Maria Helena; Almeida, Paulo; Almeida, Tânia; Amorim, Maria Isabel; Araújo, Susana; Azevedo, Herlânder; Badia, Aleix; Batista, Dora; Bohn, Andreas; Capote, Tiago; Carrasquinho, Isabel; Chaves, Inês; Coelho, Ana Cristina; Costa, Maria Manuela Ribeiro; Costa, Rita; Cravador, Alfredo; Egas, Conceição; Faro, Carlos; Fortes, Ana M; Fortunato, Ana S; Gaspar, Maria João; Gonçalves, Sónia; Graça, José; Horta, Marília; Inácio, Vera; Leitão, José M; Lino-Neto, Teresa; Marum, Liliana; Matos, José; Mendonça, Diogo; Miguel, Andreia; Miguel, Célia M; Morais-Cecílio, Leonor; Neves, Isabel; Nóbrega, Filomena; Oliveira, Maria Margarida; Oliveira, Rute; Pais, Maria Salomé; Paiva, Jorge A; Paulo, Octávio S; Pinheiro, Miguel; Raimundo, João AP; Ramalho, José C; Ribeiro, Ana I; Ribeiro, Teresa; Rocheta, Margarida; Rodrigues, Ana Isabel; Rodrigues, José C; Saibo, Nelson JM; Santo, Tatiana E; Santos, Ana Margarida; Sá-Pereira, Paula; Sebastiana, Mónica; Simões, Fernanda; Sobral, Rómulo S; Tavares, Rui; Teixeira, Rita; Varela, Carolina; Veloso, Maria Manuela; Ricardo, Cândido PP

doi:10.1186/1471-2164-15-371

Research article
Open access
Published: 15 May 2014

A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing

José B Pereira-Leal¹,
Isabel A Abreu^2,3,
Cláudia S Alabaça⁴,
Maria Helena Almeida⁵,
Paulo Almeida¹,
Tânia Almeida^6,7,
Maria Isabel Amorim⁸,
Susana Araújo^9,10,11,
Herlânder Azevedo¹²^nAff32,
Aleix Badia^13,14,
Dora Batista¹⁵,
Andreas Bohn^13,14,
Tiago Capote^6,7,
Isabel Carrasquinho¹⁶,
Inês Chaves^17,18,19,20,
Ana Cristina Coelho²¹,
Maria Manuela Ribeiro Costa¹²,
Rita Costa¹⁶,
Alfredo Cravador²²,
Conceição Egas²³,
Carlos Faro²³,
Ana M Fortes²⁴,
Ana S Fortunato²⁵,
Maria João Gaspar^26,27,
Sónia Gonçalves^6,7,
José Graça²⁷,
Marília Horta²²,
Vera Inácio²⁸,
José M Leitão⁴,
Teresa Lino-Neto¹²,
Liliana Marum^19,20,
José Matos¹⁶,
Diogo Mendonça¹⁶,
Andreia Miguel^19,20,
Célia M Miguel^19,20,
Leonor Morais-Cecílio²⁸,
Isabel Neves¹,
Filomena Nóbrega¹⁶,
Maria Margarida Oliveira^2,3,
Rute Oliveira¹²,
Maria Salomé Pais²⁹,
Jorge A Paiva^9,10,30,
Octávio S Paulo³¹,
Miguel Pinheiro²³,
João AP Raimundo¹²,
José C Ramalho²⁵,
Ana I Ribeiro²⁵,
Teresa Ribeiro^6,7,28,
Margarida Rocheta²⁸,
Ana Isabel Rodrigues⁵,
José C Rodrigues³⁰,
Nelson JM Saibo^2,3,
Tatiana E Santo⁴,
Ana Margarida Santos^1,2,3,
Paula Sá-Pereira¹⁶,
Mónica Sebastiana²⁹,
Fernanda Simões¹⁶,
Rómulo S Sobral¹²,
Rui Tavares¹²,
Rita Teixeira⁵,
Carolina Varela¹⁶,
Maria Manuela Veloso¹⁶ &
…
Cândido PP Ricardo^17,18

BMC Genomics volume 15, Article number: 371 (2014) Cite this article

5739 Accesses
36 Citations
1 Altmetric
Metrics details

Abstract

Background

Cork oak (Quercus suber) is one of the rare trees with the ability to produce cork, a material widely used to make wine bottle stoppers, flooring and insulation materials, among many other uses. The molecular mechanisms of cork formation are still poorly understood, in great part due to the difficulty in studying a species with a long life-cycle and for which there is scarce molecular/genomic information. Cork oak forests are of great ecological importance and represent a major economic and social resource in Southern Europe and Northern Africa. However, global warming is threatening the cork oak forests by imposing thermal, hydric and many types of novel biotic stresses. Despite the economic and social value of the Q. suber species, few genomic resources have been developed, useful for biotechnological applications and improved forest management.

Results

We generated in excess of 7 million sequence reads, by pyrosequencing 21 normalized cDNA libraries derived from multiple Q. suber tissues and organs, developmental stages and physiological conditions. We deployed a stringent sequence processing and assembly pipeline that resulted in the identification of ~159,000 unigenes. These were annotated according to their similarity to known plant genes, to known Interpro domains, GO classes and E.C. numbers. The phylogenetic extent of this ESTs set was investigated, and we found that cork oak revealed a significant new gene space that is not covered by other model species or EST sequencing projects. The raw data, as well as the full annotated assembly, are now available to the community in a dedicated web portal at http://www.corkoakdb.org.

Conclusions

This genomic resource represents the first trancriptome study in a cork producing species. It can be explored to develop new tools and approaches to understand stress responses and developmental processes in forest trees, as well as the molecular cascades underlying cork differentiation and disease response.

Background

Oaks (Quercus spp.) are important trees of the Northern hemisphere. In Europe they form highly valuable widespread forests. Together with chestnut and beech, oaks belong to the Fagaceae, and are probably the best-known genus of the family. The evergreen cork oak (Q. suber) grows in the Western Mediterranean Basin, having as natural range Algeria, France, Italy, Morocco, Portugal, Spain and Tunisia, where it is managed under low-density anthropogenic open woodland forests. Quercus spp. are important for conservation of soil and water, biodiversity, natural landscape and climate, and for production of highly valuable materials, thus having high ecological, social and economic value.

Quercus suber shares with Phellodendron amurense (Amur cork tree) and Q. variabilis (Chinese cork oak) the odd ability of producing a continuous and renewable out-bark of cork, although only Q. suber cork has the fine physical and chemical properties for a highly profitable industrial use.

Portugal owns the credits of the world leading position on cork oak forest area (740,000 ha out of the world 2,200,000 ha), cork production (60% of the world exported cork volume), and cork processing (74% of world processed cork). In Portugal, in the past, oaks used to dominate the native forests but their area has rapidly decreased as a result of human activity. Still, cork oak forests are accounting for about 26% of the Portuguese forest [1].

However, cork oak (Q. suber) and holm oak (Q. ilex ssp. rotundifolia) decline reported in the Iberian Peninsula over the last 20 years has caused death of numerous trees, threatening the rural economy in this part of Europe [2–5]. It has been predicted that oak diseases in Europe could become more severe and expand to the North and East within the next few hundred years [6].

Nowadays, this species faces many other threats, such as drought, extreme temperature and pests, leading to a marked decline of cork oak stands, possibly related to the repeated successions of extremely dry and hot years with a significant reduction of springtime precipitation [7].

The relevance of Q. suber and the scarce information available on its genetics, biochemistry and physiology [8–14] fully justifies the generation of transcriptomics data that will allow a new insight on cork oak biology and genetics. These data are fundamental for designing selection programs and understanding the plant adaptation processes to both biotic and abiotic factors, plant’s plasticity, ecophysiological interactions, interspecific hybridization and gene flow.

For a species that has neither its genome sequenced, nor a physical map available, the information obtained from expressed sequence tags (ESTs) is a practical means for gene discovery and a way to start elucidating its physiology and functional genome. When this project started (in 2010) there were less than 300 ESTs available for Q. suber. Recently, this number has increased to almost 7,000 (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html).

Other oak species have also been subjected to transcriptomic studies, namely two European white oak species (Q. petraea, sessile oak, and Q. robur, English oak) [15, 16], two American oak species (Q. alba, white oak, and Q. rubra, red oak) (reviewed in [17]). Ueno et al. [15] generated 222,671 non-redundant sequences (including alternative transcripts) from multiple cDNA libraries prepared from Q. petraea and Q. robur, which is a relevant resource for genomic studies and identification of genes of adaptive significance. In 2011, the same team produced another useful tool, a BAC library, for genome analysis in Q. robur[18]. Another important tool to develop a physical map for a Fagaceae species was based on the work of Durand and co-workers [19], who produced a total of 256 oak EST-SSRs that were assigned to bins and their map position was further validated by linkage mapping (http://www.fagaceae.org). More recently, [16] generated the larger-to-date set of reads from the transcriptome of an oak species (Q. robur), combining 454 and Illumina sequencing.

Within a national initiative, Portugal organized a consortium to study cork oak ESTs (COEC – Cork oak ESTs Consortium, http://coec.fc.ul.pt/), where 12 projects were designed to obtain a deeper understanding of Q. suber functional genomics. Developmental aspects (gametophytes, fruit and embryo development, acorn germination, bud sprouting, vascular and leaf development), as well as cork formation and quality, and abiotic (oxidative stress, drought, heat, cold and salinity) and biotic interactions (including symbiosis and pathogenesis) were followed by 20 teams from all over the country. Two of these projects were fully dedicated to the bio-informatics analysis of the generated data and development of bioinformatics platforms, one of them further focusing on polymorphism detection and validation.

This paper presents the experiments conducted for large-scale sequencing of 21 cDNA libraries and construction of a cork oak transcriptome database containing 159,000 unigenes. Presently, this database constitutes one of the largest genomic resources available for oaks and was structured to accommodate future data on genomics and physiology of woody species. The tools that were generated are crucial to study cork oak biology and diversity, and to understand gene regulation and adaptation to a changing environment. Future developments will make possible the early detection of traits of interest. This initiative will contribute to genomic research in cork oak and the Fagaceae family, paving the way for further studies.

Results and discussion

Sequencing

We have constructed 21 libraries from Q. suber as described in Table 1. The libraries were constructed from total RNA extracted from multiple tissues, developmental stages and stress conditions. Libraries were normalized by the Duplex-Specific Nuclease-technology [20], with the aim of increasing gene-space coverage and sequenced in a 454 GS-FLX with Titanium Chemistry (Roche). A total of 7,445,712 reads were produced, ranging from 40 to 587 bp, with an average length ranging between 185 and 310 bp (Table 2). An initial pre-processing step to remove contaminants, low quality sequences and short sequences resulted in a reduction to nearly 5 million nuclear reads (4,968,463), with average lengths ranging between 209 and 321 bp (Table 2). Our approach resulted in a higher number and comparable read length as compared to other multi-library projects [Moser:2005ju; Ueno:2010bv; ONeil:2010bk; [21]].

Table 1 Tissues and conditions used to produce the RNA libraries

Full size table

Table 2 Sequencing statistics

Full size table

Assembly

A stringent assembly pipeline was implemented and is summarized in Figure 1. The assembly methodology is described in the Materials and Methods section, consisting of two stages: first each library was assembled individually, and secondly all assembled libraries were further assembled (assembly of assemblies). The choice of this two-step protocol lied in the asynchronous nature of the libraries being sequenced, and the need to deal with future libraries that are expected to be generated for other conditions and stress types. The choice of parameters in our protocol maximized the number of contigs and their length (in MIRA -‐AL:egp = no:mrs = 85 reduces gap penalties and permits longer matches; -‐AS:mrpc = 1 allows for single read contings, thus increasing the number of contigs), was extensively validated, and is described in greater detail in a companion paper (in preparation). We opted for de novo assembly, as the lack of a closely related species with a completely sequenced genome resulted in poor assembly (not shown). The assembly statistics for each library are shown in Table 2. A total of 577,852 putative unigenes was achieved, including 501,257 contigs and 76,122 singlets. Each library produced from 8,442 up to 50,522 putative unigenes. These were all subjected to one additional assembly step (see Material and Methods section), which reduced the number of putative unigenes to approximately 159,298 unigenes. The final unigene length distribution is shown in Figure 2A. An average unigene length of 148.5 bp was found, which is smaller than those obtained in another oak using a combination the same sequencing platform with Sanger sequencing [15, 16] (see Table 3). A BlastP of all the unigenes the NR database finds Plant best hits in 97.3% of the cases, with the remaining being hits to other species that are likely contaminations not removed by our pipeline. A plot with the species distribution of these non-plant species is found on CorkOakDB.org.

Table 3 Assembly metrics of this project compared with those of two large oak transcriptome sequencing projects

Full size table

Coverage and depth

The large number of libraries used, together with the choice of a two-step assembly, resulted in a high redundancy. Most of the nearly 5 million filtered ESTs were assembled into a large number of unigenes (~159 K). We obtained an average coverage depth of 3.9 (number of times each nucleotide was sequenced), with a maximum depth of 429 (25% percentile = 1; 75% percentile = 5). This is higher than other recent tree EST projects using the same sequencing platform (e.g. [22]), likely due to the extensive number of libraries sequenced in this project, prepared from multiple tissues, developmental stages and stress conditions. After the two rounds of assembly, 61,687 high quality reads remained unassembled and were treated as singletons. Thus, 65% of our unigenes derive from contigs, higher than other recent comparable projects (see Table nine in [15]).

In the absence of a complete genome sequence, it is impossible to know the true coverage of the cork oak gene space offered by this project. However, when we queried the proteomes of Arabidopsis thaliana and Populus trichocarpa using BLASTp to determine the potential number of unique genes detected, using a cut off of e < 10^-5, we found that 65% of cork oak unigenes hit 23,482 out of 27,379 predicted proteins in A. thaliana (85%), and 30,318 out of 45,555 in P. trichocarpa (67%) [23]. These numbers represent a rough estimate of the upper (85%) and lower (67%) boundaries one can expect from the Q. suber transcriptome coverage. This figure doesn’t change significantly if we use a more lenient cut off of e < 10^-2, where we hit 24,093 (79%) and 30,719 (67%), respectively. A high degree of redundancy in our unigenes is suggested, as multiple unigenes hit the same target genes in either species. The remaining 55,921 unigenes cannot find any hit in either A. thaliana or P. trichocarpa, representing about 35% of the cork oak transcriptome. These include small unigenes that would not achieve significance in BLASTp comparisons (see Figure 2A), as well as potential novel genes not present in these two genomes. This number could be eventually overestimated, if we consider some under-assembly in our libraries.

We performed a serial clustering at increasing levels of identity in order to evaluate the degree of redundancy in our assembly (Figure 2C). We found that at the protein level, there was a sharp decrease in the number of clusters at 95% identity, indicating that approximately 8000 predicted peptides show a high identity between each other, comparable to that found in other oak species [15]. This could indicate a recent event of polyploidization giving rise to many highly similar genes. Alternatively, and probably most likely, this could be accounted by the high genetic diversity among the multiple unrelated trees used to prepare the libraries [9]. Sequencing errors not fully resolved due to the relatively low coverage of many unigenes could also be responsible for this result. In the first scenario our decision to filter off redundancies at the cDNA level at 98% could have been excessive, leading to the underestimation of the predicted number of unigenes. In contrast, the second and third scenarios would suggest that 95% is insufficient and we are overestimating the number of unigenes that may be closer to 151,000. We do not have enough data to favour any of these scenarios, in particular because all three may co-exist. We have thus chosen the 98% cDNA clustering as a conservative parameter that we hope does not over-cluster paralogues. With future data accumulation, it will be easier to fuse unigenes than to resolve incorrectly clustered paralogues.

Functional annotation

We mapped the cork oak unigenes to the functional classes defined in Gene Ontology (GO) [24]. We had 73,766 sequences mapped to at least one GO term and the unigenes covered a total of 2,273 different GO terms. Each unigene mapped to 3.66 terms on average. The vast majority of terms is present at low frequency, with a few functional classes dominating. The Biological process “Metabolism” was the most frequent, with other metabolic categories in the top five categories - metabolism related categories cover 68% of the terms assigned (Figure 3). Consistently, enzyme functions dominate the Molecular Functions (“Catalytic activity”, “Transferase activity”, “Hydrolase activity”) (Figure 3). These are in contrast with the combined ESTs of two other oaks, Q. petraea and Q. robur, where the classes Transport (Biological Process) and Nucleotide Binding (Molecular Function) dominate [15]. Note, however, that this difference may simply lie in the fact that in that study non-normalized libraries were used, resulting in under-representation of lowly expressed genes. Furthermore, this difference may also lie in the fact that in that study, nuclear and organelle transcriptomes were, to the best of our knowledge, assembled together, while we removed both chloroplast and mitochondrial sequences from our assembly. This is supported by the observation that in the GO Cellular Component classification, the “Plastid” class is the most frequent in the Q. petraea/Q. robur ESTs, while in the cork oak, intracellular classes dominate (“Cell”, “Intracellular”, “Cytoplasm”, etc.) (Figure 3).

We used a simple and conservative scheme for gene naming of the cork oak unigenes. Besides its accession number (see below for details), we gave it an unigene name based on its similarity to proteins in A. thaliana and P. trichocarpa (Table 4). We observed that for nearly 40% of the unigenes we could not assign a clear annotation at cut off of e < 10^-5 (Figure 4), consistent with the number of unigenes that are not similar to any gene in other model plants. Conversely, we could identify conserved domains in 44% of the unigenes, and could establish clear homology relationships to an additional 16% of the unigenes, in a total of 60% unigenes with clear functional assignments in GO.

Table 4 Unigene naming criteria are as follows

Full size table

We were able to map Interpro domains to 108,341 unigenes (68%). Nearly half of the domains were widespread in evolution, being present in both Eukaryota and Bacteria (Figure 5). The other half was dominated by general Eukaryotic domains and less than 10% of the domains were plant specific. These results are comparable to those reported for the complete genomes of A. thaliana, P. trichocarpa and P. persica genomes, as well as to those of the transcriptomes of the closely related Quercus robur and Castanea mollissima which are also depicted in Figure 5.

Evolution

We compared the gene content of the cork oak, as estimated by our EST sequencing project, with that of 31 completely sequenced plant genomes. We used BLASTp at e < 10^-5 and also at the permissive cut off of e < 10^-2 to determine how many predicted proteins in those species are similar to at least one cork oak unigene. The results of this analysis are shown in Figure 6, indicating a broad concordance with the generic taxonomic/evolutionary distance of the species. This result does not change when we use a more permissive cut off of e < 10^-2 (not shown).

We compared the unigenes derived from the cork oak with those of the red oak (Q. rubra), the pedunculate Oak (Q. robur - also known as English or French oak) and the Chinese chestnut (Castanea mollissima). For this comparison, the data from the Fagaceae Genome Web was used, for Q. rubra and C. mollissima which include multiple tissues also sequenced using the 454 pyrosequencing platform (http://www.fagaceae.org/node/87455 and http://www.fagaceae.org/node/181796/, respectively), and the data for Q. robur, which included 454 and Illumina generated sequences, and was obtained from http://www.ufz.de/trophinoak/index.php?de=31205[16, 26]. We used our own assembly pipeline on these sequences to ensure that no additional differences were introduced on methodological grounds. The comparison is shown in Figure 7. The total number of distinct unigenes is higher in the cork oak project, probably reflecting the higher number of tissues and conditions sampled in our libraries, as well as incomplete assembly due to library biases and genetic heterogeneity of the samples. We verified that between 77% and 82% of the unigenes from those species are similar to at least one unigene in the cork oak, as expected from evolutionarily close species. The remaining 18% - 23% of the unigenes of the red and english oaks and chestnut tree are likely species-specific, but may also be partially accounted by an incomplete coverage of the Q. suber. The large number of cork oak unigenes that does not find a hit in the other transcriptomes (30% - 44% at e < 10^-5) does however suggest that, most likely, this is not a major factor. This cork-oak-specific set represents a mixture of small reads that fail to attain statistical significance (e.g. from incomplete assembly), as well as a putative set of cork oak-specific genes. Note that when we compare Q. suber with a completely sequenced genome of the Prunus persica, 94% of the P. persica genes find a hit in Q. suber, further suggesting that incomplete coverage of the gene space was probably not a major problem of our project.

Database and interface

To support the assembly and annotation pipeline we have a data warehouse system that records the data and metadata associated with each step of the pipeline. This is described in a companion paper (in preparation). From this warehouse we generated a public portal as a community resource for cork oak genomics, which is found at http://www.corkoakdb.org. The assembled genes, the proteins they encode, and the functional annotations are made accessible through a web interface, partially shown in Figure 8. The gene view features sequence data, cDNA and protein, as well as plots of base-by-base coverage information for the unigene. Users are shown pre-computed phylogenetic profiles against other plants according to two distinct methods, the bi-directional best BLAST hit and the inparanoid, two standard methods to identify orthologs and paralogues [27]. The gene view further includes functional annotations, namely GO annotations, Interpro domain assignments, KEGG pathways and best BLAST hits against general and plant-specific databases. Genes of interest can be discovered by searching specific fields or by running a nucleotide or protein BLAST search against the Cork Oak database.

Conclusions

We have developed the first large-scale library for the cork oak, an important economic resource in Southern Europe and North of Africa. We carried out a preliminary analysis of its gene content and functional annotation, and built a public platform for data sharing. Nineteen different libraries were sequenced, covering genes expressed in multiple tissues, developmental stages and stress conditions. Our results suggest that we covered a large fraction of the cork oak gene space. Many of its unigenes are dissimilar to any other plant genes. These likely represent incomplete assemblies due to library biases, but may also include several true cork-oak specific genes, which once identified will represent a promising avenue to understand the molecular basis of the response leading to cork formation. We believe that this sequencing effort will enable the community to explore the molecular basis of the cork oak physiology, as well as its responses to the multiple abiotic and biotic challenges that the cork oak forest is currently experiencing.

Methods

Samples, collection and preparation

Within this initiative, in order to guarantee high transcript coverage and to increase gene diversity, total RNA was isolated from Quercus suber biological samples obtained from different organs and tissues at varying developmental stages (roots, leaves, buds, flowers, fruits, phellogen, vascular tissue, good and bad quality cork), as well as from plants that had been exposed to infection with Phytophthora cinnamomi, symbiosis with Pisolithus tinctorius mycorrhizal fungus and different abiotic stresses (cold, heat, drought, salinity and oxidative stress). Furthermore, total RNA was also isolated, at two distinct dates (May and September), from annual shoots of 30 years old Quercus suber x cerris hybrid trees that either produce or don’t produce cork, in order to cover different developmental stages of the phellogen meristem. No approval or licenses were required for sample collection. In each library, plant material from half-siblings (e.g. abiotic and biotic stress libraries) or from several unrelated trees was used. All the plant material used was from Portuguese trees except for those trees used to detect polymorphism, which were from different Mediterranean countries [28]. The detailed conditions applied in each situation are described in http://www.corkoakdb.org/libraries. The full set of libraries is described in Table 1.

cDNA preparation, library normalization and pyrosequencing

Total RNA from each tissue/condition was used as the source of starting material for cDNA synthesis and production of normalized cDNA libraries intended for 454 sequencing. Briefly, the total RNA quality was verified on Agilent 2100 Bioanalyzer with the RNA 6000 Pico kit (Agilent Technologies, Waldbronn, Germany) and the quantity assessed by fluorimetry with the Quant-iT RiboGreen RNA kit (Invitrogen, CA, USA). A fraction of 1–2 μg of total RNA was used for cDNA synthesis with the MINT cDNA synthesis kit (Evrogen, Moscow, Russia), a strategy based on the SMART double-stranded cDNA synthesis methodology using a modified template-switching approach that allows the introduction of known adapter sequences to both ends of the first-strand cDNA. Amplified cDNA was then normalized with TRIMMER cDNA Normalization kit (Evrogen, Moscow, Russia) using the Duplex-Specific Nuclease-technology [20, 29].

Normalized cDNA was quantified by fluorescence and sequenced in 454 GS FLX Titanium according to the standard manufacturer’s instructions (Roche-454 Life Sciences, Brandford, CT, USA) at Biocant (Cantanhede, Portugal).

Sequence processing and assembly

The implemented sequence analysis strategy included an initial pre-processing stage, performed on each library, where contaminant, low quality, redundant and repeat-full sequences were removed and each library assembled. This was followed by a multilibrary assembly (described below, and summarized in Figure 1). Initially, each read, respective quality scores and ancillary information, were extracted from the sequencing machine output (.sff), using open source software sff_extract (http://bioinf.comav.upv.es/sff_extract/). Reads of each sample were selected using a Python pipeline that screens the reads for primer sequences, classifying them by sample origin and allocating them in different files. For each sample we generated a file with the sequences (.fasta) and the corresponding file with the quality scores (.qual). At this stage we removed adaptors and reads smaller than 40 bp. Thereafter, artificial duplicates associated with pyrosequencing were removed using cd-hit-454 [30] at a threshold of 98%, and Seq-trim [31] was used to remove small sequences (length < 100 bp) or sequences with low quality (QV > 20, quality window = 10), as well as poly-A or poly-T tails, and adaptors.

In the following step, contaminant sequences were removed. For this, a database of possible types of contaminants was prepared (ContaminantsDB - see supplementary material for details) and queried with the Q. suber reads using BLASTn (5, -E 3 -e 1e-09 -q -5 -b 1 -G 3). Reads that found a match in this database, were subsequently blasted against a database of plant proteins (PlantDB - see supplementary material for details) using the same parameters as before. If the hit (match) e-value in ContaminantsDB was smaller than hit (match) e-value in Plant DB, the read was considered as a contaminant and removed from the pipeline. The remaining reads continued in the pipeline to be screened for repetitive elements, using the program RepeatMasker 3.2.9 (http://www.repeatmasker.org) against PlantRepeatsDB [32]. Whenever sequences were masked in more than 90% of their length they were discarded.

The final step of the preprocessing stage was the classification of all the trimmed reads into potential mitochondrial, chloroplastidial or nuclear sequences. For this, a BLASTn (-e = 0.001) was first performed against a database containing coding region sequences from complete plant mitochondrial genomes (from Arabidopsis thaliana, Medicago truncatula and Populus tricocharpa). The sequences that presented a hit were considered potential mitochondrial sequences and were kept in a FASTA file reserved for this organelle sequences. A similar process was then applied against a database of coding region sequences of plant complete plastidial genomes (same organisms).

Assembly

We chose MIRA 3.2.0 [33] to assemble the resulting sequences, as this has been shown to have higher coverage than other assemblers [34]. For each library, we obtained contigs and singletons with the following parameters: --job = denovo, est, accurate, 454; --GE:not = 20; --SK:not = 20; 454_SETTINGS -LR:mxti = no, -CL:qc = no:cpat = no:mbc = yes, --AL:egp = no:mrs = 85, -OUT:sssip = yes, -AS:mrpc = 1. Following this step, all the contigs and singlets resulting from the assembly of each library were then clustered to remove redundancy using CD-HiT [35], and the resulting non-redundant sequence collection was re-assembled using the same parameters as before. The resulting sequences were considered to be Unigenes, and at this point they were given an unigene accession number. Libraries L20 and L21 were not used in the analysis presented in this manuscript, but are available in the full assembly on the CorkOakDB.

Protein prediction

In order to be able to translate the nucleotide sequences to protein sequences, the pipeline first performs a Blast search (blastx) against a RNA database [36], to remove non-protein coding unigenes. It then queries all Viridiplantae protein sequences existing in the Uniprot database [37]. The program Prot4EST [38] then takes the outputs of these BLAST searches and translates the sequences into putative peptide sequences. Those unigenes without significant hits are translated using the program ESTscan [39], and for the remaining untranslated sequences, the longest ORF of the 6 frames is selected.

Sequence naming

In order to assign names to the genes/proteins found, putative peptides were used to query, using BLASTp at a cut off of e < 10^-5, a database of Uniprot sequences from A. thaliana and P. tricocharpa. Whenever a putative peptide does not have a hit, it is considered “Predicted hypothetical protein”. If a similar hit is detected, then the protein name is assigned to the putative peptide in Q. suber together with a label that describes the level of confidence of the annotation (see Table 4).

Functional annotation

In order to obtain domains and functional sites of putative peptides, an Interpro search was executed [40]. The Interpro database [41] integrates different classification methods based on amino-acid patterns and profiles, protein family fingerprints, protein sequences and structural domains, as well as functional information. The Interpro database 28.0 was downloaded and searches were run locally. Afterwards, a BLAST (BLASTp) search against non-redundant protein database was executed and results entered the program Blast2GO [42]. We used the pipeline version of the B2G called B2g4pipe, obtaining GO-terms and E.C. Numbers. The same pipeline was used to assign Interpro domains for the transcriptomes analysed in Figure 5.

Database implementation

A MySQL relational database was deployed, using the InnoDB engine to allow rollback of transactions in case of failure. This was essential, given the progressive nature of the data loading. Every EST sequence was stored in the database, and as each step of the pipeline was ran, the results were added to the corresponding tables, up to the functional annotation of assembled unigenes, as well as metadata related to the EST libraries. Some intermediate output data, such as large FASTA and XML files, were kept on the file system. The web interface is powered by a Python application built on Django (an open source web framework), HTML/CSS and Javascript. KEGG data is displayed using the KEGG SOAP API.

Accession numbers and unigene naming

Accession numbers on the corkoakDB have the following format QS_000000, for unigenes, and QS_P_000000 for putative peptides. Whenever the sequences are putative mitochondrial or potential chloroplast sequences they start with QSm or QSc, respectively.

Evolutionary analysis

Comparisons to other organisms were made using predicted proteomes obtained from the superfamily database [43] release 1.75. We used BLASTp for the comparisons, always filtering for low complexity regions and using the cut offs indicated in the text. We used the standard NCBI’s taxonomic tree as a reference for Figure 6. Red oak libraries were obtained from the Fagaceae genomics web (http://www.fagaceae.org/node/87455) and processed using our own pipeline, resulting in 38,346 predicted unigenes. We then used BLASTp with a cut off at e = 0.01 to determine how many unigenes from the cork oak were similar to at least one unigene in the red oak.

Availability of supporting data

All sequenced ESTs were submitted to the sequence read archive (http://www.ncbi.nlm.nih.gov/sra) with the accession number ERP001762, and accession name “Cork Oak”.

Author’ contributions

JBPL, ACC, AC, CF, MF, SG, MH, JML, JM, CMM, LMC, MMO, JAPP, OSP, MMV, CPPR- Fund raising, consortium planning and organization. JBPL, IAA, MHA, TA, HA, ABohn, ICarrasquinho, IChaves, ACC, MMRC, RC, AC, CF, SG, MH, TLN, JM, CMM, LMC, FN, MMO, MSP, JAPP, OSP, NJMS, MS, FS, RTavares, RTeixeira, CV, MMV, CPPR- Project organization and writing. IAA, CSA, TA, MIA, SA, HA, DB, TC, ICarrasquinho, IChaves, ACC, MMRC, RC, ASF, MJG, SG, JG, MH, JML, TLN, LM, DM, AM, CMM, FN, MMO, RO, JAPP, OSP, JAPR, JCRamalho, AIRibeiro, TR, AIRodrigues, JCRodrigues, NJMS, TES, MS, FS, RSS, RTavares, CPPR- Preparation of the plant material and assays. CSA, TA, MIA, SA, HA, DB, TC, IChaves, ACC, MMRC, RC, ASF, SG, MH, VI, TLN, DM, AM, FN, JAPP, JCRamalho, AIRibeiro, MR, TES, PSP, MS, FS, RSS, RTavares- RNA preparation. CE, CF, MP- Transcriptome sequencing and analyses. JBPL, PA, ABadia, ABohn, IN, MP, AMS- Bioinformatics. JBPL, IAA, PA, HA, DB, ABohn, ICarrasquinho, IChaves, ACC, MMRC, RC, AC, CE, CF, MF, ASF, SG, MH, JML, TLN, LM, JM, AM, CMM, LMC, FN, MMO, JAPP, OSP, MP, JCRamalho, AIRibeiro, NJMS, AMS, MS, FS, RTavares, RTeixeira, CV, CPPR- Paper writing and discussion. All authors read and approved the final manuscript.

References

de Gestão Florestal DN: Inventário Florestal Nacional- Portugal Continental. IFN 2005–2006. 2010, Autoridade Florestal Nacional: Lisbon
Google Scholar
Brasier MD, Robredo F, Ferraz J: Evidence for Phytophthora cinnamomi involvement in Iberian oak decline. Plant Pathol. 1993, 42: 140-145. 10.1111/j.1365-3059.1993.tb01482.x.
Article Google Scholar
Sanchez ME, Caetano P, Ferraz J, Trapero A: Phytophthora disease of Quercus ilex in south-western Spain. Forest Pathol. 2002, 32: 5-18. 10.1046/j.1439-0329.2002.00261.x.
Article Google Scholar
Moreira AC, Martins J: Influence of site factors on the impact of Phytophthora cinnamomi in cork oak stands in Portugal. Forest Pathol. 2005, 35: 145-162. 10.1111/j.1439-0329.2005.00397.x.
Article Google Scholar
de Sousa E, Santos M, Varela MC, Henriques J: Perda de vigor dos montados de sobro e azinho: Análise da situação e perspectivas. 2007
Google Scholar
Bergot M, Cloppet E, Pérarnaud V: Simulation of potential range expansion of oak disease caused by Phytophthora cinnamomi under climate change. Glob Change Biol. 2004, 10: 1539-1552. 10.1111/j.1365-2486.2004.00824.x.
Article Google Scholar
Pereira JS, Kurz-Besson C: Coping with drought. Cork Oak Woodlands on the Edge – Ecology, Adaptive Management and Restoration. 2009, Washington: Island Press, 73-80. 1
Google Scholar
Marum L, Miguel A, Ricardo CP, Miguel C: Reference gene selection for quantitative real-time PCR normalization in Quercus suber. PLoS ONE. 2012, 7: e35113-10.1371/journal.pone.0035113.
Article CAS PubMed Central PubMed Google Scholar
Coelho AC, Lima MB, Neves D, Cravador A: Genetic diversity of two evergreen oaks (Quercus suber L. and Q (ilex) rotundifolia Lam.) in Portugal using AFLP markers. Silvae Genetica. 2006, 55: 105-118.
Google Scholar
Chaves I, Passarinho JAP, Capitão C, Chaves MM, Fevereiro P, Ricardo CPP: Temperature stress effects in Quercus suber leaf metabolism. J Plant Physiol. 2011, 168: 1729-1734. 10.1016/j.jplph.2011.05.013.
Article CAS PubMed Google Scholar
Graça J, Santos S: Suberin: a biopolyester of plants’ skin. Macromol Biosci. 2007, 7: 128-135. 10.1002/mabi.200600218.
Article PubMed Google Scholar
Soler M, Serra O, Molinas M, Huguet G, Fluch S, Figueras M: A genomic approach to suberin biosynthesis and cork differentiation. Plant Physiol. 2007, 144: 419-431. 10.1104/pp.106.094227.
Article CAS PubMed Central PubMed Google Scholar
Vaz M, Pereira JS, Gazarini LC, David TS, David JS, Rodrigues A, Maroco J, Chaves MM: Drought-induced photosynthetic inhibition and autumn recovery in two Mediterranean oak species (Quercus ilex and Quercus suber). Tree Physiol. 2010, 30: 946-956. 10.1093/treephys/tpq044.
Article CAS PubMed Google Scholar
Almeida T, Menéndez E, Capote T, Ribeiro T, Santos C, Gonçalves S: Molecular characterization of Quercus suber MYB1, a transcription factor up-regulated in cork tissues. J Plant Physiol. 2013, 170: 172-178. 10.1016/j.jplph.2012.08.023.
Article CAS PubMed Google Scholar
Ueno S, Provost GL, Léger V, Klopp C, Noirot C, Frigerio J-M, Salin F, Salse J, Abrouk M, Murat F, Brendel O, Derory J, Abadie P, Léger P, Cabane C, Barré A, de Daruvar A, Couloux A, Wincker P, Reviron M-P, Kremer A, Plomion C: Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics. 2010, 11: 650-10.1186/1471-2164-11-650.
Article PubMed Central PubMed Google Scholar
Tarkka MT, Herrmann S, Wubet T, Feldhahn L, Recht S, Kurth F, Mailänder S, Bönn M, Neef M, Angay O, Bacht M, Graf M, Maboreke H, Fleischmann F, Grams TEE, Ruess L, Schädler M, Brandl R, Scheu S, Schrey SD, Grosse I, Buscot F: OakContigDF159.1, a reference library for studying differential gene expression in Quercus robur during controlled biotic interactions: use for quantitative transcriptomic profiling of oak roots in ectomycorrhizal symbiosis. New Phytol. 2013, 199: 529-540. 10.1111/nph.12317.
Article CAS PubMed Google Scholar
Kremer A, Abbott AG, Carlson JE, Manos PS, Plomion C, Sisco P, Staton ME, Ueno S, Vendramin GG: Genomics of Fagaceae. Tree Genetics & Genomes. 2012, 8: 583-610. 10.1007/s11295-012-0498-3.
Article Google Scholar
Rampant PF, Lesur I, Boussardon C, Bitton F, Martin-Magniette M-L, Bodénès C, Le Provost G, Bergès H, Fluch S, Kremer A, Plomion C: Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome. BMC Genomics. 2011, 12: 292-10.1186/1471-2164-12-292.
Article CAS Google Scholar
Durand J, Bodénès C, Chancerel E, Frigerio J-M, Vendramin G, Sebastiani F, Buonamici A, Gailing O, Koelewijn H-P, Villani F, Mattioni C, Cherubini M, Goicoechea PG, Herrán A, Ikaran Z, Cabane C, Ueno S, Alberto F, Dumoulin P-Y, Guichoux E, de Daruvar A, Kremer A, Plomion C: A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics. 2010, 11: 570-10.1186/1471-2164-11-570.
Article PubMed Central PubMed Google Scholar
Zhulidov PA, Bogdanova EA, Shcheglov AS, Shagina IA, Wagner LL, Khazpekov GL, Kozhemyako VV, Lukyanov SA, Shagin DA: A method for the preparation of normalized cDNA libraries enriched with full-length sequences. Russ J Bioorg Chem. 2005, 31: 170-177. 10.1007/s11171-005-0023-7.
Article CAS Google Scholar
Timme RE, Delwiche CF: Uncovering the evolutionary origin of plant molecular processes: comparison of Coleochaete (Coleochaetales) and Spirogyra (Zygnematales) transcriptomes. BMC Plant Biol. 2010, 10: 96-10.1186/1471-2229-10-96.
Article PubMed Central PubMed Google Scholar
Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11: 1-16. 10.1186/1471-2164-11-1.
Article Google Scholar
Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, et al: The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.
Article CAS PubMed Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.
Article CAS PubMed Central PubMed Google Scholar
Zhi-Liang H, Bao J: CateGOrizer: a web-based program to batch analyze Gene Ontology Classification Categories. Online J Bioinformatics. 2008, 9 (2): 108-112.
Google Scholar
Barakat A, DiLoreto DS, Zhang Y, Smith C, Baier K, Powell W, Wheeler N, Se deroff R, Carlson JE: Comparison of transcriptome from cankers and healthy stems in American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima). BMC Plant Biol. 2009, 9: 51-62. 10.1186/1471-2229-9-51.
Article PubMed Central PubMed Google Scholar
Altenhoff AM, Dessimoz C: Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comp Biol. 2009, 5: e1000262-10.1371/journal.pcbi.1000262.
Article Google Scholar
Varela MC: Handbook of the EU Concerted Action on cork oak: FAIR 1 CT 95-0202; European network for the evaluation of genetic resources of cork oak for appropriate use in breeding and gene conservation strategies. 2003, Lisboa (Portugal): INIA
Google Scholar
Shcheglov AS, Zhulidov PA, Bogdanova EA, Shagin DA: Nucleic Acids Hybridization Modern Applications. 2007, Dordrecht: Springer Netherlands, 97-124.
Book Google Scholar
Niu B, Fu L, Sun S, Li W: Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics. 2010, 11: 187-10.1186/1471-2105-11-187.
Article PubMed Central PubMed Google Scholar
Falgueras J, Lara AJ, Fernandez-Pozo N, Canton FR, Perez-Trabado G, Claros MG: SeqTrim: a high-throughput pipeline for preprocessing any type of sequence reads. BMC Bioinformatics. 2010, 11: 38-10.1186/1471-2105-11-38.
Article PubMed Central PubMed Google Scholar
Ouyang S: The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res. 2004, 32: 360D-363D. 10.1093/nar/gkh099.
Article Google Scholar
Chevreux B, Pfisterer T, Wetter T: Assembly of Genomic Sequences Assisted by Automatic Finishing. German Conf Bioinformatics. 1999, 183-184.
Google Scholar
Papanicolaou A, Stierli R, ffrench-Constant RH, Heckel DG: Next generation transcriptomes for next generation genomes using est2assembly. BMC Bioinformatics. 2009, 10: 447-10.1186/1471-2105-10-447.
Article PubMed Central PubMed Google Scholar
Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.
Article CAS PubMed Google Scholar
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM: The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucl. Acids Res. 37 (suppl 1): D141-D145.
Apweiler R: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32: 115D-119D. 10.1093/nar/gkh131.
Article Google Scholar
Wasmuth JD, Blaxter ML: prot4EST: translating expressed sequence tags from neglected genomes. BMC Bioinformatics. 2004, 5: 187-10.1186/1471-2105-5-187.
Article PubMed Central PubMed Google Scholar
Iseli C, Jongeneel C, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Bio. 1999, 138-148.
Google Scholar
Zdobnov EM, Apweiler R: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.
Article CAS PubMed Google Scholar
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, et al: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37 (Database issue): D211-D215.
Article CAS PubMed Central PubMed Google Scholar
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.
Article CAS PubMed Google Scholar
Gough J, Chothia C: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 2002, 30: 268-272. 10.1093/nar/30.1.268.
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgments

This project was funded by “Fundação para a Ciência e a Tecnologia” (FCT) within a National Consortium (COEC – Cork Oak ESTs Consortium) that supported 12 sub-projects (SOBREIRO/033, 035, 014, 034, 015, 017, 038, 019, 029, 039, 030, 036/2009). The authors further wish to acknowledge FCT for ten doctoral (BD) and post-doctoral (BPD) fellowships (Tânia Almeida: SFRH/BD/44410/2008, Tiago Capote:SFRH/BD/69785/2010, Inês Chaves: SFRH/BPD/20833/2004, Ana S. Fortunato: SFRH/BPD/47563/2008, Marília Horta: SFRH/BPD/63213/2009, Liliana Marum: "SFRH/BPD/47679/2008, Andreia Miguel: SFRH/BD/44474/2008, Margarida Rocheta: SFRH/BPD/64905/2009, Tatiana E. Santo: SFRH/BD/47450/2008, Mónica Sebastiana: SFRH/BPD/25661/2005). Andreas Bohn, Nelson J.M. Saibo, Rita Teixeira were supported by the Programa Ciência 2007, financed by POPH (QREN) and Isabel A. Abreu, Susana Araujo, Dora Batista, A. Margarida Fortes, Jorge A.P. Paiva, Sónia Gonçalves by Programa Ciência 2008, also funded by POPH (QREN). A Margarida Santos was funded through iBET (PEst-OE/EQB/LA0004/2011). Maintenance of the CorkOakDB is supported by the Instituto Gulbenkian de Ciência.

Author information

Herlânder Azevedo
Present address: CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, 4485-661, Portugal

Authors and Affiliations

Instituto Gulbenkian de Ciência, Rua da Quinta Grande 6, Oeiras, 2780-156, Portugal
José B Pereira-Leal, Paulo Almeida, Isabel Neves & Ana Margarida Santos
Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Genomics of Plant Stress Lab, Av. da República, Oeiras, 2780-157, Portugal
Isabel A Abreu, Maria Margarida Oliveira, Nelson JM Saibo & Ana Margarida Santos
Instituto de Biologia Experimental e Tecnológica, Genomics of Plant Stress Lab, Apartado 12, Oeiras, 2781-901, Portugal
Isabel A Abreu, Maria Margarida Oliveira, Nelson JM Saibo & Ana Margarida Santos
Laboratory of Genomics and Genetic Improvement, BioFIG, FCT, Universidade do Algarve, E.8, Campus de Gambelas, Faro, 8300, Portugal
Cláudia S Alabaça, José M Leitão & Tatiana E Santo
Centro Estudos Florestais (CEF), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa, 1349-017, Portugal
Maria Helena Almeida, Ana Isabel Rodrigues & Rita Teixeira
Centro de Biotecnologia Agrícola e Agro-Alimentar do Alentejo (CEBAL)/ Instituto Politécnico de Beja (IPBeja), Beja, 7801-908, Portugal
Tânia Almeida, Tiago Capote, Sónia Gonçalves & Teresa Ribeiro
Centre for Research in Ceramics & Composite Materials (CICECO), Universidade de Aveiro, Campus Universitário de Santiago, Aveiro, 3810-193, Portugal
Tânia Almeida, Tiago Capote, Sónia Gonçalves & Teresa Ribeiro
Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, s/n, FC4, Porto, 4169-007, Portugal
Maria Isabel Amorim
Instituto de Biologia Experimental e Tecnológica, Plant Cell Biotecnology Lab, Apartado 12, Oeiras, 2781-901, Portugal
Susana Araújo & Jorge A Paiva
Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Plant Cell Biotecnology Lab, Av. da República, Oeiras, 2780-157, Portugal
Susana Araújo & Jorge A Paiva
Instituto de Investigação Científica Tropical (IICT), BIOTROP/Veterinária e Zootecnia, R. da Junqueira, 86 - 1, Lisboa, 1300-344, Portugal
Susana Araújo
Centre for Biodiversity, Functional & Integrative Genomics (BioFIG), Plant Functional Biology Centre, Universidade do Minho, Campus de Gualtar, Braga, 4710-057, Portugal
Herlânder Azevedo, Maria Manuela Ribeiro Costa, Teresa Lino-Neto, Rute Oliveira, João AP Raimundo, Rómulo S Sobral & Rui Tavares
Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Systems Biodynamics Lab, Av. da República, 2780-157, Oeiras, Portugal
Aleix Badia & Andreas Bohn
Instituto de Biologia Experimental e Tecnológica, Systems Biodynamics Lab, Apartado 12, Oeiras, 2781-901, Portugal
Aleix Badia & Andreas Bohn
Centro de Investigação das Ferrugens do Cafeeiro/BioTrop, Instituto de Investigação Científica Tropical, Quinta do Marquês, Oeiras, 2784-505, Portugal
Dora Batista
INIAV- Instituto Nacional de Investigação Agrária e Veterinária, IP, Quinta do Marquês, Oeiras, 2780-159, Portugal
Isabel Carrasquinho, Rita Costa, José Matos, Diogo Mendonça, Filomena Nóbrega, Paula Sá-Pereira, Fernanda Simões, Carolina Varela & Maria Manuela Veloso
Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Plant Biochemistry Lab, Av. da República, Oeiras, 2780-157, Portugal
Inês Chaves & Cândido PP Ricardo
Instituto de Biologia Experimental e Tecnológica, Plant Biochemistry Lab, Apartado 12, Oeiras, 2781-901, Portugal
Inês Chaves & Cândido PP Ricardo
Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Forest Biotech Lab, Av. da República, Oeiras, 2780-157, Portugal
Inês Chaves, Liliana Marum, Andreia Miguel & Célia M Miguel
Instituto de Biologia Experimental e Tecnológica, Forest Biotech Lab, Apartado 12, Oeiras, 2781-901, Portugal
Inês Chaves, Liliana Marum, Andreia Miguel & Célia M Miguel
Centro de Electrónica, Optoelectrónica e Telecomunicações (CEOT), Universidade do Algarve, Campus de Gambelas, Faro, 8005-139, Portugal
Ana Cristina Coelho
Institute for Biotechnology and Bioengineering - Centre of Genomics and Biotechnology (IBB-CGB), Plant and Animal Genomic Group, Universidade do Algarve - Campus de Gambelas, Faro, 8005-139, Portugal
Alfredo Cravador & Marília Horta
Biocant, Parque Tecnológico de Cantanhede, Cantanhede, 3060 - 197, Portugal
Conceição Egas, Carlos Faro & Miguel Pinheiro
Centre for Biodiversity, Functional & Integrative Genomics (BioFIG), Faculdade de Ciências da Universidade de Lisboa, Lisboa, 1749-016, Portugal
Ana M Fortes
Unidade de Ecofisiologia, Bioquímica e Biotecnologia Vegetal/BioTrop, Instituto de Investigação Científica Tropical, Quinta do Marquês, Av. da República, Oeiras, 2784-505, Portugal
Ana S Fortunato, José C Ramalho & Ana I Ribeiro
Departamento Genética e Biotecnologia, Univ. Trás-os-Monte e Alto Douro, Vila Real, 5001-801, Portugal
Maria João Gaspar
CEF, ISA Technical University Lisbon, Tapada da Ajuda, Lisboa, 1349-017, Portugal
Maria João Gaspar & José Graça
Centro Botânica Aplicada Agricultura (CBAA), Instituto Superior de Agronomia, Universidade Técnica de Lisboa, Tapada da Ajuda, Lisboa, 1349-017, Portugal
Vera Inácio, Leonor Morais-Cecílio, Teresa Ribeiro & Margarida Rocheta
Centre for Biodiversity, Functional & Integrative Genomics (BioFIG), Plant Systems Biology Lab, Faculdade de Ciências da Universidade de Lisboa, Lisboa, 1749-016, Portugal
Maria Salomé Pais & Mónica Sebastiana
Instituto de Investigação Científica Tropical (IICT), BIOTROP/Florestas e dos Produtos Florestais, Tapada da Ajuda, Lisboa, 1349-017, Portugal
Jorge A Paiva & José C Rodrigues
Centro de Biologia Ambiental, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa, 1749-016, Portugal
Octávio S Paulo

Authors

José B Pereira-Leal
View author publications
You can also search for this author in PubMed Google Scholar
Isabel A Abreu
View author publications
You can also search for this author in PubMed Google Scholar
Cláudia S Alabaça
View author publications
You can also search for this author in PubMed Google Scholar
Maria Helena Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Tânia Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Maria Isabel Amorim
View author publications
You can also search for this author in PubMed Google Scholar
Susana Araújo
View author publications
You can also search for this author in PubMed Google Scholar
Herlânder Azevedo
View author publications
You can also search for this author in PubMed Google Scholar
Aleix Badia
View author publications
You can also search for this author in PubMed Google Scholar
Dora Batista
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Bohn
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Capote
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Carrasquinho
View author publications
You can also search for this author in PubMed Google Scholar
Inês Chaves
View author publications
You can also search for this author in PubMed Google Scholar
Ana Cristina Coelho
View author publications
You can also search for this author in PubMed Google Scholar
Maria Manuela Ribeiro Costa
View author publications
You can also search for this author in PubMed Google Scholar
Rita Costa
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Cravador
View author publications
You can also search for this author in PubMed Google Scholar
Conceição Egas
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Faro
View author publications
You can also search for this author in PubMed Google Scholar
Ana M Fortes
View author publications
You can also search for this author in PubMed Google Scholar
Ana S Fortunato
View author publications
You can also search for this author in PubMed Google Scholar
Maria João Gaspar
View author publications
You can also search for this author in PubMed Google Scholar
Sónia Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
José Graça
View author publications
You can also search for this author in PubMed Google Scholar
Marília Horta
View author publications
You can also search for this author in PubMed Google Scholar
Vera Inácio
View author publications
You can also search for this author in PubMed Google Scholar
José M Leitão
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Lino-Neto
View author publications
You can also search for this author in PubMed Google Scholar
Liliana Marum
View author publications
You can also search for this author in PubMed Google Scholar
José Matos
View author publications
You can also search for this author in PubMed Google Scholar
Diogo Mendonça
View author publications
You can also search for this author in PubMed Google Scholar
Andreia Miguel
View author publications
You can also search for this author in PubMed Google Scholar
Célia M Miguel
View author publications
You can also search for this author in PubMed Google Scholar
Leonor Morais-Cecílio
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Neves
View author publications
You can also search for this author in PubMed Google Scholar
Filomena Nóbrega
View author publications
You can also search for this author in PubMed Google Scholar
Maria Margarida Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Rute Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Maria Salomé Pais
View author publications
You can also search for this author in PubMed Google Scholar
Jorge A Paiva
View author publications
You can also search for this author in PubMed Google Scholar
Octávio S Paulo
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar
João AP Raimundo
View author publications
You can also search for this author in PubMed Google Scholar
José C Ramalho
View author publications
You can also search for this author in PubMed Google Scholar
Ana I Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Margarida Rocheta
View author publications
You can also search for this author in PubMed Google Scholar
Ana Isabel Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
José C Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Nelson JM Saibo
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana E Santo
View author publications
You can also search for this author in PubMed Google Scholar
Ana Margarida Santos
View author publications
You can also search for this author in PubMed Google Scholar
Paula Sá-Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Mónica Sebastiana
View author publications
You can also search for this author in PubMed Google Scholar
Fernanda Simões
View author publications
You can also search for this author in PubMed Google Scholar
Rómulo S Sobral
View author publications
You can also search for this author in PubMed Google Scholar
Rui Tavares
View author publications
You can also search for this author in PubMed Google Scholar
Rita Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Varela
View author publications
You can also search for this author in PubMed Google Scholar
Maria Manuela Veloso
View author publications
You can also search for this author in PubMed Google Scholar
Cândido PP Ricardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José B Pereira-Leal.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Reprints and permissions

About this article

Cite this article

Pereira-Leal, J.B., Abreu, I.A., Alabaça, C.S. et al. A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing. BMC Genomics 15, 371 (2014). https://doi.org/10.1186/1471-2164-15-371

Download citation

Received: 14 March 2013
Accepted: 15 April 2014
Published: 15 May 2014
DOI: https://doi.org/10.1186/1471-2164-15-371

A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing

Abstract

Background

Results

Conclusions

Background

Results and discussion

Sequencing

Assembly

Coverage and depth

Functional annotation

Evolution

Database and interface

Conclusions

Methods

Samples, collection and preparation

cDNA preparation, library normalization and pyrosequencing

Sequence processing and assembly

Assembly

Protein prediction

Sequence naming

Functional annotation

Database implementation

Accession numbers and unigene naming

Evolutionary analysis

Availability of supporting data

Author’ contributions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us