High-throughput transcriptome sequencing and preliminary functional analysis in four Neotropical tree species
© Brousseau et al.; licensee BioMed Central Ltd. 2014
Received: 12 February 2014
Accepted: 13 March 2014
Published: 27 March 2014
The Amazonian rainforest is predicted to suffer from ongoing environmental changes. Despite the need to evaluate the impact of such changes on tree genetic diversity, we almost entirely lack genomic resources.
In this study, we analysed the transcriptome of four tropical tree species (Carapa guianensis, Eperua falcata, Symphonia globulifera and Virola michelii) with contrasting ecological features, belonging to four widespread botanical families (respectively Meliaceae, Fabaceae, Clusiaceae and Myristicaceae). We sequenced cDNA libraries from three organs (leaves, stems, and roots) using 454 pyrosequencing. We have developed an R and bioperl-based bioinformatic procedure for de novo assembly, gene functional annotation and marker discovery. Mismatch identification takes into account single-base quality values as well as the likelihood of false variants as a function of contig depth and number of sequenced chromosomes. Between 17103 (for Symphonia globulifera) and 23390 (for Eperua falcata) contigs were assembled. Organs varied in the numbers of unigenes they apparently express, with higher number in roots. Patterns of gene expression were similar across species, with metabolism of aromatic compounds standing out as an overrepresented gene function. Transcripts corresponding to several gene functions were found to be over- or underrepresented in each organ. We identified between 4434 (for Symphonia globulifera) and 9076 (for Virola surinamensis) well-supported mismatches. The resulting overall mismatch density was comprised between 0.89 (S. globulifera) and 1.05 (V. surinamensis) mismatches/100 bp in variation-containing contigs.
The relative representation of gene functions in the four transcriptomes suggests that secondary metabolism may be particularly important in tropical trees. The differential representation of transcripts among tissues suggests differential gene expression, which opens the way to functional studies in these non-model, ecologically important species. We found substantial amounts of mismatches in the four species. These newly identified putative variants are a first step towards acquiring much needed genomic resources for tropical tree species.
The Amazonian rainforest hosts one of the greatest pools of terrestrial biodiversity, including very large tree species diversity [1–3]. In forest genetics, most efforts so far have focused on temperate and boreal tree species. While ongoing anthropogenic climate change is suspected to deeply affect the stability of Neotropical rainforests , tropical tree species genetic resources and adaptive potential are still poorly known , despite the availability of sequence data for several species [6–8]. Identification of polymorphisms and robust estimates of tropical tree species’ standing genetic diversity are thus needed to evaluate the vulnerability to environmental changes of populations and their ability to endure them [9, 10].
A thorough assessment of tropical tree species’ genetic diversity requires large amounts of genomic data and informative molecular markers [11, 12]. Single-nucleotide polymorphisms (SNPs) have become the most popular genome-wide genetic markers [13, 14] and are increasingly used to characterize potentially adaptive genetic variation (e.g. [15–17]).
High-throughput sequencing and genotyping methods have paved the way to genomic studies in non-model species [14, 18, 19], by permitting cost-effective sequencing and the generation of very large genetic data collections. Thus, NGS provides a valuable tool to describe genome properties and variation in non-model species [14, 20]. While assembling whole genomes without a reference sequence can be very complex and in the best cases incomplete, transcriptome sequencing constitutes an efficient alternative in information-poor organisms . Transcriptomes also include a large number of loci with known or predictable functions [22, 23] and have been applied to comparative genomics , marker discovery , and population genomic studies .
An array of next-generation sequencing strategies, varying in read length range and absolute throughput  can be used to sequence transcriptomes. The Roche 454-pyrosequencing technology, in spite of being the oldest among these, is the one producing on average the longest reads [23, 28, 29], which makes de novo assembly easier in non-model species without prior genomic resources [25, 30, 31] and allows preliminary screening of DNA variation  and transcriptome analysis (gene expression profiling by mRNA identification and quantification; ).
In this study we describe the transcriptomes of four widespread Neotropical tree genera chosen to represent different botanical families, ecological properties and patterns of local and range distribution (see Methods).
The objectives of the present study are (i) to describe the transcriptomes of these four tropical genera, (ii) to compare expression profiles among species and organs (leaves, stems and roots), and (iii) to provide an initial catalogue of well-supported mismatches, as candidates for validation as SNPs.
Study species and sampling
Species description: distribution range, ecological properties relative to light (successional status) and soil, spatial population structure and seed dispersal properties
Ecology - light
Spatial population structure
Gravity, rodents 
Guiana shield 
Shade tolerant 
Mostly seasonally flooded 
Neotropics, paleotropics 
Shade tolerant 
Seasonally flooded 
Gravity, vertebrates 
Seasonally flooded 
cDNA library preparation and sequencing
Total RNA from each fresh sample was extracted using a CTAB protocol as described by Le Provost et al.,  (with minor modifications for a subset of the samples). mRNAs were converted to double stranded cDNA using either SMARTer PCR cDNA Synthesis Kit (Clontech) or Mint cDNA synthesis kit (Evrogen) according to the manufacturer’s instructions.
For each species, cDNA libraries from the different organs (leaves, stems and roots) were identified by a specific molecular identifier (MID) tag. Samples from the same organ of different conspecific individuals were pooled for sequencing (MID1 = leaves, MID2 = stems, MID3 = roots). Libraries of the different species were sequenced separately (one run per species) according to a standard Roche-454 protocol . The raw data were submitted to the European Nucleotide Archive (ENA) database (study number: PRJEB3286; http://www.ebi.ac.uk/ena/) and given the accession numbers ERS177107 through ERS177110.
Assembly and functional annotation
Clipped-ends reads were de novo assembled into contigs using MIRA v.3.4.0. The software is rather flexible, has a large range of parameter choices  and it has been used efficiently for transcriptome assemblies . We applied the “accurate” mode (with ‘job’ arguments: ‘de novo, est, accurate’) to limit the assembly of paralogous genes. Singletons (i.e. unassembled reads) were discarded for all subsequent steps.
Because different numbers of reads were obtained from different organs, comparisons in the number of contigs (unigenes) among organs may suffer from ascertainment bias, with libraries containing fewer reads displaying fewer contigs due to more limited sampling. To test for this effect we have applied the RaBoT method , which compares observed values of a given statistic (here, number of contigs) in a smaller sample (the ‘empirical’ value) with the value obtained from repeated sub-samples of the same size, drawn from a larger sample (the ‘bootstrapped’ values). The statistic in the larger sample is thus evaluated in the same conditions as in the smaller one, which allows an unbiased comparison and their difference to be tested statistically. RaBoT was applied with N = 100 sub-samples. Because the sub-samples were not independent, only the non-parametric test and P-value (i.e. the fraction of the distribution of ‘bootstrapped’ values that is above the ‘empirical’ value) are reported.
Assembled contig consensus sequences were submitted to Blast2Go (B2G) analysis (http://www.blast2go.de/b2ghome), which permits large-scale blasting, mapping and annotation of novel sequence data particularly in non-model species . BlastX search was performed on species assemblies against the NCBI non-redundant protein database (with BlastX minimum e-value of 10−3, Number of Blast Hits = 20). We realized a semi-automated search for contaminants by verifying the organism identity of each blast hit as follows: NCBI Taxonomy CommonTree Browser (http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi) was searched with a non-redundant list of species extracted from B2G. Contigs for which at least one of the ten hits with the lowest e-values (< 10−25) identified a sequence from a genus belonging to the "green plant" node of the generated tree were further considered as non-contaminants; contigs with no hits to any “green plant” genus were treated as contaminants and excluded. Contigs were then assigned to the minimum e-value informative functional annotations from plant species hits, provided that their e-value was smaller than 10−25.
The contigs associated to each level 4 GO term were identified, and the number R ci of reads obtained for each contig i from each organ was recorded. The following steps were executed separately for each organ;
the observed average number of reads across all contigs associated to a given Biological Process was computed; this statistic was considered as an estimator of the average expression level of all genes involved in that Biological Process (contigs with zero counts were excluded);
then, the values R ci of reads per contig (within each organ) were permuted over all contigs 1000 times. At each permutation, the average read count of all contigs associated to a given biological process was recorded again, and the difference between empirical (observed) and was recorded.
the distribution of thses differences indicates how close to to average is the expression of genes belonging to a given GO term; i.e., for a Biological Process whose genes exhibit an average level of expression, the distribution of mean differences obtained from permutations overlaps zero; biological processes whose genes have expression levels above average have a distribution of permuted differences above zero, and vice versa for biological processes with genes showing less than average expression levels.
if, for a given biological process, the observed average read count per contig was larger than 95% of the average values obtained by permutation, then the group of genes associated to that biological process was considered as over-expressed, and consequently the biological process was considered functionally important for that organ.
Because a contig may be associated to different biological processes, steps (ii)-(v) above were performed for each biological process separately. Because all permutation tests were performed within organs, this analysis is not prone to biases in the number of reads per organ (see above and Results and Discussion). Comparisons among organs for variations in expression among processes were done qualitatively.
Assemblies were post-processed using both bioperl scripts from the SeqQual pipeline (Lang et al. in preparation) (Additional file 2), and home-made R scripts (Additional file 3) that followed various steps of filtering the data by integrating a number of quality criteria (and Additional file 4: Table S1 for a description of programs used). The different steps of the procedure used were as follows:
Splitting .ace assembly files and linking to quality
Assembled contig sequence files were extracted from the .ace files given by MIRA and linked to their original base quality scores contained in the .fasta.qual files.
Nucleotide differences were screened in assembled contigs and particular bases were masked according to several criterions:
being a singleton
being a variant with a frequency lower than 0.1 (see also 4.3 below)
having a quality score lower than 20 for polymorphic sites.
Following this ‘masking step’, a ‘cleaning step’ removed all positions (i.e. corresponding to one base) of the assembled contigs that contained only indels and masked bases. This last step is particularly relevant for 454 data where false insertions due to homopolymers were very common and drastically affect contig consensus, hampering further re-sequencing and SNP design for genotyping. Consensus (using IUPAC codes) were edited from cleaned assembled data and used both for estimating the total transcriptome length obtained and for identifying well supported mismatches.
Computing mismatch statistics and post-filtering
All mismatches contained in the cleaned assemblies were used to build a summary statistics table (number of occurrences and frequency of the different variants, depth, mean quality, minor allele frequency (maf)). This table was used to identify the highest-quality mismatches a posteriori (without affecting assembly and consensus). In particular, we chose to avoid:
mismatches with lower-than-expected frequencies based on the number of gametes sequenced. With two genotypes, four different gametes were sequenced with the probability of having a variant being 0.25 at minimum. The following rationale can be applied to any number of gametes 2N. The probability of observing a particular number of times (or fewer) the minority variant (with expected frequency in the sequence pool, p=1/2N) follows a binomial distribution. The probability of observing the variant exactly t times out of x reads is computed as and the probability of observing it t times or fewer is given by . All polymorphisms that were present in a configuration (e.g. 3 variants among 29 reads) with a cumulative probability P < 0.05 were considered as false positives and were discarded. This led to the exclusion of additional variants with frequencies between 0.1 and 0.15 but with probability below 5%.
mismatches having a depth lower than 8X, which can be considered as a stringent criteria, given the 20 quality score for each base, a minimum SNP frequency of 2/8= 0.25 here (since singletons have been previously excluded), and the fact that this configuration has a probability of 0.31 based on the binomial distribution rationale, which is well above the 5% threshold chosen before.
Following the filtering steps described above, mismatches were counted and their density per base was computed as the total number of putative variants (including those at contig ends that passed the quality and singleton filters) divided by the total number of bases where the depth was at least 8 reads. Numbers of transitions, transversions, and deletions were also reported.
Results and discussion
Partitioning of reads among different organs (leaves, stems, roots) in each species cDNA library ( C. guianensis , E. falcata , S. globulifera and V. surinamensis ) with percentages in parenthesis
Number of reads
From leaves [MID1]
63016 [43334 (28%)]
17421 [11417 (9%)]
49894 [32190 (30%)]
31526 [22077 (11%)]
From stems [MID2]
47100 [29720 (20%)]
28362 [18088 (14%)]
110373 [66874 (66%)]
41435 [28284 (14%)]
From roots [MID3]
132030 [77052 (50%)]
175551 [100909 (76%)]
7 [2 (0%)]
141948 [89918 (72%)]
5999 [3435 (2%)]
3260 [1799 (1%)]
6866 [4367 (4%)]
4314 [2691 (2%)]
Assembly results: number of assembled reads, number of contigs, total transcriptome coverage, average length per contig, and average number of reads per contig
Number of reads
Number of assembled reads
Number of contigs
Total length (bp)
Average length per contig (bp)
Average number of reads per contig
Proportion of contigs with 10 reads or fewer
BlastX statistics per species, performed on consensus sequences obtained from the MIRA assemblies
No of unigenes that did not return any blast result
No of blasted unigenes
[No unigenes after contaminant removal]
No of mapped unigenes
No of annotated unigenes
Total assembly length without contaminant (bp)
[Total length of blasted unigenes after removal of contaminant and unigenes with e-values >10−25]
In stems, we detected between eight (S. globulifera) and twenty-five (V. surinamensis) biological processes (Figure 4 middle column) that had significantly higher-than-average expression levels, fifteen of them being shared among different species. At least a subset of these processes (‘cellular biosynthetic process’, ‘cellular component movement’,‘organic substance biosynthetic process’, ‘organic substance catabolic process’,‘secondary metabolic process’) are potentially related to cell differentiation events that occur during wood formation.
In roots, between seven (C. guianensis) and twenty-six (E. falcata) biological processes appeared as particularly over-expressed, eleven being shared by different species. They reflect two main functions of roots: water and nutrient uptake (‘response to inorganic substance’, ‘response to ‘organic substance transmembrane transport’) and response to stresses caused by soil constraints, which fall in two classes: (a) soil water depletion (e.g. ‘response to osmotic stress’) which frequently occurs in tropical rainforests during the dry season; (b) oxidative stresses caused by soil hypoxia, to which the processes ‘reactive oxygen species metabolic process’, ‘response to oxidative stress’, and ‘response to oxygen containing compound’ are related; flooding-induced hypoxia is particularly frequent in water-logged bottomlands.
rRNA intron-encoded homing endonucleases were very abundant in the E. falcata assembly (581 unigenes against 43, 39 and 17 unigenes in C. guianensis, S. globulifera and V. surniamensis respectively). In E. falcata, these unigenes comprised between two and 920 reads with a mean of 15.3 (s.d.=69.77). Homing endonucleases from group I introns are self-splicing genetic elements or parasitic genes mostly found in organellar genomes [64–66]. Among contigs that showed BLASTX hits with rRNA-intron-encoded homing endonucleases in E. falcata, 69 were potentially polymorphic and contained from 1 to 18 mismatches with many haplotypes . High transcription levels of such elements, combined with the high numbers of mutations that they have accumulated, suggests a massive but ancient genome invasion event [67, 68] in the E. falcata genome compared to the other three species. The evolutionary implications of transfers of such elements remain poorly understood, because of their ‘super-Mendelian’ inheritance (such elements may be both vertically and horizontally transmitted ), and because they have no known function .
Total length with depth ≥ 8X after assembly cleaning (bases)
Before post-filtering based on binomial test
N variant-containing contigs
mismatch density (/100 bp)
N mismatches with 2 variants
N mismatches > 2 variants
After post-filtering based on binomial test
N variant-containing contigs
mismatch density (/100 bp)
N mismatches with 2 variants
N mismatches > 2 variants
Candidate transcriptome polymorphism and its usefulness in population genetics studies
Next-generation sequencing, allowing massive de novo acquisition of molecular data, provides a range of new potential applications for evolutionary and ecological-genetic studies in non-model species. High-throughput SNP data have indeed shown their potential for inferences about demographic and adaptive processes in natural populations [16, 73–79]; for examples in tree species, see [80, 81]. However, SNP design and validation has often frustratingly low success rates, because candidate variant identification is not stringent enough; in this paper, we have proposed a strategy to filter out false positives based on multiple criteria.
The genomic resources obtained here will trigger new exciting fields of research on tropical biodiversity. Providing a catalogue of putative functions for genomic regions with a high potential diversity will help identifying useful candidate genes for further resequencing or SNP genotyping [12, 82, 83]. These genes belong to a large range of biological processes, including growth, reproduction, light and nutrient acquisitions, as well as plant response to biotic and abiotic stresses. Focusing on genes potentially involved in adaptive processes in Neotropical forest tree species will permit to test hypotheses about evolutionary processes underlying genome evolution and the build-up of biological diversity in tropical forest ecosystems.
Availability of supporting data
The raw data were submitted to the ENA database (study number: PRJEB3286) and given the accession numbers ERS177107 through ERS177110.
We thank the GENOTOUL platform who performed the sequencing. We also thank Valérie Léger, who helped designing and setting up the laboratory protocols, and Myriam Heuertz for critically reading the manuscript. This project was supported by PO FEDER ENERGIRAVI, the MEDD-ECOFOR “Ecosystèmes tropicaux” program and an "Investissement d’Avenir" grant managed by Agence Nationale de la Recherche (CEBA, ref. ANR-10-LABX-0025).
- Hoorn C, Wesselingh FP, ter Steege H, Bermudez MA, Mora A, Sevink J, Sanmartín I, Sanchez-Meseguer A, Anderson CL, Figueiredo JP, Jaramillo C, Riff D, Negri FR, Hooghiemstra H, Lundberg J, Stadler T, Särkinen T, Antonelli A: Amazonia through time: andean uplift, climate change, landscape evolution, and biodiversity. Science. 2010, 330 (6006): 927-931. 10.1126/science.1194585.PubMedView ArticleGoogle Scholar
- Hubbell SP, He F, Condit R, Borda-de-Água L, Kellner J, ter Steege H: How many tree species are there in the Amazon and how many of them will go extinct?. Proc Natl Acad Sci. 2008, 105 (Supplement 1): 11498-11504. 10.1073/pnas.0801915105.PubMed CentralPubMedView ArticleGoogle Scholar
- Hawkins BA, Rodríguez MÁ, Weller SG: Global angiosperm family richness revisited: linking ecology and evolution to climate. J Biogeogr. 2011, 38 (7): 1253-1266. 10.1111/j.1365-2699.2011.02490.x.View ArticleGoogle Scholar
- Phillips OL, Aragao LEOC, Lewis SL, Fisher JB, Lloyd J, Lopez-Gonzalez G, Malhi Y, Monteagudo A, Peacock J, Quesada CA, van der Heijden G, Almeida S, Amaral I, Arroyo L, Aymard G, Baker TR, Banki O, Blanc L, Bonal D, Brando P, Chave J, de Oliveira ACA, Cardozo ND, Czimczik CI, Feldpausch TR, Freitas MA, Gloor E, Higuchi N, Jimenez E, Lloyd G, et al: Drought Sensitivity of the Amazon Rainforest. Science. 2009, 323 (5919): 1344-1347. 10.1126/science.1164033.PubMedView ArticleGoogle Scholar
- Savolainen O, Pyhäjärvi T, Knürr T: Gene Flow and Local Adaptation in Trees. Annu Rev Ecol Evol Syst. 2007, 38 (1): 595-619. 10.1146/annurev.ecolsys.38.091206.095646.View ArticleGoogle Scholar
- Audigeos D, Brousseau L, Traissac S, Scotti-Saintagne C, Scotti I: Molecular divergence in tropical tree populations occupying environmental mosaics. J Evol Biol. 2013, 26: 529-544. 10.1111/jeb.12069.PubMedView ArticleGoogle Scholar
- Audigeos D: Relations entre diversité génétique et environnement: quels sont les processus évolutifs mis en jeu ? Cas d’une espèce d’arbre tropical: Eperua falcata Aublet. Ph. D. thesis.2010, Kourou, French Guiana: University of French West Indes and French Guiana,Google Scholar
- Argout X, Salse J, Aury J-M, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, Abrouk M, Murat F, Fouet O, Poulain J, Ruiz M, Roguet Y, Rodier-Goud M, Barbosa-Neto JF, Sabot F, Kudrna D, Ammiraju JSS, Schuster SC, Carlson JE, Sallet E, Schiex T, Dievart A, Kramer M, Gelley L, Shi Z, Berard A, et al: The genome of Theobroma cacao. Nat Genet. 2010, 43 (2): 101-108.PubMedView ArticleGoogle Scholar
- Scotti I: Adaptive potential in forest tree populations: what is it, and how can we measure it?. Ann For Sci. 2010, 67 (8): 801-801. 10.1051/forest/2010053.View ArticleGoogle Scholar
- Jump AS, Marchant R, Peñuelas J: Environmental change and the option value of genetic diversity. Trends Plant Sci. 2008, 14 (1): 51-58.PubMedView ArticleGoogle Scholar
- Aitken SN, Yeaman S, Holliday JA, Wang T, Curtis-McLane S: Adaptation, migration or extirpation: climate change outcomes for tree populations. Evol Appl. 2008, 1 (1): 95-111. 10.1111/j.1752-4571.2007.00013.x.PubMed CentralPubMedView ArticleGoogle Scholar
- Stapley J, Reger J, Feulner PGD, Smadja C, Galindo J, Ekblom R, Bennison C, Ball AD, Beckerman AP, Slate J: Adaptation genomics: the next generation. Trends Ecol Evol. 2010, 25 (12): 705-712. 10.1016/j.tree.2010.09.002.PubMedView ArticleGoogle Scholar
- Helyar SJ, Hemmer-Hansen J, Bekkevold D, Taylor MI, Ogden R, Limborg MT, Cariani A, Maes GE, Diopere E, Carvalho GR, Nielsen EE: Application of SNPs for population genetics of nonmodel organisms: new opportunities and challenges. Mol Ecol Resour. 2011, 11: 123-136.PubMedView ArticleGoogle Scholar
- Seeb JE, Carvalho G, Hauser L, Naish K, Roberts S, Seeb LW: Single-nucleotide polymorphism (SNP) discovery and applications of SNP genotyping in nonmodel organisms. Mol Ecol Resour. 2011, 11: 1-8.PubMedView ArticleGoogle Scholar
- Schlötterer C: Towards a molecular characterization of adaptation in local populations. Curr Opin Genet Dev. 2002, 12 (6): 683-687. 10.1016/S0959-437X(02)00349-0.PubMedView ArticleGoogle Scholar
- Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, Gonzalez-Martinez SC, Neale DB: Patterns of Population Structure and Environmental Associations to Aridity Across the Range of Loblolly Pine (Pinus taeda L., Pinaceae). Genetics. 2010, 185 (3): 969-982. 10.1534/genetics.110.115543.PubMed CentralPubMedView ArticleGoogle Scholar
- Eveno E, Collada C, Guevara MA, Leger V, Soto A, Diaz L, Leger P, Gonzalez-Martinez SC, Cervera MT, Plomion C, Garnier-Gere PH: Contrasting Patterns of Selection at Pinus pinaster Ait. Drought Stress Candidate Genes as Revealed by Genetic Differentiation Analyses. Mol Biol Evol. 2008, 25 (2): 417-437. 10.1093/molbev/msm272.PubMedView ArticleGoogle Scholar
- Allendorf FW, Hohenlohe PA, Luikart G: Genomics and the future of conservation genetics. Nat Rev Genet. 2010, 11 (10): 697-709. 10.1038/nrg2844.PubMedView ArticleGoogle Scholar
- Ellegren H: Sequencing goes 454 and takes large-scale genomics into the wild. Mol Ecol. 2008, 17 (7): 1629-1631. 10.1111/j.1365-294X.2008.03699.x.PubMedView ArticleGoogle Scholar
- Egan AN, Schlueter J, Spooner DM: Applications of next-generation sequencing in plant biology. Am J Bot. 2012, 99 (2): 175-185. 10.3732/ajb.1200020.PubMedView ArticleGoogle Scholar
- Pop M, Salzberg SL: Bioinformatics challenges of new sequencing technology. Trends Genet. 2008, 24 (3): 142-149. 10.1016/j.tig.2007.12.006.PubMed CentralPubMedView ArticleGoogle Scholar
- Bouck AMY, Vision T: The molecular ecologist's guide to expressed sequence tags. Mol Ecol. 2007, 16 (5): 907-924.PubMedView ArticleGoogle Scholar
- Emrich SJ, Barbazuk WB, Li L, Schnable PS: Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2007, 17 (1): 69-73.PubMed CentralPubMedView ArticleGoogle Scholar
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17 (7): 1636-1647. 10.1111/j.1365-294X.2008.03666.x.PubMedView ArticleGoogle Scholar
- Novaes E, Drost D, Farmerie W, Pappas G, Grattapaglia D, Sederoff R, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9 (1): 312-10.1186/1471-2164-9-312.PubMed CentralPubMedView ArticleGoogle Scholar
- Namroud M-C, Beaulieu J, Juge N, Laroche J, Bousquet J: Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce. Mol Ecol. 2008, 17: 3599-3613. 10.1111/j.1365-294X.2008.03840.x.PubMed CentralPubMedView ArticleGoogle Scholar
- Mutz K-O, Heilkenbrinker A, Lönne M, Walter J-G, Stahl F: Transcriptome analysis using next-generation sequencing. Curr Opin Biotechnol. 2013, 24 (1): 22-30. 10.1016/j.copbio.2012.09.004.PubMedView ArticleGoogle Scholar
- Weber APM, Weber KL, Carr K, Wilkerson C, Ohlrogge JB: Sampling the Arabidopsis Transcriptome with Massively Parallel Pyrosequencing. Plant physiology. 2007, 144 (1): 32-42. 10.1104/pp.107.096677.PubMed CentralPubMedView ArticleGoogle Scholar
- Wicker T, Schlagenhauf E, Graner A, Close T, Keller B, Stein N: 454 sequencing put to the test using the complex genome of barley. BMC Genomics. 2006, 7 (1): 275-10.1186/1471-2164-7-275.PubMed CentralPubMedView ArticleGoogle Scholar
- Blanca J, Pascual L, Ziarsolo P, Nuez F, Canizares J: ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 2011, 12 (1): 285-10.1186/1471-2164-12-285.PubMed CentralPubMedView ArticleGoogle Scholar
- Margam VM, Coates BS, Bayles DO, Hellmich RL, Agunbiade T, Seufferheld MJ, Sun W, Kroemer JA, Ba MN, Binso-Dabire CL, Baoua I, Ishiyaku MF, Covas FG, Srinivasan R, Armstrong J, Murdock LL, Pittendrigh BR: Transcriptome Sequencing, and Rapid Development and Application of SNP Markers for the Legume Pod Borer Maruca vitrata (Lepidoptera: Crambidae). PLOS One. 2011, 6 (7): e21388-10.1371/journal.pone.0021388.PubMed CentralPubMedView ArticleGoogle Scholar
- Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS: SNP discovery via 454 transcriptome sequencing. Plant J. 2007, 51 (5): 910-918. 10.1111/j.1365-313X.2007.03193.x.PubMed CentralPubMedView ArticleGoogle Scholar
- Morozova O, Marra MA: Applications of next-generation sequencing technologies in functional genomics. Genomics. 2008, 92 (5): 255-264. 10.1016/j.ygeno.2008.07.001.PubMedView ArticleGoogle Scholar
- Kenfack D: A Synoptic Revision of Carapa (Meliaceae). Harv Pap Bot. 2011, 16 (2): 171-231. 10.3100/0.25.016.0201.View ArticleGoogle Scholar
- Vincent G, Molino J-F, Marescot L, Barkaoui K, Sabatier D, Freycon V, Roelens J: The relative importance of dispersal limitation and habitat preference in shaping spatial distribution of saplings in a tropical moist forest: a case study along a combination of hydromorphic and canopy disturbance gradients. Ann For Sci. 2011, 68 (2): 357-370. 10.1007/s13595-011-0024-z.View ArticleGoogle Scholar
- Degen B, Caron H, Bandou E, Maggia L, Chevallier MH, Leveau A, Kremer A: Fine-scale spatial genetic structure of eight tropical tree species as analysed by RAPDs. Heredity. 2001, 87: 497-507. 10.1046/j.1365-2540.2001.00942.x.PubMedView ArticleGoogle Scholar
- Forget P-M, Cuijpers L: Survival and Scatterhoarding of Frugivores-Dispersed Seeds as a Function of Forest Disturbance. Biotropica. 2008, 40 (3): 380-385. 10.1111/j.1744-7429.2007.00358.x.View ArticleGoogle Scholar
- Cowan RS: A monograph of the genus Eperua (Leguminosae: Caesalpinioideae). Smithsonian Contr Bot. 1975, 28: 26-28.Google Scholar
- Ter Steege H, Zondervan G, ter Steege H: A preliminary analysis of large-scale forest inventory data of the Guiana Shield. Plant Diversity in Guyana. 2000, Wageningen, NL: Tropenbos Foundation, 18Google Scholar
- Dick Christopher W, Abdulah Salim K, Bermingham E: Molecular systematic analysis reveals cryptic tertiary diversification of a widespread tropical rain forest tree. Am Nat. 2003, 162 (6): 691-703. 10.1086/379795.PubMedView ArticleGoogle Scholar
- Pr A, Hamrick JL, Chavarriaga P, Kochert G: Microsatellite analysis of demographic genetic structure in fragmented populations of the tropical tree Symphonia globulifera. Mol Ecol. 1998, 7 (8): 933-944. 10.1046/j.1365-294x.1998.00396.x.View ArticleGoogle Scholar
- Wilson TK: Myristicaceae. Flowering plants of the Neotropics. Edited by: Smith N, Mori SA, Henderson DW, Heald SV. 2004, Princeton, NJ: New York Botanical garden & Princeton University PressGoogle Scholar
- Forget PM, Sabatier D: Dynamics of the seedling shadow of a frugivore-dispersed tree species in French Guiana. Journal of tropical ecology. 1997, 13: 767-773. 10.1017/S0266467400010920.View ArticleGoogle Scholar
- Baraloto C, Morneau F, Bonal D, Blanc L, Ferry B: Seasonal water stress tolerance and habitat associations within four neotropical tree genera. Ecology. 2007, 88 (2): 478-489. 10.1890/0012-9658(2007)88[478:SWSTAH]2.0.CO;2.PubMedView ArticleGoogle Scholar
- Le Provost G, Paiva J, Pot D, Brach J, Plomion C: Seasonal variation in transcript accumulation in wood-forming tissues of maritime pine (Pinus pinaster Ait.) with emphasis on a cell wall glycine-rich protein. Planta. 2003, 217 (5): 820-830. 10.1007/s00425-003-1051-2.PubMedView ArticleGoogle Scholar
- Meyer M, Stenzel U, Hofreiter M: Parallel tagged sequencing on the 454 platform. Nat Protocols. 2008, 3 (2): 267-278. 10.1038/nprot.2007.520.View ArticleGoogle Scholar
- Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S: Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs. Genome research. 2004, 14 (6): 1147-1159. 10.1101/gr.1917404.PubMed CentralPubMedView ArticleGoogle Scholar
- Kumar S, Blaxter M: Comparing de novo assemblers for 454 transcriptome data. BMC Genomics. 2010, 11 (1): 571-10.1186/1471-2164-11-571.PubMed CentralPubMedView ArticleGoogle Scholar
- Conesa A, Götz S: Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics. 2008, 2008: 1-13.View ArticleGoogle Scholar
- Scotti I, Montaigne W, Cseke K, Traissac S: RaBoT: a rarefaction-by-bootstrap method to compare genome-wide levels of genetic diversity. Ann For Sci. 2013, 70 (6): 631-635. 10.1007/s13595-013-0302-z.View ArticleGoogle Scholar
- Consortium TGO: The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2010, 38 (suppl 1): D331-D335.View ArticleGoogle Scholar
- Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, Hub tA: AmiGO: online access to ontology and annotation data. Bioinformatics. 2009, 25 (2): 288-289. 10.1093/bioinformatics/btn615.PubMed CentralPubMedView ArticleGoogle Scholar
- Consortium TGO: The Gene Ontology project in 2008. Nucleic Acids Res. 2008, 36 (suppl 1): D440-D444.Google Scholar
- Torres TT, Metta M, Ottenwälder B, Schlötterer C: Gene expression profiling by massively parallel sequencing. Genome Res. 2008, 18 (1): 172-177.PubMed CentralPubMedView ArticleGoogle Scholar
- Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW, DeLong EF: Microbial community gene expression in ocean surface waters. Proc Natl Acad Sci. 2008, 105 (10): 3805-3810. 10.1073/pnas.0708897105.PubMed CentralPubMedView ArticleGoogle Scholar
- Craft JA, Gilbert JA, Temperton B, Dempsey KE, Ashelford K, Tiwari B, Hutchinson TH, Chipman JK: Pyrosequencing of Mytilus galloprovincialis cDNAs: Tissue-Specific Expression Patterns. PLOS One. 2010, 5 (1): e8875-10.1371/journal.pone.0008875.PubMed CentralPubMedView ArticleGoogle Scholar
- You F, Huo N, Deal K, Gu Y, Luo M-C, McGuire P, Dvorak J, Anderson O: Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. BMC Genomics. 2011, 12 (1): 59-10.1186/1471-2164-12-59.PubMed CentralPubMedView ArticleGoogle Scholar
- Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Scholkopf B, Weigel D, Lohmann JU: A gene expression map of Arabidopsis thaliana development. Nat Genet. 2005, 37 (5): 501-506. 10.1038/ng1543.PubMedView ArticleGoogle Scholar
- Sloan DB, Keller SR, Berardi AE, Sanderson BJ, Karpovich JF, Taylor DR: De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol Ecol Resour. 2012, 12 (2): 333-343. 10.1111/j.1755-0998.2011.03079.x.PubMedView ArticleGoogle Scholar
- Blanca J, Canizares J, Roig C, Ziarsolo P, Nuez F, Pico B: Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae). BMC Genomics. 2011, 12 (1): 104-10.1186/1471-2164-12-104.PubMed CentralPubMedView ArticleGoogle Scholar
- Parchman T, Geist K, Grahnen J, Benkman C, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11 (1): 180-10.1186/1471-2164-11-180.PubMed CentralPubMedView ArticleGoogle Scholar
- Cottet K, Genta-Jouve G, Fromentin Y, Duplais C, Laprévote O, Michel S, Lallemand M-C: Comparative LC-MS-based metabolite profiling of the ancient tropical rainforest tree Symphonia globulifera. Phytochemistry. 2014, in pressGoogle Scholar
- Lamarre GPA, Baraloto C, Fortunel C, Dávila N, Mesones I, Rios JG, Ríos M, Valderrama E, Pilco MV, Fine PVA: Herbivory, growth rates, and habitat specialization in tropical tree lineages: implications for Amazonian beta-diversity. Ecology. 2012, 93 (sp8): S195-S210. 10.1890/11-0397.1.View ArticleGoogle Scholar
- Bagchi R, Gallery RE, Gripenberg S, Gurr SJ, Narayan L, Addis CE, Freckleton RP, Lewis OT: Pathogens and insect herbivores drive rainforest plant diversity and composition. Nature. 2014, advance online publicationGoogle Scholar
- Burt A, Koufopanou V: Homing endonuclease genes: the rise and fall and rise again of a selfish element. Curr Opin Genet Dev. 2004, 14 (6): 609-615. 10.1016/j.gde.2004.09.010.PubMedView ArticleGoogle Scholar
- Cho Y, Qiu Y-L, Kuhlman P, Palmer JD: Explosive invasion of plant mitochondria by a group I intron. Proc Natl Acad Sci. 1998, 95 (24): 14244-14249. 10.1073/pnas.95.24.14244.PubMed CentralPubMedView ArticleGoogle Scholar
- Yahara K, Fukuyo M, Sasaki A, Kobayashi I: Evolutionary maintenance of selfish homing endonuclease genes in the absence of horizontal transfer. Proc Natl Acad Sci. 2009, 106 (44): 18861-18866. 10.1073/pnas.0908404106.PubMed CentralPubMedView ArticleGoogle Scholar
- Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A, Vicedomini R, Sahlin K, Sherwood E, Elfstrand M, Gramzow L, Holmberg K, Hallman J, Keech O, Klasson L, Koriabine M, Kucukoglu M, Kaller M, Luthman J, Lysholm F, Niittyla T, Olson A, Rilakovic N, Ritland C, Rossello JA, Sena J, et al: The Norway spruce genome sequence and conifer genome evolution. Nature. 2013, Advance online publicationGoogle Scholar
- Koufopanou V, Goddard MR, Burt A: Adaptation for Horizontal Transfer in a Homing Endonuclease. Mol Biol Evol. 2002, 19 (3): 239-246. 10.1093/oxfordjournals.molbev.a004077.PubMedView ArticleGoogle Scholar
- Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 2007, 8 (7): R143-10.1186/gb-2007-8-7-r143.PubMed CentralPubMedView ArticleGoogle Scholar
- DePristo MA: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 2011, 43 (5): 491-498. 10.1038/ng.806.PubMed CentralPubMedView ArticleGoogle Scholar
- Parchman TL, Gompert Z, Mudge J, Schilkey FD, Benkman CW, Buerkle CA: Genome-wide association genetics of an adaptive trait in lodgepole pine. Mol Ecol. 2012, 21 (12): 2991-3005. 10.1111/j.1365-294X.2012.05513.x.PubMedView ArticleGoogle Scholar
- Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C: Genomic scans for selective sweeps using SNP data. Genome Res. 2005, 15 (11): 1566-1575. 10.1101/gr.4252305.PubMed CentralPubMedView ArticleGoogle Scholar
- Nielsen R, Hubisz MJ, Hellmann I, Torgerson D, Andrés AM, Albrechtsen A, Gutenkunst R, Adams MD, Cargill M, Boyko A, Indap A, Bustamante CD, Clark AG: Darwinian and demographic forces affecting human protein coding genes. Genome Res. 2009, 19 (5): 838-849. 10.1101/gr.088336.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Stephan W: Inferring the Demographic History and Rate of Adaptive Substitution in Drosophila. PLoS Genet. 2006, 2 (10): e166-10.1371/journal.pgen.0020166.PubMed CentralPubMedView ArticleGoogle Scholar
- Siol M, Wright SI, Barrett SCH: The population genomics of plant adaptation. New Phytol. 2010, 188 (2): 313-332. 10.1111/j.1469-8137.2010.03401.x.PubMedView ArticleGoogle Scholar
- Turner TL, Bourne EC, Von Wettberg EJ, Hu TT, Nuzhdin SV: Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nat Genet. 2010, 42 (3): 260-263. 10.1038/ng.515.PubMedView ArticleGoogle Scholar
- Fournier-Level A, Korte A, Cooper MD, Nordborg M, Schmitt J, Wilczek AM: A Map of Local Adaptation in Arabidopsis thaliana. Science. 2011, 334 (6052): 86-89. 10.1126/science.1209271.PubMedView ArticleGoogle Scholar
- Hancock AM, Brachi B, Faure N, Horton MW, Jarymowycz LB, Sperone FG, Toomajian C, Roux F, Bergelson J: Adaptation to Climate Across the Arabidopsis thaliana Genome. Science. 2011, 334 (6052): 83-86. 10.1126/science.1209244.PubMedView ArticleGoogle Scholar
- Eckert AJ, Wegrzyn JL, Pande B, Jermstad KD, Lee JM, Liechty JD, Tearse BR, Krutovsky KV, Neale DB: Multilocus Patterns of Nucleotide Diversity and Divergence Reveal Positive Selection at Candidate Genes Related to Cold Hardiness in Coastal Douglas Fir (Pseudotsuga menziesii var. menziesii). Genetics. 2009, 183 (1): 289-298. 10.1534/genetics.109.103895.PubMed CentralPubMedView ArticleGoogle Scholar
- Holliday JA, Suren H, Aitken SN: Divergent selection and heterogeneous migration rates across the range of Sitka spruce (Picea sitchensis). Proc R Soc B-Biol Sci. 2012, 279 (1734): 1675-1683. 10.1098/rspb.2011.1805.View ArticleGoogle Scholar
- Lister R, Gregory BD, Ecker JR: Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr Opin Plant Biol. 2009, 12 (2): 107-118. 10.1016/j.pbi.2008.11.004.PubMed CentralPubMedView ArticleGoogle Scholar
- Morozova O, Hirst M, Marra MA: Applications of New Sequencing Technologies for Transcriptome Analysis. Annu Rev Genomics Hum Genet. 2009, 10 (1): 135-151. 10.1146/annurev-genom-082908-145957.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.