Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions
- Xavier Argout1Email author,
- Olivier Fouet1,
- Patrick Wincker2,
- Karina Gramacho3,
- Thierry Legavre1,
- Xavier Sabau1,
- Ange Marie Risterucci1,
- Corinne Da Silva2,
- Julio Cascardo4,
- Mathilde Allegre1,
- David Kuhn5,
- Joseph Verica6,
- Brigitte Courtois1,
- Gaston Loor7,
- Regis Babin8, 9,
- Olivier Sounigo8, 9,
- Michel Ducamp10,
- Mark J Guiltinan6,
- Manuel Ruiz1,
- Laurence Alemanno11,
- Regina Machado12,
- Wilberth Phillips13,
- Ray Schnell5,
- Martin Gilmour14,
- Eric Rosenquist15,
- David Butler16,
- Siela Maximova6 and
- Claire Lanaud1
© Argout et al; licensee BioMed Central Ltd. 2008
Received: 04 June 2008
Accepted: 30 October 2008
Published: 30 October 2008
Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao.
Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species.
Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories.
A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database.
To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection.
A large collection of new genetic markers was provided by this ESTs collection.
This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation.
Theobroma cacao is a diploid species (2n = 2X = 20) with a small genome size of 380 Mbp [1, 2]. It is a tree fruit originating from the tropical rainforest of South America. According to Cheesman (1944) , its center of origin is the lower eastern equatorial slopes of the Andes. T. cacao is now cultivated in all tropical lowlands of the world and its beans are used to produce chocolate and cocoa butter after a post harvest treatment including fermentation, drying and torrefaction steps. T. cacao is one of the major cash crops for several tropical countries. Its economic importance is high and presently cocoa is the third most important internationally traded raw material after sugar and coffee.
Cocoa is mainly produced on smallholdings. It is estimated that approximately 14 million people around the world rely on cacao plantations for income. T. cacao production is seriously affected by several fungal diseases and insect attacks. Oomycetes and especially Phytophthora, spp., (black pod) are responsible, worldwide, for 30% of losses. Several species are involved. P. palmivora is present in the entire cacao growing area, whereas P. capsici and P. citrophthora are prevalent in South America. P. megakarya is limited to some countries in West Africa, however it is by far the most aggressive species causing losses of production up to 50% Harvest losses due to Phytophthora species were estimated to be 450,000 tons .
Two basidiomycetes, Moniliophthora roreri (frosty pod) and Moniliophthora perniciosa (witches' broom) are also responsible for important harvest losses. In Brazil, M. perniciosa was responsible for a drastic yield loss with a fall in production from 405,000 tons in 1986 to less than 130,000 tons in 1998. Moniliophthora roreri causes a very destructive pod rot and has already had dramatic effects in some countries such as Ecuador  and Costa Rica . M. roreri was confined to several countries of Central and northern South America, but is continuously spreading towards other Central American countries like Mexico or southward towards countries like Peru.
Several sources of disease resistance have been identified in different genetic backgrounds, and the search for a sustainable disease resistance, cumulating the different resistance genes is one of the major challenges of T. cacao genetic breeding programs .
Other traits of importance in T. cacao are quality traits. Food quality improvement, nutritional as well as organoleptic, is now a strong demand of consumers. Fundamental knowledge of the genetic basis of quality is an important challenge that can address this demand.
Flavor is among the main criteria of quality for chocolate manufacturers, but these characteristics are largely understudied by the cocoa research and breeding community due to their complexity and a dramatic lack of fundamental knowledge about these traits. Flavour components depend strongly on conditions of post-harvest processing . After pod harvests, fresh seeds need to be fermented for 4 to 6 days, then dried and roasted to develope good cocoa aromas. Raw seeds, embedded in a pulp rich in sugar, undergo biochemical changes under the effect of various microorganisms present in the environment. The initial anaerobic, low pH and high sugar conditions of the pulp favour yeast activity, converting sugars in the pulp to alcohol and carbon dioxide. Bacteria then start oxidising the alcohol into lactic acid and then into acetic acid as conditions become more aerobic. These biochemical changes are accompanied by changes of amount and composition of several compounds having a major effect on cocoa flavor such as peptide aroma precursor formation, procyanidines or terpenes content.
However, it is now well recognized that the genetic origin is also a strong determinant of flavor, independent of the conditions of post-harvest processing .
Although some aromas are prominently defined by a single molecule, most aromas are composed of a bulk of volatile compounds responsible for aroma perception, and belonging to different classes of organic compounds. Interestingly, despite the vast number of chemical structures involved, the large majority of scent compounds are biosynthesized by a surprisingly small number of metabolic pathways. Parts of these metabolic pathways are ubiquitous, and have been developed by small but important modifications of ancestral genes and pathways . In T. cacao more than 500 volatile compounds have been detected. However, only a small number are thought to play a key role in natural aroma variations.
Cocoa is classified into two classes: the «standard quality cocoa» corresponding to 95% of the total market, and the «fine flavor cocoa» produced by T. cacao trees originated from two main varieties: Criollo and Nacional, which bring a higher price in the market.
An important class of volatile compounds, the terpenes, plays an important role in the aromatic flavor of these varieties.
For example, a high level of linalool, a monoterpene, has been observed in Nacional varieties  from Ecuador, characterised by a floral taste, and could be at the origin of this specific flavor which represents an important economic «niche» for the country. However, the modern and hybrid Nacional varieties present a wide range of flavor variations due to introgressions of foreign and more vigorous varieties, leading to a dilution of this specific floral flavor, and recently a part of Ecuador cocoa production was declassified from fine flavor to "bulk cocoa" with a lower price. An increased knowledge of the metabolic pathways and expression of genes involved in terpene synthesis could help to improve the aromatic flavor of new "Nacional" varieties.
Independent to volatile compounds, some other biochemical compounds are known to interact with T. cacao organoleptic traits. This is the case with polyphenols. Catechin, epicatechin and procyanidines are the main polyphenols present in T. cacao. They have well known antioxidant biological activities and beneficial effects on the cardiovascular system [12–14]. Contributing to bitterness and astringency, polyphenols influence T. cacao organoleptic quality [15, 12]. They influence aromatic profiles of T. cacao in restricting Maillard's reactions, which generates a majority of the aromatic compounds of T. cacao.
Genomic research provides new tools to study the genetic and molecular bases of important trait variations: EST sequencing projects carried out on other plant models have allowed the characterization of the transcriptome and facilitated the gene discovery of important trait variations . In tree crops, except for poplar whose genome has been recently sequenced , genomic resources are generally limited, and few large EST collections have been produced. Recently, a citrus EST collection comprising 15,664 putative transcription units  has been produced, allowing the identification of clusters associated with fruit quality, production and salinity tolerance. A cotton study identified 51,107 unigenes from a global assembly of 185,000 cotton ESTs,  providing a framework for future investigation of cotton genomics. The same approach was used to characterize the grape transcriptome during berry development by the analysis and annotation of 25,746 unigenes from 146,075 ESTs .
The objective of this study was to produce a large T. cacao EST collection from a wide range of organs, providing a good representation of T. cacao genes expressed during T. cacao development and suitable for further analysis of all kind of traits in T. cacao. Moreover, we emphasized the production of tools to further study T. cacao diseases, a major constraint for cocoa production, and quality features. Therefore, we also produced cDNA libraries relevant to disease resistance and quality traits. ESTs were produced from T. cacao tissues interacting with various pest and fungal diseases, from seeds at different stages of development and during the fermentation steps. This large EST collection will provide valuable tools to carry out functional genomic studies and discover genes essential to important agronomic and quality trait variation in T. cacao, aiming to accelerate T. cacao improvement. A multidisciplinary approach combining functional genomic and quantitative genetic approaches could lead to a better understanding of gene function involved in disease resistance mechanisms or quality trait variations. T. cacao's phylogenetic proximity to the model plant Arabidopsis will facilitate our understanding of most metabolic pathways. However, T. cacao is a tree, and expresses traits not found in Arabidopsis, thus we hypothesize that genes not found in Arabidopsis play important roles in cacao development.
Results and Discussion
Summary of T. cacao libraries
Good quality ESTs
stem tissues inoculated by Ceratocystis fimbriata
cherels from 1 week to 1 month stage of development
pod tissue inoculated by Phytophthora palmivora
cortex tissue, external part
cortex tissue internal part with lignified chanels
SSH library from tissues inoculated/non inoculated by Phytophthora palmivora
SSH library from tissues inoculated/non inoculated by Phytophthora palmivora
cotyledons from germinated seeds (1 to 3 weeks)
leaves submited to drought stresses
roots submited to drought stresses
epicotyle and hypocotyle from 1 week germinated seeds
epicotyle from 2–3 week germinated seeds
flowers at different stages of development
SSH library from ovaries submitted to compatible/incompatible pollinations
hypocotyle from 2–3 week germinated seeds
young and adult leaves at different stages of development
leaves inoculated by Phytophthora palmivora
leaves inoculated by Phytophthora palmivora
SSH library from leaves inoculated by Phytophthora megakarya from susceptible-resistant PNG seedlings
SSH library from leaves inoculated by Phytophthora megakarya from resistant – susceptible PNG seedlings
SSH library from leaves inoculated by Phytophthora palmivora from resistant – susceptible PNG seedlings
young shoot tissues attacked by Sahlbergella singularis (mirids)
pod tissues inoculated by Moniliophthora roreri
pod tissues inoculated by Monilia roreri
ovaries from 1 to 7 days after pollinations
ovules collected 2 to 3 months after pollination
pod tissues inoculated by Phytophthora megakarya
SSH library from pod tissues inoculated-non inoculated by Moniliophthora perniciosa less than 60 days after inoculation
SSH library from pod tissues inoculated-non inoculated by Moniliophthora perniciosa between 60 to 120 days after inoculation
pod tissues inoculated by Moniliophthora perniciosa less than 60 days after inoculation
pod tissues inoculated by Moniliophthora perniciosa between 60 to 120 days after inoculation
SSH library from leaves of resistant seedlings inoculated-non inoculated by Phytophthora megakarya
SSH library from leaves of resistant seedlings non inoculated- inoculated by Phytophthora palmivora
SSH library from leaves of resistant seedlings inoculated-non inoculated by Phytophthora palmivora
seeds 3 to 3,5 months after pollinations
seeds 4 to 5 months after pollinations
Cotyledons from seeds fermented between 6 H and 4 days
seeds from mature pods 5,5 to 6 months after pollinations
seeds 2 to 5 months after pollinations
fermented seeds during 6 to 26 H
fermented seeds during 32 to 40 H
SSH library from stems inoculated-non inoculated by Ceratocystis fimbriata
SSH library from stems non inoculated-inoculated by Ceratocystis fimbriata
SSH library from young shoots non attacked-attacked by Sahlbergella singularis
SSH library from young shoots attacked-non attacked by Sahlbergella singularis
complete disc of stems 1 cm diameter
SSH library from (and reverse sens) shoot tissues inoculated/non inoculated by Moniliophthora perniciosa less than 18 days after inoculation
SSH library from shoot tissues inoculated-non inoculated by Moniliophthora perniciosa between 18 to 120 days after inoculation
testa from seeds fermented between 6 H and 4 days
testa with pulp from mature seeds
embryogenic and non embryogenic callus in vitro culture
fermented testa during 6 to 40 H
young wilted cherels 7 to 10 days after pollination
bark and cambium part of wood
EST sequencing and assembly
From the 56 libraries, 8565 clones were first sequenced on both strands using forward and reverse primers, to have an overview of the quality of the libraries, and then 163,868 clones were single-pass sequenced from 5' or from 3' end (Table 1). This represented a total number of 180,998 chromatograms that were used in this analysis. After low quality, vector and adapters trimming, 149,650 sequences longer than 100 bp remained as good quality sequences. The average sequence length was 472 bp and 62% were longer than 400 bp. These individual ESTs (available through EMBL-Bank ) were assembled using the TIGR Gene Indices clustering tools (TGICL) . The assembly process produced 12,692 contigs and 35,902 singletons that represented a total of 25.6 Mb of transcripted sequences. The combined set of contigs and singletons resulted in 48,594 unigenes which might correspond to different putative transcripts or different parts of the same transcript found in the Theobroma cacao transcriptome. The average length of this T. cacao non redundant sequences dataset was 527 bp.
An assembly of ESTs has already been published for Theobroma cacao but has been limited to 1380 unigenes (4433 ESTs) from two leaf and bean cacao libraries , to the isolation of 1256 unigenes (2114 ESTs) from cacao leaves treated with inducers of defense response  and to 2926 non redundant sequences from libraries of cacao meristems inoculated by Moniliophthora perniciosa .
The results of this study are more comparable to a cotton EST project , involving 30 cDNA libraries. This analysis detected 51,107 unigenes in approximately 185,000 Gossypium ESTs.
Unigene set annotation
BLASTN against cacao ESTs
The unigene dataset was used to detect how many cacao sequences had not been already described in public databases. To answer this question, we collected all 2539 T. cacao unique sequences already published by the Dana Farber Cancer Institute (DFCI) gene index  and we did a BLASTN search against our unigenes. An e-value cutoff of 1e-50 was used to ensure that only highly similar sequences were detected. A total of 3901 unigenes produced a significant hit with 1788 unique sequences from the DFCI gene index, therefore these sequences may correspond to T. cacao sequences already published or may match different parts of the gene index sequences. They may be also produced by closely related genes (multigenic families). Finally, 44,693 unique sequences did not produce a significant hit, therefore these sequences may be new.
BLASTX and BLASTN annotation
To further investigate this unexpected result we compared with the BLASTX program the cacao unigenes dataset against the two proteomes of Arabidopsis thaliana and Vitis vinifera (Figures 3B, C). For each Blast result, we selected the species found in the first hit having an expected value lower than 1e-15 to detect similar sequence. A total of 25,049 Theobroma cacao sequences (56%) presented at least a significant hit with an Arabidopsis thaliana or Vitis vinifera protein. The results showed that 18,643 Theobroma cacao sequences presented a higher similarity to the Vitis vinifera proteome whereas only 6406 Theobroma cacao sequences presented a first Blast hit similar to the Arabidopsis thaliana proteome (Figure 3B). Moreover, it was determined that these first significant hits involved 9943 Vitis vinifera proteins (33% of the proteome) and 4246 Arabidopsis thaliana proteins (12% of the proteome) (Figure 3C).
These surprising results suggest that the genes expressed in Theobroma cacao are more similar to Vitis vinifera proteins than to those of Arabidopsis thaliana. These findings could be explained by the fact that Theobroma cacao and Vitis vinifera are both fruit trees. This idea could be supported by the large amount of Blast hits found with other tree crops such as Populus trichocarpa (8605 Blast hits), despite a small number of non redundant proteins in the databases for this species.
Gene Ontology annotation
Genes involved in defense and resistance mechanisms
Some of the libraries provide an important resource to study plant/pathogens interactions. Using the annotations provided by Blast and Gene Ontology, we specifically focussed on genes known to play a crucial role in plant pathogen resistance and defense mechanisms . Using the AmiGO browser, we identified 1001 gene product associations to "response to stress" (GO:0006950). Both searches with Blast result and Gene Ontology annotation resulted in the identification of unigenes similar to known proteins involved in resistance or defense mechanisms such as LRR-NBS  (8 contigs and 32 singletons), chitinase  (19 contigs and 37 singletons), 1–3 beta glucanase  (5 contigs and 7 singletons) or pathogenesis-Related protein (24 contigs and 24 singletons).
Other genes related to resistance/defense mechanisms were also found more specifically in libraries produced from pathogen infected tissues, such as those involved in regulation of pathogen-induced genes like transcription factors (6 contigs and 7 singletons), in signal transduction (like MAPKinase with 5 contigs and 3 singletons) or in the cell death program.
The identification of a unigene set gathering sequences from all genes known to be involved in plant resistance and defense mechanisms, and the construction of a corresponding microarray could constitute a valuable tool to progress in the understanding of plant/pathogens interactions.
Genes involved in particular metabolic pathways or biological activities
To check the representativeness of our EST collection, we looked for ESTs encoding proteins known to be involved in the flavonoid and the terpene pathways, already studied in other plant species, and at the basis of important traits of interest in T. cacao. Generally, polyphenols play a major role in chocolate quality, acting as colour precursors or taste agents . Moreover, they are strongly implicated in health benefits associated with chocolate consumption [37–40].
The flavonoid pathway
The flavonoid pathway has been already studied in several plants . In T. cacao, this pathway is the source of numerous essential components for human health benefits of chocolate [37–40] and resistance against pathogens .
The terpene pathway
One of our goals was to identify enzymes involved in the terpenoid pathway that could be responsible for linalool content variations among Nacional clones. As a first step we identified sequences encoding isoprenoid pathway enzymes (42 contigs and 55 singletons). The final step enzyme for linalool synthesis, linalool synthase, was represented by 2 contigs and 4 singletons. Nearly all enzymes reported to be involved in this biochemical pathway were present in our ESTtik database, allowing the analysis of the T. cacao terpene pathway based on oligonucleotide microarrays derived from these ESTs.
The fact that nearly all of the genes involved in these two pathways as described in other plant species were identified in ESTs from our collection demonstrates the high level of representation of this resource and suggests that the majority of cacao genes have been sampled. Thus, this EST collection offers a comprehensive resource to search for candidate genes involved in quality traits and other important agronomical traits variation.
Production of SSR and SNP markers
Molecular markers derived from ESTs are part of, or adjacent to genes, and therefore they provide an efficient means of gene mapping.
Distribution of motifs length in SSRs dataset
Number of SSRs
Distribution of dimers and trimers motifs in SSRs dataset
Number of SSRs
For each SSR identified, if possible, 3 couples of primers were defined using Primer3 . A total of 5265 flanking sequences were designed and it was possible to define at least one couple of primers for 1755 SSRs.
The exploration of redundant ESTs in contigs was shown to be a valuable resource of Single Nucleotide Polymorphisms (SNP) . SNPs were detected using QualitySNP  pipeline from unigene contigs. We assumed that contigs with at least 100 members contained paralogous sequences [50, 51] therefore we selected 4818 contigs that contained at least 4 sequences but no more than 100 sequences. A preliminary study assembled 5246 SNPs into 2012 contigs. Transitions (A/T-G/C) represented 54.2% of the SNPs found, transversions 32.1% and InDels 13.7%.
The present assembly of 149,650 T. cacao ESTs produced from 56 cDNA libraries constructed from different organs and environmental conditions is the largest transcriptome dataset produced so far for T. cacao, and among the largest ones generated for any tree fruit crop. It provides a major resource for cacao genetic and functional genomic analyses of important T. cacao traits, with the identification and annotation of 48,594 different putative transcripts.
The improved knowledge of the T. cacao transcriptome will enhance our understanding of main disease resistance mechanisms and will be useful to improve new varieties and establish a sustainable T. cacao resistance to pests and diseases. Towards this goal, a large number of cDNA libraries have been produced from T. cacao/pathogens or pest interactions, and an important set of unique transcripts homologous to genes known in other species involved in defense and resistance mechanisms have been identified in the whole EST collection using keywords and Gene Ontology tools. It provides a cDNA resource available for the broad scientific community and suitable for cDNA-based microarray analyses.
This collection of ESTs also provides a valuable framework for the discovery of candidate genes involved in chocolate quality traits. Tested for two distinct metabolic pathways, this collection displays a good representation of the T. cacao transcriptome involved in quality trait elaboration and will allow the comparative analysis of contrasting genotypes for T. cacao qualities to better understand the genetic basis of quality.
This EST collection also will provide a large number of genetic tools, such as SSR and SNP markers, which will be used to construct high density gene maps, facilitating the integration of genetic and genomic approaches to discover the genes that effect trait variations, and also facilitating the sequence assembly in further activities of whole T. cacao genome sequencing.
Finally, the assembly and annotation associated will also provide a valuable resource for future investigation of T. cacao evolutionary genomics with related species such as Gossypium hirsutum or Arabidopsis thaliana.
Material used for libraries construction
In total, 56 different libraries were constructed. The organs and T. cacao genotypes used for cDNA construction, and the treatments carried out on these organs are reported in Table 1.
Scavina 6 (SCA6) is a self incompatible Forastero genotype originating from the Upper Amazonian region of Peru. SCA6 is highly resistant to Phytophthora species and Moniliophthora perniciosa diseases. It has been widely used in the breeding programs.
ICS1 is a self compatible Trinitario genotype, a hybrid involving Criollo, the first T. cacao variety domesticated in Central America, and a Forastero variety originated from the Lower Amazonia of Brazil; ICS1 is known for its large beans and good quality traits. This clone was used for RNA production during the different stages of development of the T. cacao seeds.
A post harvest treatment is generally applied to T. cacao seeds to develop chocolate, involving fermentation steps, drying and torrefaction. Tissues from ICS1 Seeds were collected during the first 2 days of fermentation to construct cDNA libraries.
Other genotypes were used more specifically to represent particular traits or genetic origins:
- Jaca is a Brazilian Forastero genotype from the Upper Amazonian region, and resistant to Ceratocystis fimbriata. Inoculation was done according to Silva et al. 
- B97 C-C-2 is a pure and homozygous Criollo genotype. This material was collected in Belize  by a mission conducted by the CRU (Cocoa Research Unit, Univ. West Indies, Trinidad) in conjunction with The Maya Mountain Archaeological Project (MMAP – Cleveland State Univ.) and is now grown in the international collection of CRU.
- GU255V is a genotype originated from French Guyana, resistant to Phytophthora palmivora. Inoculation was done according to Tahi et al. 
- PNG seedlings are from a progeny produced in Papua New Guinea from the cross of two hybrids: 17/3-1 × 36/3-1, and segregating for Phytophthora resistance. Inoculation was done according to Tahi et al. 
- UF676 is a Trinitario genotype tolerant to mirids. Insect attack was done using protocol described by Babin et al. .
- P7, IMC47, UPA134 are Forastero genotypes originated from the Upper Amazonian region of Peru, known for their resistance to Phytophthora palmivora or P. megakarya. Inoculation was done according to Tahi et al. 
- UF 273 is a Trinitario genotype resistant to Moniliophthora rorer. Inoculation was done according to Khun et al. 
- 33–49 and BE240 are Nacional genotypes from Ecuador known for their aromatic and floral taste.
SSH libraries or direct libraries were constructed from these genotypes. More information related to these genotypes is available through the International Cocoa Germplasm Database .
Drought Stress Libraries were constructed from total RNA isolated from leaves and roots of Scavina 6 plants that were initially grown under standard conditions in a greenhouse . Rooted cuttings were generated and grown to about 6 months old, then were moved into a Conviron growth chamber and were not watered until leaves were visibly wilted (approx 36 hours) at which time tissues were flash frozen in liquid nitrogen.
Plant tissues were frozen in liquid nitrogen or placed in RNA stabilization reagent (RNA later™, Qiagen) and stored at -20°C before RNA extraction. Approximately 100 mg of plant tissues were crushed in liquid nitrogen with poly-vinyl-poly pyrrolidone. The powder was transferred in a tube containing 1 ml of extraction buffer " TE3D " (14.8 g EDTA, 84.4 g Tris, 20 g Nonidet P-40, 30 g lithium dodecyl sulfate, 20 g sodium deoxycholate, 95 ml H2O) . After 15 min incubation at room temperature, 1 ml of sodium acetate (3 M) and one volume of chloroformisoamyl alcohol (24:1) were added. Purification of the aqueous phase was carried out following centrifugation by adding one volume of mixed alkyl tri-ethyl ammonium bromine solution (2% MATAB, 3 M NaCl) followed by 15 min at 74°C. The residual polysaccharides were then eliminated by addition of one volume of chloroformisoamyl alcohol (24:1) and centrifugation; the aqueous phase was precipitated by the addition of one volume of isopropyl alcohol. After centrifugation, the pellet was resuspended in 50 μl of ribonuclease free water containing 1 μl of ribonuclease inhibitor (RiboLock™, Fermentas).
RNA samples from cacao tissues were isolated following the procedure of Charbit et al  with modifications. Following DNase treatment (DNase I, Fermentas), RNA was then extracted with the phenolchloroformisoamyl alcohol (25:24:1) step and precipitated with one-tenth volume of 3 M sodium acetate, pH 5.3, and 2.5 volumes of 100% ethyl alcohol. An aliquot of RNA was then run by elecrophoresis on a 1.2% agarose gel and stained with ethidium bromide to confirm RNA integrity.
Construction of full-length enriched cDNA library
First strand cDNA were synthesized using the Clontech BD SMART PCR cDNA Synthesis KIT (cat No 634902) as recommended by the supplier. 0.5–1 μg of total RNA was incubated at 72°C for 2 min with 1 μl 3' BD SMART CDS Primer II A (12 μM) and 1 μl BD SMART II A Oligonucleotide (12 μM) in a total volume of 5 μl. Then 2 μl 5× First-Strand Buffer, 1 μl DTT (20 mM); 1 μl dNTP Mix (10 mM of each dNTP), 1 μl BD PowerScript Reverse Transcriptase were added and the mix was incubated at 42°C for 1 hour in an air incubator. According to Glen K Fu (2003) , 3 μl Biotin-dATP (Invitrogen), 3 μl Biotin-dCTP (Invitrogen), 1 μl 5'-NVVVVV-3' primer 30 μM (50 ng), 2 μl 5× First-Strand Buffer, 1 μl BD PowerScript Reverse Transcriptase were added, and the mix was kept at 42°C for 30 min. For capture of the unfinished strand, the reaction was mixed with 600 μl of Streptavidine MagneSphere Paramagnetic Particles (Promega) and eluted as recommended by the supplier.
A 2 μl aliquot from the first strand synthesis was used for the cDNA Amplification by LD PCR (Clontech). Each reaction was performed with 80 μl deionized water, 10 μl 10× BD Advantage 2 PCR Buffer, 2 μl 50× dNTP Mix (10 mM of each dNTP), 4 μl 5' PCR Primer II A (12 μM), 2 μl 50× BD Advantage 2 Polymerase Mix in a 98 μl total volume. The PCR reaction consisted of 18 to 25 PCR cycles at 95°C for 15 sec, 65°C for 30 sec, 68°C for 6 min, following with a final extension at 70°c for 10 min.
After comparison of fragment sizes with those of model species (rice and Arabidopsis), fragment sizes of some cDNA libraries were improved using cDNA size fractionation. These libraries were submitted to an "agarase step"  after 18 cycles PCR. Double-stranded cDNA was separated on 1% low-melting agarose gel and the DNA ladder "lane" was stained and photographed with a ruler. Two size fractions (< 1.2 kb and > 1.2 kb) were excised from the unstained cDNA "lane" based on the DNA ladder "lane". cDNAs were extracted from the gel slices with agarase (Fermentas) according to the supplier instructions. After a gelase digestion, the cDNA was precipitated with one volume of isopropanol. The pellets were dried and suspended in ribonuclease free water. Four to five additional PCR cycles were performed in order to improve the efficiency of ligation in pGEM®-T Easy Vector.
For SSH cDNA libraries: The procedure was performed with the PCR-Select cDNA Substraction kit (Clontech) according to the manufacturer's recommendations with slight modifications. The cDNA generated from the SMART procedure was restricted with 15 U of Rsa I (Fermentas) and the two aliquots of the tester cDNA were ligated to adaptors 1 and 2R, respectively, with 30 U of T4 DNA ligase (Fermentas). The PCR mixture enriched for differentially expressed sequences was cloned using pGEMT (Promega) as mentioned above.
One μl of the second strand product was cloned in pGEM®-T Easy Vector Systems (Promega) and transformed by electroporation in the DH10B T1 resistant strain of Escherichia coli (Invitrogen); transformation products were plated on LB-ampicillin agar plates and incubated overnight at 37°C. White colonies were picked using a Qpix 2 XT biorobot (Genetix) and stored in 384 well plates at -80°C.
All clones were end-sequenced using either Forward or Reverse M13 primers. The sequencing reactions were performed with Applied Biosystems BigDye V3.1 kits, and were resolved on ABI3730xl DNA Analysers
The software Phred  was used for base calling linked to Vecscreen  for vector and adapters trimming. Cleaning of sequences was performed with the standalone low complexity filter mdust and bioperl modules. Each forward and reverse ESTs were individually assembled with the CAP3 program, using an overlap percent identity cutoff of 65 (p) and an overlap length cutoff of 20 (o).
Special attention has been paid to the global assembly of ESTs, in order to obtain the most representative transcription units. The TGI Clustering tools (TGICL) were used because they provide an optimized protocol for the analysis of EST sequences . This package performs a clustering phase (using megablast) without multiple alignments, and then creates contigs (consensus sequences) with the assembly program CAP3. Many parameters were tested and because we had clusters made of ESTs coming from several highly expressed genes, we increased the clustering and assembly stringency. For the clustering step, we used a minimum percent identity for overlaps (p) of 94, a minimum overlap length (l) of 30, a maximum length of unmatched overhangs (v) of 30. For the assembly, we used a specify overlap percent identity cutoff (p) of 93.
Similarity searches were performed with the standalone version 2.2.16 of BLAST  against non redundant proteins and nucleotides. The XML Blast output was used and parsing of results was performed with the Bio::SearchIO module of Bioperl toolkit .
We built a local Blast2GO MySQL database and we first used the Blast2GO program  with default parameters to assign Gene Ontology (GO) terms to the unigenes based on the BLAST definitions. To best exploit GO annotations, results were integrated into a local AmiGO browser and database.
The QualitySNP pipeline  was used for detecting single nucleotide polymorphisms in the unigenes.
Sequence data, molecular markers and high quality annotation will be integrated into CocoaGen DB , a Web portal developed for combining T. cacao molecular genetic and genomic information from TropgeneDB  and phenotypic data from The International Cocoa Germplasm Database . The individual ESTs of the 56 libraries were deposited in the EMBL database under accession CU469588 to CU633156.
We thank USDA and MARS for their financial support in this project. We also gratefully acknowledge CNRG for having funded and carried out the sequencing work of the project. Finally we wish to thank Renaud Boulanger for critically reading the manuscript.
- Figueira A, Janik J, Goldsbrough P: Genome size and DNA polymorphism in Theobroma cacao. Journal of the American Society for Horticultural Science. 1992, 117: 673-677.Google Scholar
- Lanaud C, Hamon P, Duperray C: Estimation of nuclear DNA content of Theobroma cacao L. by flow cytometry. Café, Cacao, Thé. 1992, 36: 3-8.Google Scholar
- Cheesman EE: Notes on the nomenclature, classification possible and relationships of cocoa populations. Tropical Agriculture. 1944, 21: 144-159.Google Scholar
- Bowers JH, Bailey BA, Hebbar PK, Sanogo S, Lumsden RD: The impact of plant diseases on world chocolate production. Plant Health Progress. 2001Google Scholar
- Ampuero E: Monilia pod rot of cocoa. Cocoa Grower's Bulletin. 1967, 9: 1518-Google Scholar
- Enriquez GA, Brenes O, Delgado JC: Development and impact of Monilia pod rot of cacao in Costa Rica. Proceedings of the 8th International Cocoa Research Conference, Cartagena, Colombia, 18–23 Oct, 1981. 1982, 375-380.Google Scholar
- Guiltinan MJ, Verica JA, Zhang D, Figueira A: Genomics of Theobroma cacao, the chocolate tree. Genomics of Tropical Crop Plants. Edited by: PaM Moore R. 2007,Google Scholar
- Chanliau S, Cros E: Influence du traitement post-récolte et de la torréfaction sur le développement de l'arôme cacao. 12th Alliance's Inter Cocoa Conf, Salvador de Bahia (Brazil). 1996, 959-964.Google Scholar
- Clapperton JF, Yow STK, Chan J, Lim DHK: Effects of planting materials on flavour. Cocoa Growers' Bulletin. 1994, 48: 47-59.Google Scholar
- Pichersky E, Gang DR: Genetics and biochemistry of secondary metabolites in plants: An evolutionary perspective. Trends Plant Sci. 2000, 205: 439-445. 10.1016/S1360-1385(00)01741-6.View ArticleGoogle Scholar
- Ziegleder G: Linalol contents as characteristics of some flavour grade cocoas. Z Lebensm. Unters Forsch. 1990, 191: 306-309. 10.1007/BF01202432.View ArticleGoogle Scholar
- Counet C, Ouwerx C, Rosoux D, Collin S: Relationship between Procyanidin and Flavor Contents of Cocoa Liquors from Different Origins. J Agric Food Chem. 2004, 52: 6243-6249. 10.1021/jf040105b.PubMedView ArticleGoogle Scholar
- Miller KB, Stuart DA, Smith NL, Lee CY, McHale NL, Flanagan JA, Ou B, Hurst WJ: Antioxidant activity and polyphenol and procyanidin contents of selected commercially available cocoa-containing and chocolate products in the United States. J Agric Food Chem. 2006, 54: 4062-4068. 10.1021/jf060290o.PubMedView ArticleGoogle Scholar
- Gu L, House SE, Wu X, Ou B, Prior RL: Procyanidin and catechin contents and antioxidant capacity of cocoa and chocolate products. J Agric Food Chem. 2006, 54: 4057-4061. 10.1021/jf060360r.PubMedView ArticleGoogle Scholar
- Stark T, Bareuther S, Hofmann T: Sensory-guided decomposition of roasted cocoa nibs (Theobroma cacao) and structure determination of taste-active polyphenols. J Agric Food Chem. 2005, 53: 5407-5418. 10.1021/jf050457y.PubMedView ArticleGoogle Scholar
- Ewing RM, Ben Kahla A, Poirot O, Lopez F, Audic S, Claverie JM: Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome research. 1999, 9 (10): 950-959. 10.1101/gr.9.10.950.PubMedPubMed CentralView ArticleGoogle Scholar
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science (New York, NY). 2006, 313 (5793): 1596-1604.View ArticleGoogle Scholar
- Terol J, Conesa A, Colmenero JM, Cercos M, Tadeo F, Agusti J, Alos E, Andres F, Soler G, Brumos J: Analysis of 13000 unique Citrus clusters associated with fruit quality, production and salinity tolerance. BMC genomics. 2007, 8: 31-10.1186/1471-2164-8-31.PubMedPubMed CentralView ArticleGoogle Scholar
- Udall JA, Swanson JM, Haller K, Rapp RA, Sparks ME, Hatfield J, Yu Y, Wu Y, Dowd C, Arpat AB: A global assembly of cotton ESTs. Genome research. 2006, 16 (3): 441-450. 10.1101/gr.4602906.PubMedPubMed CentralView ArticleGoogle Scholar
- da Silva FG, Iandolino A, Al-Kayal F, Bohlmann MC, Cushman MA, Lim H, Ergul A, Figueroa R, Kabuloglu EK, Osborne C: Characterizing the grape transcriptome. Analysis of expressed sequence tags from multiple Vitis species and development of a compendium of gene expression during berry development. Plant physiology. 2005, 139 (2): 574-597. 10.1104/pp.105.065748.PubMedView ArticleGoogle Scholar
- Jones PG, Allaway D, Gilmour DM, Harris C, Rankin D, Retzel ER, Jones CA: Gene discovery and microarray analysis of cacao (Theobroma cacao L.) varieties. Planta. 2002, 216 (2): 255-264. 10.1007/s00425-002-0882-6.PubMedView ArticleGoogle Scholar
- Leal GALJ, Albuquerque PSB, Figueira A: Genes differentially expressed in Theobroma cacao associated with resistance to witches' broom disease caused by Crinipellis perniciosa. Molecular Plant Pathology. 2007, 8 (3): 279-292. 10.1111/j.1364-3703.2007.00393.x.PubMedView ArticleGoogle Scholar
- Verica JA, Maximova SN, Strem MD, Carlson JE, Bailey BA, Guiltinan MJ: Isolation of ESTs from cacao (Theobroma cacao L.) leaves treated with inducers of the defense response. Plant cell reports. 2004, 23 (6): 404-413. 10.1007/s00299-004-0852-5.PubMedView ArticleGoogle Scholar
- Gesteira AS, Micheli F, Carels N, Da Silva AC, Gramacho KP, Schuster I, Macedo JN, Pereira GA, Cascardo JC: Comparative analysis of expressed genes from cacao meristems infected by Moniliophthora perniciosa. Annals of botany. 2007, 100 (1): 129-140. 10.1093/aob/mcm092.PubMedPubMed CentralView ArticleGoogle Scholar
- Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M, Baldwin A, Bates K, Bhattacharyya S, Bower L, Browne P: EMBL Nucleotide Sequence Database in 2006. Nucleic acids research. 2007, D16-20. 10.1093/nar/gkl913. 35 Database
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics (Oxford, England). 2003, 19 (5): 651-652. 10.1093/bioinformatics/btg034.View ArticleGoogle Scholar
- Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R, Antonescu V, Chan A, Cheung F, Quackenbush J: The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic acids research. 2005, D71-74. 33 Database
- Zhu XY, Chase MW, Qiu YL, Kong HZ, Dilcher DL, Li JH, Chen ZD: Mitochondrial matR sequences help to resolve deep phylogenetic relationships in rosids. BMC evolutionary biology. 2007, 7: 217-10.1186/1471-2148-7-217.PubMedPubMed CentralView ArticleGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics (Oxford, England). 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000, 25 (1): 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- AmiGO browser. [http://amigo.geneontology.org/cgi-bin/amigo/go.cgi]
- Walters D, Newton A, Lyon G: Induced resistance for plant defence. 2007, Blackwell PublishingView ArticleGoogle Scholar
- DeYoung BJ, Innes RW: Plant NBS-LRR proteins in pathogen sensing and host defense. Nat Immunol. 2006, 7 (12): 1243-1249. 10.1038/ni1410.PubMedPubMed CentralView ArticleGoogle Scholar
- Mishra NS, Tuteja R, Tuteja N: Signaling through MAP kinase networks in plants. Arch Biochem Biophys. 2006, 452 (1): 55-68. 10.1016/j.abb.2006.05.001.PubMedView ArticleGoogle Scholar
- Wróbel-Kwiatkowska M, Lorenc-Kukula K, Starzycki M, Oszmianski J, Kepczynska E, Szopa J: Expression of [beta]-1,3-glucanase in flax causes increased resistance to fungi. Physiological and Molecular Plant Pathology. 2004, 65 (5): 245-256. 10.1016/j.pmpp.2005.02.008.View ArticleGoogle Scholar
- Wollgast J, Anklam E: Review on polyphenols in Theobroma cacao: changes in composition during the manufacture of chocolate and methodology for identification and quantification. Food Research International. 2000, 33: 423-447. 10.1016/S0963-9969(00)00068-5.View ArticleGoogle Scholar
- Dreosti IE: Antioxydant Polyphenols in Tea, Cocoa, and Wine. Nutrition. 2000, 16 (7/8): 692-694. 10.1016/S0899-9007(00)00304-X.PubMedView ArticleGoogle Scholar
- Othman A, Ismail A, Ghani NA, Adenan I: Antioxydant capacity and phenolic content of cocoa beans. Food Chemistry. 2007, 100: 1523-1530. 10.1016/j.foodchem.2005.12.021.View ArticleGoogle Scholar
- Steinberg FM, Bearden MM, Keen CL: Cocoa and chocolate flavonoids: implications for cardiovascular health. Journal of the American Dietetic Association. 2003, 103 (2): 215-223. 10.1053/jada.2003.50028.PubMedView ArticleGoogle Scholar
- Wollgast J, Anklam E: Polyphenols in chocolate: is there a contribution to human health?. Food Research International. 2000, 33: 449-459. 10.1016/S0963-9969(00)00069-7.View ArticleGoogle Scholar
- Winkel-Shirley B: Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant physiology. 2001, 126 (2): 485-493. 10.1104/pp.126.2.485.PubMedPubMed CentralView ArticleGoogle Scholar
- Djocgoue PF, Boudjeko T, Mbouobda HD, Nankeu DJ, El Hadrami I, Omokolo ND: Heritability of phenols in the resistance of Theobroma cacao against Phytophthora megakarya, the causal agent of black pod disease. Journal of Phytopathology. 2007, 155: 519-525. 10.1111/j.1439-0434.2007.01268.x.View ArticleGoogle Scholar
- Chanliau S, Cros E: Influence du traitement post-récolte et de la torréfaction sur le développement de l'arôme cacao. 12th International Cocoa Research Conference, Salvador de Bahia (Brazil). 1996, 959-964.Google Scholar
- Loor RG, Risterucci AM, Fouet O, Courtois B, Amores F, Suarez C, Rosenquist E, Vasco A, Madina M, Lanaud C: Genetic diversity analysis of the Nacional cacao type from Equator. 15th International Cocoa Research Conference, San José, Costa Rica. 2006Google Scholar
- Cros E: Cocoa flavor development. Effects of post-harvest processing. Manufacturing Confectioner. 1999, 79: 70-77.Google Scholar
- MISA – MIcroSAtellite identification tool. [http://pgrc.ipk-gatersleben.de/misa/]
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods in molecular biology (Clifton, NJ). 2000, 132: 365-386.Google Scholar
- Buetow KH, Edmonson MN, Cassidy AB: Reliable identification of large numbers of candidate SNPs from public EST data. Nature genetics. 1999, 21 (3): 323-325. 10.1038/6851.PubMedView ArticleGoogle Scholar
- Tang J, Vosman B, Voorrips RE, Linden van der CG, Leunissen JA: QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC bioinformatics. 2006, 7: 438-10.1186/1471-2105-7-438.PubMedPubMed CentralView ArticleGoogle Scholar
- Batley J, Barker G, O'Sullivan H, Edwards KJ, Edwards D: Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant physiology. 2003, 132 (1): 84-91. 10.1104/pp.102.019422.PubMedPubMed CentralView ArticleGoogle Scholar
- Dantec LL, Chagne D, Pot D, Cantin O, Garnier-Gere P, Bedon F, Frigerio JM, Chaumeil P, Leger P, Garcia V: Automated SNP detection in expressed sequence tags: statistical considerations and application to maritime pine sequences. Plant molecular biology. 2004, 54 (3): 461-470. 10.1023/B:PLAN.0000036376.11710.6f.PubMedView ArticleGoogle Scholar
- Silva SDVM, Mandarino EP, Damaceno VO, Santos Filho LP: Reação de genótipos de cacaueiros a isolados de Ceratocystis cacaofunesta. Fitopatologia Brasileira. 2007, 32: 504-506.View ArticleGoogle Scholar
- Mooleedhar V: A Study of the Morphological Variation in a Relic Criollo Cacao Population from Belize. Annual report CRU/The University of West Indies. 1997, 5-14.Google Scholar
- Tahi M, Kebe I, Eskes AB, Ouattara S, Sangaré A, Mondeil F: Rapid screening of cacao genotypes for field resistance to Phytophthora palmivora using leaves, twigs and roots. Eur J Plant Pathol. 2000, 106: 87-94. 10.1023/A:1008747800191.View ArticleGoogle Scholar
- Babin R, Sounigo O, Dibog L, Nyassé S: Field tests for antixenosis and tolerance of cocoa towards mirids. Ingenic Newsletter. 2004, 9: 45-50.Google Scholar
- Kuhn DN, MacArthur HC, Nakamura K, Borrone JW, Schnell RJ, Brown JS, Johnson ES, Phillips-Mora W: Development of molecular genetic markers from a cDNA subtraction library of frosty pod inoculated cacao. 15th International Cocoa Research Conference: 2006. 2006, San Jose, Costa Rica, 179-184.Google Scholar
- The International Cocoa Germplasm Database. [http://www.icgd.rdg.ac.uk/]
- Maximova S, Miller C, Antunez de Mayolo G, Pishak S, Young A, Guiltinan MJ: Stable transformation of Theobroma cacao L. and influence of matrix attachment regions on GFP expression. Plant cell reports. 2003, 21 (9): 872-883.PubMedGoogle Scholar
- Charbit E, Legavre T, Lardet L, Bourgeois E, Ferriere N, Carron M: Identification of differentially expressed cDNA sequences and histological characteristics of Hevea brasiliensis calli in relation to their embryogenic and regenerative capacities. Plant cell reports. 2004, 8: 539-548. 10.1007/s00299-003-0737-z.View ArticleGoogle Scholar
- Fu GK, Stuve LL: Improved method for the construction of full-length enriched cDNA libraries. Biotechnique. 2003, 34: 954-957.Google Scholar
- Wellenreuther R, Schupp I, The German cDNA Consortium, Poustka A, Wiemann S: SMART amplification combined with cDNA size fractionation in order to obtain large full-length clones. BMC genomics. 2004, 5: 36-44. 10.1186/1471-2164-5-36.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. Journal of molecular biology. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome research. 1998, 8 (3): 175-185.PubMedView ArticleGoogle Scholar
- Vecscreen. [http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html]
- Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, Quackenbush J: An optimized protocol for analysis of EST sequences. Nucleic acids research. 2000, 28 (18): 3657-3665. 10.1093/nar/28.18.3657.PubMedPubMed CentralView ArticleGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H: The Bioperl toolkit: Perl modules for the life sciences. Genome research. 2002, 12 (10): 1611-1618. 10.1101/gr.361602.PubMedPubMed CentralView ArticleGoogle Scholar
- CocoaGen DB. [http://cocoagendb.cirad.fr]
- Ruiz M, Rouard M, Raboin LM, Lartaud M, Lagoda P, Courtois B: TropGENE-DB, a multi-tropical crop information system. Nucleic acids research. 2004, D364-367. 10.1093/nar/gkh105. 32 Database
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.