Generation and analysis of ESTs from strawberry (Fragaria xananassa) fruits and evaluation of their utility in genetic and molecular studies

Background Cultivated strawberry is a hybrid octoploid species (Fragaria xananassa Duchesne ex. Rozier) whose fruit is highly appreciated due to its organoleptic properties and health benefits. Despite recent studies on the control of its growth and ripening processes, information about the role played by different hormones on these processes remains elusive. Further advancement of this knowledge is hampered by the limited sequence information on genes from this species, despite the abundant information available on genes from the wild diploid relative Fragaria vesca. However, the diploid species, or one ancestor, only partially contributes to the genome of the cultivated octoploid. We have produced a collection of expressed sequence tags (ESTs) from different cDNA libraries prepared from different fruit parts and developmental stages. The collection has been analysed and the sequence information used to explore the involvement of different hormones in fruit developmental processes, and for the comparison of transcripts in the receptacle of ripe fruits of diploid and octoploid species. The study is particularly important since the commercial fruit is indeed an enlarged flower receptacle with the true fruits, the achenes, on the surface and connected through a network of vascular vessels to the central pith. Results We have sequenced over 4,500 ESTs from Fragaria xananassa, thus doubling the number of ESTs available in the GenBank of this species. We then assembled this information together with that available from F. xananassa resulting a total of 7,096 unigenes. The identification of SSRs and SNPs in many of the ESTs allowed their conversion into functional molecular markers. The availability of libraries prepared from green growing fruits has allowed the cloning of cDNAs encoding for genes of auxin, ethylene and brassinosteroid signalling processes, followed by expression studies in selected fruit parts and developmental stages. In addition, the sequence information generated in the project, jointly with previous information on sequences from both F. xananassa and F. vesca, has allowed designing an oligo-based microarray that has been used to compare the transcriptome of the ripe receptacle of the diploid and octoploid species. Comparison of the transcriptomes, grouping the genes by biological processes, points to differences being quantitative rather than qualitative. Conclusions The present study generates essential knowledge and molecular tools that will be useful in improving investigations at the molecular level in cultivated strawberry (F. xananassa). This knowledge is likely to provide useful resources in the ongoing breeding programs. The sequence information has already allowed the development of molecular markers that have been applied to germplasm characterization and could be eventually used in QTL analysis. Massive transcription analysis can be of utility to target specific genes to be further studied, by their involvement in the different plant developmental processes.


Background
Strawberry (Fragaria xananassa Duchesne ex. Rozier) is one of the most important berry crops in the world; in 2008 its production was approximately 4 million metric tons [1]. The benefits that strawberry fruit consumption has on cardiovascular, neurodegenerative, and other diseases like aging, obesity, and cancer has been a subject of increased study over recent years [2]. The strawberry belongs to the family Rosaceae in the genus Fragaria. There are four basic fertility groups in Fragaria that are associated primarily with their ploidy level or chromosome number. The most common native species, F. vesca L., has 14 chromosomes and it is considered to be a diploid and proposed as model for the genus [3]. The most important cultivated strawberry is a perennial and herbaceous octoploid plant, with fifty six chromosomes (2n = 8× = 56), that stems from the cross of the octoploids F. virginiana Duchesne from eastern North America, which was noted for its fine flavour, and F. chiloensis (L.) Mill. from Chile, noted for its large size [3]. Numerous varieties of strawberries have been developed in the temperate zones of the world by different breeding programs.
Strawberry has been considered a non-climacteric fruit, since there is no concomitant burst of respiration and production of the hormone ethylene that triggers the ripening process [4,5]. The berry results from the development of the flower receptacle that consists of a pith at the centre, a fleshy cortex, an epidermis, and a ring of vascular bundles with branches leading to the achenes, the true fruits. Each achene contains a single seed and a hard pericarp. The achenes are attached to the receptacle by vascular strands. When classifying the strawberry as non-climacteric, no distinction was made between the receptacle and the achenes. Growth and ripening of strawberry fruits is an important field of research, which includes the role played by hormones, the synthesis of anthocyanins and flavour compounds, and the cell wall changes occurring during the late stages of ripening. It is reasonable to think that those changes that are important for fruit quality, like anthocyanins and flavour content, as well as fruit softening, mostly rest on the receptacle, whereas hormone control of the process might be supported by the achenes. Therefore, the generation of tools to distinguish the functional roles of these two parts in the growth and ripening of the whole berry is important.
The hormone auxin, which is supplied by the achenes, is considered as a key regulator of growth and ripening. Removal the achenes from the receptacle has different effects depending on the developmental stage. In the early green stage it stops receptacle growth, whereas in the late green and white stages it accelerates ripening [6]. Interestingly, both effects are suppressed by the exogenous application of auxin restoring normal development [7], [8]. Therefore the role of ethylene in fruit ripening has been considered as negligible. Recently, however, it has been reported that the achenes of red fruits produce ethylene at low concentrations, although its role in fruit ripening is unclear [5].
Genes related to biochemical processes and metabolites, such as the health promoting metabolites anthocyanin [9] and vitamin C [10], with important roles in modulating fruit quality have been studied. The aroma, an important criterion defining strawberry quality is dependent on more than 360 volatile compounds, many of them esters, whose synthesis is dependent on the strawberry alcohol acyltransferase (SAAT) activity encoded by the FaSAAT gene [11]. Of all the volatiles, furaneol (HDMF) is the main one responsible of the aroma of the strawberry fruit [12]. The genes of two enzymes related to the biosynthesis of HDMF have been cloned [13], [14]. Due to the importance of the cell wall in the integrity of the strawberry fruit, genes encoding for cell wall modifying enzymes have been analysed, including expansins [15], cellulases [16], beta-galactosidase [17], pectate lyases [18], [19], and pectinmethylesterases [20], [21].
Despite all the previous molecular studies, including a recent report on metabolic changes during fruit growth and ripening [22], information on regulatory genes involved in the strawberry fruit development is still scarce. The development of genomic tools will, no doubt, constitute important input that will facilitate strawberry research. In recent years molecular markers for this species have been developed [23], [24], and microarray gene expression experiments during fruit ripening [25], [26], and in relation to fruit firmness have been reported [27].
One of the most useful tools in the gene discovery, and further assignment of function, is the availability of expressed sequence tags (ESTs). These sequences stem from cDNA libraries constructed from different tissues and organs, under different environmental conditions and stages of development, so they represent a broad set of expressed genes. ESTs collections have been used in gene expression studies [28] and to saturate genetic maps with simple sequence repeats (EST-SSRs) [29] or single nucleotide polymorphisms (SNPs) [30]. They also allow the identification of miRNA precursors and targets [31], and massive transcriptome analysis using microarrays [32], [33]. At present there are more than 50 million ESTs in the GenBank database, a quarter of which are from plants. Although fruit crops have been less studied than other plants like Arabidopsis, rice, soybean, maize or pine, there is a significant number of ESTs obtained from fruits like tomato [34], grape [35], apple [36], citrus [37] and melon [38].
In this report we have analysed around 10,000 ESTs from F. xananassa, 4,600 of which originated from our own sequencing project, and 5,400 are from the Gen-Bank database. These ESTs have been processed, clustered, annotated and classified into different functional categories. We have searched for SSRs and SNPs in the ESTs set in order to evaluate their potential in markerassisted breeding programs. Creation of a gene index [39] and comparisons with other species enabled the conclusion that the highest average sequence identity was with the wild diploid relative F. vesca, up to a value of 93.27% between sequences of orthologous genes. Expression studies of selected ESTs using QRT-PCR allowed investigating on the possible involvement of hormones like auxin, ethylene, and brassinosteroid in strawberry fruit ripening. In addition, the set of nonredundant sequences from F. xananassa jointly with an equivalent number of sequences from F. vesca has been used to design and perform a microarrays-based expression studies in ripe receptacle of these two species.

EST Sequencing and Clustering
More than 4,500 clones were sequenced from six cDNA libraries prepared from fruits of several varieties of the cultivated strawberry (F. xananassa) (Table 1). Because we are interested in fruit ripening, transcripts were extracted from two ripening stages, green and red, and two different fruit parts: achenes and receptacle. In addition, transcripts from red fruits were favoured vs. transcripts from green fruits using two different subtraction procedures (see Methods). Also, sequences were obtained from transcripts corresponding to genes differentially expressed in ethylene-treated ripe fruits. A webaccessible database containing all the EST sequences, contigs, and bioinformatic tools for their analysis and data mining has been created and named FREST http:// fresa.uco.uma.es/srs71. The set was completed with the dbEST GenBank sequences of F. xananassa. In total, 10,018 sequences were analyzed in the present study ( Table 2).
The raw sequences were processed to remove vector and adaptor sequences and to discard sequences with either more than 3% of N or being less than 100 bp in length. The mean length ranged from 343 to 612 bp, and the accuracy was evaluated by the frequency of appearance of an undetermined nucleotide, and changed from an average of once every 51 to 548 bp ( Table 2). All ESTs (9,790) that passed the quality control were used for clustering. A total of 5,976 singletons and 1,120 contigs/tentative consensuses were obtained, resulting in 7,096 unigenes/non redundant sequences (Table 2). Some genes were represented by multiple ESTs as shown in the Table 3 that includes the contigs with more than 15 ESTs. In the case of contigs corresponding to metallothionein-like and prunin, with more than 100 ESTs for each one, it is notable that they are overrepresented in the M1 (green receptacle) and M2 (green achenes) libraries, respectively (Table 3). This is related to the high expression level of these genes in these fruit parts, i.e. the receptacle and achenes, at this developmental stage.

Functional annotation
A summary of the different parameters studied in the annotation of the complete set of unigenes from F. xananassa is shown in Table 4. The number of chimeras was very low based on the BlastX sequence searches against the Arabidopsis TAIR Database. Annotation included not only sequence homology comparisons in the GenBank at two e-value cutoff (47.7% of sequences at e-value < 1e-10, and 1.5% at e-value < 1e-100), but also search for domains using InterProScan (9.2% of sequences), signal peptides using signalP tool (17.5% of sequences), association to gene ontology (GO) terms (29.9% of sequences), and numbers of the Enzyme Commission (EC) (7.2% of sequences). In total, a 56.1% unigenes were annotated in at least one of the categories of Table 4. This means that there is still a 43.9% of the unigenes that remained unknown for any putative function.
Further, a global analysis by gene ontology (GO) groups was performed with the Blast2GO software [40]. Blast2GO uses different tools as BlastX and InterProScan to annotate sequences. Figure 1 shows the result of this analysis. Metabolic processes account for almost 60% of the annotated sequences, including primary, macromolecule, and cellular metabolic processes. Remarkable is the dominance of biosynthetic processes (12.07%) over catabolic processes (4.41%). Proteins involved in transport are represented by 5.61%, and other groups of proteins encoded by EST correspond to a wide variety of biological processes.

Sequence analyses allowed the identification of genes involved in metabolic and regulatory processes of fruit ripening
We were interested in the fruit ripening process; therefore the RNA used in the preparation of the cDNA libraries of the present publication was extracted from two parts of the berry at two developmental stages, as it was ripening-enriched by subtraction (Table 1). There has been previous sequencing project in strawberry focused on the fruit ripening [25] and we now complete this previous information. We have performed manual assignment of F. xananassa unigenes to specific metabolic and signalling pathways (Additional files 1, 2, 3 and 4) providing an exhaustive catalogue of Clustering and determination of the consensus sequences were performed through the TGICL pipeline as described in the Methods section. F. xananassa sequences to be further used in specific research projects. We summarize in Table 5 the contribution of the new sequences to the information previously available on strawberry genes relevant for the fruit ripening process. In 8 of the 21 metabolic pathways the new genes account for more than 50 percent of the total number of genes known. More interestingly, in hormone signalling the information on new genes is very significant, being over 50 percent in 5 of the 6 pathways. In some cases, like brassinosteroid, gibberellins and abscisic acid, there is not previous information on gene sequences of the corresponding signalling pathways.
Overall comparison of F. xananassa sequences with other species. Gene Index We determined the F. xananassa gene index and related it to different plant species using the DFCI Gene Index Database (for species like Arabidopsis thaliana, Oryza sativa or Vitis vinifera), and "ad hoc" gene indices created from the GenBank dbEST (for species like Prunus persica, Prunus armeniaca, Citrus spp. or Fragaria vesca). The homology search was performed using the BlastN tool against non-redundant sequences and true orthologues were considered as having E values of ≤ 1e-20. Results of this analysis are shown in Table 6. The analysis of these orthologous groups was made from three different perspectives. Percentage of orthologous unigenes of each species relative to F. xananassa unigenes, percentage of F. xananassa orthologous unigenes relative to each species unigenes, and the average identity, after the alignments, of unigenes from each species with the corresponding F. xananassa orthologous unigene. Values for the first two comparisons (  putative orthologues to F. vesca unigenes ( Table 6, column 3). Thus, although the number of ESTs available for F. vesca is more than four-fold the number of ESTs for F. xananassa there are a high number of sequences of cultivated strawberry that have not been revealed in the F. vesca sequencing projects. The average identity was calculated after the alignments of these putative orthologous sequences from different species with the F. xananassa sequences ( Table 6, column 4). As expected, the highest value was for F. vesca reaching the 93.27 percent, as the genome of this species probably shares a common ancestor with F. xananassa [42]. The order of the species in this column reflects the taxonomic proximity with close relatives, having Rosa hybrid, Prunus and Malus the highest values. However, this is not an analysis of phylogeny, but the result of the multiple alignments of sequences available in the databases for the different species. Therefore, it is not possible to gain taxonomic information from the results here presented on species out of the Rosaceae family (Table 6, column 4)

Actual polymorphisms evaluation inside the EST collection
Microsatellites, or simple sequence repeats (SSRs), are stretches of DNA consisting of tandem repeated short units of 1-6 base pairs in length. The uniqueness and the value of microsatellites as molecular markers arise from their multiallelic nature, co dominant inheritance, relative abundance, extensive genome coverage and simple detection by PCR. Three hundred eighty three (4.64%) SSRs were identified in 329 of the 7.096 unigenes. Fifty sequences contained more than 1 SSR and 47 of them with less than 100 bp between 2 consecutive SSRs. The frequency of SSR was one every 9.1 kb of the sequence. As shown in Table 7, dinucleotides are the most frequent motifs (47.3%), followed by trinucleotides (45.9%). Other nucleotide combinations are poorly represented (3.9% tetranucleotides, 2.9% pentanucleotides). Most of the SSRs found were on the 5' non-coding regions upstream of putative ORFs, close to the initial ATG. A total of 102 SSRs have been amplified and 10 have already been used for studies of F. xananassa varieties and Fragaria species [24].
Of the 1,120 contigs generated in the present study, 242 contained a minimum of two alleles, 128 of them with potential SNPs. In these contigs the changes corresponded to 636 potential SNPs and 148 indels. The final number of good quality true-SNPs was 372, 192 of them were transitions, 124 were transversions, 2 were tri-allelic polymorphisms, and 54 were indels. The frequency of SNP was one every 256 bp, and a mean value of 2.9 SNPs per contig.

Expression analysis of selected genes during fruit ripening
A detailed catalogue of strawberry sequences of genes related to hormone biosynthesis and signalling pathways is shown in Table 8. The expression of some of the hormone-related genes whose sequence information is provided in the present paper (Table 9) was further studied

Total 7 100
The number of di-, tri-, tetra and pentanucleotides repeats identified in the strawberry database is shown for the complete set of putative SSRs (pSSRs) in fruits. For auxin we selected genes encoding ARF (auxin response factor) proteins, which are transcription factors controlling the expression of auxin-induced genes [43]. Regarding ethylene, we studied genes encoding ethylene response transcription factors (ERF) that belong to the large AP2/ERF family regulating ethyleneresponsive genes [44]. Patterns of the expression of these genes in achenes and receptacle at three developmental stages are shown in Figure 2(A, B). Values of expression by qRT-PCR are relative for each gene, therefore is not possible to have information on absolute values of expression of different genes. However, it is apparent that each gene presents a tissue-and developmental-specific expression pattern with significant differences among samples ( Figure 2). Thus, transcriptional activity of FaARF1 is highest in red receptacle whereas for FaARF2 the highest level of transcripts is detected in white receptacle. For the FaERF genes high and significant changes occur for FaERF1 and FaERF3, having the first the highest value in green achenes and the second in green receptacle. A more conspicuous case is brassinosteroids, whose involvement in fruit developmental processes has been studied in only a few species [45], [46]. We have identified ESTs homologous to the receptor and two components of the signalling pathway (FaBRI1, FaBRZ1, FaBIN2) ( Table 9). Their expression also varies with the fruit part, achene or receptacle, and the developmental stage ( Figure 2C). Highest changes occur for the receptor FaBRI1 whose expression is higher in receptacle and clearly increases with ripening  [3]. These differences have their origin in the receptacle tissue and become more apparent at the ripe stage. Therefore, analysis of transcripts was performed in the receptacle of ripe fruits from cultivated strawberry F. xananassa (cv. Camarosa) and F. vesca. Expression values are provided in the Additional File 5. Prior to the analysis of the results, redundancy between the two sets of sequences was determined. A blastN between both datasets with an e-value < 1e-100, and a similarity percentage > 90% were used as discriminatory criteria. Global analysis was restricted to genes with very different expression level in the two species. Thus, Figure 3 shows the results of the genes that were over 4-fold up-( Figure 3A, 892 genes) and down-regulated ( Figure 3B, 269 genes) in F. xananassa relative to F. vesca, analyzed by GO terms, when differences were statistically significant (p value ≤ 0.1). in general distribution of genes between categories of up-and down-regulated genes was similar between them, and also to the distribution in the F. xananassa EST collection here analyzed (Figure 1). However, there are two categories where differences, albeit minor, appear meaningful. The category "response to stress" was more abundant among the genes up-regulated in F. vesca (13.1%) in comparison to those up-regulated in F. xananassa (4.46%). Most of these genes correspond to heat shock proteins (Table 10), which have been reported to play a role not only in thermo tolerance but also in plant development [47], [48]. A second difference was observed in the category of "regulation of cellular processes" that was more highly represented among the genes up-regulated in F. xananassa. Detailed analysis of the genes reveals that most of them encode for proteins involved in signalling processes, some of them related to hormone action, especially auxin (Table 11).

Discussion
Sequencing information has produced important data that is being used to investigate both basic and applied aspects of plant growth and development. It is the first step towards a functional genomics, and a basic tool for molecular breeding. However, this information has been mainly generated either in model species or species with great impact in global food supply. Fruits of cultivated strawberry (F. xananassa) are appreciated both as fresh  Figure 2 Relative expression of ARF, ERF, brassinosteroid signalling pathway genes from strawberry in achenes and receptacle of fruits at three developmental stages, evaluated by QRT-PCR. RNA was extracted separately from achenes and receptacle of strawberry fruits at three developmental stages corresponding to green, white and red receptacle, as previously described [79]. Real time quantitative PCR was performed as described in the Methods sections. The values are the results of two biological and three technical repetitions ± standard error. and as processed foods. However, there have been only limited genetic and genomic resources developed in this species due to its growing characteristics and the inherent difficulty of working with an octoploid. Despite this, genetic and genomic information is slowly appearing and recently the first genetic map has been reported [42]. In this work we analyzed more than 10,000 ESTs from F. xananassa, assembled in more than 7,000 unigenes. Half of these sequences proceeded from our own sequencing project; a second set of sequences was obtained from the GenBank dbEST Database. Regarding the new sequences reported here it is worth emphasizing that they proceed from different fruit parts (achenes and receptacle), at different developmental stages (green and red fruit), and after hormone treatment (ethylene). In addition to the genetic characteristics, difficulties analyzing strawberry fruit growth and ripening arise from the fact that the commercial fruit is not a true fruit but includes an engrossed flower receptacle with the true fruits, the achenes, attached on its surface. Moreover, the development pattern of these two parts of the commercial fruit is not synchronous in that the achenes reach their mature stage much earlier than receptacle [49] Thus, the sequence information provided in this report specific for achenes and receptacle libraries is highly valuable. This is highlighted by the high number of ESTs encoding prunins in the achene library that is absent in the receptacle library. Similarly a large number of ESTs encoding metallothioneins were identified in the receptacle library with a low number of ESTs in the achene library. Prunins are known as the globulins of the genus Prunus, which comprise the main family of storage proteins synthesized in seeds during embryogenesis [50]. Metallothioneins belong to a family of cysteine-rich, low molecular weight proteins that have the capacity to bind metals through the thiol group of the cysteine residues, which represent nearly 30% of their amino acidic residues. These proteins have been shown to be involved in metal scavenging and detoxification [51], as well as in biotic and abiotic plant responses [52], [53]. Their high abundance in green receptacle suggests their important role in this organ.   The gene index analysis of the sequences reflected the genetic proximity of strawberry with other species of Rosaceae. Effectively, Fragaria sp. belongs to the Rosaceae family that includes apple, peach and apricot, and to the Rosaceae supertribe [54] that includes rose. The highest identity in the alignment was with F. vesca, from the same genus, followed by Rosa hybrida from the same supertribe, and Prunus and Malus from the same family. Previous studies on genomic resources of Fragaria and Rosa have also shown a high level of genetic proximity [54], [55]. There are more than 50.000 ESTs available from the diploid F. vesca [55] that has been proposed as a model plant for genomic studies. Recent studies have predicted approximately 200 Mb for its genome size [56] which might facilitate its complete sequencing. However, cultivated strawberry is an octoploid species with at least two genomes involved in its origin; one is thought to be an ancestor of F. vesca or F. manchurica, and the other an ancestor of F. iinumae, or potentially other species [42].

SHAGGY-LIKE KINASE
Overall comparison between the F. vesca and F. xananassa has revealed that only 32.42% of the diploid species had a corresponding putative orthologous gene in the octoploid. A possible explanation for this low value would be that the F. vesca derived subgenome is silenced in F. xananassa, as it has been previously described for specific genes [57], or even that the donor subgenome could be an ancestor or F. vesca. However, these hypothesis needs further studies since it could also be just a consequence of the different precedence of the EST sequences used in this comparison, mostly from plantlets in F. vesca and from fruits in F. xananassa. In any case, cultivated strawberry still represents a great potential source of alleles that might be important for selected traits, since in other species it has been shown that polyploidization is accompanied by changes in the gene expression, and accordingly in phenotypic variation [58].
In addition, the strawberry fruit produces some metabolites that are not found in other fruit models, such as tomato. These aspects make the ESTs information provided here valuable since it might eventually be used to probe for specific genes in other species, some of them closely related as some berries of the Rubus genus, like raspberry and blackberry that are classified in the same supertribe of Rosoideae as Fragaria [54].
SSRs derived from ESTs have been used as functional markers in the generation of maps and in breeding programs. In strawberry, we have previously used some of these markers to study genetic diversity within the species [59]. Based on the high level of identity found with corresponding genes of genetically close species, like those of the Rosaceae family, we foresee their transferability to these species, as other authors have shown [60], [61]. For this purpose, it is important to indicate that strawberry comparative map reveals a high level of co-linearity between diploid and octoploid Fragaria species [42]. For other species of the Rosaceae family this transferability deserves to be evaluated.
The function played by hormones in the development of strawberry fruits is still an unresolved question. Considered as a non-climacteric fruit, the main role has been attributed to the auxin synthesized in the achenes [62]. A search for genes involved in hormones response was performed. Auxin response factors (ARF) are transcription factors acting on the signalling pathway of this hormone [63]. We have unequivocally identified two of them in the strawberry ESTs Database, FaARF1 and FaARF3. For the FaARF1, the highest homology corresponds to a gene expressed in tomato [64], and to the Arabidopsis ARF1 gene [65]. FaARF3 has high homology to both ARF3 genes from tomato and Arabidopsis. The strawberry gene FaARF3 is mostly expressed in the receptacle at the white stage. At this stage the content of auxin is decreasing but still high in comparison to red fruits [8], [62], and cell expansion determines the final size of the receptacle.
The ethylene binding factors (ERF) constitute a family of transcription factors that were identified by their capacity to bind ethylene-responsive elements (ERE) present as cis-sequences in the ethylene-inducible genes. Further studies revealed that they act as transcriptional activators or repressors of GCC Box-mediated gene expression [44]. In tomato fruits it has been reported that some of them participate in the signalling pathway initiated by ethylene during the ripening of the fruits [32]. In the ESTs collection we have identified three putative ERFs (FaERF1, FaERF2, FaERF3) proceeding from the library prepared from the achenes, and this is consistent with the finding that achenes produced four to ten-fold more ethylene than fruit epidermal peels [5]. Both FaERF1 and FaERF3 have highest expression at the green stage and show high homology with SlERF2 [66] and MdERF1 [67], respectively, involved in tomato and apple fruit ripening. The corresponding Arabidopsis genes for FaEFR1 and FaERF3 belong to the subfamily B-2 (Group VII) [68]. In contrast, the Arabidopsis gene homologous to FaERF2, which shows minor variation, is classified in the subfamily B-3 (Group IX) [68]. The genes in group IX have often been linked in defensive gene expression in response to pathogen infection.
In strawberry there is no information on the content of active brassinosteroid in the ripening fruit. The preferential expression of FaBRI1 in red receptacle suggests an increased concentration of this hormone in this tissue at later stages of ripening. However, a relationship between FaBRI1 expression and an increased concentration is not direct since it is needed to know the expression of other important elements in the brassinosteroid signalling pathway such as BAK1 (BRI1 associated receptor kinase) and BKI1 (inhibitor of the association of BRI1 and BAK1) [69]. BZR1 is a transcription factor [70] whose cell location depends on its phosphorylation status, mainly controlled by BIN2 [71]. When BZR1 is phosphorylated goes to the nucleus where induces the expression of brassinosteroid dependent genes. Expression of genes FaBZR and FaBIN2 occurs in achenes and receptacle at all stages, but the expression ratio FaBZR/ FaBIN2 is higher in the white achene and lower in white receptacle. These expression patterns must be interpreted under the light of the interaction of the encoded proteins as above indicated. In summary, the functional relevance of all these expression studies in terms of the role of hormones in fruit ripening is limited. However, they illustrate the possibility of using the sequence information here reported to initiate the molecular dissection the problem with gene-specific tools.
The database here reported allowed the comparison of the transcriptome in the ripe receptacle of F. xananassa (cv. Camarosa) and the diploid F. vesca. As expected, there are very specific changes in genes related to secondary metabolism (see Additional file 6). However, global analysis revealed that differences in the transcriptomes being more quantitative than qualitative i. e. supported by activation/depletion rather than by gain/ loss of biological processes. The two minor differences found in "response to stress", up-regulated in F. vesca, and "regulation of cellular processes", up-regulated in F. xananassa, are probably related to the domestication of the species. Natural environment of the wild F. vesca is more cold climate and high altitude than F. xananassa [3], and it is probable that its cultivation under temperate conditions triggers the heat stress response. On the other hand, is not surprising that hormone signalling pathways are more efficient in F. xananassa especially those related to auxin action since it has been reported that increasing auxin content in both F. xananassa and F. vesca has the effect of increasing weight and size of fruits [72]. The relevance of these changes here reported deserves further investigation by a deep study of specific genes. This is currently under progress.

Conclusions
We anticipate that the generation of this strawberry gene dataset will be important in further genomic studies of this species. It doubled the number of ESTs available for this species and combined and analysed all the information presently available for the strawberry. The analysis of the information reported and gathered in relation to the cultivated strawberry when compared with the available information on the wild strawberry, the diploid Fragaria vesca, is valuable to establish their genetic relationship. It is an essential source of information for the study of the expression of genes, either by QRT-PCR or by microarray. It will also allow the establishment of few tools for the analysis of metabolic and hormone signalling pathways playing a role in the different developmental processes of this species.

Plant material
Strawberry plants (F. xananassa Duchesne ex. Rozier) were grown under field conditions in Huelva, in the southwest of Spain. The fruits were sampled at selected developmental stages that we previously established [10]. For the expression studies samples were from receptacle and achenes, separately, from stages of green fruit (green receptacle and green achenes); white fruits: white receptacle and green achenes; and red fruits: red receptacle and brown achenes, of the cultivar Camarosa. The cDNA libraries were prepared from different tissues of the strawberry fruits at various developmental stages. The M1 and M2 libraries were prepared from receptacle and achenes, respectively, of fruits of the cultivar Carisma at the green stage. The C1, and C2, and C3 libraries were prepared from fruits of the cultivar Chandler, being C1 and C3 subtractive libraries. Whereas libraries C1 and C2 were prepared from whole fruits, the C2 library was only from receptacle. The L1 library was prepared from red fruits (receptacle and achenes) of the cultivar Elsanta treated with ethylene.
In the microarray studies, plants of F. xananassa (cv. Camarosa) and F. vesca were cultivated in a greenhouse under natural light conditions in Churriana (Málaga, Spain), and fruits of the two species were sampled during their overlapping growing season.

Construction of cDNA libraries and EST sequencing
For the M1 and M2 libraries achenes were removed from fruits at the green stage and total RNA was extracted separately from the remaining receptacle and the achenes. Total RNA isolation was performed as previously described [73]. Poly(A+) mRNA was purified from total RNA using the 'PolyAtract_mRNA Isolation Systems' kit according to the manufacturer's instruction (Promega). This poly(A+) RNA was used for the construction of the directional cDNA library in the Lambda ZAP Express phage using the 'ZAP Express_ cDNA Synthesis Kit','Gigapack_ III Gold Cloning Kit', and 'Gigapack_ III GoldPackaging Extract' kits according to the manufacturer's instructions (Stratagene, La Jolla, CA).
The C1 subtractive library (red stage versus green stage) was generated from the whole fruit (receptacle and achenes) as previously described [74]. The C2 library was prepared from RNA extracted from whole red strawberry fruits [16]. The C3 library was prepared based in the suppression subtractive hybridization (SSH) [75]. The subtraction (red stage versus green stage) was normalized and prepared, only from receptacle tissue, according the Clontech PCR-Select cDNA Subtraction Kit (BD Biosciences) system. For the L1 library ripe strawberry fruits were exposed to a constant stream of air containing 50 vpm ethylene. RNA was extracted after 2, 4, 24, 48 and 72 hours and also used in a suppressed subtractive hybridisation (SSH, Clontech Inc.) protocol.
Sequencing of the M1 and M2 libraries was performed from the 5'-end of the inserts using the M13 reverse primer by a custom service (Sistemas Genómicos S.L., Spain). The C1, C2, and C3 libraries were sequenced in an ABI PRISM™ 310 de Perkin Elmer by the Central Services of the Universidad de Córdoba. Primers used were T3, T7, M13 forward y M13 reverse when cloned in pBSII, and sp6 when cloned in pGEM-T.

Bioinformatics
The strawberry EST sequences for the libraries CO3, CO8 and SGBL were obtained from the dbEST database from GenBank. Libraries with less than 100 sequences were placed in the group SGBL (Small GenBank libraries).
EST sequences were cleaned with the seqclean software [41] using the default parameters. As dataset for fragments of vectors and adaptors the Univec and Univec_core from NCBI were used. To remove contaminants, ColiBank95, an Escherichia coli genome dataset from NCBI, was used. The program was repetitively applied to the sequences in FASTA format until no sequence was excluded. Clustering and creation of the consensus sequences were performed through the TGICL pipeline [41] with the programs Megablast for clustering and Cap3 for the consensus sequences. Variations on the default parameters in Megablast revealed that the percentage of minimum identity was the only determinant on the final number of clusters. Thus, the parameters established for clustering were: 95 percent for the minimum identity, 40 bp length for the minimum overlapping region, and 20 bp length for maximum non-overlapping extremes. Those sequences from a cluster allowing the establishment of a consensus sequence were included in a contig. In this process, we defined singlets as clustered sequences that could not be included in a consensus sequence and singletons as sequences that were not grouped in a cluster. The unigenes were then the sum of singletons, singlets, and contigs.
The chimera analysis was performed parsing the results of the BlastX of the 5' and 3' extremes (300 nt) of each unigene using TAIR 8 as blast database. Unigenes that presented different blast hits for each extreme not related between them were annotated as putative chimeras.
Functional annotation was performed using the package Blast2Go [40]. Tools of this package were used for BlastX (using GenBank nr as database and 1e-10 as initial cutoff e-value), InterProScan (for protein domain search and signal peptide prediction) and enzyme code and GO term mapping. The functional category analysis was done over biological process GO term distribution at a cutoff level of 3.
The datasets for the comparison with other species were made in the following way: The sequences were downloaded from the dbEST database in GenBank. These sequences were cleaned and clustered in the same way as the strawberry sequences. The homology search between strawberry and these species were made with the program Blastall, and subprogram TblastX, with a threshold e-value of 1e-20.

SSRs and SNPs
The identification and localization of SSRs was accomplished by PERL5 scripts MISA [76]. SSRs were only considered when they contained motifs that were between two and five nucleotides in size and with 2, 3, 4 and 5 repeats for di-tri-tetra-and pentanucleotides, respectively. For SNP location we have used the pipeline Qual-itySNP [77] that develops an algorithm to detect reliable SNPs and insertions/deletions in EST data, from diploid and polyploid species. The default parameters were used, i.e. CAP3 similarity of overlap 95%, minimum size of alleles of each SNP 2, length of the low quality region at the 5' end of sequence 30 nucleotides, similarity on one polymorphic site 0.75, similarity on all polymorphic ie sites 0.8, low quality region of 3' side 0.2 (20% of the whole sequence). The weight value of the low quality region 0.5 and the minimal confidence score 2. 2.

Expression studies
Total RNA was extracted from F. xananassa fruits, from receptacle and achenes separatetly, according to the method described by [74]. Two biological and three technical replicates of each were performed for every sample. The RT reaction was done using iScript ™cDNA Synthesis Kit (Bio-Rad, http://www.bio-rad.com) according to the manufacturer's instructions. Expression was analysed by real-time quantitative RT-PCR using iQ™SYBR® Green Supermix sample in an iCycler detection system (Bio-Rad, http://www.bio-rad.com according to the manufacturer's instructions, and gene-specific primers. The results obtained were normalized against FaRIB413 expression that was reported to be constitutive [78]. The primers used in the PCR reactions are indicated in Additional File 6.

Microarray analysis
Oligo (60 mer length) design for expression analysis was performed by NimbleGen Systems Inc. from 6,349 non redundant sequences of F. xananassa http://fresa.uco. uma.es/srs71 and 7,734 non redundant sequences of F. vesca (GenBank). A minimum of 7-10 oligo were printed per probe and three blocks were printed per dataset. Samples corresponding to two growing seasons were prepared as high quality double-stranded cDNA which were synthesized from total RNA, extracted from the receptacle of ripe fruit as above described, following the protocol described in the Invitrogen's SuperScript™ Double-Stranded cDNA Synthesis Kit. Samples labeling, hybridization with three probes per target, and data normalization was performed by NimbleGen Systems Inc. according to the procedures described in the expression analysis section http://www.nimblegen.com/.
Data analysis of the microarrays expression studies was performed with the software for gene expression analysis ArrayStar (DNASTAR). The t-test and FDR (Benjamini-Hochberg) for multiple testing corrections were used with a confidence p-value < 0.1, to identify statistically significant differences.
The redundancy between probes of the two species were analysed using BlastN with cutoff value < 1e-100 and a similarity percentage > 90%.