Transcriptome analyses of early cucumber fruit growth identifies distinct gene modules associated with phases of development

Background Early stages of fruit development from initial set through exponential growth are critical determinants of size and yield, however, there has been little detailed analysis of this phase of development. In this study we combined morphological analysis with 454 pyrosequencing to study transcript level changes occurring in young cucumber fruit at five ages from anthesis through the end of exponential growth. Results The fruit samples produced 1.13 million ESTs which were assembled into 27,859 contigs with a mean length of 834 base pairs and a mean of 67 reads per contig. All contigs were mapped to the cucumber genome. Principal component analysis separated the fruit ages into three groups corresponding with cell division/pre-exponential growth (0 and 4 days post pollination (dpp)), peak exponential expansion (8dpp), and late/post-exponential expansion stages of growth (12 and 16 dpp). Transcripts predominantly expressed at 0 and 4 dpp included homologs of histones, cyclins, and plastid and photosynthesis related genes. The group of genes with peak transcript levels at 8dpp included cytoskeleton, cell wall, lipid metabolism and phloem related proteins. This group was also dominated by genes with unknown function or without known homologs outside of cucurbits. A second shift in transcript profile was observed at 12-16dpp, which was characterized by abiotic and biotic stress related genes and significant enrichment for transcription factor gene homologs, including many associated with stress response and development. Conclusions The transcriptome data coupled with morphological analyses provide an informative picture of early fruit development. Progressive waves of transcript abundance were associated with cell division, development of photosynthetic capacity, cell expansion and fruit growth, phloem activity, protection of the fruit surface, and finally transition away from fruit growth toward a stage of enhanced stress responses. These results suggest that the interval between expansive growth and ripening includes further developmental differentiation with an emphasis on defense. The increased transcript levels of cucurbit-specific genes during the exponential growth stage may indicate unique factors contributing to rapid growth in cucurbits.


Background
Fleshy fruits are highly prized for nutritional content, flavor, fragrance, and appearance. While most fruits are eaten when ripe, a subset, including many that for culinary purposes are viewed as vegetables, are consumed immature. Cucumbers (Cucumis sativus), which are used as fresh product and processed into pickles, are typically harvested at the middle or end of the exponential growth phase, 1-2 weeks post-pollination, and approximately 2-3 weeks prior to fruit maturation.
Early fruit development is typified by phases of cell division and expansion [1]. In cucumber fruit, which develop from an enlarged inferior ovary, cell division occurs most rapidly prior to anthesis and then continues more slowly in the first 0-5 days post anthesis [2][3][4][5]. This phase largely overlaps with the period of highest respiration [4]. Fruit elongation begins almost immediately after pollination, with the most rapid increase occurring approximately 4-12 days post pollination (dpp) [6]. The rapid increase in cell size mirrors the rapid increase in fruit length, with obvious increase in vacuolization of mesocarp cells, and thickening in epidermal cell walls occurring between 8 and 16 dpp [6]. Cell division and expansion are largely completed by 12-16 dpp, with some variation depending on cultivar and season [4,6,7].
In addition to cell division and expansion, early development also includes specialized tissue and organ development and interaction with the abiotic and biotic environment. For example, developing cucumber fruit exhibit a distinct change in susceptibility to the soilborne, oomycete pathogen, Phytophthora capsici; young fruit are highly susceptible, while older fruit are resistant [8,9]. There is a sharp transition in susceptibility that occurs at approximately 10-12 dpp coinciding with the end of the period of rapid fruit elongation. This agerelated resistance suggests additional kinds of developmental changes occurring in the young cucumber fruit.
Although a limited number of studies have examined gene expression during early fruit development, a picture reflecting cell division and expansion is beginning to emerge based on transcriptomic studies of apple, cucumber, grape, tomato and watermelon. Among the enriched categories associated with tomato fruit set, were genes associated with protein biosynthesis, histones, nucleosome and chromosome assembly and cell cycle, suggesting a profile reflective of active cell division [10][11][12]. In contrast, various water, sugar and organic acid transportassociated genes were under-represented, but then increased with the transition from cell division to cell expansion. Highly expressed categories of genes expressed in expanding cucumber, as well as apple, grape, tomato, melon and watermelon fruits, included cytoskeleton and cell wall modifying genes such as tubulins, expansins, endo-1,2-B-glucanase, beta glucosidases, pectate lyases, and pectin methylesterases, and transport associated genes such as aquaporins, vacuolar H+ATPases, and phloem-associated proteins [6,10,[13][14][15][16][17][18]. The most highly represented transcripts in rapidly expanding cucumber fruit (8 dpp) also were strongly enriched for defense related homologs including, lipid, latex, and defense-related genes, e.g., chitinase, thionin, hevein, snakin, peroxidase, catalase, thioredoxin, and dehydrins [6].
The early stages of fruit development, including fruit set and exponential growth, are clearly essential for all fruits. However, despite their importance as determinants of fruit size and yield, there has been little detailed analysis of this phase of development. Most studies to date, including recent transcriptomic studies, have focused on late development, or a broad range of developmental stages, with only a single snapshot during early development eg., [19][20][21][22]. In this study we combined morphological characterization with transcriptome analysis to provide new insight into important early fruit developmental stages and processes. Our observations, performed at five time points during the period from fruit set through the end of exponential fruit growth, indicate that this is a dynamic period of cucumber fruit development involving an array of internal and external morphological, physiological, and transcriptomic changes that act in concert with phases of active cell division, expansion, and response to the environment. Relative to anthesis and early fruit set, the period of peak-and late-exponential growth includes a large portion of highly represented transcripts, either of unknown function, or without homologs in Arabidopsis, suggesting unique factors contributing to the rapid growth phase in cucurbits. The end of exponential growth was marked by a shift in transcriptome profile characterized by abiotic and biotic stress related genes and significant enrichment for transcription factor gene homologs associated with stress response and development, suggesting that the interval between expansive growth and ripening may include a programmed transition toward enhanced defense.

Results and discussion
Morphological changes during early cucumber fruit development Young Vlaspik cucumber fruit followed a highly reproducible progression of growth and development including visible external and internal morphological changes. Increase in size occurred rapidly after fertilization with most rapid growth occurring between 4 and 12 dpp ( Figure 1A). After approximately 16 dpp, fruit size remained largely constant until fruit maturation at approximately 30 dpp. At 0 dpp (anthesis), deep ridges along the length of fruit covered the surface of the fruit. Densely spaced spines were randomly scattered relative to the ridges ( Figure 1B). In contrast to ridges, which were most prominent at anthesis, warts, which are typically are formed at the base of spines, were diminutive at 0 dpp. They rapidly developed to become highly prominent at 4 dpp but then flattened out with further fruit expansion. Both ridges and warts were nearly absent by 12 dpp. The spines followed a maturation process culminating in abscission. At 0 dpp spine color was translucent light green. At approximately 8 dpp they started to senesce, turning yellow, then white at 12-16 dpp. By 16 dpp many had abscised from the fruit surface.
At anthesis, the exocarp was dark green. Dark green/ light green stripes and specks on the surface of the fruit began to emerge around 8 dpp. The fruit surface at anthesis also has a dull appearance due to 'bloom' ( Figure 1B), a fine white powder primarily composed of silica oxide (SiO2) [23]. The bloom disappeared first from the peduncle end around 4 dpp, then the blossom end by 8 dpp; by 12 dpp, it had disappeared completely, leaving a shiny fruit surface. The cuticle layer showed increased thickness with age. After 12 and 16 dpp it stained more darkly with Sudan IV, indicating increased cutin or wax content that appeared to penetrate between the pallisade cells in the epidermal layer ( Figure 1C).
With respect to internal fruit morphology, both placenta and pericarp rapidly expanded from 4-16 dpp. The rate and amount of expansion was very similar for both tissues ( Figure 1D). The mesocarp was initially green at 0 and 4 dpp, but became progressively lighter with age. Increase in mesocarp cell size is accompanied bv increased vacuolization between 4 and 12 dpp [6]. The placenta tissue became gelatinous between 8 and 12 dpp and hardening of seed coats occurred between 12 and 16 dpp ( Figure 1E).  Table S1). The resulting data were assembled into 27,859 contigs with a mean length of 834 base pairs (bp). All transcripts were mapped to the assembled cucumber genome of Huang et al. [24], although in some cases more than one transcript mapped to the same location. The number of the reads per contig ranged from 2 to more than 14,000 with a mean of 67 reads per contig and median of 7 reads/contig. Assembed contig length increased steadily with the number of ESTs/contig, until approximately 30 reads/ contig where it leveled off with an average length of approximately 1400 bp (Additional file 2: Figure S1). Similarly, frequency of identification of homologs in Arabidopsis increased with number of ESTs/contig, leveling off at approximately 90% with approximately 30 reads/contig (Additional file 2: Figure S1). Gene ontology (GO) assignment to those contigs with putative homologs in Arabidopsis showed a similar distribution of gene functions as are present in the full Arabidopsis genome (generally within 2-fold relative to the distribution in Arabidopsis), suggesting broad representation of the genome (Additional file 2: Figure S1). Approximately half of the contigs with ≥30 reads but without homologs in Arabidopsis had putative homologs in other species. The final portion, approximately 5% of the total (275 contigs), either did not have any identified homologs in the current NCBI nr database, or only had putative homologs in cucurbit species, suggesting that these transcripts may be unique to cucumber or cucurbits relative to the plant species sequenced to date. These potentially cucurbit-unique transcripts included 91 very highly expressed contigs, represented by at least 100 ESTs (average length >1000 bp). Eighteen had putative functional assignments, eight of which were known cucurbit specific phloem-related proteins, such as phloem lectins and phloem proteins (Table 1).

Changes in transcript abundance during early fruit growth
Based on the observed relationship between ESTs/contig, contig length, and putative homologs in Arabidopsis (Additional file 2: Figure S1), subsequent bioinformatic analyses were performed on contigs represented by at least 30 ESTs. The distribution of contigs represented by at least 30 ESTs that did not have putative homologs outside of cucurbit species was not evenly distributed across fruit age (Figure 2A). The 8, 12, and 16 dpp libraries contained nearly twice as many contigs without identified homologs in Arabidopsis as was observed for the 0 and 4 dpp libraries. Of the 91 very highly abundant transcripts without known homologs outside of cucurbits, only three were not observed in the 8, 12 or 16 dpp samples. In contrast, 17 of the cucurbit specific transcripts did not appear in 0 or 4 dpp samples.
To validate usefulness of the 454 sequence data for analysis of transcript abundance, a set of fourteen genes representing different levels of EST representation/contig across the different fruit ages were selected for quantitative real time (qRT)-PCR analysis (Additional file 3 Figure S2). These included genes such as cyclin-dependent kinase B2;2 with high transcript levels early in development (0-4 dpp) or expansin A5 with higher transcript levels at 8-16 dpp. Comparison of transcript level at a given age relative to baseline expression at 0dpp (56 gene/time comparisons) showed good correspondence between values obtained by 454 sequencing and qRT-PCR (Pearson's correlation, R 2 = 0.85; Additional file 3: Figure S2). There was also good correspondence between the qRT-PCR results obtained from two different growth experiments in the greenhouse (R 2 = 0.91), indicating biological reproducibility of patterns of gene expression across fruit ages, and validity of the use of frequency of EST representation in the 454 library as a measure of level of gene expression. Principal component analysis (PCA) was performed on transcript levels among the libraries from the five fruit ages ( Figure 2B). The first two components, which accounted for nearly 90% of the variation, separated the fruit ages into three groups, 0 and 4 dpp, 8 dpp, and 12 and 16 dpp. Examination of fruit growth rate indicated that these age groups correspond with cell division/pre-exponential growth, peak exponential expansion, and late/ post-exponential expansion stages of growth, respectively ( Figure 2C). Comparison of the transcripts present in each of the age groups showed that the great majority were detected in all three age groups. The fewest unique transcripts were present in the 8 dpp sample, consistent with a developmental gradient of transcription moving from 0-4 to 8 to 12-16 dpp. Both the PCA and Venn Diagram ( Figure 2B,D) show the least commonality between the 0-4 and 12-16 dpp age groups.
The most highly represented contigs in each age group (0.1% of transcript pool; 79, 111 and 107 contigs for 0-4, 8, and 12-16dpp, respectively) exhibited markedly different profiles of putative gene function. Among those in common to all three groups were housekeeping genes including numerous ribosomal protein genes, and several tubulins, actins, and redox-related genes (catalase, ascorbate oxidase, ascorbate peroxidase), as well as several with unknown function or no identifiable homolog in Arabidopsis. Examination of the transcripts that were very highly represented in only one age group ( Table 2), showed that 0-4 dpp was the only one to include histone genes. This observation is consistent with the high level  Similarly, ribosomal protein genes were among the most highly represented transcripts at 0-4 dpp (8/21 genes) but minimally present in the 8 or 12-16 dpp ages (2 and 0 times, respectively). The 12-16 dpp group was marked by numerous abiotic stress related genes. Strikingly, genes with unknown function or without Arabidopsis homologs, dominated the group at 8 dpp, accounting for more than half of the contigs (14/27 genes, 52%).
The exponential growth stage of tomato also was associated with a larger proportion of ESTs with unknown function relative to other ages [10]. Fewer genes with unknown function or without Arabidopsis homologs occurred in the 12-16 dpp group (5/21) and only 1 member of the 0-4dpp group had no assigned putative function or was without a homolog in Arabidopsis.
To identify less highly represented genes that were strongly enriched at a specific age group, contigs were normalized for portion of reads observed at different time points. If transcription levels were constant during development, 20% of the transcript reads would be observed at each of the five sample ages (i.e., 40% for 0 +4dpp, 20% for 8dpp and 40% for 12+16dpp). Overall distribution of portion of transcripts observed at a given age followed this expectation for the transcriptome set, with a mean value of 41.05%, 19.77%, and 39.18%, respectively for 0+4, 8, and 12+16dpp age groups ( Figure 3A). The tails of the distribution (top 2.5%) were examined for genes for which transcript levels were strongly enriched in a specific age group (approxi-mately120 genes/group). This resulted in three nonoverlapping sets of genes (Additional file 4: Table S2). There also was minimal overlap with the genes listed in  Table 2. The genes listed in Table 2 had an average of 862 ESTs/contig whereas the mean number of ESTs/contig for the genes identified in this manner was 166. As was seen for the most highly represented group of genes, there was uneven distribution of genes without homologs in Arabidopsis or with unknown function; those accounted for 18.3% for 0+4 dpp enriched transcripts, but for 34.1% and 33.8% for 8dpp and 12+16 dpp enriched transcripts, respectively.
Fruit set/pre-exponential growth Functional enrichment analysis of those transcripts with age-group enriched transcript levels indicated that the 0-4 dpp age group had significantly increased representation of genes associated with cell organization and biogenesis, and DNA or RNA metabolism, that subsided with age ( Figure 3B). In addition to histone genes, which were also among the most highly abundant transcripts for this age group (Table 2), numerous putative cell cycle genes, cyclin-and cyclin dependent kinase-related gene family members, exhibited greater than 90% of transcript reads at 0-4 dpp (Additional file 4: Table S2, Figure 4A). Extensive protein interaction and gene expression data from Arabidopsis have allowed for the development of a picture of the cyclin interactome, including characterization of complexes associated with different cell phases [26]. Cyclin related genes strongly enriched at 0-4 dpp in the cucumber fruit transcriptome, such as putative homologs of CDKB1;2, CDKB2;2, CYCB1:2; CYCD3;1, CYCD3;3, CYCD5;1, were among those associated with the mitosis and post-mitosis (M and G1) phases in the Arabidopsis interactome. Elevated expression of several of these genes was also observed during fruit set in pollinated vs. unpollinated apple and cucumber flowers [3,27]. In contrast, the homolog of CDKA;1 [TAIR:At3g48750], which was uniformly represented in the young cucumber fruit transcriptome, was associated with cyclin complexes throughout the Arabidopsis cell cycle.  . Functional distribution, normalized frequency, and bootstrap standard deviation (SD) of contigs with putative Arabidopsis homologs was determined using the categories classification from the Classification SuperViewer from Bio-Array Resource for Arabidopsis Functional Genomics for Gene Ontology [25]. Shading indicates those categories that are significantly enriched (P < 0.05).
The categories of plastid and chloroplast also were significantly enriched in the 0-4 dpp group, then declined with age. This is consistent with the decrease in chlorophyll observed after 4dpp; chlorophyll content per gram fresh weight peaked at 4 dpp, and then decreased until 12 dpp ( Figure 5A). The assembled contigs included 91 transcripts whose homologs in Arabidopsis had annotations including one or more of the following terms: chlorophyll, chloroplast, photosystem, or thylakoid (Additional file 5: Table S3). Overall patterns of transcript abundance for these genes paralleled chlorophyll content in the developing fruit ( Figure 5B).
K-means cluster analysis allowed for further identification of transcripts showing progressive patterns of representation with fruit age (Figure 6). The chloroplast and other photosynthesis related genes described above, along with homologs of at least 10 additional chloroplast located proteins and enzymes predominated in the 4 or 4+8 clusters, but were minimally observed in the 0, 0+4, or 8 dpp clusters, and did not appear in the later clusters ( Figure 6, Additional file 6: Table S4).

Exponential growth
The group of genes with peak abundance at the 8 dpp, exponential growth stage, included cytoskeleton, cell wall, and water and carbohydrate transport genes. Tubulins, actin-related proteins, extensins, expansins, cellulose synthases, pectinase modifying enzymes, aquaporins, vacuolar H+ATPases, and phloem filament and lectin proteins, were among those strongly represented, as has been observed for other rapidly growing fleshy fruits such as tomato, apple, grape, and watermelon [10,[13][14][15]17,18]. The major latex protein related genes also exhibited peak levels at 8 dpp, including two extremely highly transcribed genes that together accounted for more than 17,000 reads (Additional file 6: Table S4). Putative homologs of vacuolar ATP synthase subunits B, D, H and P2 [TAIR:At4g38510, At3g58630, At3g42050, At1g19910] showed coordinate transcript abundance, with comparable levels increasing steadily until 8 dpp, and then gradually declining Two very highly represented homologs of the vacuolar aquaporin gene [TAIR:At2g36830], gamma tip tonoplast intrinsic protein, also peaked at 4-8dpp (Additional file 6 Table S4).
All of the cucurbit specific phloem proteins listed in Table 1 and the four putative homologs of the Arabidopsis phloem protein (ATPP) A2 family members observed in the data set peaked somewhat later, at 8-16 dpp with minimal transcript levels at 0 and 4 dpp ( Figure 4D). Cucurbits are characterized by a unique and functionally divergent network of extrafascicular phloem external to the vascular bundles [28][29][30]. The highly expressed proteinaceous phloem filaments, comprised of the cucurbitspecific PP1 proteins, and the more widely distributed PP2 phloem lectin proteins [31], were found to be primarily associated with the extrafasicular phloem [30]. Strong expression of phloem protein genes during rapid growth has been observed in other studies, including PP1 expression in green stage watermelon fruit [18,31,32]. Specific expression of PP2 (a group A member [31]) was observed in young pumpkin (Cucurbita pepo) hypocotyls, peaking at 12 days after germination in concert with the period of peak growth and vascular differentiation [32]. In contrast, cucumber homologs of the ATPP2-B family had a nearly inverted pattern of transcript levels relative to PP2-A genes, peaking at 0 dpp, and dropping during exponential growth, suggesting possible functional divergence ( Figure 4D).
The period of rapid fruit enlargement was also associated with marked changes in fruit surface, including an increase in cuticle thickness as is typically observed during rapid plant growth [33], and loss of the silica oxide powder based 'bloom'. The homolog of the Cucurbita moschata silicon transporter [GenBank:327187680; ref. 23] showed age specific transcript abundance peaking at 8 dpp then dropping sharply, coinciding with the time of bloom loss from the middle of the fruit (the region from which samples were taken).
Among the genes identified in other systems to be associated with cuticle biosynthesis are the extracellular GDSL motif lipase/hydrolase proteins and lipid transfer proteins, which have been implicated in lipid transport to extracellular surfaces [33][34][35][36]. The cucumber fruit transcriptome set included eleven GDSL motif lipase/hydrolase protein family members that were represented by at least 30 ESTs, including five with more than 100 ESTs. The majority showed peak levels at 8 or 12-16 dpp, with virtually no measured reads until either 8 or 12 dpp ( Figure 4B). Twelve lipid transfer protein (LTP) family members with greater than 30 ESTs/contig also were observed in the transcriptome data set, including four with greater than 700 ESTs. As for the GDSL motif lipase/hydrolase protein genes, the majority of the lipid transfer proteins were most highly represented from 8-16 dpp; transcript levels of one gene peaked at 4-8 dpp ( Figure 4C). A homolog of the transcription factor gene SHINE1 [TAIR:At1g15360], which is associated with cuticle production in Arabidopsis ( Figure 4B) [37] also exhibited peak transcript abundance at 8 dpp. Additionally, transcript levels of two cyctochrome P450 family members (CYP86A and CYP77A) that have been associated with cutin biosynthesis [38]; and two putative beta amyrin synthases, enzymes which have been associated with cuticular wax synthesis in tomato [39], also peaked at 8dpp (Additional file 4: Table  S2). In contrast, two putative GDSL family members and one lipid transfer protein with moderate transcript levels (45)(46)(47)(48)(49)(50)(51)(52)(53)(54)(55) [homologs of TAIR:At5g62930, At5g03610, and At2g45180, respectively] were observed almost exclusively at 0 dpp, suggesting possible floral, rather than fruit, expression (Additional file 6: Table S4).

Late/post exponential growth
Stress-related genes (response to stress and response to abiotic and biotic stimulus categories) were overrepresented at all stages, but considerably more so at 12-16 dpp than at the younger ages of 0-4 and 8 dpp ( Figure 3B). The 12+16 dpp age group had the highest representation of abiotic and biotic stress related genes, including a variety of heat shock, redox, biotic defense and ethylene-related transcripts (Additional file 4: Table S2). Of the 120 genes in this group, 44 have high homology with genes associated with plant stress, including at least 13 transcription stress-related factors such as WRKY70 activator of SA-dependent defense; radical induced cell death; ethylene response, salt stress, and heat shock transcription factors ( Figure 4E; Additional file 4: Table S2).
Overall, the group of genes with peak abundance at 12 +16 dpp was significantly enriched for transcription factor genes (2.48-fold enrichment normalized frequency relative to Arabidopsis; P value = 3.19, E-04) ( Figure 3B Profiles of cucumber fruit transcripts showing agespecific expression patterns as determined by K-means analysis. Analyses were performed using Cluster 3.0 software [25]. accounting for 16% of the top 2.5% set. This may be contrasted with the total cucumber fruit transcriptome data set where transcription and transcription factor activity related genes were represented at a normalized frequency of 0.94 relative to occurrence in the Arabidopsis genome. Transcription factors in the top 2.5% of 0+4 and 8 dpp groups also were represented at a comparable frequency to the Arabidopsis genome, accounting for 3.7% and 4.6% of the gene list, respectively. In addition to the stress related transcription factors with specific representation at 12-16 dpp, several putative transcription factor homologs were annotated to be associated with development [e.g., embryo sac development (BEL1-LIKE HOMEODOMAIN 1), morphogenesis (anac036/NAC domain containing protein 36), and cell expansion (ATHB-2 homeobox protein) (Additional file 4: Table S4). Furthermore, transcripts of other genes with homologs that have been implicated in development related processes are specifically observed at 12-16 days, such as putative homologs of TCTP (TRANSLATION-ALLY CONTROLLED TUMOR PROTEIN); BTB AND TAZ DOMAIN PROTEIN 1; calcium-binding EF hand family protein; seed development related (E12A11); and BAX INHIBITOR 1.

Conclusions
Examination of early cucumber fruit growth from the period of pollination and initial fruit set through the end of the exponential growth phase shows a dynamic series of physiological and morphological changes (Figure 7). Transcriptomic analysis of the predominant genes represented in the different age groups as identified either by total number of reads (most highly represented among the genes at that age), portion of transcript reads observed at that age, or genes grouped by K-means cluster analysis, told a story aligned with the sequential stages of development.
Transcript representation in the youngest ages, 0-4 dpp, was uniquely characterized by genes associated with cell division, cell organization and biogenesis. At 4 dpp, transcription of the cell cycle genes was declining, while chloroplast, photosynthesis, and chloroplast-localized genes were peaking. Transcripts highly abundant during the exponential growth phase, 4-12 dpp, included extensive representation of genes associated with cell structure such as cytoskeleton, vacuoles and cell walls, along with surface lipid metabolism related genes, in concert with the period of greatest increase in cuticle thickness.
A second shift in the transcriptome profile was observed at 12-16 dpp with significant enrichment of abiotic and biotic stress related genes and stress-related and developmental transcription factor gene homologs. The enriched representation of numerous transcription factors relative to earlier ages suggests a programmatic change away from fruit growth, toward defense, and ultimately fruit maturation. This is also the time period where we have observed transition of cucumber fruit from susceptibility to resistance to P. capsici [8,9]. Classically, fleshy fruit development is described to consist of three stages post pollination: cell division, cell expansion, and ripening [1]. These results suggest that the interval between expansive growth and ripening may include further developmental differentiation; an emphasis on defense would be consistent with the role of fruit in protecting the developing seeds during embryo maturation prior to facilitating seed dispersal.
Finally, approximately 5% of the contigs represented by ≥30 reads either did not have identified putative homologs, or did not have homologs outside of cucurbits suggesting potentially unique genes specific to cucumber or cucurbits. The observation that these genes, as well as genes with homologs but with no annotated function, rarely occurred in the 0-4 dpp group, suggests commonality among processes associated with early fruit set and cell division and/or greater knowledge about the fruit set stage. The predominance of transcripts without non-cucurbit homologs or with unknown predicted functions during the peak exponential growth stage may reflect fewer studies to date about this phase of growth, or unique adaptations of cucurbits to allow for extreme fruit growth rates associated with these species.
Collectively, the transcriptomic information provided by the young cucumber fruit samples coupled with morphological analyses provide an informative picture of early fruit development characterized by phases of active cell division, fruit expansion including novel or uncharacterized genes, and response to the environment, as summarized in Figure 7. The progressive modules of transcript abundance tell a story of cell division, development of photosynthetic capacity, cell expansion and fruit growth, phloem activity, protection of the fruit surface, and finally transition away from fruit growth toward defense and maturation.

Plant material, fruit growth, chlorophyll and cuticle measurements
Sets of 80 cucumber plants per experiment (pickling type, cv. Vlaspik; Seminis Vegetable Seed Inc, Oxnard, CA) were grown in the greenhouse in 3.78 L plastic pots filled with BACCTO (Michigan Peat Co., Houston, TX) media and fertilized once per week. Temperature was kept between 21 to 25°C, supplemental lights were used to provide an 18 h light period. Pest control was performed according to standard management practices. All flowers for each experiment were hand pollinated on a single date (1-2 flowers per plant). The experiment was repeated three times. Prior to the harvests, which were performed at 4 day intervals from 0-16 dpp, fruit were measured for length and diameter, and examined for external appearances including: presence or absence of wax along the length of the fruit; wart development; color patterns (e.g., stripes); and changes in presence, color, and densities of spines. Pericarp and placenta size was measured from the cross section of the fruit after harvest.
Exocarp samples (upper 1 to 2 mm) for chlorophyll measurement were removed by fruit peeler from the center portion of five fruit at each age and stored at -20°C. Samples were subsequently thawed at room temperature and blotted on paper to remove excess water and 1 g gram portions were immersed in N, N-dimethylformamide for at least 24 hours at 4°C in dark. Total chlorophyll was calculated based on  Gene expression data refer to periods of peak expression for indicated gene categories. Data for respiration, cell division, and susceptibility to P. capsici are from Marcelis and Hofman-Eijer [4], Colle et al. [7], and Gevens et al. [9], respectively.
spectrophotometer absorbance measurements at 665 and 647 nm [40]. Samples to measure cuticle thickness were stained with Sudan IV(as per [41]) and measured using a Spot RT3 Digital Camera System at 200x magnification (SPOT Imaging Solutions, Diagnostic Instruments, Inc., MI).
cDNA library production and 454 sequencing Randomly assigned groups of twenty fruit were harvested at 0, 4, 8, 12, and 16 dpp and ranked by size; the middle ten fruits were used for RNA extraction. Pericarp samples consisting of exocarp, mesocarp, and placenta tissue but not seeds, were isolated from the center portion of the fruit by razor blade, immediately frozen in liquid nitrogen, and stored at -80°C until RNA was isolated. Samples from ten fruits were pooled for RNA extraction; RNA and oligo(dT)-primed cDNA sample preparation were based on the procedures of Schilmiller et al. [42] and Ando and Grumet [6]. Final concentration was assessed by the nanodrop ND-1000 method and subsequent steps for 454 Titanium (0, 4, 1,2, 16 dpp) pyrosequencing analysis were performed by the Michigan State University Research Technology Support Facility (RTSF). Each sample was loaded on a 1/4 plate 454 Pico Titer-Plate (454 Life Sciences, a Roche Corporation, CT). The 8 dpp sample was sequenced previously [6].

Contig assembly and gene annotation
Contigs were assembled by the MSU RTSF Bioinformatics Group. Reads were processed through The Institute for Genomic Research (TIGR) SeqClean pipeline to trim residual sequences from the cDNA preparation, poly(A) tails and other low quality or low complexity regions [43]. Trimmed sequences were assembled into contigs using the TIGR Gene Indices Clustering Tools (TIGCL) [44]. Stringent clustering and alignment parameters were used to limit the size of clusters for assembly. Contigs from the first pass of assembly were then combined and subjected to a second assembly pass with CAP3 [45]. Less stringent alignment parameters were used for this pass to allow for minor sequencing errors or allelic differences in the cDNA sequence. Read data for 8 day post pollination samples is available from the Sequence Read Archive (SRA), accessible through NCBI BioProject ID PRJNA79541. Read data for 0, 4, 12 and 16 dpp samples in SRA as well as assembled contig sequences deposited as Transcriptome Shotgun Assemblies (TSA) and expression profiling data in the Gene Expression Omnibus (GEO) are available through NCBI BioProject ID PRJNA169904.
To estimate relative expression, the number of reads originating from each cDNA library were counted for each contig and reported relative to the total number of reads generated for that library as transcripts per thousand (TPT). The final contigs were subjected to BLASTX search against the green plant subdivision of the NCBI nr protein database and/or the Arabidopsis protein (TAIR9) databases to search for similarity to previously identified genes and assign possible gene functions. BLASTN analysis was performed for highly expressed contigs for which homologs were not identified by BLASTX searches.

Transcriptome analysis
The Classification SuperViewer Tool w/Bootstrap web database [25] was used for GO categorization, determination of normalized frequencies relative to Arabidopsis, and calculation of bootstrap standard deviations, and Pvalues. Princomp procedure SAS 9.1 (SAS Institute, Cary, NC) was used for principal component analysis. The first two principal components, which explain nearly 90% of the total variation were extracted from the covariance matrix. To examine relative gene expression at each age, the portion of reads for that transcript relative to total reads for the transcript, was calculated for each transcript with >30 reads, for each age. Expression profiles were clustered by K-means method using Cluster 3.0 software [46].

qRT-PCR
Total RNA was isolated and assessed for quality and quantity as above. RT reactions were performed using the High Capacity RNA-to-cDNA kit (Applied Biosystems, Foster City, CA). Gene-specific primers (Additional file 7: Table S5) were designed using Primer Express software. ABI Prism 7900HT Sequence Detection System was used for qRT-PCR analysis. Power SYBR Green PCR Master Mix (Applied Biosystems) was used for PCR quantification. Actin from C. sativus was used as an endogenous control and for normalization. Each qRT experiment was repeated three times. PCR products from each gene were quantified with reference to corresponding standard curves.