The botanical family Cucurbitaceae, commonly known as cucurbits, includes several economically and nutritionally important vegetable crops cultivated worldwide, such as cucumber, melon, watermelon and pumpkins, gourds and squashes . The cucurbit family displays a rich diversity of many traits, being primary models for sex expression analysis, for the study of vascular biology and for the analysis of the mechanisms involved in fruit ripening [2–5].
Despite the agricultural and biological importance of cucurbits, knowledge of their genetics and genome has been very limited till now. So far, genomic efforts have largely focused on cucumber and melon. Recently, the whole genome sequencing of the cucumber, C. sativus var. sativus L., has been completed by combining traditional Sanger and next-generation Illumina GA sequencing technologies . Also an effort is in progress through a Spanish Initiative to obtain the whole genome sequence of melon, Cucumis melo L. . Many genomic resources are available for both crops and also for watermelon, Citrullus lanatus (Thunb.) Matsum. & Nakai. BAC libraries, collections of genetic markers, detailed physical and genetic maps, mapping populations, microarrays, sequence databases and mutant collections [8–11] are facilitating the use of cucurbits by the research community. Many genomic resources are available at the web site of the International Cucurbit Genomics Initiative (ICuGI) .
Cucurbita genus (2n = 2 × = 40), that include squashes, gourds and pumpkins, has been less studied. It contains some of the earliest domesticated plant species . Today, three of them, C. pepo L., C. moschata Duchesne, and C. maxima Duchesne, have considerable impact on human nutrition, being appreciated by their nutritional and medical properties [14–17]. In addition to the use of the edible fruits, flowers, leaves, and vine tips are consumed, and seeds are also important as snacks, as a source of edible oil and protein for human and animal consumption, and in the pharmaceutical industry. Squashes are also popular as containers and for ornamental purposes. The economic value of Cucurbita spp. as rootstocks for overcoming soil borne diseases in cucurbits is significant .
C. pepo is the most economically important species within the genus distributed worldwide, and one of the most variable in the plant kingdom. Cultivated C. pepo is considered to comprise two subspecies each one including several cultivar-groups, ssp. pepo (Pumpkin, Vegetable Marrow, Cocozelle, and Zucchini) and ssp. ovifera (Acorn, Scallop, Crookneck, and Straightneck) [19, 20]. Its great economic value is based mainly on the culinary use of the immature fruits as vegetables, often referred to collectively as "summer squashes", but also the Pumpkin and Acorn groups display a major use as mature fruits, known as "winter squashes". The great diversity of uses makes breeding objectives quite variable.
The currently available genetic and genomic tools for Cucurbita are very limited. Until now three genetic maps have been constructed: two maps from inter-specific crosses between C. pepo and C. moschata [21, 22] and the third from an intra-specific cross of C. pepo (a USA oil-Pumpkin variety and an Italian Crookneck variety) . These maps contained mostly RAPDs (Random Amplified Polymorphic DNA) and AFLPs (Amplified Fragment Length Polymorphism) markers. Only recently a collection of genomic microsatellites (Simple Sequence repeats, SSRs) has been developed and used to increase the map density . The last map version comprises 178 SSRs, 244 AFLPs, 230 RAPDs, and two morphological traits (h (hull-less seed) and Bu (Bush growth habit). It contains 20 linkage groups with a map density of 2.9 cM and genome coverage of 86.8%. These SSRs were also used to study synteny between C. pepo and C. moschata .
The lack of denser genetic maps, larger high-throughput marker collections, and suited mapping populations is limiting gene isolation and squash breeding. Many C. pepo genes have been reported, mainly related to fruit quality and resistance to poty- and other viruses and several fungi, such as downy and powdery mildew , but only the transcripts of a few have been cloned and molecularly characterized in individual studies in C. pepo or other Cucurbita spp, for example genes involved in the biosynthesis or signaling pathways of growth regulators, affecting plant development, sex expression and response to stress [27–32].
Single nucleotide polymorphisms (SNPs) are the most abundant variations in genomes and, therefore, constitute a powerful tool for mapping and marker-assisted breeding. These markers are replacing microsatellites in many model and non-model plants for saturating genetic maps [10, 33]. In genomes that have been poorly studied, sequence availability is the limiting factor for the discovery of SNPs.
Expressed sequenced tags (ESTs) represent a valuable sequence resource for research and breeding as they provide comprehensive information regarding the transcriptome. ESTs have played significant roles in accelerating gene discovery, allowing large-scale expression analysis, improving genome annotation, elucidating phylogenetic relationships and facilitating breeding programs for both plants and animals by providing SSRs and SNPs markers [6, 8, 11, 34–37].
Currently, there are more than 66 million ESTs in the NCBI public collection . However, less than 1,000 EST sequences are available for Cucurbita spp (C. maxima, C. moschata and C. pepo), and approximately 500,000 for all the species in the Cucurbitaceae family, most of them of cucumber and melon, included in the ICUGI Cucurbit Genomics Database , as compared to more than 1.5 and 2 million ESTs available for Arabidopsis and maize, respectively.
Recent advances in next-generation sequencing technologies allow us the large scale generation of ESTs efficiently and cost-effectively [39, 40]. There are increasing studies in which 454 technologies, combined or not with Solexa/Illumina, are used to characterize transcriptomes in cereals and legumes [41–43]. Even in model species, such as Arabidopsis, this deep sequencing is allowing to identify new transcripts not present in previous ESTs collections . Also specific transcriptomes are being generated in species for which previous genomic resources are lacking [45–47]. The new transcripts are being used for microarrays design , and also for high throughput SSRs or SNPs identification. SNP detection is performed by aligning raw reads from different genotypes to a reference genome or transcriptome previously available, as in maize, cucumber and even in poliploid species such as Brassica napus [49–51]. De novo assembly of raw sequences coming from a set of genotypes, followed by pairwise comparison of the overlapping assembled reads has also successfully used in species lacking any significant genomic or transcriptomic resources .
In this study, we describe the generation of 49,610 Cucurbita unigenes de novo assembled from about 500.000 ESTs obtained from roots, leaves and flowers of two contrasting C. pepo cultivars (Zucchini and Scallop, belonging to the two C. pepo subspecies) using Roche/454 GS FLX Titanium massive parallel pyrosequencing technology. These unigenes are functionally annotated and represent the first C. pepo transcriptome. They have been also screened for SSR motifs and used to identify a large SNPs collection suited for high-throughput mapping purposes. This sequence will allow accelerating genetics and breeding of this crop. It is also an important advance for cucurbit genomics as it is the first genomic resource for this genus, allowing comparisons among members of the three most economically important cucurbit genera, Cucumis, Citrullus and Cucurbita.