The first de novo transcriptome of pepino (Solanum muricatum): assembly, comprehensive analysis and comparison with the closely related species S. caripense, potato and tomato
© Herraiz et al. 2016
Received: 23 December 2015
Accepted: 25 April 2016
Published: 4 May 2016
Solanum sect. Basarthrum is phylogenetically very close to potatoes (Solanum sect. Petota) and tomatoes (Solanum sect. Lycopersicon), two groups with great economic importance, and for which Solanum sect. Basarthrum represents a tertiary gene pool for breeding. This section includes the important regional cultigen, the pepino (Solanum muricatum), and several wild species. Among the wild species, S. caripense is prominent due to its major involvement in the origin of pepino and its wide geographical distribution. Despite the value of the pepino as an emerging crop, and the potential for gene transfer from both the pepino and S. caripense to potatoes and tomatoes, there has been virtually no genomic study of these species.
Using Illumina HiSeq 2000, RNA-Seq was performed with a pool of three tissues (young leaf, flowers in pre-anthesis and mature fruits) from S. muricatum and S. caripense, generating almost 111,000,000 reads among the two species. A high quality de novo transcriptome was assembled from S. muricatum clean reads resulting in 75,832 unigenes with an average length of 704 bp. These unigenes were functionally annotated based on similarity of public databases. We used Blast2GO, to conduct an exhaustive study of the gene ontology, including GO terms, EC numbers and KEGG pathways. Pepino unigenes were compared to both potato and tomato genomes in order to determine their estimated relative position, and to infer gene prediction models. Candidate genes related to traits of interest in other Solanaceae were evaluated by presence or absence and compared with S. caripense transcripts. In addition, by studying five genes, the phylogeny of pepino and five other members of the family, Solanaceae, were studied. The comparison of S. caripense reads against S. muricatum assembled transcripts resulted in thousands of intra- and interspecific nucleotide-level variants. In addition, more than 1000 SSRs were identified in the pepino transcriptome.
This study represents the first genomic resource for the pepino. We suggest that the data will be useful not only for improvement of the pepino, but also for potato and tomato breeding and gene transfer. The high quality of the transcriptome presented here also facilitates comparative studies in the genus Solanum. The accurate transcript annotation will enable us to figure out the gene function of particular traits of interest. The high number of markers (SSR and nucleotide-level variants) obtained will be useful for breeding programs, as well as studies of synteny, diversity evolution, and phylogeny.
KeywordsSolanum muricatum Transcriptome S. caripense Pepino Potato Tomato Solanaceae Functional annotation Phylogeny Candidate genes Molecular markers
The pepino (Solanum muricatum Aiton) is a neglected herbaceous domesticate native to the Andean region, where wild relatives (Solanum section Basarthrum) are naturally distributed [1, 2]. The pepino is a vegetatively propagated cultigen grown for its fruit. The fruits are juicy berries, of variable shape and color depending on the cultivar, which typically weighs between 100 and 400 g. The fruit has an attractive appearance, with most common cultivars producing fruits with a golden yellow skin covered with purple stripes. Nutritionally, the pepino has very high levels of potassium and vitamin C, and a low calorie content . Apart from being cultivated in its region of origin, the pepino has been introduced in other countries like New Zealand, China and Turkey as a potential new horticultural crop [4, 5].
One of the most interesting features of the pepino is its phylogenetically close relationship with the potato and tomato [6, 7]. In fact, the pepino and its wild relatives in Solanum sect. Basarthrum are part of the tertiary gene pool of both potato and tomato [8, 9]. Cultivated potato, tomato and pepino share the same basic number of chromosomes (x = 12) [10, 11], although tomato and pepino are diploid and most cultivated potato cultivars are polyploid . Importantly, the close phylogenetic relationship among these species allows the use of genomic resources from tomato and potato for pepino breeding, as has been demonstrated with the high transferability of tomato SSRs to the pepino . Reciprocally, the close relationship may also facilitate the use of the pepino as a genetic source for tomato and potato breeding including resistance to several diseases in both crops, and the transfer of parthenocarpy and improved flavor in the tomato [3, 8, 14]. The first step in introgression of pepino traits into tomato, have been obtained via the construction of tomato-pepino somatic hybrids .
Wild relatives of domesticates are a source of variation for improving cultivated species  and for studying the domestication process . In this context, S. caripense Dunal, locally known as “mamoncillo” or “tzimbalo”, is important: it is easily hybridized with the pepino, and hybrids are highly fertile . AFLP and genic DNA sequence studies indicated that S. caripense is one of the species that has been involved in the origin and evolution of the cultivated pepino . In the Andean region the widely distributed S. caripense and other cross-compatible wild relatives frequently grow in close proximity to the fields and gardens of the cultivated pepino; as a consequence, there is evidence of introgression and gene flow among them . Solanum caripense is of particular interest because of traits for pepino breeding such as high levels of soluble solids content , or a high content of bioactive phenolic acids . In addition, some accessions of S. caripense have displayed resistance to Tomato Mosaic Virus (ToMV)  and to Phytophthora infestans , the most important disease of potato , and could offer alternative sources of variation for breeding for resistance to these diseases.
Despite being an important crop in the Andean region during pre-Columbian times [23, 24] and despite its potential as a new crop for many areas with mild climates, there have been few molecular studies of, and few molecular tools developed for S. muricatum - the pepino. Neither the pepino or its significant wild relatives have been thoroughly studied at a genome-wide level in the context of molecular studies and tools. As of July 2015, only 126 nucleotide sequences had been deposited in the NCBI’s GenBank database, all of them resulting from a single study . In addition, there are few studies of molecular markers and their application in pepinos. Some of the previous studies used cp-DNA restriction fragments length polymorphism (RFLP) , AFLP and gene sequence haplotypes , RAPDs  and EST-SSRs derived from tomato , to study diversity in the pepino and its wild relatives. Apart from these studies, an intra-specific low-density genetic map with SNPs taken from the sequencing of a set of COSII was produced in the pepino wild relative S. caripense with the aim of mapping the resistance to Phytophthora infestans .
High throughput sequencing of transcriptomes (RNA-Seq) has opened the way to study the genetic and functional information in neglected crops and species. RNA-Seq is genome-independent and is especially useful for analyzing the transcriptome of species without complete genome information or a reference genome [26, 27], as is the case of the pepino and wild relatives. In this context, RNA-Seq can be helpful for: (1) listing the transcripts and other RNAs from one or several tissues; (2) investigating the transcriptional structure of genes, splicing patterns, and gene isoforms; (3) studying post-transcriptional modification and mutations; and (4) quantifying gene expression . The transcriptomics studied have provided a basis for: scanning the evolution of polyploidy in plants [29, 30], study of phylogenies in some families including the Solanaceae , comparing patterns associated with domestication , and finally for developing markers en masse [33–35].
In the present work, we used the Illumina pair-end sequencing technology to perform RNA-Seq of one modern cultivar of the pepino and of one accession of the pepino wild relative S. caripense. We obtained almost 111 million reads including sequencing of both species. Our transcriptome analysis included de novo assembly, structural and functional annotation and comparison with tomato and potato genomes [36, 37], providing us the opportunity to establish a dated phylogeny of the pepino compared with related species. Candidate genes, mainly from tomato agronomic traits of interest have also been found. These genes can provide us an effective comparative approximation of patterns of selection in domestication, and will allow us to identify genes useful for the genetic improvement of the pepino. Another important goal is the discovery of the high throughput markers (SSR and SNPs). These gene-derived markers are important functionally in that they can provide potential changes in the proteins expressed, and they offer an essential tool to be utilized in the construction of genetic maps, and they can be used in marker-assisted selection. The rest of the dataset will serve as a public information platform for gene expression and genomics in the pepino and their related species, particularly useful for future studies in pepino, potato, and tomato genomics and breeding. This is a seminal work that will pave the way to broader genomic studies in pepino, a neglected species with great interest for future development, and as a reservoir of important genes for tomato and potato improvement as well.
Transcriptome sequencing (mRNA-Seq) output and assembly
Summary of raw and clean reads after processing for S. muricatum and S. caripense
Total raw reads
58,327,154 × 2
52,646,045 × 2
Total raw reads data size (Gb)
Total clean reads
33,963,075 × 2
36,228,181 × 2
Total clean reads data size (Gb)
Summary of the Solanum muricatum transcriptome assembly. After assembly in the first group (Transcripts), and after filtering by level of expression (Most expressed transcripts)
Most expressed transcripts
Functional annotation summary of the pepino sequences over protein databases. First the most expressed transcripts were annotated in Swiss-Prot database. Then, unpaired transcripts in this annotation were evaluated in the next database, ITAG2.4. And finally, the unpaired at this level, were evaluated in the Uniref90 database
Number of transcripts
% of total
Annotated in Swiss-Prot
Annotated in ITAG2.4
Annotated in UniRef90
Total annotated in protein databases
Using Blast2GO against the NR database, we recovered gene ontology (GO) terms and enzyme commission numbers (EC) for the most expressed transcripts or unigenes in S. muricatum. A total of 197,221 GO terms were assigned to 37,031 transcripts. The distribution of unigenes relative to the number of GOs to which they were assigned is shown in Additional file 5: Figure S2. Slightly more than half of the unigenes (50.7 %) have between 1 and 5 GO terms, and 12 % have more than 10 GO terms. The maximum number of GO terms annotated in a unigene was 45. Among all the GO terms extracted, 89,060 (45.2 %) belong to the molecular function class (MF), 59,856 (30.3 %) to biological process class (BP) and 48,305 (24.5 %) to cellular components class (CC).
In order to understand the function of the unigenes in pepino, a BLASTX search against the KEGG protein database with a cut-off e value of 1e−5 was performed. Out of the 75,832 transcripts, 16,027 were annotated in the KEGG pathway database, and assigned to 144 unique pathways. These pathways include amino acid metabolism, sugar metabolism, fatty acid metabolism, as well as biosynthesis of secondary metabolites like flavonoids and terpenoids. Our results show that the largest three pathway groups were purine metabolism, starch and sucrose metabolism, and phenylalanine metabolism (see Additional file 7). Given that the pepino is largely a dessert fruit in which sugars and bioactive compounds are important for quality , we paid special attention to the pathways pertaining to starch and sucrose metabolism, and to biosynthesis of carotenoids, anthocyanins, and several vitamins. A considerable number of genes were related to relevant metabolic pathways, including starch and sucrose metabolism (map00500, 727 genes), carotenoid biosynthesis (map00906, 33 genes), anthocyanin biosynthesis (map00942, 31 genes), ascorbate and alderate metabolism (map00053, 123 genes), vitamin B6 metabolism (map00750, 28 genes), retinol (vitamin A) metabolism (map00830, 89 genes), thiamine (vitamin B1) metabolism (map00730, 325 genes), riboflavin (vitamin B2) metabolism (map00740, 117 genes), and biotin (vitamin H) metabolism (map00780, 98 genes). Finally, we compared the number of genes assigned for every KEGG pathway in our analysis with the analogous genes assigned in tomato and potato genomes. This comparison like other comparisons we made, indicated many similarities, and implies a not-surprising close relationship among the three species (Additional file 7). These data points also indicate that we have a good representation of the transcriptome. The number of genes annotated in the pepino was notably lower than in the tomato and potato in very few pathways. This is because some processes may not be properly represented in our samples because they derive from mRNA of three tissues, and we do not have a representation of the whole genome. Other processes instead, are better represented. The results of this comparison is presented in Additional file 7.
Candidate genes studied affecting traits of importance in different Solanaceae. Traits and genes affecting inflorescence, fruit stripes, fruit shape, anthocyanins route, chlorogenic acid pathway, saponines pathway, and sucrose accumulation are included. More information the Candidate genes section of Material and Methods and in Additional file 8: Table S1
Gene - F-box protein
Gene - microtubule cytoskeleton organization
SNP in downstream-regulatory
Gene - Kinase
Gene - Hydroxylase
Gene - Acyltransferase
Gene - Glucosyltransferase
Gene - Anthocyanidin synthase
Gene - Dihydroflavonol 4-reductase
Gene - Flavanone 3-hydroxylase
Gene - Chalcone isomerase
Gene - Chalcone synthase
Gene - Chalcone synthase
Gene - Chalcone synthase
Gene - Acyltransferase
Chlorogenic acid pathway
Gene - 4-Coumarate-CoA ligase
Gene - Transferase
Gene - Transferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Glycosyltransferase
Gene - Acid invertase
The majority of the genes described in other related species are present in our assembled transcriptome (83.8 %). It should be noted that each of these genes are present too in S. caripense. In most cases, there are few differences in these sequences and only nine are identical among the two species. These results are summarized in Additional file 8: Table S1. Interestingly, the greatest differences between the cultigen and the wild species were found in genes related to characters like fruit stripes, anthocyanins and chlorogenic acid synthesis. These differences obviously relate to characters selected in artificial selection by ancient domesticators, and they also offer a clear idea about the variability available in the wild species that may be utilized in pepino breeding.
Comparison with potato and tomato genomes
We have generated gene model predictions comparing our assembled transcriptome of S. muricatum with the tomato genome . This alignment of the unigenes to the tomato genomic DNA was performed using the est2genome software, and the ORFs annotations were carried out using ESTScan software . A large number (48,440) of the most expressed transcripts were predicted to have one ORF (63.9 %). On the other hand, we predicted the presence of introns in our unigenes. We found 130,528 in a total of 24,979 unigenes (32.9 %), which means 5.2 introns per unigene, with a maximum number of 19. Knowledge of the positions of these intronic regions is particularly important for discovery of SNPs and INDELs. The previously generated intron map allows us to discard those that are located in the vicinity of an intron, because that would make it difficult to design the primers to amplify these regions. The annotation results (ORFs, introns, descriptions, GO terms, orthologs, nucleotide-level variants and SSR) are deposited in Additional file 4 in GFF3 format.
Molecular phylogeny among Solanaceae species
We used five genes for a phylogenetic study of the pepino and five other Solanaceae crops (potato, tomato, eggplant, capsicum pepper and tobacco). These genes were waxy or GBSSI [7, 51], SAMT , ADH, β-amylase and CesA . After confirming that these genes are represented in our transcriptome, they were concatenated and aligned using ClustalW. The total length of the sequence analyzed was of 9407 bp including the five genes. Variations among the sequences were found for a total of 1809 positions, of which 507 were parsimony-informative, i.e., these sites contain at least two types of nucleotides, and at least two of them occur with a minimum frequency of two. Results of this alignment are presented in Additional file 9.
SSR and nucleotide-level variants discovery
Single sequence repeat (SSR) statistics according to the type of motif, the percentage of each motif and the amount of unigenes with SSRs. Complete information about these markers is shown in Additional file 10
Number of Di-SSR
Number of Tri-SSR
ACT, ACG, CCG
Number of Tetra-SSR
High throughput sequencing of both transcriptomes has made possible to obtain a large collection of SNPs and INDELs. The variant calling was carried out using the default parameters recommended by the Freebayes software , which allows distinguishing and recognizing sequence variations from sequencing errors and mutations introduced during cDNA synthesis. The implementation of several filters described in the Methods has also allowed obtaining markers of potentially high quality, allowing their use in high throughput genotyping platforms . Apart from this, the CAPS filter can be especially useful when other methods for SNPs detection are not available.
Single nucleotide variants statistics for the S. muricatum and S. caripense transcriptomes
Solanum muricatum (SL)
Solanum caripense (EC-40)
S. muricatum (SL) vs. S. caripense (EC-40)
Single nucleotide polymorphism (SNPs) statistics. Type and number of transitions and transversions are shown for high quality SNPs identified in each species and between them
53,490 (62.9 %)
31,363 (36.9 %)
212 (0.3 %)
Distribution of pepino nucleotide-level variants on chromosomes of tomato and potato
We employed NGS technology for the first time for sequencing the transcriptome of pepino and its related wild species S. caripense. There is no reference genome for these species, consequently we used successfully the de novo assembly method [45, 57]. Using this methodology a large number of most expressed transcripts or unigenes (75,832) were obtained, with an average length of 704 bp. We were pleased that the number of unigenes obtained in this project is similar to or better than that obtained in previous studies using similar technologies, thus demonstrating the quality and potential utility of our work, both in sample preparation and assembling protocol [37, 38]. The high initial quality of the unigenes we have obtained is of fundamental importance for the remainder of the work.
The mRNA-Seq methodology gives the G/C content (ratio of guanine and cytosine) among unigenes. As a consequence of the nature of the chemical bond between G and C, this base pair is considered more stable than the A/T base pair. Thus, during evolution, variation in the G/C content would accumulate more slowly (although this assertion has been contested ). Accordingly, given the G/C content, markedly variable among different organisms , would be an indicator of closeness between species. The G/C contents obtained (41.7 % in S. muricatum and 42.5 % in S. caripense) were consistent with values found in other Solanum . In particular, the tomato G/C content for cDNA reported in previous studies was 40.3 % , and for potato, the value is similar (43.1 % ). This suggest that the S. muricatum transcriptome represents a typical example of a Solanaceae transcriptome, thereby raising the probability of successfully employing our data for broader comparative studies in this genus.
The assembled unigenes were functionally annotated in order to better understanding the role of the genes represented. The knowledge of the functional role of the genes increases the probability of utilizing them for pepino breeding and demonstrates the possible ways in which the wild relatives of pepino could be of interest in developing new varieties. The annotation percentages in the different protein databases were similar to those obtained in other similar projects [43–45]. Most of the sequences were annotated in Swiss-Prot . Since SwissProt is a highly-curated, highly-crossreferenced, and non-redundant database, that has a lower error rate than the automatically created databases, which gives high annotation reliability. The percentage of protein annotations in the other databases (ITAG2.4  and UniRef90 ) was smaller, as expected. The GO annotation indicated that the majority of the unigenes were involved in molecular, cellular and biological processes. Within the molecular function category, the majority of the unigenes were assigned to different binding processes, hydrolase activity and catalytic activity. The majority of genes in the cellular component sequences functioned in cell and organelle structures (Fig. 3 and Additional file 6).
The KEGG pathway database is a resource for the systematic analysis of gene functions in terms of networks of genes and molecules in cells and their variants specific to particular organisms . In our case, this analysis included 144 pathways involving 16,027 unigenes, importantly including pathways key to success in a dessert fruit like pepino. For example we had representation of important pathways like starch and sucrose metabolism, biosynthesis of carotenoids and anthocyanins and several vitamins (B1, B2, B6, H and A). Identifying changes in these genes and associating them with phenological differences will enable us to more efficiently manage future breeding programs involving these species . Perhaps because we used transcripts representing only three tisues (i.e., not the whole genome), the comparison of the KEGG pathways we identified with those obtained from the potato and tomato genomes shows that some processes were not very well represented in the pepino or S. caripense. Despite this, our transcriptome data are an excellent representation of the metabolic processes that, with further analysis of these pathway-related genes, will improve our understanding of the pepino features, some of them unique and others shared with the rest of the Solanaceae [23, 62]. Obviously, the transcriptome data can contribute to expanding and enhancing the breeding resources for and with these species.
As a preliminary test of the application of our results to breeding programs, a set of candidate genes described from other Solanaceae were evaluated for their presence in the pepino transcriptome, and the associated nucleotide changes in S. caripense. These genes selected were either characters associated with domestication or those with potential for enhancement through breeding. The caveats we worked with in running these preliminary tests included recognition that there are large differences in nucleotide changes between S. muricatum and S. caripense in some characters of interest, like anthocyanins and chlorogenic acid synthesis [63–65]. There are significant morphological and fruit composition differences between the pepino and wild S. caripense; for example, differences related to plant habit, leaf complexity, trailing habit and seediness of the fruits [13, 66]. Correspondingly, our data demonstrate the existence of many differences at genomic level, including genes that are of great interest for breeding. Thus, we think this foundational work will provide the basis for broader studies, such as those where an in-depth and accurate phenotypic characterization can be related to changes at the nucleotide level. These kinds of studies may be helpful for understanding the genetics of these key characters, for providing a foundation for positive selection in the domestication process, and for establishing a breeding program [32, 67, 68]. For example, we have found concentrations of chlorogenic acid, a powerful bioactive molecule with application to human health as an antioxidant [53, 54], to be much higher in S. caripense than in the cultivated S. muricatum (unpublished data). Sequence differences found can be used as functional markers for marker-assisted breeding to transfer alleles from S. caripense to pepino. In this regard, differentially expressed genes (DEG) analysis , could be interesting for clearing up differences at the expression level in several developmental stages and tissues, which may provide relevant information about pepino domestication. Furthermore, the information obtained may be used for tomato or potato improvement in the near future using modern technologies for gene editing like CRISPR/Cas , or by transformation using cisgenic approaches .
Another issue to consider in a de novo assembly of a transcriptome is the genic structure. In this regard, we have generated gene model predictions comparing our assembled transcriptome of S. muricatum with the tomato genome . Because these two species are very closely related [6, 7], this is a valid approach until, full genome sequencing is available.
Other genes present in our assembled transcriptome are used to study phylogenetic relationships within this large, and economically important, family (for example: e waxy or GBSSI [7, 51], SAMT , ADH, β-amylase and CesA ), and the sequence differences in pepino with other Solanaceae were used to elucidate its relationship. Further confidence in our results comes from the fact that they are consistent with previous studies such as Spooner et al. , Wang et al.  and Garzon-Martinez et al. , and the fact that the divergence times estimated are congruent with data deposited in TimeTree  (a public knowledge-base of divergence times among organisms). Our data indicate that the divergence among all the Solanaceae studied here (pepino, tomato, potato, eggplant, pepper and tobacco) occurred in the last 24 million years. The pepino and the tomato-potato clade shared a common ancestor from which they diverged 9.26 Mya. Other divergence estimates indicate that the eggplant, an African member of the family Solanaceae, and the rest of the American Solanum, occurred 14 Mya.
The total of potential SSRs obtained was 1072 in 1049 unigenes; that is, approximately 1.4 % of the transcripts contain SSRs (Table 5). The number of SSRs are slightly lower than expected, or at least lower than obtained in similar studies [73, 74]. This may be due to the application of strongest criteria on our study, the advantage being that we should have obtained markers of better quality. In any case, the number of markers is adequate to develop a high density genetic map, and for genetic diversity studies and marker assisted breeding studies.
As stated above, tri-nucleotide repeats were the most commonly found repetitions in our transcriptome accounting for almost 66 % of the SSRs identified. The tri-nucleotide repeats may be the most common because these SSRs do not change the frameshift and mutations have less dramatic effect . There is considerable evidence that genic SSRs have important functions. For example, it has been postulated that SSRs may affect chromatin organization, and they also may be related to regulation of gene activity, recombination, and DNA replication . Extra-genic SSR markers have several advantages beyond genomic SSRs because they are related to codifying sequences, and thus can be used as candidate genes to study association with phenotypic variation. Extra-genic SSR markers can be useful for genetic diversity studies, as demonstrated for pepino when using tomato EST-SSRs , for the development of genetic maps and for fingerprinting commercial cultivars, breeding lines or landraces .
By applying several bioinformatic approaches, we obtained a total of 11,735 SNPs and 766 INDELs in Solanum muricatum, and 30,668 SNPs and 1494 INDELs in S. caripense, as well as 84,972 SNPs and 4058 INDELs between the two species (interspecific) (Table 6). These nucleotide-level variant markers show that both clones present an important degree of heterozygosis, although the highest number of intraclone markers was obtained in S. caripense. This makes sense, given that S. caripense is an obligately allogamous wild species with a gametophytic self-incompatiblity system [18, 78, 79].
The large number of SNPs markers developed can be readily used in pepino research. These markers exhibit co-dominant inheritance and due to their abundance, they are widely used for different applications. Some of these applications include: diversity studies, development of saturated molecular genetic and physical maps, identification of QTLs or genes controlling traits of economic importance, marker-assisted selection, or association mapping with genome-wide association studies (GWAS) .
This study constitutes the first genomic resource for pepino, a cultigen closely related to tomato and potato [6, 62]. These genomic studies are especially important because they promote the understanding of crop evolution in this group, and pepino enhancement. Furthermore, because the pepino is part of the tertiary genepool of tomatoes and potatoes , these genomic studies, provide a wide array of genomic information that may be useful for breeding in those groups as well. The high quality of the transcriptome presented here will enhance comparative studies within the genus Solanum, and will be useful for future annotations of the S. muricatum genome sequence. The detailed annotation provided in this work will facilitate the use of sequenced unigenes for gene discovery, in particular for traits of interest within pepino (such as soluble solids content, chlorogenic acid content and fruit size). In addition to the pepino, sequencing of the transcriptome of its sister wild relative S. caripense has allowed identification of a large number of molecular markers (SSRs and nucleotide-level variants), within each species, as well as between them. The filtering process applied in the search of these variants has facilitated the selection of the most suitable markers for high throughput genotyping platforms. Our results are an example of how high throughput sequencing technologies can contribute to knowledge of domesticates with no or limited genomic information, but where there are closely related species with the whole genome sequence available. Our assembled transcriptome, and the large collection of markers found, will enhance pepino breeding, facilitate molecular studies in this crop, and will be useful to develop the first genetic map of the pepino. Ultimately, the genomic information obtained will be of interest for tomato and potato breeding and for studying genomic changes during evolution and crop domestication in these important crops.
RNA preparation, Illumina paired-end cDNA library construction and sequencing
Total RNA was extracted from each tissue using the TRI reagent (Sigma-Aldrich, St. Louis, USA). RNA integrity was confirmed by agarose electrophoresis, and RNA quantification was performed using a Nanodrop Spectrophotometer ND-1000 (Thermo Scientific, Wilmington, USA). For each of the two accessions, we combined equivalent amounts of RNA from each tissue into two pools. A total of 10 μg of total RNA for each pool was sent to Macrogen Korea (Seoul, South Korea) for Illumina RNA-seq performed in HiSeq 2000 sequencer (Illumina, San Diego, USA).
The cDNA library was constructed according to the manufacturer’s instructions (Illumina/Hiseq-2000 RNA-seq) by Macrogen Korea. Essentially, the mRNA molecules containing poly (A) were purified using Sera-mag Magnetic Oligo (dT) Beads from the RNA samples. A fragmentation buffer was added to break the mRNA into small fragments. Using these fragments as templates, the first strand of cDNA was synthesized. The second strand of cDNA was synthesized using the buffers containing dNTPs, RNase H, and DNA polymerase I. The synthesized cDNA was purified and connected with the sequencing adapters. Finally, a range of cDNA fragments (200 ± 25 bp) were excised from an agarose gel using a gel extraction kit. Then, the library was sequenced using the Illumina/Hiseq-2000 RNA-seq. These raw sequences are available at the NCBI Sequence Read Archive (SRA) as stated in the section titled, “Availability of Data and Material”.
DNA sequence processing and de novo transcriptome assembly
In the case of S. muricatum Sweet Long, we found two sequences overexpressed after initial quality filtering. Blast against databases (NCBI-GenBank) showed that these sequences belong to the Pepino mosaic virus (PepMV), although plants were asymptomatic. These reads were eliminated using Bowtie2 . We did not find other sequences overexpressed that would indicate the presence of more contaminants.
High quality reads are required for better assembling. We performed the following processes: trimming of adapter contamination, filtering of reads with “N” and trimming of low quality nucleotides Q ≥ 20 using NGS_CRUMBS (http://bioinf.comav.upv.es/ngs_crumbs).
We used Trinity software [26, 27] to build the primary assembly. This first assembly was post-processed with the following steps. First, we reduced the redundancy using CAP3 . Then we removed low complex transcripts using DUST score. Next, we split some of the subcomponents into new subclusters, using blast and transitivity properties, as a way to enhance the results previously obtained. In this way if one transcript is similar to a second, and this second is similar to a third and so on, all of them are the same unigene, and they can be merged. For this purpose we used a handmade script. Finally, we removed low expression transcripts using RSEM (RNA-Seq by Expectation-Maximization) . From the final assembly, we made a subset selecting only the most expressed transcript from each Trinity transcript cluster. A detailed explanation of the steps undertaken in the post-processed stage is shown in Additional file 12.
Structural and functional annotation
Annotation of the assembled transcript sequences was performed using the bioinformatic application ngs_backbone , based in the BLASTX algorithm  against different databases. The order was established prioritizing handmade annotation databases. Accordingly, the databases used, in order, were Swiss-Prot , ITAG2.4 , and UniRef90 . This analysis was released on February 2015. The first analysis compared all transcripts with the first database, the second compared transcripts not paired in the preceding and so on. A typical blast cut-off e-value of 1e−20 was used and other details of this analysis are showed in Additional file 12.
Additionally, we performed a functional classification of the transcripts following the Gene Ontology (GO) scheme using Blast2GO . This analysis covers three steps as follows: (1) sequence alignment via Blastx with the NR (Non Redundant) database (cut-off e-value of 1e−20), (2) gene ontology mapping, and (3) functional annotation, including molecular functions, biological processes, and cellular components . In this case, to sum up the functional information of our pepino transcriptome, we performed a plant specific GO slim. Additionally, when possible, Blast2GO gives an Enzyme Commission number (EC number). Meanwhile, KEGG pathways were retrieved from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database (version 73.0, January 1, 2015). This KEGG analysis includes a collection of manually drawn pathway maps representing experimental knowledge on metabolism and various other functions of the cell and the organism.
Taking the pepino transcriptome as a reference database, we evaluated the sequences of several genes associated with breeding characters of interest found in others related species. In total, we selected: 12 genes related to fruit shape , two related to inflorescence type , 11 with the anthocyanins synthesis route , 13 related to the synthesis of saponines , four with the chlorogenic acid synthesis pathway , one with sucrose accumulation  and one related to fruit stripes . Some of these genes are part of genic families; consequently we evaluated the principal gene and the rest of its family. The total number of sequences evaluated was 115. Description of the genes and their features are shown in Table 4 and in Additional file 8: Table S1.
Using Blastn (cut-off of 1e−60), these genes were compared with pepino unigenes to determine its presence or absence in our assembled transcriptome. Once defined as part of our transcriptome, they were compared with the transcripts of S. caripense in order to recognize nucleotide variants between these two species.
Comparison with tomato and potato genomes
The whole of the most expressed transcripts were compared to the S. tuberosum and S. lycopersicum genomes using Blastn (cut-off value of 1e−20) in order to obtain the physical position of our assembled sequences. Gene model prediction was performed using the Est2genome software [93, 94] which allows EST sequences to be aligned with genomic DNA sequences with high efficiency. The gene model prediction takes place by sequence homology with the tomato genome. Additionally we used the open reading frame detector ESTScan  for annotation of ORFs.
Circos, software that allows visualization of data and information in a circular layout , was used to represent our sequences over the tomato and potato reference genomes; this enabled visual estimation of the distribution of our codifying sequences.
Molecular phylogeny between Solanaceae species
Using sequence data available in databases, we chose five nuclear protein-coding genes to investigate phylogenetic relationships with five of the most important Solanaceae crops (potato, tomato, eggplant, pepper and tobacco), in addition to the pepino. These genes were: (1) the widely used granule-bound starch synthase gene (waxy or GBSSI) [7, 51], (2) the salicylic acid methyltransferase gene (SAMT) , (3) the alcohol dehydrogenase gene (ADH), (4) the β-amylase gene, and (5) the cellulose synthase gene (CesA) . Once isolated, the genes were concatenated one after the other and aligned using ClustalW2, a multiple sequence alignment program . The alignment file generated was used to build a phylogenetic tree using the maximum likelihood distance with 500 bootstrap replications using MEGA6 . Divergence times were estimated with the same program, and the tomato/potato split (5.1–7.3 million years ago) was used for time calibration .
SSR and nucleotide-level variants discovery
Mapping reads of S. caripense to reference transcriptome of S. muricatum. Mining SSRs was carried out using the Sputnik software , specially designed for this function. Once the contigs with SSRs were isolated, they were filtered by quality, closeness to introns, number of repetitions and position in the genome of tomato.
Nucleotide-level variants calling (SNPs and INDELs) was performed comparing the assembled transcriptome of S. muricatum with the clean reads of both species. We mapped the reads with Bowtie2. For SNPs and INDELs calling we used Freebayes . Several filters, shown in Additional file 11, were applied in order to maximize the successful validation and its future use in high throughput genotyping platforms. First, filters IV0, IV1 and IV2 were used to select the variants in and between the two species. The filter vks was used to select authentic SNPs on the one hand and INDELs on the other. Other filters were used for optimizing their future use in high throughput genotyping platforms (Additional file 8: Table S1). Circos  was also used for positioning the density (variants per Mb) and distribution of all these markers over both reference genomes.
Availability of data and materials
The raw reads data are available at NCBI Sequence Read Archive (SRA) with accession number SRS1052501 (S. muricatum) and SRS1054035 (S. caripense), available at http://www.ncbi.nlm.nih.gov. The transcriptome assembly of S. muricatum is deposited into the NCBI Transcriptome Shotgun Assembly (TSA) repository within the bioproject number PRJNA294064.
amplified fragment length polymorphism
conserved ortholog set
clustered regularly interspaced short palindromic repeats EC, Enzyme commission
expressed sequence tag
Kyoto Encyclopedia of Genes and Genomes
million years ago
open reading frame
quantitative trait locus
random amplification of polymorphic DNA
single nucleotide polymorphism
soluble solid content
simple sequence repeat
The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the PAB (Andalusian Bioinformatics Platform) center located at the SCBI of the University of Malaga (www.scbi.uma.es). Pietro Gramazio is grateful to the Universitat Politècnica de València for a pre-doctoral (Programa FPI de la UPV-Subprograma 1/2013 call) contract.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Anderson GJ, Jansen RK, Kim Y. The origin and relationships of the pepino, Solanum muricatum (Solanaceae): DNA restriction fragment evidence. Econ Bot. 1996;50:369–80.View ArticleGoogle Scholar
- Anderson GJ, Martine CT, Prohens J, Nuez F. Solanum perlongistylum and S. catilliflorum, new endemic Peruvian species of Solanum, Section Basarthrum, are close relatives of the domesticated pepino, S. muricatum. Novon. 2006;16:161–7.View ArticleGoogle Scholar
- Rodríguez-Burruezo A, Prohens J, Fita AM. Breeding strategies for improving the performance and fruit quality of the pepino (Solanum muricatum): A model for the enhancement of underutilized exotic fruits. Food Res Int. 2011;44:1927–35.View ArticleGoogle Scholar
- Yalçin H. Effect of ripening period on composition of pepino (Solanum muricatum) fruit grown in Turkey. Afr J Biotechnol. 2010;9:3901–3.Google Scholar
- Abouelnasr H, Li Y-Y, Zhang Z-Y, Liu J-Y, Li S-F, Li D-W, Yu J-L, McBeath JH, Han C-G. First Report of Potato Virus H on Solanum muricatum in China. Plant Dis. 2014;98:1016.View ArticleGoogle Scholar
- Spooner DM, Anderson GJ, Jansen RK. Chloroplast DNA evidence for the interrelationships of tomatoes, potatoes, and pepinos (Solanaceae). Am J Bot. 1993;80:676–88.View ArticleGoogle Scholar
- Sarkinen T, Bohs L, Olmstead RG, Knapp S. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree. BMC Evol Biol. 2013;13:214.View ArticlePubMedPubMed CentralGoogle Scholar
- Nakitandwe J, Trognitz FCH, Trognitz BR. Genetic mapping of Solanum caripense, a wild relative of pepino dulce, tomato and potato, and a genetic resource for resistance to potato late blight. In: VI International Solanaceae Conference: Genomics Meets Biodiversity 745. 2006. p. 333–42.Google Scholar
- Sakomoto K, Taguchi T. Regeneration of intergeneric somatic hybrid plants between Lycopersicon esculentum and Solanum muricatum. Theor Appl Genet. 1991;81:509–13.View ArticlePubMedGoogle Scholar
- Bernardello LM, Anderson GJ. Karyotypic studies in Solanum section Basarthrum (Solanaceae). Am J Bot. 1990;77:420–31.View ArticleGoogle Scholar
- Arumuganathan K, Earle ED. Nuclear DNA content of some important plant species. Plant Mol Biol Report. 2004;9:208–18.View ArticleGoogle Scholar
- Spooner DM, Rodríguez F, Polgár Z, Ballard HE, Jansky SH. Genomic origins of potato polyploids: GBSSI gene sequencing data. Crop Sci. 2008;48(Supplement to crop science):27–36.Google Scholar
- Herraiz FJ, Vilanova S, Andújar I, Torrent D, Plazas M, Gramazio P, Prohens J. Morphological and molecular characterization of local varieties, modern cultivars and wild relatives of an emerging vegetable crop, the pepino (Solanum muricatum), provides insight into its diversity, relationships and breeding history. Euphytica. 2015;206:301–18.View ArticleGoogle Scholar
- Trognitz FC, Trognitz BR. Survey of resistance gene analogs in Solanum caripense, a relative of potato and tomato, and update on R gene genealogy. Mol Genet Genomics. 2005;274:595–605.View ArticlePubMedGoogle Scholar
- Hajjar R, Hodgkin T. The use of wild relatives in crop improvement: a survey of developments over the last 20 years. Euphytica. 2007;156:1–13.View ArticleGoogle Scholar
- Doebley JF, Gaut BS, Smith BD. The molecular genetics of crop domestication. Cell. 2006;127:1309–21.View ArticlePubMedGoogle Scholar
- Blanca JM, Prohens J, Anderson GJ, Zuriaga E, Canizares J, Nuez F. AFLP and DNA sequence variation in an Andean domesticate, pepino (Solanum muricatum, Solanaceae): implications for evolution and domestication. Am J Bot. 2007;94:1219–29.View ArticlePubMedGoogle Scholar
- Rodríguez-Burruezo A, Prohens J, Nuez F. Wild relatives can contribute to the improvement of fruit quality in pepino (Solanum muricatum). Euphytica. 2003;129:311–8.View ArticleGoogle Scholar
- Herraiz FJ, Villaño D, Plazas M, Vilanova S, Ferreres F, Prohens J, Moreno DA. Phenolic profile and biological activities of the pepino (Solanum muricatum) fruit and its wild relative S. caripense. Int J Mol Sci. 2016;17:394.View ArticlePubMed CentralGoogle Scholar
- Leiva-Brondo M, Prohens J, Nuez F. Characterization of pepino accessions and hybrids resistant to Tomato mosaic virus (ToMV). J Food Agric Env. 2006;4:138.Google Scholar
- Nakitandwe J, Trognitz F, Trognitz B. Reliable allele detection using SNP-based PCR primers containing Locked Nucleic Acid: application in genetic mapping. Plant Methods. 2007;3:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Andrivon D. The origin of Phytophthora infestans populations present in Europe in the 1840s: a critical review of historical and scientific evidence. Plant Pathol. 1996;45:1027–35.View ArticleGoogle Scholar
- Prohens J, Ruiz JJ, Nuez F. The pepino (Solanum muricatum, Solanaceae): A “new” crop with a history. Econ Bot. 1996;50:355–68.View ArticleGoogle Scholar
- Heiser CB. Origin and Variability of the Pepino (Solanum Muricatum). In: Preliminary Report. 1964.Google Scholar
- Ahmad H, Khan A, Muhammad K, Nadeem MS, Ahmad W, Iqbal S, Nosheen A, Akbar N, Ahmad I, Que Y. Morphogenetic study of pepino and other members of solanaceae family. Am J Plant Sci. 2014;5:3761.View ArticleGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–512.View ArticlePubMedGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63.View ArticlePubMedPubMed CentralGoogle Scholar
- McKain MR, Wickett N, Zhang Y, Ayyampalayam S, McCombie WR, Chase MW, Pires JC, de Pamphilis CW, Leebens-Mack J. Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin of bimodal karyotypes in Agavoideae (Asparagaceae). Am J Bot. 2012;99:397–406.View ArticlePubMedGoogle Scholar
- Barker MS, Vogel H, Schranz ME. Paleopolyploidy in the Brassicales: analyses of the Cleome transcriptome elucidate the history of genome duplications in Arabidopsis and other Brassicales. Genome Biol Evol. 2009;1:391–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Rensink W, Lee Y, Liu J, Iobst S, Ouyang S, Buell CR. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts. BMC Genomics. 2005;6:124.View ArticlePubMedPubMed CentralGoogle Scholar
- Koenig D, Jimenez-Gomez JM, Kimura S, Fulop D, Chitwood DH, Headland LR, Kumar R, Covington MF, Devisetty UK, Tat A V, Tohge T, Bolger A, Schneeberger K, Ossowski S, Lanz C, Xiong G, Taylor-Teeples M, Brady SM, Pauly M, Weigel D, Usadel B, Fernie AR, Peng J, Sinha NR, Maloof JN. Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato. Proc Natl Acad Sci U S A. 2013;110:E2655–62.View ArticlePubMedPubMed CentralGoogle Scholar
- Blanca JM, Cañizares J, Ziarsolo P, Esteras C, Mir G, Nuez F, Garcia-Mas J, Picó MB. Melon transcriptome characterization: Simple sequence repeats and single nucleotide polymorphisms discovery for high throughput genotyping across the species. Plant Genome. 2011;4:118–31.View ArticleGoogle Scholar
- Blanca J, Canizares J, Roig C, Ziarsolo P, Nuez F, Pico B. Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae). BMC Genomics. 2011;12:104.View ArticlePubMedPubMed CentralGoogle Scholar
- Howe GT, Yu J, Knaus B, Cronn R, Kolpak S, Dolan P, Lorenz WW, Dean JF. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation. BMC Genomics. 2013;14:137.View ArticlePubMedPubMed CentralGoogle Scholar
- Consortium TG. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012;485:635–41.View ArticleGoogle Scholar
- Potato Genome Sequencing Consortium. Genome sequence and analysis of the tuber crop potato. Nature. 2011;475:189–95.View ArticleGoogle Scholar
- Anderson GJ, Jansen RK. Biosystematic and molecular systematic studies of Solanum section Basarthrum and the origin and relationships of the pepino (S. muricatum). In: Proceedings of the VI Congreso Latinoamericano de botanica: Mar del Plata, Argentina. 1994. p. 2–8.Google Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Swiss Prot [http://web.expasy.org/docs/swiss-prot_guideline.html]. Accessed 29 Apr 2016.
- SGN release versionITAG2.4 [ftp://ftp.sgn.cornell.edu/tomato_genome/annotation/]. Accessed 29 Apr 2016.
- Uniref [http://www.ebi.ac.uk/uniprot/database/download.html]. Accessed 29 Apr 2016.
- Wei D-D, Chen E-H, Ding T-B, Chen S-C, Dou W, Wang J-J. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein) using transcriptome sequences. PLoS One. 2013;8:e80046.View ArticlePubMedPubMed CentralGoogle Scholar
- Li D, Deng Z, Qin B, Liu X, Men Z. De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.). BMC Genomics. 2012;13:192.View ArticlePubMedPubMed CentralGoogle Scholar
- Lulin H, Xiao Y, Pei S, Wen T, Shangqin H. The first Illumina-based de novo transcriptome sequencing and analysis of safflower flowers. PLoS One. 2012;7:e38653.View ArticlePubMedPubMed CentralGoogle Scholar
- Mitraki A, Barge A, Chroboczek J, Andrieu JP, Gagnon J, Ruigrok RWH. Nomenclature committee of the international union of biochemistry and molecular biology (NC-IUBMB). Eur J Biochem. 1999;264:610–50.View ArticleGoogle Scholar
- Sierro N, Battey JN, Ouadi S, Bovet L, Goepfert S, Bakaher N, Peitsch MC, Ivanov N V. Reference genomes and transcriptomes of Nicotiana sylvestris and Nicotiana tomentosiformis. Genome Biol. 2013;14:R60.View ArticlePubMedPubMed CentralGoogle Scholar
- Garzon-Martinez GA, Zhu ZI, Landsman D, Barrero LS, Marino-Ramirez L. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction. BMC Genomics. 2012;13:151.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang L, Li J, Zhao J, He C. Evolutionary developmental genetics of fruit morphological variation within the Solanaceae. Front Plant Sci. 2015;6:248.PubMedPubMed CentralGoogle Scholar
- Iseli C, Jongeneel CV, Bucher P. ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol. 1999;99:138–48.Google Scholar
- Peralta IE, Spooner DM. Granule-bound starch synthase (GBSSI) gene phylogeny of wild tomatoes (Solanum L. section Lycopersicon [Mill.] Wettst. subsection Lycopersicon). Am J Bot. 2001;88:1888–902.View ArticlePubMedGoogle Scholar
- Martins TR, Barkman TJ, Smith JF. Reconstruction of Solanaceae phylogeny using the nuclear gene SAMT. Syst Bot. 2005;30:435–47.View ArticleGoogle Scholar
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013;30:2725–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Y, Diehl A, Wu F, Vrebalov J, Giovannoni J, Siepel A, Tanksley SD. Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics. 2008;180:391–408.View ArticlePubMedPubMed CentralGoogle Scholar
- Garrison E. FreeBayes. In: Marth Lab. 2010.Google Scholar
- Collins DW, Jukes TH. Rates of transition and transversion in coding sequences since the human-rodent divergence. Genomics. 1994;20:386–96.View ArticlePubMedGoogle Scholar
- Xie F, Burklew CE, Yang Y, Liu M, Xiao P, Zhang B, Qiu D. De novo sequencing and a comprehensive analysis of purple sweet potato (Ipomoea batatas L.) transcriptome. Planta. 2012;236:101–13.View ArticlePubMedGoogle Scholar
- Mooers AØ, Holmes EC. The evolution of base composition and phylogenetic inference. Trends Ecol Evol. 2000;15:365–9.View ArticlePubMedGoogle Scholar
- Aoki K, Yano K, Suzuki A, Kawamura S, Sakurai N, Suda K, Kurabayashi A, Suzuki T, Tsugane T, Watanabe M, Ooga K, Torii M, Narita T, Shin-I T, Kohara Y, Yamamoto N, Takahashi H, Watanabe Y, Egusa M, Kodama M, Ichinose Y, Kikuchi M, Fukushima S, Okabe A, Arie T, Sato Y, Yazawa K, Satoh S, Omura T, Ezura H, et al. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics. BMC Genomics. 2010;11:210.View ArticlePubMedPubMed CentralGoogle Scholar
- Crookshanks M, Emmersen J, Welinder KG, Nielsen KL. The potato tuber transcriptome: analysis of 6077 expressed sequence tags. FEBS Lett. 2001;506:123–6.View ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Lester RN. Evolutionary relationships of tomato, potato, pepino, and wild species of Lycopersicon and Solanum. In: Hawkes JG, Lester RN, Nee M, Estrad N, editors. Solanaceae III Taxonomy, Chem Evol Kew Linn Soc London. 1991. p. 283–301.Google Scholar
- Butelli E, Titta L, Giorgio M, Mock H-P, Matros A, Peterek S, Schijlen EGWM, Hall RD, Bovy AG, Luo J, Martin C. Enrichment of tomato fruit with health-promoting anthocyanins by expression of select transcription factors. Nat Biotech. 2008;26:1301–8.View ArticleGoogle Scholar
- Clé C, Hill LM, Niggeweg R, Martin CR, Guisez Y, Prinsen E, Jansen MAK. Modulation of chlorogenic acid biosynthesis in Solanum lycopersicum; consequences for phenolic accumulation and UV-tolerance. Phytochemistry. 2008;69:2149–56.View ArticlePubMedGoogle Scholar
- Niggeweg R, Michael AJ, Martin C. Engineering plants with increased levels of the antioxidant chlorogenic acid. Nat Biotechnol. 2004;22:746–54.View ArticlePubMedGoogle Scholar
- Prohens J, Sánchez MC, Rodríguez-Burruezo A, Cámara M, Torija E, Nuez F. Morphological and physico-chemical characteristics of fruits of pepino (Solanum muricatum), wild relatives (S. caripense and S. tabanoense) and interspecific hybrids. Implications in pepino breeding. Eur J Hortic Sci. 2005;70:224.Google Scholar
- Blanca J, Montero-Pau J, Sauvage C, Bauchet G, Illa E, D’iez MJ, Francis D, Causse M, van der Knaap E, Cañizares J. Genomic variation in tomato, from wild ancestors to contemporary breeding accessions. BMC Genomics. 2015;16:1–19.View ArticleGoogle Scholar
- Rong J, Lammers Y, Strasburg JL, Schidlo NS, Ariyurek Y, de Jong TJ, Klinkhamer PGL, Smulders MJM, Vrieling K. New insights into domestication of carrot from root transcriptome analyses. BMC Genomics. 2014;15:895.View ArticlePubMedPubMed CentralGoogle Scholar
- Swanson-Wagner R, Briskine R, Schaefer R, Hufford MB, Ross-Ibarra J, Myers CL, Tiffin P, Springer NM. Reshaping of the maize transcriptome by domestication. Proc Natl Acad Sci. 2012;109(29):11878–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Feng Z, Zhang B, Ding W, Liu X, Yang D-L, Wei P, Cao F, Zhu S, Zhang F, Mao Y. Efficient genome editing in plants using a CRISPR/Cas system. Cell Res. 2013;23:1229–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Park T, Vleeshouwers V, Jacobsen E, Van Der Vossen E, Visser RGF. Molecular breeding for resistance to Phytophthora infestans (Mont.) de Bary in potato (Solanum tuberosum L.): a perspective of cisgenesis. Plant Breed. 2009;128:109–17.View ArticleGoogle Scholar
- Hedges SB, Dudley J, Kumar S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006;22:2971–2.View ArticlePubMedGoogle Scholar
- Zhai L, Xu L, Wang Y, Cheng H, Chen Y, Gong Y, Liu L. Novel and useful genic-SSR markers from de novo transcriptome sequencing of radish (Raphanus sativus L.). Mol Breed. 2014;33:611–24.View ArticleGoogle Scholar
- Ahn Y-K, Tripathi S, Kim J-H, Cho Y-I, Lee H-E, Kim D-S, Woo J-G, Yoon M-K. Microsatellite marker information from high-throughput next-generation sequence data of Capsicum annuum varieties Mandarin and Blackcluster. Sci Hortic. 2014;170:123–30.View ArticleGoogle Scholar
- Metzgar D, Bytof J, Wills C. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000;10:72–80.PubMedPubMed CentralGoogle Scholar
- Li Y, Korol AB, Fahima T, Beiles A, Nevo E. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol. 2002;11:2453–65.View ArticlePubMedGoogle Scholar
- Varshney RK, Graner A, Sorrells ME. Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 2005;23:48–55.View ArticlePubMedGoogle Scholar
- Anderson GJ. The variation and evolution of selected species of Solanum section Basarthrum. Brittonia. 1975;27:209–22.View ArticleGoogle Scholar
- Murray BG, Hammett KRW, Grigg FDW. Seed set and breeding system in the pepino Solanum muricatum Ait., Solanaceae. Sci Hortic (Amsterdam). 1992;49:83–92.View ArticleGoogle Scholar
- Perez-de-Castro AM, Vilanova S, Canizares J, Pascual L, Blanca JM, Diez MJ, Prohens J, Pico B. Application of genomic tools in plant breeding. Curr Genomics. 2012;13:179–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Ruiz JJ, Prohens J, Nuez F. “Sweet Round” and “Sweet Long”: Two pepino cultivars for Mediterranean, climates. HortSci. 1997;32:751–2.Google Scholar
- FASTAQC [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/]. Accessed 29 Apr 2016.
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.View ArticlePubMedPubMed CentralGoogle Scholar
- Blanca JM, Pascual L, Ziarsolo P, Nuez F, Cañizares J. ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 2011;12:1–8.View ArticleGoogle Scholar
- Conesa A, Gotz S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics. 2008;2008:619832.View ArticlePubMedPubMed CentralGoogle Scholar
- Lippman ZB, Cohen O, Alvarez JP, Abu-Abied M, Pekker I, Paran I, Eshed Y, Zamir D. The making of a compound inflorescence in tomato and related nightshades. PLoS Biol. 2008;6:e288.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang Y, Hu Z, Chu G, Huang C, Tian S, Zhao Z, Chen G. Anthocyanin accumulation and molecular analysis of anthocyanin biosynthesis-associated genes in eggplant (Solanum melongena L.). J Agric Food Chem. 2014;62:2906–12.View ArticlePubMedGoogle Scholar
- Kohara A, Nakajima C, Hashimoto K, Ikenaga T, Tanaka H, Shoyama Y, Yoshida S, Muranaka T. A novel glucosyltransferase involved in steroid saponin biosynthesis in Solanum aculeatissimum. Plant Mol Biol. 2005;57:225–39.View ArticlePubMedGoogle Scholar
- Gramazio P, Prohens J, Plazas M, Andujar I, Herraiz FJ, Castillo E, Knapp S, Meyer RS, Vilanova S. Location of chlorogenic acid biosynthesis pathway and polyphenol oxidase genes in a new interspecific anchored linkage map of eggplant. BMC Plant Biol. 2014;14:350–014–0350–z.View ArticleGoogle Scholar
- Klann E, Yelle S, Bennett AB. Tomato fruit Acid invertase complementary DNA: nucleotide and deduced amino Acid sequences. Plant Physiol. 1992;99:351–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Lam Cheng KL. Golden2--like (GLK2) Transcription Factor: Developmental Control of Tomato Fruit Photosynthesis and Its Contribution to Ripe Fruit Characteristics. Davis: University of California; 2013.Google Scholar
- Mott R. EST_GENOME: A program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci. 1997;13:477–8.PubMedGoogle Scholar
- EMBOSS [http://www.bioinformatics.nl/emboss-explorer/]. Accessed 29 Apr 2016.
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–8.View ArticlePubMedGoogle Scholar
- Abajian C. Sputnik. University of Washington Department of Molecular Biotechnology. 1994.[http://wheat.pw.usda.gov/ITMI/EST-SSR/LaRota]. Accessed 29 Apr 2016.