De novotranscriptome characterization of Vitis vinifera cv. Corvina unveils varietal diversity
- Luca Venturini†1,
- Alberto Ferrarini†1,
- Sara Zenoni1,
- Giovanni Battista Tornielli1,
- Marianna Fasoli1,
- Silvia Dal Santo1,
- Andrea Minio1,
- Genny Buson1,
- Paola Tononi1,
- Elisa Debora Zago1,
- Gianpiero Zamperin1,
- Diana Bellin1,
- Mario Pezzotti1 and
- Massimo Delledonne1Email author
© Venturini et al.; licensee BioMed Central Ltd. 2013
Received: 11 October 2012
Accepted: 11 January 2013
Published: 18 January 2013
Plants such as grapevine (Vitis spp.) display significant inter-cultivar genetic and phenotypic variation. The genetic components underlying phenotypic diversity in grapevine must be understood in order to disentangle genetic and environmental factors.
We have shown that cDNA sequencing by RNA-seq is a robust approach for the characterization of varietal diversity between a local grapevine cultivar (Corvina) and the PN40024 reference genome. We detected 15,161 known genes including 9463 with novel splice isoforms, and identified 2321 potentially novel protein-coding genes in non-annotated or unassembled regions of the reference genome. We also discovered 180 apparent private genes in the Corvina genome which were missing from the reference genome.
The de novo assembly approach allowed a substantial amount of the Corvina transcriptome to be reconstructed, improving known gene annotations by robustly defining gene structures, annotating splice isoforms and detecting genes without annotations. The private genes we discovered are likely to be nonessential but could influence certain cultivar-specific characteristics. Therefore, the application of de novo transcriptome assembly should not be restricted to species lacking a reference genome because it can also improve existing reference genome annotations and identify novel, cultivar-specific genes.
KeywordsTranscriptomics RNA-Seq de novo assembly Grape Varietal diversity
Grapevine is the most cultivated fruit crop in the world, covering approximately 7.8 million hectares in 2011 and producing more than 67 million tons of berries (http://www.fao.org). The modern grapevine (Vitis vinifera sbs. sativa) was domesticated 8000 years ago in the Southern Caucasus region  from its wild ancestor V.vinifera sbs. sylvestris. The grapevine genome is highly polymorphic  and vegetative propagation is preferred over seed germination because the extensive heterozygosity results in erratic yields and produces offspring with diverse characteristics . At least 14,000 grapevine varieties have been cataloged  but breeding is restricted to a relatively small number of cultivars, such as Pinot and Traminer . However, the global demand for high-quality wines is increasing, awakening interest in the use of local cultivars to create premium products and in the molecular analysis of their prized organoleptic traits .
The genetic analysis of grapevine has been hindered by the long generation time (3 years), extensive heterozygosity and phenotypic plasticity. Even berries of the same cultivar may differ markedly in their properties because of environmental factors, from which arises the concept of terroir in viticulture . Detailed characterization of the genome is therefore necessary to separate the genetic and environmental components underlying the phenotype. Grapevine was the first fruit species to be sequenced, but the reference genome is that of a near-homozygous and non-cultivated accession, PN40024 . This resource has facilitated the detailed phylogenetic analysis of specific gene families [9, 10], the creation of SNP catalogs which can be used as genetic markers for cultivar differentiation  and the development of microarrays for transcriptomic analysis [12, 13], but recent deep sequencing experiments have shown that relying on a single reference genome may underestimate the variability among different genotypes. The comparison of Asian and African human genomes with the reference sequence has revealed 5 Mb of novel sequence in each assembly containing population-specific coding regions . Furthermore, the assembly of genomes and transcriptomes from 18 different Arabidopsis thaliana ecotypes led to the identification of 221 genes that are not present in the Col-0 reference . These data indicate that the comparison of polymorphisms in a reference genotype may not represent the full genetic diversity of a species, and this is particularly relevant in grapevine because V. vinifera is more genetically diverse than both Homo sapiens and A. thaliana [2, 16]. These challenges could be addressed by de novo sequencing and annotating each grapevine cultivar. De novo assembly of a complex genome is however hindered by repetitive DNA sequences and low complexity regions. In order to address these problems multiple paired-end and mate-pair libraries with different insert sizes are necessary but require consistent efforts to be produced . Moreover, gene annotation of de novo assembled genomes is a time and labor intensive task which includes both the use of gene prediction and annotation methods and a lengthy manual curation.
An alternative to whole-genome sequencing is the direct reconstruction of the transcriptome by de novo assembly. The potential of this approach has been demonstrated in animals lacking reference genome sequences such as the coral Acropora millepora , the whitefly Bemisia tabaci , the butterfly Melitaea cinxia , the mosquito Anopheles funestus  and the planarian Schmidtea mediterranea [22, 23]. Transcript sequences generally lack the repetitive sequences that complicate genome assembly.
De novo transcriptome assembly was used to characterize the varietal diversity of V. vinifera cv. Corvina, an indigenous cultivar of the Verona area in north Italy which has recently been subject to comprehensive transcriptomic, proteomic and metabolomic analysis [24–28], including an RNA-Seq based expression profiling of berry development . The direct comparison of potential polymorphisms by projection onto the reference genome indicated that up to one third of the PN40024 proteins could be affected by disruptive mutations, suggesting that a full reconstruction and re-annotation of Corvina genes is required. The de novo transcriptome assembly strategy allowed us to identify 19,517 novel splice isoforms among 9463 known genes, and 2321 potentially novel protein-coding genes in the raw PN40024 reads but not in the assembled sequence. We also identified 180 apparently private Corvina genes, 27% of which are modulated during berry development and withering.
Sequencing the Corvina transcriptome and the characterization of sequence variation
We identified 646,982 polymorphisms between the Corvina and PN40024 sequences, including 137,871 insertion/deletion polymorphisms (indels) and 509,111 single nucleotide polymorphisms (SNPs). Approximately 13% of the indels and 29% of the SNPs were located in regions annotated as coding sequences in the V1 reference annotation (http://genomes.cribi.unipd.it/; Figure 1c,d). We further filtered this dataset with a minimum frequency threshold (≥0.75) of the alternative polymorphism calculated on the total of read pairs aligning on the region This final set contained 67,281 putative mutations, of which 59,064 are SNPs and 8217 indels. Putative mutations were annotated to determine their potential effect on the encoded proteins (Additional file 2: Figure S1). A simple projection of the polymorphisms onto the PN40024 annotation showed that 5808 Corvina proteins were potentially changed by substitutions and 579 proteins were potentially destroyed by frameshifts, premature stop codons, stop-codon mutations and mutations at splice sites (Figure 1e). These data are reminiscent of the situation reported in different A. thaliana ecotypes supporting earlier claims that reference annotations cannot be transferred reliably to any cultivar/accession without the reassembly and re-annotation of the genome and/or the transcriptome .
Reconstruction of the Corvina gene catalog and comparison with reference annotations
General statistics of grape V1 annotations and contig assemblies
Grape V1 annotation
Number of sequences
Comparison of contig mapping coordinates with grapevine annotations, public ESTs and cDNAs
Contigs classification (12x assembly; V1 annotation)
Contigs assigned to known genes
Contigs assigned to novel genes
Contigs assigned to private genes
The coordinates of the mapped putative transcripts were then compared with the current V1 annotation (http://genomes.cribi.unipd.it/). The genomic coordinates of the final set of 40,610 putative transcripts corresponded to 17,425 annotated gene loci, 554 known repetitive regions and 4488 putative novel gene loci (Figure 2c; Additional file 3: Table S2; Additional file 2: Figure S2). A similar number of putative novel gene loci (4431) was obtained by a reference alignment and prediction approach starting from the same dataset  (data not shown).
By comparing our putative transcripts to the corresponding known genes, we identified 3788 adjacent genes that merged into 1677 putative loci. We compared our 1677 putative loci with the group of 1429 genes that were recently shown to be erroneously split in the V1 annotation , finding 346 of the loci in common between the two studies. Based on genome alignments of reconstructed contigs, we compared their structures with known transcripts and were able to identify 5383 de novo reconstructed transcripts with an exon-intron structure identical to known PN40024 transcripts. Finally, the analysis allowed us to identify 19,517 novel isoforms representing 9463 genes annotated in the PN40024 reference sequence, 7902 of which generated multiple isoforms (Additional file 2: Figure S2).
Identification of Corvina private genes
Dynamic gene expression during Corvina berry development and withering
We assessed the biological significance of the novel genes and Corvina private genes by analyzing berry samples at two developmental stages and one withering stage. Three biological replicates were collected at each stage and were processed to generate indexed RNA-seq libraries, which were sequenced using an Illumina HiSeq 1000. We obtained 147 million 50-bp paired-end reads (14.7 Gb), comprising 46.5 million reads representing the post fruit-set (PFS) stage, 34.7 million reads from the pre-ripening (PR) stage and 65.9 million reads from the post-harvest withering (PHWII) stage (Additional file 4: Table S3).
Gene expression levels at all three stages were quantified using the PN40024 genome and the Corvina private genes as reference sequences. The abundance of each transcript was expressed as fragments per kilobase of exon model per million mapped reads (FPKM) as implemented in Cufflinks . A gene was considered to be expressed if the FPKM 95% confidence interval lower boundary was greater than zero and if the FPKM value was higher than 0.001. We detected 23,538 expressed genes in at least one of the three samples, including 1226 of the 2353 novel genes and 108 of the 180 Corvina private genes. This represented 72% of the grapevine transcriptome (including novel and private genes). Although more genes were detected in the berry samples than in the reconstructed pool, 5264 genes expressed in the pooled samples were not represented in any of the three berry samples.
Interestingly, 50 of the 180 Corvina private genes were differentially expressed and 15 (8.3%) were induced specifically during withering. Similarly, 524 of the 2353 novel loci were differentially expressed in at least one sample. Genes were grouped according to their expression profiles by transforming the expression data into moderate fold change estimates using a variance stabilizing transformation . Differentially-expressed genes were thus grouped into four clusters according to their expression profiles, comprising those repressed at PR and PHWII or exclusively at PHWII but not PFS (cluster 1), those transiently repressed at PR (cluster 2), those transiently induced at PR (cluster 3) and those induced at the PR and PHWII stages or at the PHWII stage alone (cluster 4) (Figure 7a).
To gain insight into the functions of the modulated genes in each cluster, each group was enriched with GO terms (FDR <5%) associated with 10,842 of the 13,866 differentially-expressed genes (Additional file 5: Table S4). Looking specifically at genes induced during the PHWII phase (24.79% of the total) given that withering is peculiar to wines produced from this cultivar, the statistics show an enrichment for genes involved in stress responses such as programmed cell death and in the synthesis of flavonoids, as previously described .
Sequence diversity is usually described in comparison to a reference genome [38, 39]. Given the high degree of genetic diversity among plant cultivars, this approach might fail to recognize highly polymorphic regions and will not detect the presence or absence of genes residing in private (cultivar-specific) regions of the genome . Whole-genome sequencing and re-annotation is therefore recommended for each variety, but in predominantly heterozygous species such as grapevine the sequence diversity would make contig assembly a daunting and resource-intensive task .
When a reference genome is available, genes and transcript isoforms are built de novo by mapping RNA-seq reads, but this does not solve the problem of hypervariable sequences and private genes [34, 40]. However, the de novo assembly strategy does not depend on the genome and has been applied successfully to reconstruct the transcriptomes of non-model species lacking reference genomes.
We have demonstrated the feasibility cDNA sequencing by RNA-seq for the analysis of varietal diversity between a local grapevine cultivar (Corvina) and the PN40024 reference genome without genomic data. The availability of a reference genome allows the reconstruction procedure to be validated and highlights the diversity between the two genomes.
Improved annotation of the reference genome
The latest grapevine genome annotation (v1 produced by CRIBI; http://genomes.cribi.unipd.it/) comprises 29,971 genes identified by a combination of ab initio prediction and cDNA mapping. By comparing this annotation to the transcripts we identified, we found our method had detected 51% of the annotated genes, the remainder probably representing tissue/condition-specific transcripts that were not present in our pooled samples. The genes overlapping our sample and the v1 annotation have a higher expression level than the v1-specific genes (mean = 35.67 vs 14.31 FPKM, median= 13.03 vs 1 FPKM). These data indicate that many of the v1 annotations undetected using our method were missed because of the paucity of sequencing reads generated from the corresponding loci. A large number (2249) of potential protein-coding genes were detected in the non-annotated parts of the genome. A recent comparison of the 8x, 12x v0 and 12x v1 annotations showed that 6089 genes present in either the 8x or 12x v0 assemblies were not present in the v1 annotation . Interestingly, 1171 of our 2353 potential protein-coding genes (72 of which are only present in raw reads) were represented in the 8x or 12x v0 annotations. Current annotation is therefore incomplete and insufficient to describe the full gene space of a cultivar other than the reference Pinot Noir clone. Our method provided experimental support for 72 protein-coding genes missing from the final assembly because they were excluded from the 12x consensus, and for 2249 additional genes that appear to have been missed in the v1 annotation. Novel genes excluded from the v1 annotation appear to have meaningful biological roles, including those modulated during berry ripening and/or withering e.g. eight disease-resistance genes (Novel_1755, Novel_2241, Novel_0853, Novel_2382, Novel_2375, Novel_1428, Novel_2207, Novel_1998), two stress-inducible genes (Novel_4520 and Novel_4511), a heat shock protein 70 gene (Novel_4478) and a senescence-associated gene (Novel_1324). The expression of the disease-resistance genes generally declined during berry development and withering (clusters 1 and 2) suggesting their role is to protect the berry from pathogens and pests during early development. In contrast, the stress-inducible genes and heat shock protein gene were induced during ripening and withering, supporting a protective role against abiotic stress during the accumulation of sugars and secondary metabolites as previously reported [25, 41, 42]. The RNA-seq data therefore provide a comprehensive insight into the biologically-relevant landscape of gene expression during berry development and ripening.
Our method not only offers a way to annotate previously uncharacterized genes but also improves the annotation of known genes by helping to define their boundaries more robustly and to identify splice variants. Our data indicate that up to 11% of the genes in the v1 annotation are split incorrectly, similar to the error rate in other annotated plant genomes . A previous in silico analysis identified 1429 instances of erroneously split genes in the v1 annotation . We also detected 462 of these genes and our analysis suggested that 75% of them were split incorrectly in the v1 annotation. Furthermore, our data resulted in the 3′ and 5′ extension of nearly 90% of the genes we detected compared to the boundaries in the v1 annotation, indicating that the untranslated regions were longer than previously reported, using in silico prediction methods . Our approach may therefore provide a useful complement to ab initio gene prediction methods to establish gene boundaries and define UTRs. Finally, our de novo transcriptome assembly method detected an average of 1.75 transcripts per locus, in line with previous reports using a reference-guided assembly of grapevine transcripts (1.25 transcripts per locus ). Although beyond the scope of our investigation, the de novo reconstruction indicated alternative splice variants for 9463 loci, providing a much more exhaustive description of the grapevine transcriptome compared to in silico predictions. The number of studies which try to describe alternative splicing events in plants are still scarce, however many recent studies point to an extensive diffusion of the phenomenon and to its important role in modulating gene expression and stress response ([46–48]). Our results indicate that the transcriptional landscape in Vitis is more complex than previously thought and therefore warrants further investigation.
Expression of Corvina private genes during berry development
Recent data from the deep sequencing of human individuals and Arabidopsis ecotypes revealed portions of genome that are not shared among all genotypes and the reference genome [14, 15]. Interestingly, the novel genomic sequences included a set of protein-coding genes (private genes) potentially contributing to the intra-species variability. Similarly, we detected 180 putative protein-coding genes with a high coding potential or matches to plant ESTs that represent potential Corvina private genes.
We identified 146 private genes expressed in at least one berry-sampling phase, 50 of which were differentially expressed between samples, and these could represent a group of genes that directly contribute to the specific characteristics of the Corvina berry. Some of these private genes could have been selected by ancient breeders looking for particular berry quality traits, such as the ability to withstand the lengthy drying phase (rasinate) required to make passito wines (straw wines) such as Amarone and Recioto. For example, we identified a heat shock protein gene (Private_087) and a stress-inducible gene (Private_101) induced during ripening, consistent with the ability of Corvina berries to undergo dehydration for up to 100 days [26, 27]. Furthermore, we detected the induction of genes involved in translation and protein metabolism during withering, including three ribosomal proteins (Private_068, Private_108 and Private_116), three elongation factors (Private_166, Private_164 and Private_152), ubiquitin (Private_122), a 5-methyltetrahydropteroyltriglutamate-homocysteine methyltransferase (Private_094) and a DNA-binding protein (Private_171). This supports cDNA-AFLP data indicating the induction of genes with similar functions during withering .
Thirty-three of the Corvina private genes matched homologs in other grape varieties but not the reference genome. This is expected because the dispensable part of the genome may be partly shared among different cultivars and only a few genes may be truly unique to a particular accession . For example, we found two Flowering Locus T (FT) genes (Private_100 and Private_113) the first corresponding to the previously-described VvFT gene found in the cultivars Cabernet Sauvignon  and Tempranillo . At least six members of the FT/TFL1 gene family were identified in the Tempranillo genome, including VvFT which appears to be the ortholog of Arabidopsis FT and therefore induces precocious flowering when expressed in Arabidopsis, consistent with reported expression patterns associated with seasonal floral induction in latent buds and with the development of inflorescences, flowers and fruits . There is no evidence for the presence of classical floral regulatory pathways in grapevine, and the expression profile of VvFT suggests that it only partially corresponds to the florigen role of Arabidopsis FT. We also observed the expression of VvFT during berry formation, suggesting an additional and uncharacterized role of this gene during early berry formation.
We were able to reconstruct a substantial part of the grapevine transcriptome (51% of known genes), improve the annotations of known genes by defining their boundaries and splice variants, add 2353 apparently novel genes representing non-annotated or unassembled regions of the reference genome, and also add 180 potentially Corvina-specific private genes that are not present in the reference sequence. Our results are consistent with data from other plant species showing that different genotypes share a common majority of genes but also possess smaller sets of private genes that are likely to be dispensable, that contribute to intra-specific variation and that produce unique, variety-dependent characteristics [15, 51]. Given the substantial divergence among registered ecotypes and cultivars [52, 53], we argue that in plant biology a de novo transcriptome assembly approach should not be limited to species lacking reference genome (e.g. [21, 54, 55]) because it can improve the annotation of diverse cultivars and identify cultivar-specific private genes without embarking on a labor-intensive reconstructing of the entire genome.
To cover most of the grapevine transcriptome, we created a pool of RNA samples representing different organs and developmental stages of V. vinifera cv Corvina (clone 48). We selected 45 of the 54 samples described by Fasoli et al.  and combined 1 μg of total RNA from each sample (Additional file 1: Table S1). Berries were collected from a vineyard in Verona (Italy) at three time points: post-fruit set (PFS), mid-ripening (MR) and mid-withering approximately 2 months post-harvest (PHW II). At the PFS stage (35 days after flowering (DAF); E-L 32), the berries were >7 mm in diameter and touching, whereas at the MR stage (84 DAF; E-L 36) they had reached their final size and the sugar content was 15.5° Brix. At the PHW II stage, the berry weight was 69.7% of the weight at harvest and the sugar content was 25.9° Brix (ripe values). The sugar content (mean Brix degree value) was recorded at each time point using a PR-32 bench refractometer (Atago Co., Ltd, Tokyo, Japan).
Total RNA was isolated from ~200 mg of the ground berry pericarp using the Spectrum™ Plant Total RNA kit (Sigma-Aldrich, St. Louis, MO) following the manufacturer’s protocol. RNA quality and quantity were determined using a Nanodrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE) and a Bioanalyzer Chip RNA 7500 series II (Agilent, Santa Clara, CA).
Total RNA samples were assessed for quality using an RNA 6000 Nano Kit (Agilent, Wokingham, UK) and 2.5-μg aliquots were used to isolate poly(A) mRNA for the preparation of a non-directional Illumina RNA-seq library using the TruSeq RNA Sample Prep Kit v2 (Illumina Inc., San Diego, CA, USA). The quality of the library was checked with a High Sensitivity DNA Kit (Agilent, Wokingham, UK). Libraries were sequenced with an Illumina HiSeq 1000 sequencer (Illumina Inc., San Diego, CA, USA) and 100-bp paired-end sequences were generated.
Pre-processing of reads
Low-quality reads (> 50 bases with quality < 7 or > 10% undetermined bases) and putative PCR duplicate reads were removed and Illumina TruSeq adapter sequences were clipped. Low-quality bases at read ends were trimmed (minimum quality 16, minimum read length 50 bp) with cutadapt (http://code.google.com/p/cutadapt/).
Mapping and polymorphism detection
The mean read insert size and standard deviation were estimated using a BWA alignment  with default parameters of a random sample of one million reads against the 12x PN42004 grapevine genome. The reads were then aligned using TopHat v1.3.0 , giving as parameters the derived mean read spacer size (110 bp) and its standard deviation (189 bp). Polymorphisms were called using FreeBayes  with default parameters and further filtered for read depth ≥ 5 and a polymorphism call quality ≥ 80 with vcftools . A custom script (vcf_filter.py) was used to select putative mutations based on a frequency of the alternative alleles (≥ 0.75) calculated on the total number of read pairs aligned on the region. The putative mutations were annotated using the Variant Effect Predictor on EnsEMBL version 64 .
De novo assembly was carried out using the Velvet/Oases package, using a k-mer value of 41, a minimum contig length of 200, an insert length of 310 and a standard deviation of 189 . Multiple k-mers would potentially have given a better sensitivity, especially against low expressed genes. However, the step of k-merge introduces uncertainties (Ns) in the sequences which we preferred to avoid when mapping reads against the contigs to evaluate expression levels. For this reason, a more “conservative” approach was adopted, using just one k-mer optimized by running different assemblies (from 21 to 51). We chose 41 as a tradeoff value, as using higher k-mers improved the N50 and average length by a minimal percentage (<5% passing from 41 to 47; data not shown) while avoiding a loss in sensitivity. As Oases does not cluster assembled contigs if used with only one k-mer, we used CdHit to cluster the Oases contigs with identity >90% and a coverage of 100% .
Contigs were aligned against the PN40024 genome  using GMAP under the following parameters: -B 4 -t 6 -x 30 -f 2 -t 6 . Sequence alignments with BLAST were carried out using a threshold E-value of 10-5 and a minimum alignment coverage of 20% of the query sequence.
Comparison of annotations
The Tuxedo suite programs Cufflinks 1.4.1 and Cuffcompare (default parameters)  were used to annotate contigs against the current V1 grapevine annotation (http://genomes.cribi.unipd.it/) . Comparisons with the 8x V0 annotation (http://www.genoscope.cns.fr/spip/Vitis-vinifera-whole-genome.html)  and cDNAs, ESTs and UniProt sequences were carried out with the intersectBed program from the BedTools suite .
Coding potential estimation
Differential gene expression analysis
Reads were aligned against the PN40024 genome and the putative new grape transcripts, using TopHat v1.4 . We calculated the FPKM (fragments per kilobase of exon per million fragments mapped) expression values of known genes, new genomic loci, and sequences outside the reference genome using Cufflinks v1.2 with default parameters [34, 40]. We used the R package DESeq and its included script  to obtain raw read counts from the alignments and to identify differentially-expressed genes (parameters: FDR ≤ 5%, lfc ≥ 1).
All next-generation sequencing data are available in the Sequence Reads Archive (SRA) with accession number SRA055265. Contigs assembled from Illumina reads have been submitted to NCBI TSA, under the accession codes KA133930 – KA174709.
This work was supported by Fondazione Cariverona (Completamento e attività del Centro di Genomica Funzionale Vegetale), Verona, Italy, and by the Valorizzazione dei Principali Vitigni Autoctoni Italiani e dei loro Terroir (Vigneto) project funded by the Italian Ministry of Agricultural and Forestry Policies.
- McGovern PE: Ancient Wine: The Search for the Origins of Viniculture. 2003, Princeton: Princeton University Press, 365-
- Lijavetzky D, Cabezas JA, Ibáñez A, Rodríguez V, Martínez-Zapater JM: High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology. BMC Genom. 2007, 8: 424-10.1186/1471-2164-8-424.View Article
- Forneck A, Benjak A, Rühl E: Grapevine (Vitis ssp.): example of clonal reproduction in agricultural important plants. Lost Sex The Evolutionary Biology of Parthenogenesis. Edited by: Dijk P, Martens K, Schön I. 2009, Netherlands: Springer, Dordrecht, 581-598.
- Alleweldt G, Spiegel-Roy P, Reisch B: Grape (Vitis). Genetic Resources of Temperate Fruit, Acta Horticulturae 290. Edited by: Moore JN, Ballington JR Jr. 1991, Leuven, Belgium: International Society for Horticultural Science (ISHS), 291-330.
- Myles S, Boyko AR, Owens CL, Brown PJ, Grassi F, Aradhya MK, Prins B, Reynolds A, Chia J-M, Ware D, Bustamante CD, Buckler ES: Genetic structure and domestication history of the grape. Proc Natl Acad Sci USA. 2011, 108: 3530-3535. 10.1073/pnas.1009363108.PubMed CentralView ArticlePubMed
- Bisson LF, Waterhouse AL, Ebeler SE, Walker MA, Lapsley JT: The present and future of the international wine industry. Nature. 2002, 418: 696-699. 10.1038/nature01018.View ArticlePubMed
- Vaudour E: The quality of grapes and wine in relation to geography: notions of terroir at various scales. J Wine Res. 2002, 13: 117-141. 10.1080/0957126022000017981.View Article
- Jaillon O, Aury J-M, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, et al: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-467. 10.1038/nature06148.View ArticlePubMed
- Matus JT, Aquea F, Arce-Johnson P: Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biol. 2008, 8: 83-10.1186/1471-2229-8-83.PubMed CentralView ArticlePubMed
- Zhang Y, Mao L, Wang H, Brocker C, Yin X, Vasiliou V, Fei Z, Wang X: Genome-wide identification and analysis of grape aldehyde dehydrogenase (ALDH) gene superfamily. PLoS One. 2012, 7: e32153-10.1371/journal.pone.0032153.PubMed CentralView ArticlePubMed
- Myles S, Chia J-M, Hurwitz B, Simon C, Zhong GY, Buckler E, Ware D: Rapid genomic characterization of the genus vitis. PLoS One. 2010, 5: e8219-10.1371/journal.pone.0008219.PubMed CentralView ArticlePubMed
- Bellin D, Ferrarini A, Chimento A, Kaiser O, Levenkova N, Bouffard P, Delledonne M: Combining next-generation pyrosequencing with microarray for large scale expression analysis in non-model species. BMC Genom. 2009, 10: 555-10.1186/1471-2164-10-555.View Article
- Fasoli M, Dal Santo S, Zenoni S, Tornielli GB, Farina L, Zamboni A, Porceddu A, Venturini L, Bicego M, Murino V, Ferrarini A, Delledonne M, Pezzotti M: The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program. Plant Cell. 2012, 24: 3489-3505. 10.1105/tpc.112.100230.PubMed CentralView ArticlePubMed
- Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, Qian W, Ren Y, Tian G, Li J, Zhou G, Zhu X, Wu H, Qin J, Jin X, Li D, Cao H, Hu X, Blanche H, Cann H, Zhang X, Li S, Bolund L, Kristiansen K, Yang H, Wang J, Wang J: Building the sequence map of the human pan-genome. Nat Biotechnol. 2010, 28: 57-63. 10.1038/nbt.1596.View ArticlePubMed
- Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT, Kahles A, Bohnert R, Jean G, Derwent P, Kersey P, Belfield EJ, Harberd NP, Kemen E, Toomajian C, Kover PX, Clark RM, Rätsch G, Mott R: Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature. 2011, 477: 419-423. 10.1038/nature10414.View ArticlePubMed
- Schmid KJ, Sorensen TR, Stracke R, Torjek O, Altmann T, Mitchell-Olds T, Weisshaar B: Large-scale identification and analysis of genome-wide single-nucleotide polymorphisms for mapping in Arabidopsis thaliana. Genome Res. 2003, 13: 1250-1257. 10.1101/gr.728603.PubMed CentralView ArticlePubMed
- Alkan C, Sajjadian S, Eichler EE: Limitations of next-generation genome sequence assembly. Nat Meth. 2011, 8: 61-65. 10.1038/nmeth.1527.View Article
- Meyer E, Aglyamova GV, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV: Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genom. 2009, 10: 219-10.1186/1471-2164-10-219.View Article
- Wang X-W, Luan J-B, Li J-M, Bao Y-Y, Zhang C-X, Liu S-S: De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genom. 2010, 11: 400-10.1186/1471-2164-11-400.View Article
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17: 1636-1647. 10.1111/j.1365-294X.2008.03666.x.View ArticlePubMed
- Crawford JE, Guelbeogo WM, Sanou A, Traoré A, Vernick KD, Sagnon N, Lazzaro BP: De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology. PLoS One. 2010, 5: e14202-10.1371/journal.pone.0014202.PubMed CentralView ArticlePubMed
- Adamidi C, Wang Y, Gruen D, Mastrobuoni G, You X, Tolle D, Dodt M, Mackowiak SD, Gogol-Doering A, Oenal P, Rybak A, Ross E, Sánchez Alvarado A, Kempa S, Dieterich C, Rajewsky N, Chen W: De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics. Genome Res. 2011, 21: 1193-1200. 10.1101/gr.113779.110.PubMed CentralView ArticlePubMed
- Sandmann T, Vogg MC, Owlarn S, Boutros M, Bartscherer K: The head-regeneration transcriptome of the planarian Schmidtea mediterranea. Genome Biol. 2011, 12: R76-10.1186/gb-2011-12-8-r76.PubMed CentralView ArticlePubMed
- Versari A, Parpinello GP, Tornielli GB, Ferrarini R, Giulivo C: Stilbene compounds and stilbene synthase expression during ripening, Wilting, and UV treatment in Grape cv. Corvina. J Agric Food Chem. 2001, 49: 5531-5536. 10.1021/jf010672o.View ArticlePubMed
- Zamboni A, Minoia L, Ferrarini A, Tornielli GB, Zago E, Delledonne M, Pezzotti M: Molecular analysis of post-harvest withering in grape by AFLP transcriptional profiling. J Exp Bot. 2008, 59: 4145-4159. 10.1093/jxb/ern256.PubMed CentralView ArticlePubMed
- Zamboni A, Di Carli M, Guzzo F, Stocchero M, Zenoni S, Ferrarini A, Tononi P, Toffali K, Desiderio A, Lilley KS, Pe ME, Benvenuto E, Delledonne M, Pezzotti M: Identification of putative stage-specific grapevine berry biomarkers and omics data integration into networks. Plant Physiol. 2010, 154: 1439-1459. 10.1104/pp.110.160275.PubMed CentralView ArticlePubMed
- Toffali K, Zamboni A, Anesi A, Stocchero M, Pezzotti M, Levi M, Guzzo F: Novel aspects of grape berry ripening and post-harvest withering revealed by untargeted LC-ESI-MS metabolomics analysis. Metabolomics. 2011, 7: 424-436. 10.1007/s11306-010-0259-y.View Article
- Di Carli M, Zamboni A, Pè ME, Pezzotti M, Lilley KS, Benvenuto E, Desiderio A: Two-dimensional differential in gel electrophoresis (2D-DIGE) analysis of grape berry proteome during postharvest withering. J Proteome Res. 2011, 10: 429-446. 10.1021/pr1005313.View ArticlePubMed
- Zenoni S, Ferrarini A, Giacomelli E, Xumerle L, Fasoli M, Malerba G, Bellin D, Pezzotti M, Delledonne M: Characterization of transcriptional complexity during berry development in Vitis vinifera using RNA-Seq. Plant Physiol. 2010, 152: 1787-1795. 10.1104/pp.109.149716.PubMed CentralView ArticlePubMed
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest ARR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563.View ArticlePubMed
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308: 1149-1154. 10.1126/science.1108625.View ArticlePubMed
- Zhang G, Guo G, Hu X, Zhang Y, Li Q, Li R, Zhuang R, Lu Z, He Z, Fang X, Chen L, Tian W, Tao Y, Kristiansen K, Zhang X, Li S, Yang H, Wang J, Wang J: Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res. 2010, 20: 646-654. 10.1101/gr.100677.109.PubMed CentralView ArticlePubMed
- Forcato C: Gene prediction and functional annotation in the Vitis vinifera genome. PhD Thesis. 2010, 1: 120-
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28: 516-520. 10.1038/nbt.1626.View Article
- Kong L, Zhang Y, Ye Z-Q, Liu X-Q, Zhao S-Q, Wei L, Gao G: CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007, 35: W345-W349. 10.1093/nar/gkm391.PubMed CentralView ArticlePubMed
- Grimplet J, Van Hemert J, Carbonell-Bejerano P, Díaz-Riquelme J, Dickerson J, Fennell A, Pezzotti M, Martínez-Zapater JM: Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences. BMC Res Notes. 2012, 5: 213-10.1186/1756-0500-5-213.PubMed CentralView ArticlePubMed
- Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11: R106-10.1186/gb-2010-11-10-r106.PubMed CentralView ArticlePubMed
- Wicker T, Krattinger SG, Lagudah ES, Komatsuda T, Pourkheirandish M, Matsumoto T, Cloutier S, Reiser L, Kanamori H, Sato K, Perovic D, Stein N, Keller B: Analysis of intraspecies diversity in wheat and barley genomes identifies breakpoints of ancient haplotypes and provides insight into the structure of diploid and hexaploid triticeae gene pools. Plant Physiol. 2009, 149: 258-270. 10.1104/pp.108.129734.PubMed CentralView ArticlePubMed
- Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjálmsson BJ, Nordborg M, Borevitz JO, Bergelson J: Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet. 2012, 44: 212-216. 10.1038/ng.1042.PubMed CentralView ArticlePubMed
- Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011, 12: R22-10.1186/gb-2011-12-3-r22.PubMed CentralView ArticlePubMed
- Deluc LG, Grimplet J, Wheatley MD, Tillett RL, Quilici DR, Osborne C, Schooley Da, Schlauch Ka, Cushman JC, Cramer GR: Transcriptomic and metabolite analyses of Cabernet Sauvignon grape berry development. BMC Genom. 2007, 8: 429-10.1186/1471-2164-8-429.View Article
- Pilati S, Perazzolli M, Malossini A, Cestaro A, Demattè L, Fontana P, Dal Ri A, Viola R, Velasco R, Moser C: Genome-wide transcriptional analysis of grapevine berry ripening reveals a set of genes similarly modulated during three seasons and the occurrence of an oxidative burst at vèraison. BMC Genom. 2007, 8: 428-10.1186/1471-2164-8-428.View Article
- Liang C, Mao L, Ware D, Stein L: Evidence-based gene predictions in plant genomes. Genome Res. 2009, 19: 1912-1923. 10.1101/gr.088997.108.PubMed CentralView ArticlePubMed
- Majoros WH, Korf I, Ohler U: Gene Prediction Methods. Bioinformatics. Edited by: Edwards D, Stajich J, Hansen D. 2009, New York: Springer, 99-119.View Article
- Denoeud F, Aury J-M, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, Jaillon O, Artiguenave F: Annotating genomes with massive-scale RNA sequencing. Genome Biol. 2008, 9: R175-10.1186/gb-2008-9-12-r175.PubMed CentralView ArticlePubMed
- Filichkin S, Priest HD, Givan S, Shen R, Bryant DW, Fox SE, Wong W-K, Mockler TC: Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 2010, 20: 45-58. 10.1101/gr.093302.109.PubMed CentralView ArticlePubMed
- Kalyna M, Simpson CG, Syed NH, Lewandowska D, Marquez Y, Kusenda B, Marshall J, Fuller J, Cardle L, McNicol J, Dinh HQ, Barta A, Brown JWS: Alternative splicing and nonsense-mediated decay modulate expression of important regulatory genes in Arabidopsis. Nucleic Acids Res. 2012, 40: 2454-2469. 10.1093/nar/gkr932.PubMed CentralView ArticlePubMed
- Marquez Y, Brown JW, Simpson CG, Barta A, Kalyna M: Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 2012, 22: 1184-1195. 10.1101/gr.134106.111.PubMed CentralView ArticlePubMed
- Sreekantan L, Thomas MR: VvFT and VvMADS8, the grapevine homologues of the floral integrators FT and SOC1, have unique expression patterns in grapevine and hasten flowering in Arabidopsis. Funct Plant Biol. 2006, 33: 1129-10.1071/FP06144.View Article
- Carmona MJ, Calonje M, Martínez-Zapater JM: The FT/TFL1 gene family in grapevine. Plant Mol Biol. 2007, 63: 637-650. 10.1007/s11103-006-9113-z.View ArticlePubMed
- Hansey CN, Vaillancourt B, Sekhon RS, De Leon N, Kaeppler SM, Buell CR: Maize (Zea mays L.) Genome Diversity as Revealed by RNA-Sequencing. PLoS One. 2012, 7: e33071-10.1371/journal.pone.0033071.PubMed CentralView ArticlePubMed
- Juenger TE, Sen S, Bray E, Stahl E, Wayne T, McKay J, Richards JH: Exploring genetic and expression differences between physiologically extreme ecotypes: comparative genomic hybridization and gene expression studies of Kas-1 and Tsu-1 accessions of Arabidopsis thaliana. Plant Cell Environ. 2010, 33: 1268-1284. 10.1111/j.1365-3040.2010.02146.x.View ArticlePubMed
- Zalapa JE, Price DL, Kaeppler SM, Tobias CM, Okada M, Casler MD: Hierarchical classification of switchgrass genotypes using SSR and chloroplast sequences: ecotypes, ploidies, gene pools, and cultivars. Theor Appl Genet. 2011, 122: 805-817. 10.1007/s00122-010-1488-1.View ArticlePubMed
- Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 2010, 20: 1432-1440. 10.1101/gr.103846.109.PubMed CentralView ArticlePubMed
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson D, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-652. 10.1038/nbt.1883.PubMed CentralView ArticlePubMed
- Li H, Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010, 26: 589-595. 10.1093/bioinformatics/btp698.PubMed CentralView ArticlePubMed
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMed CentralView ArticlePubMed
- Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing. 2012, 9-[http://arxiv.org/abs/1207.3907]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England). 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.View Article
- McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F: Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010, 26: 2069-2070. 10.1093/bioinformatics/btq330.PubMed CentralView ArticlePubMed
- Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012, 28: 1086-1092. 10.1093/bioinformatics/bts094.PubMed CentralView ArticlePubMed
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.View ArticlePubMed
- Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005, 21: 1859-1875. 10.1093/bioinformatics/bti310.View ArticlePubMed
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMed CentralView ArticlePubMed
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh L-SL: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004, 32: D115-D119. 10.1093/nar/gkh131.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.