Re-annotation of the woodland strawberry (Fragaria vesca) genome

Darwish, Omar; Shahan, Rachel; Liu, Zhongchi; Slovin, Janet P; Alkharouf, Nadim W

doi:10.1186/s12864-015-1221-1

Research article
Open access
Published: 27 January 2015

Re-annotation of the woodland strawberry (Fragaria vesca) genome

Omar Darwish¹,
Rachel Shahan²,
Zhongchi Liu²,
Janet P Slovin³ &
…
Nadim W Alkharouf¹

BMC Genomics volume 16, Article number: 29 (2015) Cite this article

5536 Accesses
54 Citations
5 Altmetric
Metrics details

Abstract

Background

Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1.

Results

The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae (http://www.rosaceae.org/). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We have experimentally verified one of the new gene model predictions to validate our results.

Conclusions

Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.

Background

The diploid strawberry Fragaria vesca is native to temperate regions of Eurasia and North America and is commonly known as the alpine or woodland strawberry. Due to its small size and small genome it is a versatile experimental perennial plant system and an emerging model for the Rosaceae family. The extant genome exhibits synteny with other commercially important members of the Rosaceae family such as apple (Malus domestica) and peach (Prunus persica) [1] and an ancestral F. vesca genome contributed to the genome of the octoploid dessert strawberry (F. × ananassa). Information obtained from studies of all aspects of plant growth, biochemistry, and physiology of F. vesca should be applicable to or inform studies of other Rosaceae species [1].

The time, cost, and difficulty of generating transcriptome sequences has been greatly reduced due to recent advances in sequencing technology, and RNA-Seq is now dominant over microarrays for in-depth transcriptome studies. The Illumina HiSeq 2000 platform was previously used to sequence 50 RNA-Seq libraries of 25 different F. vesca tissue types from early developing fruit at various stages, young leaves, and seedlings [2] of the 7^th generation inbred line Yellow Wonder 5AF7 (YW5AF7) [3]. The 50 libraries represent two biological replicates of 25 tissue types, and each library yielded between 12 and 40 million 51 bp, single end reads, for a total of ~70 Giga bytes of sequence data [2,4].

The genome sequence of the F. vesca inbred line Hawaii4×4 was published in 2011 [5] and the first version of the gene predictions is hosted at the Genome Database for Rosaceae (GDR) http://www.rosaceae.org/projects/strawberry_genome/v1.1/assembly [4]. In 2013, The National Center for Biotechnology Information (NCBI) published a new F. vesca annotation using the NCBI eukaryotic gene prediction tool Gnomon. Both annotations, from the GDR and the NCBI, are based on ab initio gene predictions and alignment of high confidence ESTs.

Using Bowtie2 [6] with default parameters, an average of 80.32% of the transcriptome reads from each library aligned to the genome (version 1.1), while only an average of 60.58% of these sequence reads aligned to the current gene predictions at GDR. Visualization of the mapped reads using GBrowse [7] uncovered incidences of genes of incorrect size, mis-annotated intron/exon junctions, and reads mapping to the genome that could represent non-coding transcripts, indicating that current annotation would be improved by incorporating the RNA-Seq data.

For the new annotation we used the MAKER2 annotation pipeline [8,9] to combine the following data sources: 1) de novo and reference based assemblies of the 50 RNA-Seq transcriptomes, 2) RefSeq alignments of publicly available plant transcripts, 3) current annotations from GDR, and 4) ab initio gene predictions based on analysis by SNAP, Augustus and GeneMark [10-13].

The resulting F. vesca genome re-annotation increases the number of coding regions and the total coding length across all seven linkage groups (LG1 - LG7) and the non-anchored scaffolds (LG0). This increase is due to the addition of exons to existing genes, the extension or merging of current exons, correction of intron/exon junctions, and the discovery of additional genes. Overall, this new annotation, named TowU_Fve, provides an improved annotation file and facilitates future gene isolation and identification in strawberry and other Rosaceae species.

Results and discussion

De novo transcriptome generation and assembly

The de novo assembly pipeline shown in (Figure 1A) was used to assemble the reads from 50 stage and tissue libraries, resulting in 754,400 transcripts. The average number of assembled transcripts across different samples is 30,176 (Table 1), with the minimum number of assembled transcripts found in the early stage embryo (Embryo3) and the maximum number found in the leaf (Leaf1). All de novo assembled transcripts were aligned to the F. vesca genome using GMAP [14] within PASA, with the aims of eliminating sequences not aligning to the genome and merging de novo assembled sequences to remove redundancy. An average of 88.32% of the de novo assembled transcripts from each sample aligned to the genome.

Table 1 Statistical summary of the de novo and reference based assembly results

Full size table

Reference based assembly

Next, we carried out reference-guided assembly using TopHat (http://ccb.jhu.edu/software/tophat/) and Cufflinks (http://cole-trapnell-lab.github.io/cufflinks/) (Figure 1B). TopHat aligned RNA-Seq reads to the reference genome and identified exon-exon splice junctions. Cufflinks then used the alignment generated by TopHat and GeneMark gene models to assemble a total of 1,302,739 reference based transcripts, with the average being 52,110 and the minimum and maximum number found in the same tissues as for the de novo assembly (Table 1).

F. vesca genome re-annotation pipeline

We then used the MAKER annotation pipeline (Figure 2) to generate the revised F. vesca annotation. MAKER is able to generate ab initio gene prediction using several tools within its pipeline; it identifies repeats, aligns proteins and ESTs to a genome, and automatically combines all classes of evidence data into gene annotations. Data analyzed in the MAKER pipeline included: 1) 754,400 de novo assembled transcripts from 25 samples (Table 1), each with two biological replicates; 2) trained ab initio predictions from the SNAP gene prediction tool; 3) Augustus trained datasets of Arabidopsis thaliana and Solanum lycopersicum (tomato) transcriptomes; 4) first generation F. vesca gene predictions obtained from the Genome Database for Rosaceae; 5) reference-based assemblies (1,302,739) obtained by aligning all RNA-Seq samples to the F. vesca genome version 1.1 using Cufflinks; and 6) plant reference proteins downloaded from the Universal Protein Resource (UNIPROT) database. This second-generation annotation for F. vesca is named TowU_Fve and is available at GDR (http://www.rosaceae.org).

Comparison of TowU_Fve annotation with the prior annotation (version 1.1)

The TowU_Fve annotation increased the number of coding regions by 9,139 compared to the version 1.1 annotation. This translates into over two million base pairs of extra coding DNA sequence (CDS) (Table 2).

Table 2 Statistical comparisons between first generation annotation and TowU_Fve annotation

Full size table

As summarized in Table 3, there are 2,286 newly predicted gene models (genes models that do not overlap with any of the genes from version1.1 annotation) in TowU_Fve. The number of newly identified coding exons is 6,006, and the average length of each of these exons is 183bp. The total coding length in all 7 linkage groups was found to be over 1.1 Mb.

Table 3 Statistical summary of the newly predicted gene models by TowU_Fve annotation

Full size table

The increased numbers of coding regions were discovered based on the RNA-Seq reads from different tissue libraries, as illustrated in Figure 3. For example, Figure 3A illustrates that the transcriptome data uncovered potential splice variation for gene21088, a putative receptor protein kinase, and re-annotation resulted in retention of a previously annotated intron. Figure 3B shows the addition of an exon to gene31621, a bZIP transcription factor. Figure 3C illustrates the discovery of a new gene, a putative hydrolase, absent from previous annotations.

We PCR amplified, cloned and sequenced the cDNA of gene11268 encoding a MADS box protein to confirm experimentally the TowU_Fve annotation. TowU_Fve predicts additional exons in the second intron based on the RNA-seq data (Figure 4A,B). Figure 4C illustrates the amplified coding region found by sequencing two independent cDNA clones, which contained the additional TowU_Fve predicted exons.

Because the RNA-seq data was obtained from the inbred line YW5AF7, which is different from Hawaii4×4 on which the prior genome assembly was based, there remains some possibility that TowU_Fve predictions differ from the prior annotations because of genome sequence differences between the two lines. Nevertheless, TowU_Fve represents a substantial improvement over previous annotations and annotation differences due to sequence differences between YW5AF7 and Hawaii4×4 may potentially underlie interesting phenotypic differences between these two lines.

Functional annotation of new gene models

Orthologous relationships between the new gene models predicted by the TowU_Fve annotation and other plants were evaluated by Blast2GO against the NCBI non-redundant protein database [15-17]. The goals of the GO analysis were to obtain support for the plant origin of the newly identified genes and to acquire information as to the function of these genes. The Blast2GO analysis showed that about 70% of the new gene models were found to have sequence homologies to plant proteins in GenBank (at e = 10⁻⁵) and could be assigned GO functions, or were found to contain a known protein domain using InterProScan. About 30% of the new gene models did not have any significant Blast hits or InterProScan identified domains.

Conclusions

The F. vesca genome was sequenced and subsequently released in 2010, along with a first generation annotation (version 1.0) [5], that was subsequently replaced by version 1.1. Recently published deep transcriptome sequencing has shown that these previous versions of annotations were not completely accurate, as might be expected given that they were mainly derived from ab initio predictions combined with mapping high confidence ESTs and mapping of gene deserts from transposable elements. Accurate and detailed genome annotation for diploid strawberry would be a valuable resource for Fragaria and the entire Rosaceae family. Seventy percent of the 2,286 new gene models identified by TowU_Fve have homologs in other plant species and/or have known GO ontologies. The remaining 30% potentially encode proteins of special interest to the Fragaria research community. The revised annotation, based on transcriptome sequences from a large number of different tissue samples, represents an important milestone in improving the accuracy of the diploid strawberry genome annotation. This improved genome annotation, TowU_Fve, provides a valuable resource for comparative and functional studies in flowering plants.

Methods

RNA-Seq Data

Tissue collection from the YW5AF7 cultivar of Fragaria vesca [3], RNA extraction, and sequencing were previously described in detail [2]. Briefly, plants were grown in growth chambers with 12 hours light at 25°C and 12 hours dark at 20°C. Samples were manually dissected from 25 different tissues (listed in Table 1) with two biological replicates for each tissue. cDNAs resulting from reverse transcription of RNA extracted from each of the 50 samples were sequenced on the Illumina HiSeq2000 platform using single-end chemistry with read lengths of 51 bp [2].

De novo assembly and PASA alignments

More than 600 million single end reads were assembled using a two-step de novo assembly pipeline, Figure 1A. Sequence reads of the replicates were first merged into one library, and then merged libraries were assembled using Trinity [18]. This step generated redundant transcripts and transcripts that do not align to the genome. In the second step of the de novo assembly pipeline these issues were resolved using the Genomic Mapping and Alignment Program for mRNA and EST Sequences (GMAP) [14] within Program to Assemble Spliced Alignments (PASA) [19]. All de novo assembled transcripts from the first step were aligned to the F. vesca genome version1.1. The resulting alignments were then used as one of the inputs of the re-annotation pipeline. All transcripts not aligning to the genome were thereby discarded.

Reference based assembly

The bioinformatics pipeline for the reference based assemblies of the transcriptome data is shown in Figure 1B. The first step was to align all reads from the 50 RNA-Seq libraries to the F. vesca genome version1.1 by passing each of the RNA-Seq samples through TopHat [20], resulting in 50 BAM files. TopHat aligns RNA-Seq reads to the reference genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie [20,21]. The BAM files from sample replicates were then merged using the “merge” command within SAMtools [22] reducing the number of BAM files to 25. All 25 BAM files were then sorted using the SAMtools “sort” command. The final step was to pass each of the 25 sorted BAMs to Cufflinks [23-25] to generate assemblies. Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples [23-25], and uses the alignment generated by TopHat and GeneMark gene models to assemble the reference based transcripts.

Training ab initio gene finding tools (GeneMark, Augustus, SNAP)

GeneMark models were built by training the GeneMark tool using the F. vesca genome version1.1. Augustus pre-trained datasets of A. thaliana and S. lycopersicum (tomato) were used along with the following: GeneMark models, de novo assembled transcripts, reference-guided assemblies, GDR annotation, and reference sequence proteins to run the first round of the MAKER annotation pipeline. The gene models, generated from the first round of the MAKER annotation pipeline were then used to train the SNAP tool.

MAKER annotation pipeline

All de novo assembly steps were executed on the Data Intensive Academic Grid (DIAG), a shared computational cloud that is available for academic and non-profit institutions for performing bioinformatics analyses http://diagcomputing.org/about/investigators.php.

All MAKER runs were executed on the iPlantCollaborative (http://www.iplantcollaborative.org/) cloud infrastructure service platform. We used the virtual machine instance emi-490420DC size c1.xlarge (16 CPUs, 16 GB memory and 50 GB disk) to run the MAKER annotation pipeline.

The following tools were installed on a personal computer 64-Bit CentOS 6. Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/, used to align the RNA-Seq libraries to both the F. vesca genome and GDR predictions); PASA (http://pasa.sourceforge.net/, used to align de novo assembled transcripts to the F. vesca genome); and Cufflinks (http://cole-trapnell-lab.github.io/cufflinks/, used perform the reference-guided assembly).

The MAKER annotation pipeline was utilized to automatically synthesize the following input data for a final run into gene annotations with evidence-based quality values. The data used as input into the MAKER pipeline shown in (Figure 2) are: de novo assemblies, reference-guided assemblies, reference sequence proteins, Augustus trained datasets, SNAP trained models, GeneMark models and the GDR annotation.

Experimental verification of gene11268

Stage 12 anthers were dissected from YW5AF7 flowers and total RNA was extracted using the RNeasy Plant Mini Kit (Qiagen, www.qiagen.com) in conjunction with RNase-free DNase (Qiagen). PolyA selection and cDNA synthesis were conducted using the iScript cDNA Synthesis Kit (BioRad).

The full length cDNA of gene11268 (621 bp; spanning exon 1 to 7) was PCR amplified using Phusion (NEB) polymerase with YW5AF7 stage 12 anther cDNA as template. PCR primer sequences were: F: 5’ ATG GGG AGG GGT AAG ATT GAG 3’ and R: 5’ TTA CAT TAT GTC GTG GAG ATT GGG CTG 3’. PCR conditions were as follows: 98°C 30 s, 98°C 10 s, 57°C 30 s, 72°C 30 s, repeat steps 2-4 34 times, 72°C 10 min. The resulting fragment was cloned into pCR8/GW/TOPO using a TA cloning kit (Invitrogen). Plasmid DNA from two such cDNA clones was commercially Sanger sequenced using the insertion-flanking GW1 and GW2 primers (Invitrogen).

Availability of supporting data

The TowU_Fve annotation files have been deposited at the GDR (http://www.rosaceae.org) for public release through their web portal. They can also be found at the Strawberry Genomic Resources database (SGR) (http://bioinformatics.towson.edu/strawberry/TowU_Fve_Annotation.aspx). RNA-Seq data are available at BioProject Accession number: PRJNA187983 (http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA187983) and ArrayExpress Accession Number: SRR674059 (http://www.ebi.ac.uk/ena/data/view/SRR674059).

References

Illa E, Sargent DJ, Lopez Girona E, Bushakra J, Cestaro A, Crowhurst R, et al. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family. BMC Evol Biol. 2011;11:9.
Article PubMed Central PubMed Google Scholar
Kang C, Darwish O, Geretz A, Shahan R, Alkharouf N, Liu Z. Genome-scale transcriptomic insights into early-stage fruit development in woodland strawberry Fragaria vesca. Plant Cell. 2013;25(6):1960–78.
Article CAS PubMed Central PubMed Google Scholar
Slovin J, Schmitt K, Folta M. An inbred line of the diploid strawberry Fragaria vesca f. semperflorens for genomic and molecular genetic studies in the Rosaceae. Plant Methods. 2009;5(1):15.
Article PubMed Central PubMed Google Scholar
Darwish O, Slovin JP, Kang C, Hollender CA, Geretz A, Houston S, et al. SGR: an online genomic resource for the woodland strawberry. BMC Plant Biol. 2013;13(1):223.
Article PubMed Central PubMed Google Scholar
Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL, et al. The genome of woodland strawberry (Fragaria vesca). Nat Genet. 2010;43(2):109–16.
Article PubMed Central PubMed Google Scholar
Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9.
Article CAS PubMed Central PubMed Google Scholar
Stein LD. Using GBrowse 2.0 to visualize and share next-generation sequence data. Brief Bioinform. 2013;14(2):162–71.
Article CAS PubMed Central PubMed Google Scholar
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2007;18(1):188–96.
Article PubMed Google Scholar
Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011;12(1):491.
Article PubMed Central PubMed Google Scholar
Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5(1):59.
Article PubMed Central PubMed Google Scholar
Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19 suppl 2:ii215–25.
Article PubMed Google Scholar
Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7(1):62.
Article PubMed Central PubMed Google Scholar
Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33(20):6494–506.
Article CAS PubMed Central PubMed Google Scholar
Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21(9):1859–75.
Article CAS PubMed Google Scholar
Conesa A, Götz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.
Article CAS PubMed Google Scholar
Conesa A, Götz S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics. 2008;2008:619832 1.
Article Google Scholar
Götz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–35.
Article PubMed Central PubMed Google Scholar
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
Article CAS PubMed Central PubMed Google Scholar
Haas J, Delcher L, Mount M, Wortman R, Smith Jr RK, Hannick I, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66.
Article CAS PubMed Central PubMed Google Scholar
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105–11.
Article CAS PubMed Central PubMed Google Scholar
Langmead B, Trapnell C, Pop M. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(1):R25.
Article PubMed Central PubMed Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Article PubMed Central PubMed Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5.
Article CAS PubMed Central PubMed Google Scholar
Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011;27(17):2325–9.
Article CAS PubMed Google Scholar
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78.
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgements

This work was supported by National Science Foundation Grant MCB0923913 to Z.L. J.S and N.A. All bioinformatics pipelines were executed on iPlant Atmosphere cloud service.

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Towson University, 7800 York Road, Towson, Maryland, 21252, USA
Omar Darwish & Nadim W Alkharouf
Department of Cell Biology and Molecular Genetics, 0229 Biological Science Research Building, University of Maryland, College Park, Maryland, 20742, USA
Rachel Shahan & Zhongchi Liu
USDA/ARS Genetic Improvement of Fruits and Vegetables Laboratory, BARC-W 10300 Baltimore Ave, Beltsville, Maryland, 20705, USA
Janet P Slovin

Authors

Omar Darwish
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Shahan
View author publications
You can also search for this author in PubMed Google Scholar
Zhongchi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Janet P Slovin
View author publications
You can also search for this author in PubMed Google Scholar
Nadim W Alkharouf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nadim W Alkharouf.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

OD conceived of the project, performed the bioinformatics work and wrote the manuscript. RS performed experimental verification. JS, ZL and NA supervised the work, validated results and edited the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Darwish, O., Shahan, R., Liu, Z. et al. Re-annotation of the woodland strawberry (Fragaria vesca) genome. BMC Genomics 16, 29 (2015). https://doi.org/10.1186/s12864-015-1221-1

Download citation

Received: 10 October 2014
Accepted: 05 January 2015
Published: 27 January 2015
DOI: https://doi.org/10.1186/s12864-015-1221-1

Re-annotation of the woodland strawberry (Fragaria vesca) genome