The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction
© Garzón-Martínez et al; licensee BioMed Central Ltd. 2012
Received: 23 November 2011
Accepted: 25 April 2012
Published: 25 April 2012
Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry.
We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs.
We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp.
KeywordsP. peruviana Solanaceae ESTs Functional annotation Gene model Phylogenetics
Physalis peruviana, also known as Cape gooseberry is a tropical fruit from the Solanaceae family, which includes many agriculturally important crops including potato, tomato, pepper, eggplant and tobacco . The Cape gooseberry fruit contains high levels of vitamin A, C and B-complex, as well as compounds of anti-inflammatory and antioxidant properties . Supercritical carbon dioxide extracts of P. peruviana leaves were shown to induce cell cycle arrest and apoptosis in human lung cancer H661 cells . Recently, 4β-Hydroxywithanolide (4βHWE) isolated from P. peruviana aerial parts (stems and leaves) was demonstrated to be a potential DNA-damaging and chemotherapeutic agent against lung cancer . In Colombia, this fruit has become promissory with high demand in European markets, mainly due to its unique taste, attractive color and shape as well as its potential health value. P. peruviana is a source of health related compounds found in the fruit and other parts of the plant including leaves and steams. Despite its nutritional and medical importance, current absence of P. peruviana genetic and genomic resources makes in-depth molecular studies on the plant difficult. Until this study, there were only a few partial P. peruviana gene sequences in public databases, mainly as a result of phylogenetic studies in the Solanaceae family [5, 6]. Therefore, there is a pressing need for efforts to obtain global genetic and genomic information from the Cape gooseberry, P. peruviana.
Advances in next generation sequencing (NGS) technology over the past few years have made it possible to rapidly perform de novo transcriptome and even genome assembly for non-model organisms with no or little prior genomic data available  . However, polyploidy and the large size of many plant genomes, which is predominantly due to amplification of repetitive elements or sometimes partial genome duplication , pose challenges to de novo whole genome assembly of plants. As such, EST sequencing, which avoids non-coding and repetitive DNA components, is a cost-effective and commonly used strategy to analyze the transcribed portion of a genome. Availability of ESTs represent a valuable resource for research as they provide comprehensive information regarding the transcriptome facilitating gene discovery and genome annotation and aiding in the determination of phylogenetic relationships . An increasing number of successful studies have been published describing EST sequencing and de novo transcriptome assembly for large-scale gene discovery [9–18].
Results and discussion
EST sequencing and assembly
P. peruviana transcriptome assembly overview
Filtered EST reads
Average length (bp)
N50 size (bp)
P. peruviana transcriptome functional annotation overview
Sequences with BLAST hits
Sequences annotated with GO terms
GO Terms associated with the sequences
Sequences associated with EC numbers
Main metabolic pathways associated to P. peruviana transcripts
KEGG* metabolic pathways
Number of transcripts
General metabolic pathways
Biosynthesis of secondary metabolites
Biosynthesis of phenylpropanoids
Drug metabolism - other enzymes
Tropane, piperidine and pyridine alkaloid biosynthesis
Biosynthesis of plant hormones
Biosynthesis of alkaloids derived from terpenoid/polyketide
Biosynthesis of terpenoids and steroids
Protein domains encoded by the P. Peruviana leaf transcriptome
A total of 12,974 P. peruviana cDNAs were found to have significant similarities to 3,117 protein domains present in the NCBI CDD (Conserved Domain Database) . The most abundant domain present in proteins encoded by the P. peruviana transcriptome is the pentatricopeptide repeat domain (PPR), found in 350 cDNAs. The PPR containing proteins are commonly found in the plant kingdom and although its function is still unclear, the PPR domain has been found in proteins involved in RNA editing in a number of recent studies [31–34]. Following the PPR domain, the next three most commonly found domains in the P. peruviana transcriptome are: protein kinase domain (294 cDNAs), NB-ARC domain (190 cDNAs) and WD40 domain (123 cDNAs). Protein kinases are one of the largest protein families in plants, involved in a wide variety of physiological processes , like calcium-dependent protein kinases and MAP kinases which are involved in the recognition of elicitors or pathogens and the subsequent activation of defense response in plants . The NB-ARC domain is a nucleotide-binding motif shared by plant resistant gene products involved in regulated cell death [37, 38]. The WD40 domain, whose common function is coordinating multi-protein complex assemblies, is found in a large number of eukaryotic proteins that cover a wide variety of functions including adaptor and regulatory modules in signal transduction, pre-mRNA processing and cytoskeleton assembly [39, 40]. Additionally, the WD40 domain is critically involved in the ubiquitin proteasome pathway which regulates photomorphogenesis, flowering and abiotic stress response in plants .
Protein domains identified in P. peruviana transcriptome
Number of cDNAs
Pentatricopeptide repeat domain (PPR motif)
Protein kinase domain
RNA recognition motif
Leucine rich repeat N-terminal domain
Tyrosine kinase catalytic domain
Out of the 110,921 singletons, there are 9,909 of them (length >200 bp) where GO term(s) were assigned to the sequence through Blast2GO (see Materials) or where a significant similarity to a well-characterized protein domain from NCBI CDD was found. We deposited the 9,909 singletons described above, in addition to the 24,024 assembled isotigs in the NCBI’s TSA (Transcriptome Shotgun Assembly) sequence database, which is available at GenBank (accessions JO124085-JO157957). Those sequences with their functional annotations, including GO terms and domain similarity related description, are also provided as Additional file 1: ‘Cape gooseberry cDNAs’.
In silico SSR marker identification
SSRs identified in P. peruviana cDNAs
We recently reported the first set of microsatellite markers developed for P. peruviana and related species  where the large majority of SSR loci was found in untranslated regions (UTRs) of transcripts with similarity to known proteins in public databases, leading to the identification of two novel polymorphic SSRs related to proteins involved in pathogen defense response. SSRs prioritization for plant breeding programs can be done via functional annotation of cDNAs associated with predicted SSRs and Gene Ontology annotations like ones involved in plant defense. Here we used and updated functional annotation of the transcriptome and the entire collection of assembled transcripts to report ten novel predictions for cDNA-derived SSRs in Cape gooseberry. These SSRs are associated with proteins with gene ontology annotations involved in plant defense to biotic stress such as defense response to fungus, programmed cell death, callose deposition in cell wall during defense response, plant hypersensitive response, and jasmonic acid, ethylene and salicylic acid hormones ( Additional file 2: ‘Functional annotation of ten Physalis peruviana SSRs markers related to plant defense’). The SRRs obtained in this study are the raw materials for future studies in genetic variation among Physalis populations, which can be used for: construction of genetic maps, quantitative trait loci (QTL) identification in this species and plant breeding programs focused on phytosanitary Cape gooseberry problems.
Gene model prediction in P. Peruviana
Cape gooseberry gene model prediction overview from alignments to the tomato and potato genomes
S. lycopersicum genome
S. tuberosum genome
Intron length variation is exemplified in Figure 5C, where a P. peruviana cDNA (ID Php00a06743.16696) was mapped to both the S. lycopersicum and S. tuberosum genomes, resulting in two identical sets of exons, but different sets of intron lengths (a, b). There is also a number of S. lycopersicum and S. tuberosum cDNAs that have the same predicted gene model in their own genome, respectively (all the cDNAs are aligned by Splign). Figure 5C (c) shows the nucleotide sequences around the first intron site of the 3 cDNAs from P. peruviana, S. tuberosum and S. lycopersicum. Primers targeting conserved flanking exonic regions as indicated can be used to amplify intronic fragments from all three species, P. peruviana, S. lycopersicum and S. tuberosum.
The conserved orthologous set (COS) markers are sets of genes conserved throughout evolution in both sequence and copy number [49, 50] that have been used extensively in comparative genomic and phylogenetic studies in Solanaceae. The COS marker strategy involves design of universal exonic primers among closely related species based on ortholog identification and multiple sequence alignment to amplify intronic/exonic regions. In the present study, we present another convenient approach to find universal exon regions - gene model prediction by Splign using two or more related genomes to define common models. Given the fact that the P. peruviana genome is not available yet, and genomes of both S. lycopersicum and S. tuberosum are only in their initial versions, gene model predictions would be particularly valuable in obtaining specific intronic regions for marker and SNP discovery in non-model species, as well as for comparative genomic and phylogenetic studies.
We also aligned the S. lycopersicum transcriptome to its own genome (data from http://solgenomics.net/organism/solanum_lycopersicum/genome) and also to the S. tuberosum genome  using Splign. We mapped 34,704 from a total of 34,727 S. lycopersicum cDNAs to its own genome with 100% identity (data not shown). However, only 28,366 (81.7%) S. lycopersicum cDNAs can be mapped to S. tuberosum genome with an average identity of 90%. In the Cape gooseberry case only 42.9% P. peruviana cDNAs get mapped to the S. tuberosum genome, suggesting that P. peruviana is evolutionarily more distant from S. lycopersicum and S. tuberosum than the two species from each other. We then conducted further analysis to estimate the phylogenetic location of P. peruviana in the Solanaceae family.
Experimental validation of intron positions
Primers used for experimental validation of intron positions
Primer sequences (5′ – 3′)
P. peruvianaunique identifier
Primer position in the cDNA
C2_At 2 g35920
C2_At 3 g06580
C2_At 5 g41480
C2_At 5 g27620
C2_At 3 g04870
C2_At 5 g60160
Comparison of validated exon/intron boundaries between PCR results and the predicted gene models
P. peruviana unique identifier
Position of boundary of the 2 adjacent exons at the mRNA
Match predicted gene model?
3 bp shift
6 bp shift
2 bp shift
3 bp shift
1 bp shift
Phylogenetic relationship of P. Peruviana with other solanaceae species
We found five putative orthologs among P. peruviana, S. lycopersicum, S. tuberosum, Capsicum spp (pepper), S. melongena (eggplant) and Petunia spp. The proteins are: xyloglucanase inhibitor containing pepsin_retropepsin superfamily domain, mitochondrial catalytic protein containing PP2Cc superfamily domain, mitochondrial small ribosomal subunit protein containing RPS2 superfamily domain, phosphate transporter and a functionally unknown protein.
The phylogenetic relationship among S. lycopersicum S. tuberosum Capsicum spp S. melongena and Petunia spp is consistent with a previous study by Wang Y et al. , in which the tree was constructed based on an unduplicated conserved syntenic segment in the genomes of the five plants. Our results showed that P. peruviana branched out before the divergence of the other five Solanaceae family members. Details of the phylogenetic analysis are summarized in Additional file 5: ‘Phylogenetic analysis workflow’.
This report constitutes the first genomic resource for the Physalis genus providing a large collection of assembled and functionally annotated cDNAs. The Physalis genus is part of the Solanaceae family, whose members are important sources of food, spice and medicine. However, genomic data for other members of the Physalis genus is limited. Therefore, this resource will enhance comparative studies within the family and the transcriptome will serve as a starting point for gene discovery in Physalis and for future annotations of the Physalis peruviana genome sequence. A number of the genes identified in this study provide candidates for resistance genes against viruses, fungal or bacterial pathogens. Additionally, this study is a potentially invaluable resource for mapping and marker-assisted breeding in Physalis peruviana and closely related species like Physalis philadelphica, commonly known as tomatillo, which are food staples in Central American countries.
cDNA synthesis and cDNA library normalization
Fresh leaf tissue from the Cape gooseberry Physalis peruviana Colombian ecotype plants from the Colombian in vitro germplasm bank (accession number 09U216-6) at the Corporacion Colombiana de Investigacion Agropecuaria (CORPOICA) were processed and flash frozen in liquid nitrogen. Tissues were immediately sent to Bio S&T Inc. (Montreal, QC, Canada) where RNA extraction, cDNA synthesis and normalization were performed. Briefly, RNA was extracted using a modified TRIzol method (Invitrogen, USA). cDNA synthesis was carried out using 16 μg total RNA by a modified SMART™ cDNA synthesis method and then were normalized by a modified normalization method [54, 55] where full-length cDNA was synthesized with two set of primers for driver and tester cDNA. Single-stranded cDNA was used for hybridization instead of double-stranded cDNA. Excess amounts of sense-stranded cDNA hybridized with antisense-stranded cDNA. After hybridization, duplex DNA was removed by hydroxyapatite chromatography. Normalized tester cDNA was re-amplified and purified with tester specific primer L4N by failsafeTm PCR (Epicentre Biotechnologies, USA), while driver cDNA was unable to amplify using L4N primer. Size fractionation of re-amplified cDNA was done in a 1% agarose gel. Greater than 0.5 kb cDNA fragments were purified by electroelution and after determining the concentrations, purified cDNAs were precipitated and stored in 80% EtOH at −80°C.
cDNA sequencing and assembly
The normalized cDNA library was prepared for sequencing at Emory Genomics Center (Atlanta, GA, USA). Approximately 5 μg of purified cDNA was sheared into small fragments via Covaris E210 Acoustic Focusing Instrument and sequenced in three-fourths 454 plate run on a 454 GS-FLX Titanium platform (Roche). The SFF files containing raw sequences and quality scores were submitted to the NCBI Sequence Read Archive (accession number SRP005904).
SeqClean [19, 56] was used before and after the assembly, for automated trimming and validation of the raw read files and the assembled file. SeqClean was launched with a minimum and maximum length cut-off of 50pb and 600pb. We used the Newbler software, GS de novo Assembler (Roche, version 2.5.3) with default parameters, to assemble reads into contigs, then further into isotigs. Isotigs within an isogroup represent putative alternatively spliced transcripts of a gene. Reads that cannot be connected with any others were defined as singletons.
After assembly, a local BLASTX [22, 23] was used to compare the assembled isotigs and singletons against the UniProtKB/Swiss-Prot database (released on April-2011) using an expect value threshold of 1e-5. The remaining cDNAs that did not get hits from UniProtKB/Swiss-Prot were compared against the NCBI RefSeq database (Release 47). The BLASTX output (XML format) was subjected to Blast2GO  for Gene Ontology (GO) analysis. Blast2GO retrieves the most significant GO terms associated with the obtained hits to the query sequence. When possible, Blast2GO also provides Enzyme commission (EC) numbers and the metabolic pathways they participate. We also compared all the P. peruviana cDNAs against the NCBI CDD database  using an expect value threshold of 1e-5 and selected all the hits where the aligned length is more than 2/3 of the targeted CDD length for domain identification.
Phobos (version 3.3.11) (http://www.rub.de/spezzoo/cm/cm_phobos.htm) was used to identify microsatellites (SSRs) in the publicly available collection of assembled transcripts and singletons [GenBank: JO124085-JO157957]. Perfect and imperfect searches were performed using default parameters.
Gene model prediction
Gene model prediction was carried out using the software package Splign , which has been proven to be able to accurately compute cDNA-to-Genome alignment with high efficiency. At the heart of the program is a compartmentization algorithm which identifies possible gene duplications, and a refined alignment algorithm recognizing introns and splice signals. The complete genome of P. peruviana is not available yet. The two closest relatives of P. peruviana that have genomic sequences available are S. lycopersicum (tomato) (data from http://solgenomics.net/organism/solanum_lycopersicum/genome; ITAG Release 2.3) and S. tuberosum (potato; PGSC_DM_v3.4) . Therefore we used the draft genomes of potato and tomato as the reference genome to map the assembled P. peruviana leaf transcriptome.
We selected orthologous proteins using the all-to-all alignment and mutual best hits selection strategy . Pairwise alignments were performed using BLASTP (expect value < 1e-5) using the RefSeq proteins from S. lycopersicum S. tuberosum Capsicum spp S. melongena and Petunia spp. At the time of analysis, the numbers of RefSeq proteins from the five species were: 4,788 for S. tuberosum; 6,008 for S. lycopersicum; 263 for S. melongena; 1,701 for Capsicum spp and 1,226 for Petunia spp. Fifteen putative orthologous proteins were found, which are present in all five species. Next, we aligned the assembled P. peruviana isotigs using BLASTX (expect value < 1e-5) against the database made of the fifteen orthologous groups obtained from the previous step (altogether 75 proteins). We identified eleven orthologous groups of proteins from all the five plants with hit(s) from the P. peruviana transcriptome. The best hit was chosen when multiple P. peruviana proteins hit a given group. We manually examined the alignments in eleven clusters and removed those with large length variation (the longest one is >20% of the shortest one) and susceptible similarities (< 65%). Thereafter we ended up with five orthologous groups among the six species.
To obtain higher accuracy phylogenetic tree, we further compared the five orthologous groups against the entire plant RefSeq protein database using BLASTP. There are altogether seven more plants that have significant hit(s) (expect value < 1e-5) for all the five orthologous groups. To this step we have thirteen plants for the five orthologous groups. We concatenated the five proteins in each species (in the same order) and aligned them using the program MUSCLE . The alignment results were manually refined and subjected to Phyml  and MEGA version 5  for phylogenetic tree construction. Bootstrapping was repeated 1,000 times. Both programs produced the same results.
Gina A. Garzón-Martínez and Z. Iris Zhu contributed equally to this work.
Support for this research was provided by a grant from the Colombian Ministry of Agriculture Contract Nos. 054/08072-2008 L4787-3281 to Luz Stella Barrero and 054/08190-2008 L7922-3322 to Victor Manuel Núñez Zarantes. Gina Garzón-Martínez was supported by a Colciencias "Joven Investigador" Fellowship during 2010. Leonardo Mariño-Ramírez expresses his deepest gratitude to his friend and colleague Dr. Alba Marina Cotes Prado for all the advice and support she gave him to conduct this project. This research was supported in part by the Intramural Research Program of the National Institutes of Health, National Library of Medicine, and National Center for Biotechnology Information.
- Knapp S: Tobacco to tomatoes: a phylogenetic perspective on fruit diversity in the Solanaceae. J Exp Bot. 2002, 53 (377): 2001-10.1093/jxb/erf068.View ArticlePubMed
- Ramadan MF: Bioactive phytochemicals, nutritional value, and functional properties of cape gooseberry (Physalis peruviana): An overview. Food Res Int. 2011, 44 (7): 1830-1836. 10.1016/j.foodres.2010.12.042.View Article
- Wu S-J, Chang S-P, Lin D-L, Wang S-S, Hou F-F, Ng L-T: Supercritical carbon dioxide extract of Physalis peruviana induced cell cycle arrest and apoptosis in human lung cancer H661 cells. Food and Chemical Toxicology. 2009, 47 (6): 1132-1138. 10.1016/j.fct.2009.01.044.View ArticlePubMed
- Yen C, Chiu C, Chang F, Chen J, Hwang C, Hseu Y, Yang H, Lee A, Tsai M, Guo Z: 4 -Hydroxywithanolide E from Physalis peruviana(golden berry) inhibits growth of human lung cancer cells through DNA damage, apoptosis and G 2/M arrest. BMC cancer. 2010, 10 (1): 46-10.1186/1471-2407-10-46.PubMed CentralView ArticlePubMed
- He C, Saedler H: Heterotopic expression of MPF2 is the key to the evolution of the Chinese lantern of Physalis, a morphological novelty in Solanaceae. Proc Natl Acad Sci U S A. 2005, 102 (16): 5779-5784. 10.1073/pnas.0501877102.PubMed CentralView ArticlePubMed
- Hu JY, Saedler H: Evolution of the inflated calyx syndrome in Solanaceae. Mol Biol Evol. 2007, 24 (11): 2443-2453. 10.1093/molbev/msm177.View ArticlePubMed
- Su Z, Ning B, Fang H, Hong H, Perkins R, Tong W, Shi L: Next-generation sequencing and its applications in molecular diagnostics. Expert Rev Mol Diagn. 2011, 11 (3): 333-343.PubMed
- Imelfort M, Edwards D: De novo sequencing of plant genomes using second-generation technologies. Briefings in Bioinformatics. 2009, 10 (6): 609-618. 10.1093/bib/bbp039.View ArticlePubMed
- Blanca J, Canizares J, Roig C, Ziarsolo P, Nuez F, Pico B: Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae). BMC Genomics. 2011, 12: 104-10.1186/1471-2164-12-104.PubMed CentralView ArticlePubMed
- Li Y, Luo H, Sun C, Song J, Sun Y, Wu Q, Wang N, Yao H, Steinmetz A, Chen S: EST analysis reveals putative genes involved in glycyrrhizin biosynthesis. BMC Genomics. 2010, 11 (1): 268-10.1186/1471-2164-11-268.PubMed CentralView ArticlePubMed
- Novaes E, Drost D, Farmerie W, Pappas G, Grattapaglia D, Sederoff R, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9 (1): 312-10.1186/1471-2164-9-312.PubMed CentralView ArticlePubMed
- Parchman T, Geist K, Grahnen J, Benkman C, Buerkle C: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010, 11 (1): 180-10.1186/1471-2164-11-180.PubMed CentralView ArticlePubMed
- Sun C, Li Y, Wu Q, Luo H, Sun Y, Song J, Lui E, Chen S: De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC Genomics. 2010, 11 (1): 262-10.1186/1471-2164-11-262.PubMed CentralView ArticlePubMed
- Alagna F, D'Agostino N, Torchia L, Servili M, Rao R, Pietrella M, Giuliano G, Chiusano ML, Baldoni L, Perrotta G: Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development. BMC Genomics. 2009, 10: 399-10.1186/1471-2164-10-399.PubMed CentralView ArticlePubMed
- Wang W, Wang Y, Zhang Q, Qi Y, Guo D: Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing. BMC Genomics. 2009, 10 (1): 465-10.1186/1471-2164-10-465.PubMed CentralView ArticlePubMed
- Rismani-Yazdi H, Haznedaroglu BZ, Bibby K, Peccia J: Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: pathway description and gene discovery for production of next-generation biofuels. BMC Genomics. 2011, 12: 148-10.1186/1471-2164-12-148.PubMed CentralView ArticlePubMed
- Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD: Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics. 2006, 7: 272-10.1186/1471-2164-7-272.PubMed CentralView ArticlePubMed
- Zeng S, Xiao G, Guo J, Fei Z, Xu Y, Roe BA, Wang Y: Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim. BMC Genomics. 2010, 11: 94-10.1186/1471-2164-11-94.PubMed CentralView ArticlePubMed
- SeqClean: [http://compbio.dfci.harvard.edu/tgi/software/]
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMed CentralPubMed
- Kumar S, Blaxter ML: Comparing de novo assemblers for 454 transcriptome data. BMC Genomics. 2010, 11: 571-10.1186/1471-2164-11-571.PubMed CentralView ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinforma. 2009, 10: 421-10.1186/1471-2105-10-421.View Article
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMed
- Aharoni A, Galili G: Metabolic engineering of the plant primary-secondary metabolism interface. Curr Opin Biotechnol. 2011, 22 (2): 239-244. 10.1016/j.copbio.2010.11.004.View ArticlePubMed
- Ahn IP, Kim S, Lee YH: Vitamin B1 functions as an activator of plant disease resistance. Plant Physiol. 2005, 138 (3): 1505-1515. 10.1104/pp.104.058693.PubMed CentralView ArticlePubMed
- Goyer A: Thiamine in plants: aspects of its metabolism and functions. Phytochemistry. 2010, 71 (14–15): 1615-1624.View ArticlePubMed
- Korkina LG: Phenylpropanoids as naturally occurring antioxidants: from plant defense to human health. Cell Mol Biol (Noisy-le-grand). 2007, 53 (1): 15-25.
- Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2011
- Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR: CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011, 39 (Database issue): D225-D229.PubMed CentralView ArticlePubMed
- Shikanai T, Okuda K: In Vitro RNA-Binding Assay for Studying Trans-Factors for RNA Editing in Chloroplasts. Methods Mol Biol. 2011, 774: 199-208. 10.1007/978-1-61779-234-2_13.View ArticlePubMed
- Zehrmann A, Verbitskiy D, Hartel B, Brennicke A, Takenaka M: PPR proteins network as site-specific RNA editing factors in plant organelles. RNA Biol. 2011, 8 (1): 67-70. 10.4161/rna.8.1.14298.View ArticlePubMed
- Takenaka M, Verbitskiy D, Zehrmann A, Brennicke A: Reverse Genetic Screening Identifies Five E-class PPR Proteins Involved in RNA Editing in Mitochondria of Arabidopsis thaliana. J Biol Chem. 2010, 285 (35): 27122-27129. 10.1074/jbc.M110.128611.PubMed CentralView ArticlePubMed
- Fujii S, Small I: The evolution of RNA editing and pentatricopeptide repeat genes. New Phytol. 2011, 191 (1): 37-47. 10.1111/j.1469-8137.2011.03746.x.View ArticlePubMed
- Chevalier D, Walker JC: Functional genomics of protein kinases in plants. Brief Funct Genomic Proteomic. 2005, 3 (4): 362-371. 10.1093/bfgp/3.4.362.View ArticlePubMed
- Romeis T: Protein kinases in the plant defence response. Current opinion in plant biology. 2001, 4 (5): 407-414. 10.1016/S1369-5266(00)00193-X.View ArticlePubMed
- Liu J, Liu X, Dai L, Wang G: Recent progress in elucidating the structure, function and evolution of disease resistance genes in plants. J Genet Genomics. 2007, 34 (9): 765-776. 10.1016/S1673-8527(07)60087-3.View ArticlePubMed
- van der Biezen EA, Jones JD: The NB-ARC domain: a novel signalling motif shared by plant resistance gene products and regulators of cell death in animals. Curr Biol. 1998, 8 (7): R226-R227. 10.1016/S0960-9822(98)70145-9.View ArticlePubMed
- Stirnimann CU, Petsalaki E, Russell RB, Muller CW: WD40 proteins propel cellular networks. Trends Biochem Sci. 2010, 35 (10): 565-574. 10.1016/j.tibs.2010.04.003.View ArticlePubMed
- Xu C, Min J: Structure and function of WD40 domain proteins. Protein Cell. 2011, 2 (3): 202-214. 10.1007/s13238-011-1018-1.View ArticlePubMed
- Biedermann S, Hellmann H: WD40 and CUL4-based E3 ligases: lubricating all aspects of life. Trends Plant Sci. 2011, 16 (1): 38-46. 10.1016/j.tplants.2010.09.007.View ArticlePubMed
- Phobos 3.3.11: [http://www.rub.de/spezzoo/cm/cm_phobos.htm]
- Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X, Risterucci AM, Da Silva C, Cascardo J, Allegre M: Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. BMC Genomics. 2008, 9 (1): 512-10.1186/1471-2164-9-512.PubMed CentralView ArticlePubMed
- Luro FL, Costantino G, Terol J, Argout X, Allario T, Wincker P, Talon M, Ollitrault P, Morillon R: Transferability of the EST-SSRs developed on Nules clementine(Citrus clementina Hort ex Tan) to other Citrus species and their effectiveness for genetic mapping. BMC Genomics. 2008, 9 (1): 287-10.1186/1471-2164-9-287.PubMed CentralView ArticlePubMed
- Simbaqueba J, Sanchez P, Sanchez E: Nunez Zarantes VM, Chacon MI, Barrero LS, Marino-Ramirez L: Development and Characterization of Microsatellite Markers for the Cape Gooseberry Physalis peruviana. PLoS One. 2011, 6 (10): e26719-10.1371/journal.pone.0026719.PubMed CentralView ArticlePubMed
- Mueller LA, Tanksley SD, Giovannoni JJ, van Eck J, Stack S, Choi D, Kim BD, Chen M, Cheng Z, Li C: The Tomato Sequencing Project, the first cornerstone of the International Solanaceae Project (SOL). Comp Funct Genomics. 2005, 6 (3): 153-158. 10.1002/cfg.468.PubMed CentralView ArticlePubMed
- Xu X, Pan S, Cheng S, Zhang B, Mu D, Ni P, Zhang G, Yang S, Li R, Wang J: Genome sequence and analysis of the tuber crop potato. Nature. 2011, 475 (7355): 189-195. 10.1038/nature10158.View ArticlePubMed
- Kapustin Y, Souvorov A, Tatusova T, Lipman D: Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct. 2008, 3: 20-10.1186/1745-6150-3-20.PubMed CentralView ArticlePubMed
- Fulton TM, Van der Hoeven R, Eannetta NT, Tanksley SD: Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell. 2002, 14 (7): 1457-1467. 10.1105/tpc.010479.PubMed CentralView ArticlePubMed
- Wu F, Mueller LA, Crouzillat D, Petiard V, Tanksley SD: Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics. 2006, 174 (3): 1407-1420. 10.1534/genetics.106.062455.PubMed CentralView ArticlePubMed
- Boussau B, Gueguen L, Gouy M: A mixture model and a hidden markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol Bioinform Online. 2009, 5: 67-79.PubMed CentralPubMed
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol. 2011, 10: 2731-2739.View Article
- Wang Y, Diehl A, Wu F, Vrebalov J, Giovannoni J, Siepel A, Tanksley SD: Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics. 2008, 180 (1): 391-408. 10.1534/genetics.108.087981.PubMed CentralView ArticlePubMed
- Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, Efstratiadis A: Construction and characterization of a normalized cDNA library. Proc Natl Acad Sci U S A. 1994, 91 (20): 9228-9232. 10.1073/pnas.91.20.9228.PubMed CentralView ArticlePubMed
- Patanjali SR, Parimoo S, Weissman SM: Construction of a uniform-abundance (normalized) cDNA library. Proc Natl Acad Sci U S A. 1991, 88 (5): 1943-1947. 10.1073/pnas.88.5.1943.PubMed CentralView ArticlePubMed
- Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI: An optimized procedure greatly improves EST vector contamination removal. BMC Genomics. 2007, 8: 416-10.1186/1471-2164-8-416.PubMed CentralView ArticlePubMed
- Li L, Stoeckert CJ: Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13 (9): 2178-2189. 10.1101/gr.1224503.PubMed CentralView ArticlePubMed
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.