Skip to main content


De novo assembly and characterization of transcriptomes of early-stage fruit from two genotypes of Annona squamosa L. with contrast in seed number



Annona squamosa L., a popular fruit tree, is the most widely cultivated species of the genus Annona. The lack of transcriptomic and genomic information limits the scope of genome investigations in this important shrub. It bears aggregate fruits with numerous seeds. A few rare accessions with very few seeds have been reported for Annona. A massive pyrosequencing (Roche, 454 GS FLX+) of transcriptome from early stages of fruit development (0, 4, 8 and 12 days after pollination) was performed to produce expression datasets in two genotypes, Sitaphal and NMK-1, that show a contrast in the number of seeds set in fruits. The data reported here is the first source of genome-wide differential transcriptome sequence in two genotypes of A. squamosa, and identifies several candidate genes related to seed development.


Approximately 1.9 million high-quality clean reads were obtained in the cDNA library from the developing fruits of both the genotypes, with an average length of about 568 bp. Quality-reads were assembled de novo into 2074 to 11004 contigs in the developing fruit samples at different stages of development. The contig sequence data of all the four stages of each genotype were combined into larger units resulting into 14921 (Sitaphal) and 14178 (NMK-1) unigenes, with a mean size of more than 1 Kb. Assembled unigenes were functionally annotated by querying against the protein sequences of five different public databases (NCBI non redundant, Prunus persica, Vitis vinifera, Fragaria vesca, and Amborella trichopoda), with an E-value cut-off of 10−5. A total of 4588 (Sitaphal) and 2502 (NMK-1) unigenes did not match any known protein in the NR database. These sequences could be genes specific to Annona sp. or belong to untranslated regions. Several of the unigenes representing pathways related to primary and secondary metabolism, and seed and fruit development expressed at a higher level in Sitaphal, the densely seeded cultivar in comparison to the poorly seeded NMK-1. A total of 2629 (Sitaphal) and 3445 (NMK-1) Simple Sequence Repeat (SSR) motifs were identified respectively in the two genotypes. These could be potential candidates for transcript based microsatellite analysis in A. squamosa.


The present work provides early-stage fruit specific transcriptome sequence resource for A. squamosa. This repository will serve as a useful resource for investigating the molecular mechanisms of fruit development, and improvement of fruit related traits in A. squamosa and related species.


Annona squamosa L., commonly known as sugar-apple (or sweetsop or custard-apple), is a popular fruit throughout the tropics, mainly southern Mexico, Antilles, Central and South America, tropical Africa, Australia, India, Indonesia, Polynesia and US (Hawaii and Florida) [1]. It is native to the tropical America and West Indies. In India, it was introduced by the Spanish and Portuguese in the 16th century [1,2]. It is known by several names in India: ata, aarticum, shareefa, sitaphal, seethaphal or seetha pazham, aathachakka and atna kothal etc. [1]. Annona sp. belongs to family Annonaceae which is the largest living family among magnoliids (primitive angiosperms). The genus Annona contains about 166 species [3], out of which six produce edible fruits; A. squamosa, A. reticulata, A. cherimola, A. muricata, A. atemoya and A. diversifolia [4]. A. squamosa is the most widely cultivated species [5]. The flower of A. squamosa comprises of a gynoecium of several loosely cohering carpels, surrounded by an androecium of numerous stamens, encircled by three small, inconspicuous sepals, and three green colored fleshy petals [6]. It is an apocarpous flower i.e. carpels are separate in individual pistils. Fruit is a syncarpium i.e. formed by amalgation of many ripened pistils and a fleshy receptacle. Each carpel has a single anatropous ovule that may develop into a single seed. The pulp is creamy white to light yellow, sweetly aromatic, and tastes like custard. The pulp is of nutritional and medicinal value [7,8], rich in calories, vitamin C, and minerals [1,9,10]. Annona fruits have been mentioned as ‘one of the most delicious fruits known to man’ and as ‘aristocrat of fruits’, considering its nutritional and medicinal value [11,12].

There have been very few genomic studies on A. squamosa, as only 158 and 12 sequences are available in nucleotide and protein databases, respectively, in NCBI GenBank as on 20th December, 2014 ( Next generation sequencing (NGS) technologies have facilitated rapid investigation of transcriptome [13-16]. The GS FLX+ platform is a high-throughput system, which can generate long sequence reads (up to ~1 kb), with high accuracy ( We report de novo assembly and transcriptome catalogue from A. squamosa. The data provides an important resource for gene discovery, gene expression, functional analysis, molecular breeding, and comparative genomic analysis of A. squamosa and related species.

In most angiosperms, including A. squamosa, ovule and ovary develop into seed and fruit, respectively. This transition is a complex physiological process with coordinated development of maternal and filial tissues. Understanding the early phase of fruit development is important, since the molecular and biochemical pathways of seed and fruit set, soon after fertilization, determine seed number, fruit size, and other fruit quality traits such as accumulation of sugar and organic acids [17-19]. Less number of seeds in fruit or seedlessness is important to consumers, both for fresh fruit consumption and fruit processing, especially when the seeds in Annona are hard and have a bad taste. Differences in fruit related traits, such as seed number have been reported among the Annona species and cultivars [9]. The presence of parthenocarpic fruits has not been reported in Annona sp. However, absence of the outer integument and change in ovule structure have been suggested as the causes for failure in seed formation due to interruption in the reproductive program in a spontaneous mutant of A. squamosa (Thai seedless) [11]. In India, some accessions have been reported with significantly reduced number of seeds, as compared to the common sugar-apple, Sitaphal [20,21]. In order to gain molecular insight into early-stage fruit development and to create groundwork for molecular characterization of fruit development, it is desirable to profile the transcriptome of developing fruits of A. squamosa.

In the present study, a massive pyrosequencing of transcriptome from early stages of fruit development was performed in two Annona genotypes (Sitaphal and NMK-1), showing significant difference in fruit seed number, using NGS technology (Roche 454 GS FLX+). De novo transcriptome assembly, functional annotation, and in silico discovery of potential molecular markers have been described here. Various genes, related to hormone, seed and fruit development, transcription factors, and metabolic pathways were identified. The information will be helpful in functional genomic studies and in furthering the understanding of molecular mechanisms of fruit development in Annona sp.


Plant material and RNA extraction

Two Annona genotypes with contrast in fruit seed number (Figure 1), Sitaphal and NMK-1, were used in this study. Sitaphal is a well known cultivar of A. squamosa [22]. NMK-1 was developed by selection for desirable characteristics from a population of Annona genotypes [21]. However, systematic information on the development of the cultivars is not available. Phylogenetic analysis, using two marker sequences (rbcl and LMCH10) in seventeen species of Annona, placed both the genotypes close to A. squamosa (Additional file 1: Figure S1). The two genotypes were collected from the field of Madhuban Nursery (17.68° N 75.92° E coordinates, at an elevation of 457 m), Solapur, Maharshtra, India, where these are clonally propagated.

Figure 1

Mature fruits of Sitaphal (a) and NMK-1 (b), showing densely seeded and nearly seedless ripened carpels (Scale 2 cm), respectively. Bar diagram shows the difference between the two genotypes in fruit seed number (c). The error bars indicate standard error in thirty mature fruits, harvested from three different plants (10 fruits from each plant) of each genotype.

Pollens were collected from flowers, in male stage, as described by Jalikop and Kumar [4]. The flowers, in female stage, were hand self-pollinated, using freshly collected pollens, in the morning (06.00 and 10.00 h). All the flowers were pollinated at the same time to avoid confounding effect of environment on fruit development. In each pollinated flower, the floral tube was plugged with cotton to prevent contamination of outside pollen. Flowers in similar stages were tagged and left as un-pollinated controls to examine seed numbers in fruits, developed from hand self-pollination and natural open-pollination (Figure 1c). The experiment was performed on three plants (three biological replicates) each of both the genotypes, during July, 2012. Developing fruits were harvested at 4, 8, and 12 days after pollination (DAP) (Figure 2). The gynoecium comprising of unfertilized ovules (0 DAP) was harvested. All the stamens were removed surrounding the gynoecium before harvesting. The 0, 4, 8 and 12 DAP samples were surface sterilized by using absolute ethanol before harvesting. The samples were frozen in liquid nitrogen immediately after harvest, and stored at −80°C until use.

Figure 2

Early-stage developing fruits (0, 4, 8, and 12 DAP) in Sitaphal and NMK-1.

Total RNA was isolated from the developing fruits (hand self-pollinated) using RNA isolation kit (Sigma) following the manufacturer’s instructions. For RNA extraction, at least three developing fruits of same stage (0, 4, 8, and 12 DAP) were taken for each sample. The RNA was extracted in three biological replicates for each genotype. The quality of RNA was confirmed by using 2100 Bioanalyzer (Agilent). For sequencing, in case of each sample (0, 4, 8, and 12 DAP of Sitaphal or NMK-1), the equivalent quantity of total RNA of three biological replicates was pooled.

Library preparation and 454 pyrosequencing

Approximately 1 μg total RNA was used for preparing mRNA sequencing library of each sample. Poly (A+) RNA was isolated from total RNA mixture by using NEBNext® Poly(A) mRNA Magnetic Isolation Module (New England Biolabs), following the manufacturer’s protocol. The purified Poly (A+) RNA was sheared and cDNA library was prepared by using NEBNext® mRNA Library Prep Reagent Set for 454™ (New England Biolabs) as per the instruction manual. The cDNA libraries were sequenced on a 454 Genome Sequencer FLX+ platform (Roche, USA).

De novo assembly

The raw data obtained from 454 GS FLX+ was filtered with a quality cut-off value of 40. The adaptor and primer sequences were removed using NGS QC Tool kit and GS Run processor. The low-quality sequences and sequences with less than 40 bp were removed before contig assembly. De novo contig assembly of the reads was performed using GS De Novo Assembler v2.6 (Roche, USA). In the assembled transcript sequence data at the four developmental stages (0, 4, 8 and 12 DAP), repeat sequences were identified with at least 40 bp overlap and 95% overlap identity. The repeat sequences of the transcriptome data from four libraries (0, 4, 8 and 12 DAP) were combined into larger units- unigenes, by using CAP3 [23].

Functional annotation

Accelerated large scale functional annotation of all contigs was done using WImpiBLAST tool [24] on high performance computing cluster. For functional annotation, the assembled transcript sequences were used as queries against the non-redundant (NR) protein database using NCBI BLASTx algorithm [25]. The BLAST E-value threshold was set at 10−5.

Gene ontology analysis

The gene ontology (GO) annotation was derived for the unigenes, using Uniprot annotation [26]. The categorization and visualization of GO terms was done using WEGO web tool [27].

Comparative analysis of A. squamosa unigenes

Comparative analysis was performed using the unigenes as queries against the protein databases of some fruit crops such as, Prunus persica (, Vitis vinifera ( and Fragaria vesca (, and a primitive angiosperm, Amborella trichopoda (

Detection of sequences associated with hormone related signalling pathways, transcription factors and seed development

The unigene sequences were used to blast (BLASTx) against the transcription factor (, seed development ( and hormone related ( protein sequence database of Arabidopsis thaliana, at the criteria of E-value ≤ 10−5 and query coverage ≥ 50%.

Single nucleotide polymorphism (SNP) analysis

Reads from the transcriptome libraries were mapped on unigenes of the respective genotype using program 'clc_ref_assemble_long' of CLC Assembly Cell version 3.2.2. Variants were detected using 'find_variations' program. SNP with read depth of more than five for each allele was only considered as heterozygous.

Detection of simple sequence repeats (SSRs)

The unigene sequences were searched for SSRs using the perl script program MISA (MIcroSAtellite; The repeats of mono- to hexa-nucleotide motifs with a minimum of five repetitions were considered as search criteria in MISA script [15].

Web resource

A web resource, comprising of entire transcriptome contigs, has been developed using open source sequence server [28], and was hosted on Linux server.

Identification of differentially expressed unigenes

A total 5689 unigenes common between the two genotypes with more than 80% query coverage, were used as reference sequences for mapping individual reads from each library. The calculation of reads per kilobase of transcript per million mapped reads (RPKM) was done by using the program 'clc_ref_assemble_long' of CLC Assembly Cell version 3.2.2.

Enrichment of Gene Ontology (GO) terms in the differentially expressed genes was performed using AgriGO analysis tool (, with Fisher tests and Bonferroni multiple testing correction (False Discovery Rate ≤ 0.05). Kyoto Encyclopedia of Genes and Genomes (KEGG) categories was assigned by the plant gene set enrichment analysis toolkit ( with fisher test function (False Discovery Rate ≤ 0.05).

Quantitative real time PCR

First strand cDNA was synthesized using cDNA Synthesis Kit RT-PCR (Roche, USA), with oligodT anchored primers following the manufacturer’s instructions. Gene-specific primers were designed using Primer Express software. QuantiTect SYBR Green RT-PCR Master mix (Qiagen) was used to perform real time PCR assay in an ABI 7700 Sequence Detector Real-Time PCR system (Applied Biosystems, USA). Three biological replications were conducted for each transcript for both the genotypes. The expression data was analyzed using ABI PRISM 7700 Sequence Detection System software (Applied Biosystems). The expression values were normalized with respect to Actin gene from A. squamosa. Dissociation curves confirmed the presence of a single amplicon in each PCR reaction. Relative expression was calculated as described previously [29].

Results and discussion

454 pyrosequencing, sequence assembly and annotation

In total, 1,801,608 and 1,901,179 raw reads were produced in the four cDNA library preparations of developing fruits (0, 4, 8 and 12 DAP) from the two genotypes of A. squamosa- Sitaphal and NMK-1 (Figure 2), respectively, with an average length of 568 bp (Additional file 2). The raw reads were filtered by removing low-quality reads, adapters, primer sequences, and sequences of less than 40 bp. Finally, 9,37,270 and 9,92,439 quality reads were obtained in the four cDNA library preparations (0, 4, 8 and 12 DAP) of Sitaphal and NMK-1, respectively. The average number of reads produced for each library was 0.24 million (Table 1). The filtered raw reads (sff files) were deposited in the NCBI Short Read Archive (SRA) database (accession number SRP042646). The quality-reads were assembled, giving 2074 to 11004 contigs, with more than 200 bp length, in the eight different cDNA libraries (Table 1). The contig sequences were searched against the known sequences in NCBI non redundant (NR) database, using BLASTx algorithm. At the E-value ≤ 10−5, 1808 to 9038 contigs were annotated across different libraries (Table 1, Additional file 3). The results provide sequence information for genes expressed during early developmental stages of fruits of A. squamosa.

Table 1 Summary of the sequencing-reads, assembly and functional annotation (using NCBI NR database) of the A . squamosa transcriptome

The contig sequence data in the four stages of fruit development (0, 4, 8, and 12 DAP) was combined into larger units, mentioned here as unigenes, by using CAP3. A total of 14921 (Sitaphal) and 14178 (NMK-1) unigenes were obtained. Out of the 14921 unigenes in Sitaphal, 2905 were ≥ 500 bp, 5239 were ≥ 1000 bp, 3663 were ≥ 1500 bp and 3114 were ≥ 2000 bp in length. Out of the 14178 unigenes in NMK-1, 2697 were ≥ 500 bp, 4883 were ≥ 1000 bp, 3516 were ≥ 1500 bp and 3082 were ≥ 2000 bp in length. The average lengths of the unigenes were 1086 bp and 1100 bp for Sitaphal and NMK-1, respectively. The sequence information is a useful resource for identification, cloning and functional genomic studies in future.

The 14178 unigenes of NMK-1 were mapped over 14921 unigenes of Sitaphal. A total of 5689 unigenes were common between the two genotypes with more than 80% query coverage. Single nucleotide polymorphism was investigated in the 1160 unigenes with at least 500 bp length and showing at least 95% similarity between the two genotypes. The SNP analysis estimated about 0.35 and 0.33% heterozygosity in Sitaphal and NMK-1, respectively, after examining about 2.2 and 1.3 million nucleotide positions. The low level of heterozygosity agrees with the previous reports, notifying the development of true-to-type and uniform seedlings in A. squamosa [30,31].

Functional categorization by GO annotation

In total, 5401 (Sitaphal) and 6421 (NMK-1) unigenes, having sequence homology with uniprot annotations, were subjected to GO assignments for biological processes, cellular components and molecular functions categories. In the category of biological processes, unigenes related to metabolic processes (49.2% in Sitaphal and 75.3% in NMK-1), cellular processes (42.9% in Sitaphal and 77.3% in NMK-1), and response to stimulus (8.4% in Sitaphal and 26.2% in NMK-1) were predominant. In cellular components, genes related to cell parts (39.2% in Sitaphal and 81.8% in NMK-1) and organelles (23.5% in Sitaphal and 62.4% in NMK-1) were the most abundant classes. In molecular functions, genes involved in binding (38.1% in Sitaphal and 60.8% in NMK-1) and catalytic activities (38.3% Sitaphal and 49.1% in NMK-1) were abundantly expressed (Figure 3).

Figure 3

GO classifications of assembled unigenes, having sequence homology with uniprot proteins, assigned to 51 functional groups.

Comparative analysis with available public databases

The assembled unigenes were functionally annotated using BLASTx algorithm against the protein sequences of five public databases: NCBI NR, Prunus persica, Vitis vinifera and Fragaria vesca, and a primitive species, Amborella trichopoda, with an E-value cut-off of 10−5. The number and percentage of annotated unigenes of A. squmosa genotypes are given in Table 2. Of the 14921 (Sitaphal) and 14178 (NMK-1) unigenes, 10333 (69.25%) and 11676 (82.35%), respectively, showed significant similarity to NCBI NR protein database (Additional file 4). Furthermore, 60.33% to 61.57% (Sitaphal) and 77.32% to 78.47% (NMK-1) of the unigenes showed significant homology with the four plant species (Table 2). We obtained 1928 (12.92%) and 2825 (19.92%) unigenes with more than 90% subject coverage, suggesting quasi-full length genes in Sitaphal and NMK-1, respectively. However, 4588 (Sitaphal) and 2502 (NMK-1) unigenes did not match with any known protein in the NR database. These un-assigned transcripts may be novel genes or belong to untranslated regions, and could play specific roles in A. squamosa. The unigene sequence information would be useful as a reference for molecular biology research on A. squamosa and its related species.

Table 2 Number and percentage (in bracket) of unigenes in A . squamosa genotypes (Sitaphal and NMK-1) from BLASTx searches against public protein databases of fruits crop and a closely related genus

Detection of transcript sequences related to hormone pathway, transcription factors and seed development

Fruit development is a complex process and involves numerous physiological and biochemical events which are initiated and regulated by hormonal signals [32]. Plant hormones, such as auxins, gibberellins, cytokinins, abscisic acid, ethylene, and brassinosteroids, play important role in fruit set and development [17,33]. Brassinosteroids are important for early fruit development [34], and the regulation of seed size [35] and number [36]. A total of 148 unigenes encoding putative hormone related genes were identified in A. squamosa (Table 3, Additional file 5), by BLASTx searches against the protein database of hormone pathway genes of A. thaliana.

Table 3 Summary of hormone related unigenes identified in the transcriptome of A . squamosa genotypes (Sitaphal and NMK-1)

Transcription factors (TFs) control gene expression quantitatively, spatially and temporally [37]. It is desirable to identify the gene regulatory networks responsible for programming of early fruit development. The unigene sequences were annotated against the Plant-TFDB database of A. thaliana, to identify TFs which express during early phases of fruit development in A. squamosa. The BLASTx searches revealed a total of 319 unigenes matched with putative homologs of Arabidopsis TFs (Table 4, Additional file 6). The three most abundant families of transcription factors were bHLH, MYB and MYB-related, and C3H, represented by 34, 34 and 25 unigenes, respectively. Basic helix–loop–helix (bHLH) proteins participate in the regulation of a myriad of essential developmental and physiological processes, including reproductive development, determination of plant organ size, fruit and seed development [38,39]. The interplay of MYB factors, apart from transcription control on many crucial biological processes, regulates fruit and seed development [40,41]. Some of the C3H type TFs are embryo specific and play regulatory role in seed development [42].

Table 4 Summary of transcription factor related unigenes identified in the transcriptome of A . squamosa genotypes (Sitaphal and NMK-1)

The BLAST search on the transcriptome data, using Arabidopsis protein sequences obtained from SeedGenes Project (, identified 379 transcripts associated with the development of seeds (Additional file 7).

The sequence information on TFs, hormone and seed development related putative genes will be useful in examining the differential expression in the two genotypes of A. squamosa, with contrasting trait related to fruit and seed development.

Differentially expressed unigenes

The transcript abundance profile was examined for the 5689 unigenes common between the two genotypes, in the developing fruits at 0 and 8 DAP. At these stages, comparable numbers of contigs were identified in the two genotypes (Table 1). A total of 5504 unigenes were differentially expressed between the two genotypes in at least one time point (0 or 8 DAP) (Additional file 8). Among these, 1792 and 721 unigenes were up- and down-regulated, respectively, by ≥ 2 fold in Sitaphal at 8 DAP. By using the information of BLASTx searches against the protein database of A. thaliana, the differentially expressed unigenes (≥2 fold, 8 DAP) were mapped to terms in AgriGO and KEGG databases [43,44]. The GO enrichment patterns showed a disproportionate representation of unigenes involved in the biological process of reproductive structure, embryo, seed and fruit development in the two genotypes (Table 5, Additional file 9). The ontology analysis based on KEGG revealed the abundance of transcripts related to hormones, alkaloids, terpanoids, steroids, phenylpropanoids, spliceosome and other metabolic pathways in Sitaphal (Table 6, Additional file 9). The results indicate a distinctly more active primary and secondary metabolism in the early-stage fruits of Sitaphal as compared to the less seeded NMK-1. Hence, development of multiple seeds in Sitaphal was accompanied by a higher rate of metabolism in developing fruits.

Table 5 AgriGO categories (False discovery rate ≤ 0.05) for putative genes up-regulated (≥2 fold) in early-stage fruits (8 DAP) of Sitaphal and NMK-1
Table 6 Pathway assignment based on KEGG (False Discovery Rate ≤ 0.05) for putative genes up-regulated (≥2 fold) in early-stage fruits (8 DAP) of Sitaphal and NMK-1

The transcript level of several unigenes associated with hormones, transcription factors, and seed development were also differentially expressed between the two genotypes at 0 and 8 DAP (Additional file 8). Many of the putative orthologous genes which give a defective embryo and/or seed phenotype in Arabidopsis mutants [45], showed reduced expression in NMK-1 at 8 DAP (Figure 4, Additional file 8). Lower level of expression of the embryogenesis-related genes (Figure 4) could be indicative of aberrant embryo development, eventually affecting seed development in NMK-1 plants. The underlying transcriptional changes need to be validated with the accompanying anatomical and metabolic changes in the developing ovules. Moreover, further in-depth RNA-sequencing is required to generate comprehensive transcriptional profile for each developing stage of fruits.

Figure 4

Differential accumulation (≥2 fold, 8 DAP vs 0 DAP) of transcripts for embryogenesis related putative genes in early-stage fruits of Sitaphal and NMK-1. The orthologous genes give a defective embryo and/or seed phenotype in Arabidopsis mutants. The details of the differentially expressed transcripts are given in Additional file 8.

Real time PCR

To validate the usefulness of the transcript sequences identified in the transcriptome resource, expression of five randomly selected unigenes was examined by real time PCR and compared with the RPKM expression values in the transcriptome data of 5689 unigenes. qRT-PCR was performed in the developing fruit (8 DAP) of Sitaphal and NMK-1, in three biological replicates. At 8 DAP initial cell division takes place in the zygote, which leads to the formation of the embryo [46]. Interestingly, the qRT-PCR analysis (Figure 5a; Additional file 10) suggested preferentially lower expression of the orthologous genes such as Clavata-3 (regulates seed formation [47]), Abnormal Suspensor-2 (involved in embryogenesis [48]), Embryo Defective-1144 (role in embryo development [49]), Embryo Defective-2742 (role in embryo development [50]), and Ovule abortion-9 (role in ovule development [51]). The qRT-PCR fold change was comparable to the RPKM values in the transcriptome data (Figure 5b). Thus, transcriptome data for the two contrasting Annona genotypes presented here is useful for identifying candidate genes for the development of less seeded fruits.

Figure 5

Quantitative RT-PCR analyses and RPKM expression value of 5 randomly selected candidate genes for seed development in Sitaphal and NMK-1, at 8 DAP. Quantitative RT-PCR analyses (a). Each bar indicates standard error in three biological replicates (*p ≤ 0.05). A detail of the primers is given in Additional file 10. The qRT-PCR fold change is comparable with RPKM values in transcriptome data (b).

Mining of SSRs

Identification of SSRs was carried out to generate information for the development of molecular markers for future studies on genetic diversity in A. squamosa. In total, 2629 and 3445 SSR motifs were identified in 2045 and 2678 transcripts for Sitaphal and NMK-1, respectively (Table 7). Out of the transcripts analyzed, 417 and 541 contained more than one SSR, whereas, 324 and 428 were in compound form in Sitaphal and NMK-1, respectively. The mono-nucleotide SSRs represented the largest fraction of SSRs (35.71% in Sitaphal and 44.06% in NMK-1), followed by di-nucleotide (30.88% in Sitaphal and 25.54% in NMK-1) and/or tri-nucleotide (29.25% in Sitaphal and 26.47% in NMK-1) SSRs. Tetra-, penta- and hexa-nucleotide SSRs were identified in a small fraction (0.030-0.004%) (Table 8). The 5689 unigenes, common between the two genotypes, were examined for the presence of SSRs with differences in length. In total, 18 SSRs were identified with variable number of tandem repeat loci between the two genotypes (Additional file 11). The SSR motifs could be potential candidates for transcript based microsatellite marker development, useful in determining functional genetic variation in A. squamosa [52].

Table 7 Statistics of SSRs identified in the transcripts of A . squamosa genotypes (Sitaphal and NMK-1)
Table 8 Classes of SSR repeat motifs in the transcriptome of A . squamosa genotypes (Sitaphal and NMK-1)

Annona transcriptome web resource

A web resource has been developed where entire assembled transcripts are available in BLAST enabled search format ( The web resource is useful for researchers in data-mining and to access pre-computed annotations.


The study provides transcriptome information on A. squamosa. We report sequencing, de novo assembly and analysis of early-stage fruit transcriptome of two genotypes with contrasting level of seed number in fruits. Orthologous genes related to hormone pathways, transcription factors and seed development were determined in the early-stage fruit tramscriptome. Differentially expressed unigenes were identified between the two genotypes. Several of such unigenes were related to seed and fruit related traits, and expressed at a higher level in the densely seeded genotype, Sitaphal. Additionally, a large number of SSRs were identified, which will be a useful resource in marker development for future genetic studies in Annona sp. This repository will serve as a useful resource for investigating the molecular mechanisms of fruit development, and improvement of fruit related traits in A. squamosa and related species.

Availability of supporting data

The RNA-seq data is available in the NCBI Sequence Read Archive (SRA) (, under accession number SRP042646.


  1. 1.

    In: Encyclopedia: Annona squamosa.

  2. 2.

    Flora of North America

  3. 3.

    Species of Annona; The Plant List 2013.

  4. 4.

    Jalikop S, Kumar R. Pseudo-xenic Effect of Allied Annona spp. Pollen in Hand Pollination of cv. ‘Arka Sahan’ [(A. cherimola × A. squamosa) × A. squamosa]. HortScience. 2007;42(7):1534–8.

  5. 5.

    Global invasive species database []

  6. 6.

    Xu F, Ronse De Craene L. Floral ontogeny of Annonaceae: evidence for high variability in floral form. Ann Bot. 2010;106(4):591–605.

  7. 7.

    Gupta RK, Kesari AN, Watal G, Murthy PS, Chandra R, Tandon V. Nutritional and hypoglycemic effect of fruit pulp of Annona squamosa in normal healthy and alloxan-induced diabetic rabbits. Ann Nutr Metab. 2005;49(6):407–13.

  8. 8.

    Andrade E, Zoghbi M, Maia J, Fabricius H, Marx F. Chemical characterization of the fruit of Annona squamosa L. occurring in the Amazon. J Food Compos Anal. 2001;14:227–32.

  9. 9.

    SCUC. Annona: Annona cherimola, A. muricata, A. reticulata, A. senegalensis, A. squamosa. Southampton, UK: University of Southampton; 2006.

  10. 10.

    National Nutrient Database for Standard Reference Release 26; Agricultural Research Service United States Department of Agriculture (

  11. 11.

    Lora J, Hormaza JI, Herrero M, Gasser CS. Seedless fruits and the disruption of a conserved genetic pathway in angiosperm ovule development. Proc Natl Acad Sci U S A. 2010;108(13):5461–5.

  12. 12.

    Levetin E, McMahon K. Plants and Society: The Botanical Connections to Our Lives. 5th ed. New York: The McGraw−Hill; 2008. p. 88–102.

  13. 13.

    Guo S, Liu J, Zheng Y, Huang M, Zhang H, Gong G, et al. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles. BMC Genomics. 2011;12:454.

  14. 14.

    Surget-Groba Y, Montoya-Burgos JI. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 2010;20(10):1432–40.

  15. 15.

    Dong S, Liu Y, Niu J, Ning Y, Lin S, Zhang Z. De novo transcriptome analysis of the Siberian apricot (Prunus sibirica L.) and search for potential SSR markers by 454 pyrosequencing. Gene. 2014;544(2):220–7.

  16. 16.

    Li C, Wang Y, Huang X, Li J, Wang H. De novo assembly and characterization of fruit transcriptome in Litchi chinensis Sonn and analysis of differentially regulated genes in fruit in response to shading. BMC Genomics. 2013;14:552.

  17. 17.

    Ruan YL, Patrick JW, Bouzayen M, Osorio S, Fernie AR. Molecular regulation of seed and fruit set. Trends Plant Sci. 2012;17(11):656–65.

  18. 18.

    Yang XY, Wang Y, Jiang WJ, Liu XL, Zhang XM, Yu HJ, et al. Characterization and expression profiling of cucumber kinesin genes during early fruit development: revealing the roles of kinesins in exponential cell production and enlargement in cucumber fruit. J Exp Bot. 2013;64(14):4541–57.

  19. 19.

    Mounet F, Moing A, Garcia V, Petit J, Maucourt M, Deborde C, et al. Gene and metabolite regulatory network analysis of early developing fruit tissues highlights new candidate genes for the control of tomato fruit composition and development. Plant Physiol. 2009;149(3):1505–28.

  20. 20.

    Jalikop SH, Kumar PS. New fruit varieties for arid regions, pomegranate ‘Ruby’ and custard apple ‘Arka Sahan'. Indian Hort. 2000;45:19–20.

  21. 21.

    Indian Council of Agricultural Research (

  22. 22.

    Jalikop SH. Annonaceous fruits. In: Chadha KL, editor. Handbook of Hort. New Delhi: ICAR; 2000. p. 109–14.

  23. 23.

    Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res. 1999;9(9):868–77.

  24. 24.

    Sharma P, Mantri SS. WImpiBLAST: Web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing. PLoS One. 2014;9(6):e101144.

  25. 25.

    National Centre for Biotechnology Information BLAST (

  26. 26.

    UniProtConsortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2010;38:D142–148.

  27. 27.

    Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, et al. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 2006;34:W293–297.

  28. 28.

    Priyam A, Woodcroft BJ, Wurum Y: SequenceServer: BLAST searching made easy. in prep.

  29. 29.

    Bhati KK, Aggarwal S, Sharma S, Mantri S, Singh SP, Bhalla S, et al. Differential expression of structural genes for the late phase of phytic acid biosynthesis in developing seeds of wheat (Triticum aestivum L.). Plant Sci. 2014;224:74–85.

  30. 30.

    Ronning CM, Schnell RJ. Using randomly amplified polymorphic DNA (RAPD) markers to identify Annona cultivars. J Am Soc Hortic Sci. 1995;120(5):726–9.

  31. 31.

    Pinto AC, Cordeiro MCR, Andrade SRM, Ferreira FR, Filgueiras HA, Alves RE, et al. Annona species. Southampton, UK: International Centre for Underutilized Crops, University of Southampton; 2005.

  32. 32.

    Wu J, Xu Z, Zhang Y, Chai L, Yi H, Deng X. An integrative analysis of the transcriptome and proteome of the pulp of a spontaneous late-ripening sweet orange mutant and its wild type improves our understanding of fruit ripening in citrus. J Exp Bot. 2014;65(6):1651–71.

  33. 33.

    McAtee P, Karim S, Schaffer R, David K. A dynamic interplay between phytohormones is required for fruit development, maturation, and ripening. Front Plant Sci. 2013;4(79):1–7.

  34. 34.

    Fu FQ, Mao WH, Shi K, Zhou YH, Asami T, Yu JQ. A role of brassinosteroids in early fruit development in cucumber. J Exp Bot. 2008;59(9):2299–308.

  35. 35.

    Jiang WB, Huang HY, Hu YW, Zhu SW, Wang ZY, Lin WH. Brassinosteroid regulates seed size and shape in Arabidopsis. Plant Physiol. 2013;162(4):1965–77.

  36. 36.

    Huang HY, Jiang WB, Hu YW, Wu P, Zhu JY, Liang WQ, et al. BR signal influences Arabidopsis ovule and seed number through regulating related genes expression by BZR1. Mol Plant. 2013;6(2):456–69.

  37. 37.

    Mishima K, Fujiwara T, Iki T, Kuroda K, Yamashita K, Tamura M, et al. Transcriptome sequencing and profiling of expressed genes in cambial zone and differentiating xylem of Japanese cedar (Cryptomeria japonica). BMC Genomics. 2014;15(1):219.

  38. 38.

    Nicolas P, Lecourieux D, Gomes E, Delrot S, Lecourieux F. The grape berry-specific basic helix-loop-helix transcription factor VvCEB1 affects cell size. J Exp Bot. 2013;64(4):991–1003.

  39. 39.

    Denay G, Creff A, Moussu S, Wagnon P, Thevenin J, Gerentes MF, et al. Endosperm breakdown in Arabidopsis requires heterodimers of the basic helix-loop-helix proteins ZHOUPI and INDUCER OF CBP EXPRESSION 1. Development. 2014;141(6):1222–7.

  40. 40.

    Machemer K, Shaiman O, Salts Y, Shabtai S, Sobolev I, Belausov E, et al. Interplay of MYB factors in differential cell expansion, and consequences for tomato fruit development. Plant J. 2011;68(2):337–50.

  41. 41.

    Zhang Y, Liang W, Shi J, Xu J, Zhang D. MYB56 encoding a R2R3 MYB transcription factor regulates seed size in Arabidopsis thaliana. J Integr Plant Biol. 2013;55(11):1166–78.

  42. 42.

    Dekkers BJ, Pearce S, van Bolderen-Veldkamp RP, Marshall A, Widera P, Gilbert J, et al. Transcriptional dynamics of two seed compartments with opposing roles in Arabidopsis seed germination. Plant Physiol. 2013;163(1):205–15.

  43. 43.

    Du Z, Zhou X, Ling Y, Zhang Z, Su Z. agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res. 2010;38:W64–70.

  44. 44.

    Yi X, Du Z, Su Z. PlantGSEA: a gene set enrichment analysis toolkit for plant community. Nucleic Acids Res. 2013;41:W98–103.

  45. 45.

    Meinke D, Muralla R, Sweeney C, Dickerman A. Identifying essential genes in Arabidopsis thaliana. Trends Plant Sci. 2008;13(9):483–91.

  46. 46.

    Lora J, Hormaza JI, Herrero M. The progamic phase of an early-divergent angiosperm, Annona cherimola (Annonaceae). Ann Bot. 2010;105(2):221–31.

  47. 47.

    Fiume E, Fletcher JC. Regulation of Arabidopsis embryo and endosperm development by the polypeptide signaling molecule CLE8. Plant Cell. 2012;24(3):1000–12.

  48. 48.

    Casson S, Spencer M, Walker K, Lindsey K. Laser capture microdissection for the analysis of gene expression during embryogenesis of Arabidopsis. Plant J. 2005;42(1):111–23.

  49. 49.

    Deng W, Chen G, Peng F, Truksa M, Snyder CL, Weselake RJ. Transparent testa16 plays multiple roles in plant development and is involved in lipid synthesis and embryo development in canola. Plant Physiol. 2012;160(2):978–89.

  50. 50.

    Thierry-Mieg D, Thierry-Mieg J. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol. 2006;7 Suppl 1:S12 11–14.

  51. 51.

    Berg M, Rogers R, Muralla R, Meinke D. Requirement of aminoacyl-tRNA synthetases for gametogenesis and embryo development in Arabidopsis. Plant J. 2005;44(5):866–78.

  52. 52.

    Garg R, Patel RK, Tyagi AK, Jain M. De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res. 2011;18(1):53–63.

Download references


This work was supported by the institutional core research fund (National Agri-Food Biotechnology Institute, Department of Biotechnology (DBT), Govt. of India). Mr. N. M. Kaspate is cordially acknowledged for facilitating experiments in the field of Madhuban Nursery, Solapur, India. YG and AKP acknowledge D.B.T. for providing fellowship. RT thanks DST for JC Bose fellowship.

Author information

Correspondence to Shrikant S Mantri or Sudhir P Singh.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SPS and RT conceived of the study. SSM supervised bioinformatic work. YG and SPS performed sample collection, RNA extraction, qRT-PCR and data analysis. AKP and KS helped in experiments and data analysis. SPS wrote the manuscript. SPS, SSM, RT, and YG prepared the final manuscript. All authors read and approved the manuscript.

Additional files

Additional file 1: Figure S1.

A phylogenetic tree on the basis of two markers- the chloroplast gene- rbcl, and the microsatellite locus- LMCH10.

Additional file 2:

Summary of 454 sequencing reads and contig assembly in developing fruit libraries of A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 3:

Details of the contigs identified in the developing fruits (0, 4, 8 and 12 DAP) of A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 4:

Details of the unigenes identified in the developing fruit transcriptome of A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 5:

Details of hormone related unigenes identified in the developing fruit transcriptome of A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 6:

Details of transcription factor related unigenes identified in the developing fruit transcriptome of A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 7:

Details of seed development related unigenes identified in the developing fruit transcriptome of A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 8:

Unigenes with RPKM values in the transcriptome data of developing fruits (0 and 8 DAP) in A. squamosa genotypes (Sitaphal and NMK-1).

Additional file 9:

AgriGO and KEGG categories (False Discovery Rate ≤ 0.05) for putative genes up-regulated (≥2 fold) in early-stage fruits (8 DAP) of Sitaphal and NMK-1.

Additional file 10:

Primers used in real time PCR analysis.

Additional file 11:

Polymorphic SSRs with variable number of tandem repeat loci in Sitaphal and NMK-1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark


  • Annona squamosa L. transcriptomics
  • Early-stage developing fruit
  • De novo transcriptome assembly
  • Simple sequence repeats
  • Web resource