Transcriptome analysis of the almond moth, Cadra cautella, female abdominal tissues and identification of reproduction control genes

Background The almond moth, Cadra cautella is a destructive pest of stored food commodities including dates that causes severe economic losses for the farming community worldwide. To date, no genetic information related to the molecular mechanism/strategies of its reproduction is available. Thus, transcriptome analysis of C. cautella female abdominal tissues was performed via next-generation sequencing (NGS) to recognize the genes responsible for reproduction. Results The NGS was performed with an Illumina Hiseq 2000 sequencer (Beijing Genomics Institute: BGI). From the transcriptome data, 9,804,804,120 nucleotides were generated and their assemblage resulted in 62,687 unigenes. The functional annotation analyses done by different databases, annotated, 27,836 unigenes in total. The transcriptome data of C. cautella female abdominal tissue was submitted to the National Center for Biotechnology Information (accession no: PRJNA484692). The transcriptome analysis yielded several genes responsible for C. cautella reproduction including six Vg gene transcripts. Among the six Vg gene transcripts, only one was highly expressed with 3234.95 FPKM value (fragments per kilobase per million mapped reads) that was much higher than that of the other five transcripts. Higher differences in the expression level of the six Vg transcripts were confirmed by running the RT-PCR using gene specific primers, where the expression was observed only in one transcript it was named as the CcVg. Conclusions This is the first study to explore C. cautella reproduction control genes and it might be supportive to explore the reproduction mechanism in this pest at the molecular level. The NGS based transcriptome pool is valuable to study the functional genomics and will support to design biotech-based management strategies for C. cautella.


Background
Date palm, Phoenix dactylifera is an important fruit tree of the Arabian Peninsula and temperate regions worldwide [1]. In hot dry regions globally, dates have a very important history and are considered one of the most important nutritional fruits. Dates can be consumed in many ways, such as eaten directly as fresh dates, eaten as dried dates, and also used in the preparation of date cookies, date paste, date syrup, and many other products. Additionally, dates have a very important medicinal value as they contain a rich source of minerals [2]. The presence of amino acids, flavonoids, steroids, antioxidants, anti-inflammatory, and anticancer elements in the flesh highlights the medicinal and nutritional importance of dates [3,4]. The by-products of dates are used for the production of organic acids, antibiotics, and fermented yeast. In the Gulf region, the populace prefer to consumes a certain quantity of dates [5].
Several devastating pests can infest date fruits causing great economic losses. These pests include the almond moth, Cadra cautella (Walker) (Lepidoptera: Pyralidae) and the sawtoothed grain beetle, Oryzaephilus surinamensis [1]. In the Middle East as well as in many other regions of the world, C. cautella is a destructive polyphagous storage pest of date fruits, cereals, dried fruits, ground nuts, and maize [6][7][8]. The life cycle of C. cautella is short with many generations per year and a single female can produce 213 and 422 eggs/female, when reared on artificial diet and "khodari" date fruits, respectively [7,[9][10][11][12].
The moth, C. cautella infests date fruits both in the field as well as in the warehouses and deteriorates the quantity and quality of dates, which leads to trade restrictions. Many countries enforce strict quarantine limitations, which bound the world trade in agricultural produce [13]. The control of C. cautella mostly depends on fumigation with methyl bromide and phosphine gas, which are effective and inexpensive and have been widely applied over the last few decades. However, recently the use of such control treatments have been questioned because the excessive use of these chemicals poses environmental concerns for human health as well as the phosphine resistance that has been reported in several stored product insect species [14][15][16]. In addition, methyl bromide, that was an efficient and cost effective fumigant; has been declared an ozone depleting chemical and has been phased out of production and use [17].
Several studies have reported on the basic ecological and biological characteristics of C. cautella [11,[18][19][20]. Therefore, there is an urgent need to develop environmentally friendly strategies to manage this serious pest. However, the molecular mechanism of its reproduction remains unknown. Over the last two decades, genomes of different insects have been sequenced. Genes related to reproduction, physiology, and sex pheromone biosynthesis and their receptors have been intensively studied for further analysis [21][22][23][24][25][26]. Thus, the objective of the present study was to identify the reproduction control genes through transcriptome data analysis especially the vitellogenin (Vg). Vg is the key component of egg yolk protein, synthesized extra-ovarially in the fat body tissues, and transported to the developing oocytes where it is internalized in the egg by the VgR and serves as a nutrient source for the developing embryo. Vg and VgR have been reported at the genetic and molecular level in many insect species [21,22,[27][28][29][30][31].
The transcriptome is an entire set of transcripts in a cell, tissue, or organism. De novo transcriptome sequencing is a method of creating a transcriptome profile via the Illumina HiSeq 2000/2500 platform [32]. Nextgeneration sequencing (NGS), can extensively explore the structure and provide indication about functional role of a particular gene product in a given tissue without the aid of any reference genome [33,34]. The NGS is an analytical technique that sequences RNA molecules with a large number of reads [35][36][37]. Transcriptome analysis has been used to study fatal diseases in humans, plants, and other organisms [38][39][40]. Transcriptomes from many insect species have been sequenced such as the silkworm, Bombyx mori, red flour beetle, Tribolium castaneum, and oriental fruit fly, Bactrocera dorsalis [41][42][43].
Sequencing of C. cautella abdominal tissues transcriptome would clarify the reproduction strategies of at the molecular level.
To the best of our knowledge, the present study is the first to report on the transcriptome analysis of C. cautella abdominal tissues, provides evidence-based knowledge to facilitate the development of future eco-friendly management strategies for this pest.

Cadra cautella transcriptome sequencing and sequence assembly
A library of C. cautella adult female abdominal tissue was sequenced by the Illumina Hiseq 2000 system. The transcriptome generated raw reads, these reads were cleaned with the help of filter-fq software (version: internal filter_fq software of BGI). The de novo assembly detected 62,687 unigenes. The details of unigenes total length, average length, and N50 is presented in (Additional file 1: Table S1).

Structural and functional annotation of unigenes
For functional annotation analysis, we obtained 25,880,15,432,17,738,16,106,8828, 9494 unigenes, which annotated to the NR, NT, Swiss-Prot, KEGG, COG, and GO databases, respectively. The total annotated unigenes were 27,836 (Table 1). For protein coding region prediction analysis, the number of coding DNA sequence (CDS) that mapped to the protein database was 25,715, whereas the number of predicted CDS was 2719 (Additional file 3: Table S2). Among the unigenes, 6789, 2, 13, and 36 were annotated exclusively to the NR, COG, KEGG, and Swiss-Prot protein databases, respectively, with 1297 unigenes annotated using both the NR and KEGG databases. In addition, 42 unigenes were commonly annotated using the NR, COG, and KEGG databases whereas no unigenes were commonly annotated using the KEGG and COG protein databases. Furthermore, 8401 common elements were annotated in the NR, COG, KEGG, and Swiss-Prot databases (Fig. 1).
A total of 27,836 unigenes sequences shared some similarity to known genes from the National Center for Biotechnology Information (NCBI) database. The ranges in e-value and sequence similarity of the top hits in the NR database were comparable, with 49% (e-value of 0 to 60) and 28.5% (100-80%), respectively, of the sequences possessing homology (Fig. 2a, b). On a species basis, the highest proportion of matching sequences in the NR database were derived from Bombyx mori (45.59%), followed by Danaus plexippus (31%) (Fig. 2c).

Protein coding region prediction
Unigenes were aligned by BLASTX (e-value < 0.00001) to protein databases in the following order: NR, Swiss-Prot, KEGG, and COG. Proteins with the highest ranks in the BLAST results were taken to decide the coding region sequences of unigenes, and the coding region sequences were translated into amino sequences. Unigenes that could not be aligned to any database were scanned by ESTScan (Version = V3.0.2) to predict the protein coding region, which is very important to determine the sequence direction (5′ -> 3′). The number of CDS that mapped to the protein databases was 25,715, whereas the ESTScan predicted that the CDS would be 2719 unigenes. The total number of CDS obtained in the study was 28,434 (Additional file 3: Table S2). The prediction of the protein coding region is very important to determine the accurate functioning of a gene, because the DNA is a long molecule that carries genes and these genes contain introns and exons. The exons are the only segments of a gene that carries the code for protein formation. The protein-coding sequenc and distribution of ESTScan sequences from Cadra cautella female abdominal tissue transcriptome are presented in (Figs. 5 and 6).

Most highly abundant transcripts in the Cadra cautella female abdominal tissue
The transcripts that were most highly expressed in the C. cautella adult female abdominal tissues are presented in Table 2. The highly abundant transcripts were yolk polypeptide 2 and follicular epithelium yolk protein subunits with FPKM values of 19,538.56 and 6939.47, respectively. Moreover, apolipophorin III and Vg genes were also among the highly expressed transcripts in the

Identification of reproduction control genes from Cadra cautella female abdominal tissue
By means of BLASTX, almost 57 genes potentially responsible for C. cautella reproduction were identified from the transcriptome analysis of female abdominal tissue. The genes identified were Vg, VgR, and lipid carrier protein (apolipophorin), sulfur containing amino acids carrying proteins that enhance vitellogenesis (hexamerins) and egg shell protein (chorion). All of these genes were submitted to NCBI and their accession numbers obtained (see Table 3). The details regarding FPKM values, blast hit score, putative identification of the gene, and resemblance with closely related species are presented in Table 3. There were also the transcripts that  Table S3. It was very important to check how many of the Vg transcripts were functional in C. cautella. Therefore, the expression levels of all Vg transcripts were verified by RT-PCR using gene specific primers (Additional file 5: Table S4). The gene specific primers were designed based on the partial transcripts identified in the transcriptome assembly by using Primer3 software (http://bioinfo.ut.ee/primer3-0.4.0/). The amplified cDNA was sequenced and aligned by using (BioEdit Sequence Alignment Editor) with the 6 Vg transcripts, result showed that the amplified sequence was exactly similar with the partial sequence of CcVg transcript. It reflects that CcVg had a higher expression level (over 3000 times) than that of the other five Vgs transcripts, and it might be the primarily functional Vg gene in C. cautella (Fig. 7).

Discussion
The order Lepidoptera is one of the most important groups of insect pests, which cause severe losses to agricultural products worldwide. The majority of lepidopterans (approximately 90%) are moths, with their caterpillars in particular being notorious pests of agricultural produce. Approximately 70% of moths are linked to stored product infestations. The almond moth, C. cautella (Walker), is an economically important pest of dates [6,12,45]. Recent studies have focused on its biology and ecology, and have proposed several management strategies to control these pests, including use of botanical extracts [46], heat treatments [47], freezing effects [48], essential oil extract [49,50], and modified atmosphere [12,51]. However, due to a lack of genetic information nothing is known about the reproductive mechanism of this economically important pest. Thus, the objective of the present study was to isolate the reproduction control genes from C. cautella by deploying the NGS approach.  Illumina NGS sequencing of C. cautella resulted in 62,687 unigenes discovered, with 44.4% of these (27836) having remarkable homology to operating genes encoding precise proteins with BLASTX analysis in GenBank. The analysis of unigenes homology indicated that 45.59% of genes showed the highest resemblance with Bombyx mori followed by 31% similarity with D. plexippus. These results indicate that C. cautella has a closer relationship to Bombyx mori and D. plexippus then to other lepidopteran members [52]. Bombyx mori is an extremely significant model organism for insect biology, in particular, and other life sciences, in general. The species distribution of C. cautella unigenes was almost in accordance with the transcriptome analysis results of other lepidopteran species such as Galleria mellonella and Heliothis virescens [53,54].
In the present study, 8828 unigenes were annotated using the COG database. In COG analyses, the most frequently identified class was related to the general function prediction, followed by replication, recombination and repair, translation, ribosomal structure and biogenesis, function unknown, transcription, and post translational modification (Fig. 3). The general function prediction class (3636 unigenes, 41.18%) was the largest COG class, which was similar to the results of Shen, Dou [42] and Yan, Liu [55].
We surveyed our transcriptome data and identified several important enzymes and genes involved in reproduction. The Vg, VgR, lipophorin, lipophorin receptor, apolipophorins, doublesex, transmembrane protein, juvenile hormone esterase, ecdysone oxidase, rab5, and many others were identified (Table 3). In the present study, 57 genes encoding proteins vital for reproduction have been submitted to the NCBI genomic database and their accession numbers obtained ( Table 3).
The Vg gene play a major role in insect reproduction and proliferation. The specificity of Vg with sex, tissues, and stage has been reported in many insect species [30,56]. Vg gene expression in female fat body tissues and the evidence of Vg protein in adult female hemolymph and ovariole extracts have been reported in the American cockroach Periplaneta americana [21], madeira cockroach, Leucophae maderae [28], and oriental leafworm moth, Spodoptera litura [57]. It has been reported that different insect species have different numbers of Vg genes [58]. In some insects, there is one Vg gene, whereas others have two or multiple Vg genes. Multiple Vg genes have been described from numerous insect species including Aedes aegypti [59], brown winged green bug, Plautia stali [60], Periplaneta americana [21,27], and Leucophae maderae [28,29]. However, in the lepidopteran species till date only one Vg transcript has been reported which might yield different numbers of yolk polypeptides as identified in pyralid moths including C. cautella comprising two true vitellogenin subunits (+/−160KDa and 47KDa) [61]. Whereas, we reported only one functional vitellogenin transcript in the present data, because there are post transcriptional modifications and the Vg transcript cleaves into two subunits of different size and in this regards the two polypeptides of different size can bee observed. Previously we have cloned and characterized several vitellogenin and its receptors genes in different insect species and reported very clearly about the cleavage process of Vg transcript in insects. For detail plz. See [28,56,58].
To date, the complete Vg mRNA has been sequenced from 23 lepidopteran species; however, among these there is only one species, the rice moth Corcyra cephalonica that is associated with stored grain infestations. Thus, the addition of C. cautella Vg/VgR and other transcripts in the GenBank will strengthen the amount of available genomic data regarding reproductive physiology. Additionally, to date there is no report on the sequencing of the VgR from any moth species, which is also associated with stored grain infestations. The Vg protein is carried by the hemolymph to the ovaries where it is taken up by its counterpart, the VgR, and deposited in the developing oocyte. The VgR is an important carrier for the uptake of Vg into the developing oocytes of all oviparous species [58]. The VgRs of insects have large membrane bound proteins approximately 180-214 kDa in size [62]. The molecular characterization of insect VgRs has revealed that these receptors, regardless of their origin, are extremely conserved not only in their structure but also in terms of their regulation [21,63]. VgR plays a crucial role in insect reproduction but little is known about this receptor in insects compared to its ligand, the Vg.
The higher expression of some transcripts in the C. cautella adult female abdominal tissue has revealed the importance of these genes in the biological process, physiology, and reproduction of C. cautella. Vg and apolipophorin are very important for the nourishment of the developing embryo inside the egg. Yolk polypeptide 2, follicular epithelium yolk protein, and ribosomal proteins were among the most abundant transcripts, which play crucial roles in the reproduction of insects [64,65]. Similarly, lipid carrier protein (apolipophorins) and sulfur containing amino acids carrying proteins (hexamerins), might play a role to enhance the vitellogenesis, whereas, the egg shell protein (chorion), juvenile hormone, and ecdysteroids play crucial roles in insect metamorphosis and reproduction [66][67][68][69][70]. In the desert locust, the silencing of ecdysone receptor affected the choriogenesis and ovarion development. The effect was not only limited up to the disruption of oogenesis it also has affected the JH biosynthesis in corpora allata [71]. Insect development and reproduction are primarily associated with the fluctuating levels of Fig. 7 Confirmation of Cadra cautella Vg gene transcripts. Cadra cautella Vg gene transcripts identified by next-generation sequencing with reverse transcription polymerases chain reaction (RT-PCR). Agarose gel 1.2% was used to analyze the amplified PCR products. The CcVg and actin genes amplified products size are shown on the right. The amplified bands were visualized under ultra violet light and photographed using gel documentation BioDocAnalyze system (Biometra). M = molecular weight marker, bp = base pairs these genes [72,73]. Methoprene tolerant protein, works as a nuclear receptor for the JH functioning and plays a key role in the larval metamorphosis as well as vitellogenesis in adult females. In H. armigera, the knockdown of methoprene tolerant gene has adversely affected the larval development and adult female oogenesis [74].
The present study is the foremost distinctive study that has provided a wealth of genes related to molecular mechanism of reproduction in C. cautella, which is the key pest of stored grains and dates.

Conclusions
The warehouse moth, C. cautella, is a serious pest of dates, both in the field and under storage conditions. The present study provides comprehensive data on reproduction control genes including Vg that has vital importance and the genes expressed in the abdominal tissues related to different physiological functions such as juvenile hormone and ecdysone receptor. Results from the present study have greatly strengthened the genetic understanding of different life processes of this pest. The availability of a huge number of transcripts will provide a foundation for future studies. Although NGS data provided 6 CcVg partial transcripts, RT-PCR analysis together with high expression level identified in terms of FPKM values showed that there might be one functional Vg gene (CcVg) in C. cautella. Next efforts will be made to get full sequence of these genes, their characterization, expression analysis, and knockdown deploying RNAi technology. The sequencing/ characterization and silencing of reproduction control genes will elucidate the developmental strategies of C. cautella at the molecular level and, it could lead toward the development of an environmentally benign strategy for the management of this key pest.

Insect rearing
The C. cautella culture was maintained at the Economic Entomology Research Unit, Department of Plant Protection, College of Food and Agriculture Sciences, King Saud University, Riyadh, Saudi Arabia. The colony was maintained in an environmental chamber (Steridium, Australia) at 25 ± 2°C and 65 ± 5% relative humidity under a 15:9 (light/dark cycle) on a slightly modified artificial diet media developed by [75]. Wandering instar larvae were separated to pupate and female pupae were placed separately in the growth chamber under the same conditions as those for the tissue collection.

Tissue preparation for transcriptome analysis
One-day old virgin adult C. cautella females were selected for tissue preparation because from the previous studies, it is obvious that the expression of vitellogenin receptor and its ligand remains maximum in one-two days old females confirmed through semi quantitative and qRT-PCR results of several studies. In silk moth, the maximum expression of Vg was reported at the age of 24 h old female moth. Several other studies have also reported the same findings in lepidopteran species [28,31,57,66]. The female moths were hold gently, their wings were removed and the last 5-7 abdominal segments were cut out with micro scissors and placed directly into phosphate buffered saline (PBS; pH 8.0) solution for washing [28]. Tissues were washed for 3-4 min in the PBS solution, and then transferred into a 1 mL Eppendorf tube containing liquid nitrogen, and preserved at − 80°C until subsequent analysis.

RNA isolation and construction of cDNA library for transcriptome analysis
The abdominal tissues of one-day-old virgin female moths 800 mg in size were used for RNA extraction with Tri-RNA reagent (Favorgen Biotech CORP, Taiwan). The total RNA concentration, RNA integrity number, 28S/18S, and size of the RNA sample were determined using an Agilent 2100 bioanalyzer and Agilent RNA 6000 nano kit, and the purity of the sample was assessed using a nanodrop instrument. The concentration and total volume of the RNA samples were 378 ng/μL and 70 μL, respectively, and the RNA integrity number was 5.2. The integrity of RNA was confirmed by 1% agarose gel electrophoresis.
After the confirmation of RNA integrity, total RNA was used for cDNA synthesis. The total RNA sample was digested by DNaseІ (New England Biolab), purified by oligo-dT beads (Dynabeads mRNA purification kit, Invitrogen), and then poly (A)-containing mRNA were fragmented into 130 base pairs (bp) with the first-strand buffer. First-strand cDNA was generated by random hexamer primers (N6), first-strand master mix, and super script II reverse transcription (Invitrogen) (reaction conditions: 25°C for 10 min, 42°C for 50 min, and 70°C for 15 min).
For the second-strand cDNA synthesis, a second-strand master mix was added to the first-strand cDNA and the prepared mixture was incubated for 1 h at 16°C. AMPure XP magnetic beads were used to purify the double strand cDNA. The purified cDNA was subjected to the End-Repair mix to recover any damaged or incompatible ends, incubated for 30 min at 20°C, and then purified. The products were ligated with one another using a sequencing adapter and, after agarose gel electrophoresis, a suitable size range of fragments were selected for polymerase chain reaction (PCR) amplification with a PCR primer cocktail and PCR master mix at 20°C for 20 min. Finally, PCR products were purified using AMPure XP beads, the library was quantitated, and the qualified libraries were sequenced using the Illumina HiSeqTM 2000 system.

Illumina sequencing and de novo assembly
The library was quantitated and the qualified libraries were amplified on cBot to generate clusters on the flow cell (Tru-Seq PE Cluster Kit V3-cBot-HS Illumina), and the amplified flow cell was sequenced pair end on the HiSeq 2000 system (TruSeq SBS KIT-HS V3, Illumina). The sequences with a read length of 50 bp were sequenced with a paired end strategy (Additional file 2: Figure S1). Raw reads produced from the sequencing machine contain dirty reads composed of adopters, which are unknown or low-quality bases that have a negative effect on the bioinformatics analysis. Therefore, the raw reads produced from the sequencing data were cleaned by removing the reads with adopters and reads with unknown nucleotides larger than 5% with the help of filter-fq software (version internal fil-ter_fq software of BGI). Transcriptome de novo assembly was carried out with the short reads assembling program Trinity (version = release-20,130,225) [76]. Further, the TGICL (version = v2.1) and Phrap (version = Release 23.0) software were used for the downstream processing of large volumes RNA-sequence reads into unigenes.

Unigenes annotation and functional organization
In the final step, unigenes were aligned to the nucleotide database (NT) with the blastN and protein databases: nonredundant protein (NR), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and clusters of orthologous groups of protein (COG) via BLASTX with an e-value < 0.00001. The sequence alignment outcomes with greatest sequence resemblance were selected and annotated to unigenes. The unigenes that were unsuccessful in lining up the above mentioned databases were separated out with ESTScan software to decide sequence direction and detect coding region. Blast2GO software (version = v2.5.0) was used in NR annotation to obtain gene ontology (GO) annotation (i.e., biological process, molecular function, and cellular component) [77].
WEGO software was applied to deduce the functional classification of all annotated unigenes [78]. All unigenes were aligned with the COG database to classify and investigate their possible functions. Similarly, the KEGG pathway database was surveyed with the BLASTX program to predict the possible pathways where each of the unigenes were involved.
Validation Vg gene transcripts via reverse transcription (RT) PCR Six transcripts of the Vg gene with dissimilar fragments per kilobase of transcript per million mapped reads (FPKM) values were recognized from the C. cautella female abdominal tissue transcriptome. The 6 Vg transcripts were evaluated through RT-PCR with gene specific primers synthesized from the 6 transcripts they had identified in the transcriptome assembly. Actin gene primers, Cc-Act-F1 and Cc-Act-R1, were used as internal controls (Additional file 5: Table S4). For validation, a cDNA library was exposed to PCR with the Gene Amp PCR system 9700 thermo cycler (Applied Biosystems, Foster City, CA, USA), and the following PCR conditions were used: initial denaturation at 94°C for 1 min, followed by 32 cycles of denaturation at 94°C for 30 s, and annealing at 68°C for 3 min. The PCR-amplified products were run on 1.2% agarose gel, stained with ethidium bromide for 30 min, and visually observed under ultra violet light with the gel documentation system BioDocAnalyze (Biometra). The successful amplified samples were sent to BGI for sequencing.