MicroRNAs in the oriental fruit fly, Bactrocera dorsalis: extending Drosophilid miRNA conservation to the Tephritidae

Background The oriental fruit fly, Bactrocera dorsalis, is an important plant pest species in the family Tephritidae. It is a phytophagous species with broad host range, and while not established in the mainland United States, is a species of great concern for introduction. Despite the vast amount of information available from the closely related model organism Drosophila melanogaster, information at the genome and transcriptome level is still very limited for this species. Small RNAs act as regulatory molecules capable of determining transcript levels in the cells. The most studied small RNAs are micro RNAs, which may impact as much as 30 % of all protein coding genes in animals. Results We have sequenced small RNAs (sRNAs) from the Tephritid fruit fly, B. dorsalis (oriental fruit fly), specifically sRNAs corresponding to the 17 to 28 nucleotides long fraction of total RNA. Sequencing yielded more than 16 million reads in total. Seventy five miRNAs orthologous to known miRNAs were identified, as well as five additional novel miRNAs that might be specific to the genera, or to the Tephritid family. We constructed a gene expression profile for the identified miRNAs, and used comparative analysis with D. melanogaster to support our expression data. In addition, several miRNA clusters were identified in the genome that show conservancy with D. melanogaster. Potential targets for the identified miRNAs were also searched. Conclusions The data presented here adds to our growing pool of information concerning the genome structure and characteristics of true fruit flies. It provides a basis for comparative studies with other Dipteran and within Tephritid species, and can be used for applied research such as in the development of new control strategies based on gene silencing and transgenesis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1835-3) contains supplementary material, which is available to authorized users.


Background
True fruit flies (Diptera: Tephritidae), constitute a family of mostly phytophagous species, many of which are considered to be serious pests of numerous plants. Within this family, species in the highly invasive genera Anastrepha, Bactrocera, Ceratitis and Rhagoletis are of worldwide economic importance, restricted by quarantine listings in Europe [1], and subject to constant eradication and establishment prevention in the United States. Research on this family of flies has been heavily concentrated on field strategies and quarantine methods [2]. Despite the vast amount of information available from the closely related model organism Drosophila melanogaster, information at the genome and transcriptome level is still very limited in true fruit flies.
The regulation of gene expression in the cell is driven by fine-tuned molecular mechanisms that respond to developmental and environmental cues. Playing an important role in this system are the small RNAs (sRNA) which act as regulatory molecules capable of determining target mRNA expression levels [3,4]. Several types of sRNAs have been documented to date. Among the most studied are micro RNAs (miRNAs) and small interfering RNAs (siRNAs); both types present in most species. Classes of sRNAs and their biogenesis pathways have been extensively described in the literature for many model species [3][4][5][6]. MiRNAs originate in genomic loci and are very often expressed in a tissue-specific manner. A large proportion, probably more than 30 %, of all protein coding genes of animals may be regulated by miRNAs.
We have sequenced sRNAs from the Tephritid fruit fly, Bactrocera dorsalis (Oriental fruit fly), specifically sRNAs corresponding to the 17 to 28 nucleotides long fraction of total RNA, looking at variation in composition and expression between different developmental stages, and between male and female sexes in the pupa stage. We were able to identify several miRNAs orthologous to known miRNAs and additional novel miRNAs that might be specific to the genera or to the Tephritid family. We constructed a profile of gene expression for the identified miRNAs, and used comparative analysis with D. melanogaster to support our expression data, identify conserved miRNA clusters in the genome, and mine for potential transcript targets for this miRNAs. The data presented here adds to the biological information concerning the genome structure and characteristics of true fruit flies. It provides a basis for comparative studies in other Dipteran species, and can be used for applied research such as in the development of new control strategies based on gene silencing and transgenesis.

Fly sample collection
A white pupal translocated strain (DTWP) was used, in which female pupae are white and male pupae are brown, allowing for separation of sex in the pupal stage [7]. Flies were grown in liquid diet [8], (200 ml diet for approximately 3400 eggs) as previously described [9].
Three biological replications were carried out for sample collection, RNA extraction and sequencing. For each replicate, eggs were allowed to develop and samples collected at the following times and developmental stages: embryos (12 mg at 0-1 h after oviposition), young larvae (approximately 20 mg at 0-12 h after egg hatch), early male (brown) pupae (0-24 h after pre-pupal formation), and early female (white) pupae (0-24 h after pre-pupal formation). For embryos and larvae, samples were sieve washed and then blot dried. All samples were collected in 1.5 ml microcentrifuge tubes and flash frozen in liquid nitrogen immediately after collection. Samples were then stored at −80°C until processed. For fertilized ovary collection, 7 day old females and males were left in a cage to mate. Thirty seven mating pairs were separated in cups and left for at least 90 min to make sure that the females were fertilized. Non-mating flies were removed and mating pairs were left until the next day. Mated females were sedated by exposing them to 4°C for 10 min, and the ovaries were dissected. Ovaries from ten female flies were collected per replication in 1.5 ml microcentrifuge tubes and flash frozen in liquid nitrogen. Samples were preserved at −80°C until RNA extraction.

RNA extraction
RNA from each of the collected samples was extracted utilizing NucleoSpin® miRNA kit (Macherey-Nagel, Duren, Germany) following manufacturer's protocol and recommendations. Initial tissue lysis was performed by grinding the frozen tissue in 1.5 ml tubes with plastic micro pestles, followed by addition of 300ul of lysis buffer. The NucleoSpin miRNA kit allows for separation of small RNA and large RNA fractions in silica membrane columns by differential ethanol concentrations. After purification, the quality and quantity of both the small and large RNA fractions for each sample was determined using a Quibit 2.0 fluorometer (Life Technologies, Carlsbad, California), and an Agilent 2100 Bioanalyzer with anAgilent small RNA kit (Ambion, Santa Clara, CA, USA).

Library preparation and sequencing
To prepare small RNA sequencing libraries, the Ion Total RNA-seq kit v2 for small RNA libraries was used following manufacturer protocols with some modifications. The small RNA fraction of each sample was ligated to adapters and reverse transcribed to cDNA. The cDNA was purified, size selected and each sample was differentially barcoded before amplification to allow subsequent sample identification. Amplified cDNA was checked for quality and size distribution using an Agilent 2100 Bioanalyzer with the High Sensitivity DNA assay kit. To further size select the cDNA, a BluePippin® (Sage Science, Beverly, MA) was used with a 3 % agarose gel cassette, enabling the enrichment of cDNAs between approximately 92 and 118 bp, (corresponding to small RNAs of 18-27 nucleotides length with the addition of sequencing adaptors and barcodes). Afterward, equimolar amounts of samples (identifiable by differential barcodes) were pooled, and the pooled library was quantified using a KAPA library quantification kit (KAPA Biosystems Woburn, MA) to assess the optimal amount for emulsion PCR. Emulsion PCR was performed on an Ion One Touch 2 System and the amplified beads were subjected to sequencing with the Ion Personal Genome Machine using an Ion PGM Sequencing 400 kit and Ion 318 Chip v2. To maximize reads per sample, sequencing runs were performed using 120 flows per run, followed by repeated sequencing of the library a total of 6 times across two initializations of the PGM instrument with the 400 kit. Post sequencing base calling, adapter trimming, and demultiplexing was performed using the Torrent Suite Software using default parameters for small RNA sequencing, and exported as fastq files.

Identification and classifications of small RNA sequences
The resulting 14 fastq files corresponding to a replicate sample from each life stage were entered separately in the mapper.pl module of the mirDeep2 package [10,11], using a config.txt file to track the original sample in the final results. The mapper.pl module discarded reads smaller than 17 nt (option -l 17), collapsed identical reads and performed counting. Additionally, reads were mapped to the B. dorsalis genome (GeneBank Accession # JFBF00000000.1) using bowtie [12] with the following stringency parameters: 0 mismatches allowed in the seed, seed length of 18 nt, up to two mismatches after the seed, discarding reads mapping more than 5 times to the genome, and reporting only the best alignment for each read. The pipeline was designed to predict highconfidence miRNAs, and while discarding mature sequences that map to too many loci compromised detection, it also prevented false positive predictions. The collapsed reads obtained from the mapper.pl module were input into the miRdeep2 core module (miR-deep2.pl) with no reference miRNAs from a closely related species supplied, in this manner all the potential miRNAs and precursors from the data could be obtained. In the miRDeep2 module, read mappings were used to excise putative miRNA precursors according to stack height and all sequenced reads were aligned to these potential precursors. Additionally, the secondary hairpin structure and its stability were predicted for the excised precursors utilizing RNAfold [13][14][15] and Ranfold [16], and a score was assigned to each precursor. Scores are used to select precursors with highest probability of being genuine (the program kept precursors with scores higher than −50). To find other non-miRNA small RNAs, mapped reads (from mapper.pl module output) were scanned against the covariance models of the Rfam 11.0 release [17], using Infernal 1.1.1 cmscan [18].
The list of potential mature miRNAs and hairpin precursors obtained from the miRDeep2 core module were subjected to nucleotide BLAST search against the Sanger miRBase mature.fa and hairpin.fa database respectively (database release 21, http://www.mirbase.org) [19][20][21], this was done with the aim of discriminating known miRNAs and iso-miRNAs from potential novel miRNAs. Nucleotide BLAST for mature sequences was performed utilizing the blastn program with the blastn-short task default parameters, except for word-size which was changed from the default of 7 to 16; in this manner, only nearly identical sequences were identified. The resulting list was further filtered to keep only perfect and nearperfect matches to known miRNAS (near perfect defined as having one mismatch or up to a 2 nucleotide length difference between the query and subject sequences). BLAST for hairpin precursors was run with standard blastn with a word size 16 and cut-off e-value 1e-7, and the result was filtered to remove instances where the potential precursor was more than 8 bases longer than the aligned stretch.

Differential expression of miRNAs
To assess the expression changes for the identified miR-NAS across the life stages tested, the raw counts for the identified miRNAs obtained from the MirDeep2 quantifier.pl module were input into the EdgeR [24,25], utilizing the pairwise exact-test modality to test each of the 11 possible comparisons on TMM (trimmed means of M-values) normalized counts. Correlations between miRNA expression in B. dorsalis and D. melanogaster were computed utilizing Spearman correlation (ρ). The method was chosen because we do not expect a perfect linear relationship between miRNA levels in both species due to developmental timing differences, Spearman correlation calculates coefficients based on rank. All figure generation and statistics were performed in R.

Sequencing results
After quality filtering, the six sequencing runs of the pooled libraries yielded more than 16 million reads in total. While most libraries had approximately one million reads or more, the "ovary replication 1" library had less than 100,000 usable reads, representing an outlier in our dataset. The "female pupa replication 2" sample failed to produce the necessary amount of cDNA in three attempts, and thus was not sequenced. Because of the stringent size-selection used in cDNA previous to sequencing of the libraries, only a small portion of the total reads (4.7 %) were removed at the 17 nt cutoff. The remaining reads (with more than 17 nt) were collapsed into identical reads in a library per library basis resulting in libraries with as low as 18 thousand unique reads (ovary replication 1), to libraries with more than 500 thousand unique reads (ovary replication 3). About 23 % of the unique reads across libraries mapped to at least one location in the B. dorsalis genome (Table 1).
Mapped reads were searched for homologs of structural RNA sequences in the Rfam database to assess the non-coding RNA composition of the dataset. Over 1 million reads matched at least one type of RNA, the most abundant being tRNAs (64 %) and miRNAs (22.5 %). The most frequent RNA types (number of unique sequences in a family type) were tRNAs (16 % of all unique sequences having a match in Rfam), miscellaneous cis-regulatory elements (14 %), miRNAs (14 %), and snoRNAs (11 %) ( Table 2). A relatively high number of tmRNAs (transfer-messenger RNA) and CRISPR-like sequences were found (8.3 and 7.3 % of unique sequences per library respectively), indicating the presence of bacteria in the sequenced tissues, however, the total amount of reads in these categories was relatively low: 3.3 and 2.9 % respectively. The full Rfam classification including frequency, abundance and diversity is detailed in Additional file 1: Table S1.

Identification of known and novel miRNA in the dataset
MirDeep2 identified 149 potential miRNAs among the mapped reads in the pooled library, and the software assigned a provisional identification (provisional ID) to each of them. In five instances, two or three different provisional IDs were assigned to the exact same sequence because they aligned to more than one precursor excised from different genome scaffolds or from different regions in the same scaffold. The output was further filtered for miRNAs with a miRDeep2 score below or equal to 2, as a visual examination of the predicted hairpin structures and "stacks" built with the reads revealed that these hairpins had low signal to noise ratio (number of miRNA vs miRNA* and other reads in the region), and unusual secondary structures that did not resemble typical metazoan miRNA precursors [26,27]. The remaining potential miRNA hairpins were screened to verify if they were contained within or overlapped predicted gene exons in the B. dorsalis genome. While the majority of the hairpin precursors were located outside of gene-coding regions and in introns, 13 of them were found either contained within an exon (10), contained a full exon (2), or overlapped an exon start (1). Whereas miRNAs are traditionally found in non-gene-coding-regions, we did not exclude these sequences from further analyses, as some more recently identified miRNAs in D. melanogaster were found within exonic regions, including 3'UTRs and coding sequences (CDSs) [28]. In addition, because the gene set being used is largely computationally predicted, some errors in gene structure may be present. The described filtering yielded 109 potential miRNAs, which were used for BLAST homology search against the miRBase mature miRNAs (mature.fa) and precursors (hairpin.fa) datasets for further validation. A total of 74 of the potential miRNAs had homology to either a hairpin precursor, a mature miRNA, or both in miRBase (Additional file 2: Table S2). For the remaining 35 sequences that did not match known sequences in the miRBase database, additional information was collected to prove or disprove their authenticity as novel miRNAs. For that purpose, and given that the fold-back precursors for these sequences were already detected in the genome, expression levels were used as a secondary supporting criterion [29]. One of these reads was present in all libraries, while one other miRNA was missing in only one of the larvae replications, but was consistently low in the other two replications for this stage, indicating possible low tissue or developmental stage expression. These two sequences were selected as putative novel miRNAs. Additionally, seven other sequences were present in all  Type  E01  E02  E03  L01  L02  L03  BP1  BP2  BP3  WP1  WP3  OV1  OV2  OV3   tRNA  788  927  317  1071  504  614  998  730  928  659  867  143  592  839   miRNA  433  901  515  653  782  542  912  551  599  588  660  280  604  748   sno-RNA  395  1018  918  535  577  355  291  186  176  157  189  23  928  1165   tmRNA  416  347  199  773  299  438  635  615  759  328  612  51  270  427   sRNA  248  593  496  321  383  221  238  137  163  108  146  17  606  778   CRISPR  209  584  524  272  372  198  224  136  143  115  128  20  652  815   rRNA  122  283  225  170  248  151  230  174  191  141  162  23  238  381   HACA-box  94  201  187  132  126  112  68  47  45  45  50  6  205  270   splicing  63  87  80  318  208  157  34  27  31  35  35  2  152  197   riboswitch  33  124  106  48  74  40  52  27  33  24  19  3  128  146   misc. cis-reg  507  1212  1157  581  756  424  403  250  276  198  264  18  1250  1619   others  163  446  393  282  316  200  213  108  128  103  107  12  489  640   TOTAL  3471  6723  5117  5156  4645  3452  4298  2988  3472  2501  3239  598  6114  8025 except male and female pupae libraries. Three of these seven had low read count and/or there were no reads detected for their corresponding miRNA*. For one other, visual examination of the hairpin revealed unusual secondary structure. These four sequences were eliminated, leaving three additional potential novel miRNAs for a total of five. The remaining 29 sequences were present in only one replicate of a single developmental stage, and were disregarded. The predominant sequence in all of the five selected cases was the miRNA relative to the miRNA*, and none of these were found in or overlapping exonic regions, further supporting these sequences as true novel miRNAs (Table 3 and Fig. 1).
In animals, miRNAs regulate transcript abundance by complimentary base pairing to the 3' UTR of the target RNA, with some exceptions [5]. Potential targets for the 109 identified miRNAs were screened within the set of 3'UTR regions extracted from our most current B. dorsalis genome assembly (NCBI Assembly# ASM78921v2) utilizing the miRANDA software. The number of targets identified per each miRNA ranged between 5 and 469, even with the stringent parameters used for target detection, giving a total of 6506 predicted miRNA-target pairs, where many mRNA were targeted by multiple miRNAs. (Additional file 3: Table S3).

Differential expression of B. dorsalis miRNAs
The relative differential abundance of miRNAs between life stages was calculated using the negative binomial distribution with edgeR [25,30]. In total, 47 of the 80 high confidence microRNAs identified showed differential expression (FDR corrected p-value <0.05) in at least one of the 10 possible pairwise comparisons between life stages. These significantly differentially regulated miR-NAs included all 11 novel miRNA sequences. The comparison between female and male pupae yielded no differentially regulated miRNAs, and the most pronounced changes were observed between ovaries and pupae, and between eggs and pupae (Additional file 4: Table S4).
Hierarchical clustering was used to identify patterns of expression across developmental stages. Some welldefined groups of miRNAs could be detected in a heatmap representation of the clustering (Fig. 2). The first group (Fig. 2, group 1) comprised miRNAs with increased expression during the developmental progression from embryo to larva, embryo to pupa and embryo to female reproductive tract. A subgroup of these showed little or no change between larvae to pupa and reduced expression in the transition between pupa to ovary, while a second subgroup had strong up-regulation in pupa compared to larva and little or no change between ovary and pupa. In this last subgroup, the three miRNAs with homology to the D. melanogaster polycistronic locus let-7-Complex (let-7-C) (i.e. mir-100, mir-125, and let-7) were present. This cluster of miRNAs is known to be involved in the timing of cellular development in D. melanogaster [31]. In particular, let-7 is known to promote the transition from larva to adult, and is required for remodeling the neuromusculature during metamorphosis. In conformance with our B. dorsalis data, let-7 in D. melanogaster is also expressed mainly at the pupal stage [32,33]. A second group in the cluster (Fig. 2, group 2) shows potential larva-specific miRNAs. These miRNAs increased in abundance in larva compared to embryo, but were markedly reduced in female and male pupa compared to larva. Although this group of miRNAs is mainly composed of miRNAs with homology to known miRNAs in D. melanogaster or other species, they have been found mainly in genome-wide screenings and their specific functions are unknown [34,35]. Finally, another welldefined cluster was formed, showing 11 miRNAs highly reduced in abundance in pupa compared to embryo and larva. These miRNAs showed increased abundance in ovary compared to pupa and all the other comparisons between larva, embryo and ovary showed little or no change in expression, indicating that these miRNAs are depleted in pupae and/or highly and equally expressed in embryo and ovary. Six out of the 11 novel B. dorsalis miRNAs belong to this group, as did miRNAs belonging to the D. melanogaster miR-309-6 cluster (i.e. mir-4, mir-5 and mir-286), (Fig. 2, group 3).
Reciprocal best BLAST hits between the hairpin precursors of the set of 109 initially identified B. dorsalis miRNAs, and those of the 256 known D. melanogaster miRNAs (as of miRBase relase#21), identified 68 presumed ortholog between both species, and these orthologus pairs were used to compare miRNA  [28,36] (GEO Accessions: GSM322208, GSM322219, GSM22245, GSM322338, GSM360256, GSM360257, GSM360260, GSM360262, GSM385744, GSM385748, GSM385821, and GSM385822). Data for both organisms was compared using counts per million reads (cpm). The Spearman correlation coefficients (ρ) between orthologous miRNA expression for all the life stages across both species ranged between 0.3 and 0.9, and all were highly significant, with p-values ranging between 0 and 0.01 (Additional file 5: Table S5 and Fig. 3). Strong correlation (ρ > 0.65) was observed between the B. dorsalis embryo library and the four embryo libraries from D. melanogaster, with the strongest correlation to the D. melanogaster 6 to 10 day embryo. This precise stage (6 to 10 days old embryo) in D. melanogaster is referred to as the 'phylotypic' stage, meaning that the most homologous developmental stage between different species occurs at this particular point [37]. Supporting this theory, expression of miRNAs in the B. dorsalis first instar library from [36], showed stronger correlation with this late D. melanogaster embryo than with first instar larva for this species. Pupae libraries from both species showed high correlation with other pupae libraries and larvae across the three studies, indicating that the population of miRNAs is more constant between these two stages. The ovary miRNA expression data from our B. dorsalis dataset correlated best with D. melanogaster eggs of up to 10 h. Correlation of the B. dorsalis ovary with D. melanogaster ovary somatic sheet cells was relatively poor (ρ = 0.5), this finding is not surprising since the B. dorsalis ovary library was composed of fertilized and fully developed ovaries, perhaps more reminiscent of just laid eggs, whereas the ovary somatic sheet cell (SSC) line represents only a fraction of the ovarioles. To investigate whether the trend of expression of a particular miRNA in B. dorsalis was similar to that of the corresponding D. melanogaster orthologs, the 37 significantly differentially regulated B. dorsalis miRNAs side-by-side with their D. melanogaster counterparts, and a trend line of counts per million across the developmental stages on a free scale was generated (Fig. 4).
Visually inspecting each of the miRNA trend plots, there were 12 miRNAs with nearly identical expression patterns, and an additional miRNA (mir-276a), with a pattern varying only in the libraries derived from ovary tissue, which were the libraries with lowest correlation between the species. Among these twelve miRNAs with identical expression pattern, mir-282 was highly expressed in pupae of both D. melanogaster and B. dorsalis studies. Fig. 1 Sequencing of the small RNA fraction of Bactrocera dorsalis at five different developmental stages identified five potential novel miRNAS. The precursor hairpin structures and counts of miRNA and miRNA* are shown. In the hairpin, the mature miRNA is colored red, the loop sequence is colored yellow, and the miRNA* sequence is colored purple Other studies have consistently found miR-282 to be expressed both in pupae and female adults of D. melanogaster [35,38], and this miRNA has been shown to regulate viability and production of eggs through the targeting of the nervous-specific adenylate cyclase in pupae during metamorphosis [39]. Mir-276a, had an equivalent trend in expression from embryo to pupa; however, the high expression of this miRNA in D. melanogaster ovary SSC was not observed in the fertilized B. dorsalis ovary library. This miRNA also had highest expression in pupae in both species, and is also implicated in neural development, specifically in olfactory response [40]. Mir-137, mir-981, mir-87, and mir-927 had analogous expression patterns across both species, with highest expression observed in larval tissues, although their absolute expression was low compared to other significantly differentially regulated miR-NAs. Even though these miRNAs were identified in genome-wide studies and computationally predicted in D. melanogaster [34,36], we could only find one report of Dme-mir-87, showing its expression in relation to hormonal signaling [41], where mir-87 was found poorly expressed in early larva (in agreement with our data for B. dorsalis and the genome-wide studies in D. melanogaster), but then highly expressed in pupae. Finally, of the four conserved genomic miRNA clusters identified in B. dorsalis, only the mir-309-6 cluster (dme-mir-4, dme-mir-6-3, dme-mir-5, dme-mir-286, and dme-mir-309) had an analogous expression pattern across life stages.

Genomic clusters of miRNAs in B. dorsalis
Clustered miRNA genes are fairly common in metazoan genomes. Approximately 40 % of described miRNA in nematodes, flies and mammals are localized in tandem clusters of two or more miRNAs less than 10 kb apart [27]. Ten genomic scaffolds with two or more of the 110 initially identified miRNAs were on the same strand, with a maximum distance of 3.5 kb between the most distant miRNAs in the largest cluster. An inspection of orthologs to these miRNA clusters in the D. melanogaster genome revealed that nine of these miRNA clusters Fig. 2 Hierarchical clustering of statistically differentially regulated B. dorsalis miRNAs. The clustering allowed for the classification of groups of miRNAs with increased expression during the developmental progression from embryo to larva, embryo to pupa and embryo to female reproductive tract (group 1). A group of potential larva-specific miRNAs (group 2), and a group of miRNA with very low expression in pupa (group 3). The values used for clustering were in the form of log2 ratios to give equal weight to up-regulated miRNAs and down-regulated miRNAs (e.g. MalePupa/Embryo means the log2 of the ratio of MalePupa counts per million over Embryo counts per million). Enclosed in rectangles are genes with orthology to two of the known D. melanogaster miRNA clusters are conserved in D. melanogaster (Table 4). The arrangement of miRNA genes in the cluster (order and direction) was the same in all B. dorsalis and D. melanogaster clusters except in cluster mir309-6, however, the B. dorsalis clusters appear to consistently span larger regions (of about 3 Kbp), compared to D. melanogaster (1kpb).
The D. melanogaster mir-309-6, was previously reported as Drosophilid-specific [42,43]. However, eight members of the mir-309-6 cluster were mapped in a region of 3.5 kb in Scaffold000054 from the B. dorsalis assembly. Although the general arrangement of the region differed significantly to that of D. melanogaster, a search of this sequence in the recently published B. cucurbitae (melon fly) genome assembly (NCBI accession GCA_000806345.1) revealed the same arrangement of miRNAs in B. dorsalis, except for the region where the two duplicated miRNAs were found. Moreover, comparison to the Mediterranean fruit fly (Ceratitis capitata) genome assembly (NCBI accession GCA_000347755.1), showed the presence of the same miRNAs in a different arrangement (Fig. 5).
The mir-309-6 cluster has been implicated in maternal transcript destabilization, the removal of transcripts maternally provided during oogenesis [44,45]. Consistent with this observation, the expression of the mir-309-6 cluster was highest at the start of zygotic transcription in the early embryo in D. melanogaster. Transcripts for this in D. melanogaster were found subsequently depleted, but mature miRNAs could still be detected in larval stages [46,47]. In our data for B. dorsalis, all miRNA members of this cluster were found to be very abundant in embryos, with counts per million reads comparable to those in D. melanogaster. While some miRNAs were still detected in the larval stage, no mature miRNAs were found in pupal samples, and a significant number of miRNAs were detected in fertilized ovary tissue (see next section). Taken together, this data indicates that this miRNA cluster is not specific to Drosophilids as previously believed, and that it has undergone extensive evolutionary divergence. Functionally, while the D. melanogaster mir-309-6 cluster was found to act as part of the zygotic machinery in the removal of maternal mRNAs, and is highly Fig. 3 Correlation between miRNA abundance in B. dorsalis and D. melanogaster across developmental stages. Two publicly available D. melanogaster datasets were used to compare with the present study. Circle size and color represent correlation strength (spearman rho-value). All correlations were significant at p-value <0.01 (see Additional file 5: Table S5) expressed in embryos, our B. dorsalis data indicates that the mir-309-6 cluster is functional before egg laying, either because zygotic transcription starts earlier than in D. melanogaster, or because these miRNAs are not only part of the zygotic machinery for maternal transcript decay, but also part of the maternal machinery, which plays the same role (removal of maternal mRNA) before the onset of zygotic transcription.

Conclusions
Deep sequencing of small RNAs has allowed the identification of miRNAs present at four different life stages of the Tephritid fruit fly Bactrocera dorsalis. Sixty-nine miRNAs homologs to miRNAs in other species were identified with high confidence, and sufficient evidence was gathered to identify eleven additional miRNAs that were not previously reported. The latter may include Fig. 4 Trend plots of expression of statistically differentially regulated miRNAs in D. melanogaster and B. dorsalis. Plots were draw on a free scale to compare trends. Red asterisk show trends that are similar in both species. Green boxes show known miRNA clusters in D. melanogaster that were also found in this study. ¥ Maximum count per million reads conserved miRNAs with relatively low expression, and/or miRNAs that have evolved independently and are specific to the Tephritid family. The three replications per library allowed for a robust differential expression analysis of the identified miRNAs and their classification into life stage specific groups; miRNAs falling in these categories could be considered of importance because they are likely involved during transitional stages in development. To complement the data, and to provide more biological insight, we attempted to provide a list of potential targets for the identified miRNAs. Given that in metazoans, perfect complementarity to only six nucleotides in the seed region of the small RNA and the target is sufficient to promote RNA silencing, like in Drosophila, the resulting list of candidate mRNA targets was very extensive [21,48], even with the stringent parameters we set for the miRANDA output. Although the dataset of potential targets was narrowed down to a short list of miRNAs for which RNAseq information was available, there is still need for wet lab experiments for confirmation of the targets.
Taking advantage of the vast resources and data available for the model species D. melanogaster, comparative analysis across conserved orthologous miRNAs were utilized to further support our findings. High correlation was identified between datasets at the level of abundance across developmental stages. Moreover, groups of miR-NAs that are physically clustered in genomic regions were also found to be conserved between both B. dorsalis and D. melanogaster. Although these miRNA clusters differed in genomic spacing between the two organisms, this difference was consistent for all the clusters, being B. dorsalis clusters arranged in regions up to three times longer than D. melanogaster miRNAs but with the same order of the miRNAs in the cluster. Only one of the miRNA clusters, the mir309-6 showed poor conservation including repeated miRNAs and a different arrangement. Although genome assembly errors were a possibility, the same cluster, with the same arrangement was found in the sequenced genomes of the two other Tephritids, namely B. cucurbitae, and C. capitata. With the data available for B. dorsalis, we hypothesize that this cluster, which has highly diverged from the D. melanogaster dme-mir-309-6 cluster, may also function in maternal transcript destabilization machinery as it does in Drosophilids, however, because it is also expressed in ovaries, it may not be specific for the zygotic machinery. Importantly, this cluster was previously reported as being specific for drosophilids, and we proved that is not, demonstrating that this dataset, and similar datasets from Tephritids can be used as comparative tools for flies and other insects, to draw more robust conclusions about evolutionary questions. Knowledge on miRNAs in B. dorsalis could help in developing novel pest control strategies, for example, miRNAs that are specific for egg and larval stages, likely involved in key pathways for developmental transitions, can be further characterized and utilized in miRNA mimics feeding and plant expression [48,49]. Because miRNAs are very important in controlling developmental states, miRNA mimics targeting female specific sex determination and development transcripts could be used to generate genetic sexing strains that can be utilized in Sterile Insect Technique (SIT). Finally, this dataset could be further explored to find other specific regulatory pathways of interest, and as an aid for functional characterization of genes. Fig. 5 Comparison of the mir309-6 cluster across Drosophilids and three Tephritid fruit flies. Sequencing of small RNAs from B. dorsalis, identified six members of this cluster previously believed to be specific for Drosophilids. The search was expanded to other two Teprhitid species for which genome information is available (Bactrocera cucurbitae, melon fly; and Ceratitis capitata, mediterranean fruit fly) yielding similar clusters. The arrangement and spanning length of the miRNAs in the cluster in the Tephritid cluster seem to significantly differ from that in Drosophilids