MicroRNAs of Bombyx mori identified by Solexa sequencing
© Liu et al. 2010
Received: 10 March 2009
Accepted: 3 March 2010
Published: 3 March 2010
Skip to main content
© Liu et al. 2010
Received: 10 March 2009
Accepted: 3 March 2010
Published: 3 March 2010
MicroRNA (miRNA) and other small regulatory RNAs contribute to the modulation of a large number of cellular processes. We sequenced three small RNA libraries prepared from the whole body, and the anterior-middle and posterior silk glands of Bombyx mori, with a view to expanding the repertoire of silkworm miRNAs and exploring transcriptional differences in miRNAs between segments of the silk gland.
With the aid of large-scale Solexa sequencing technology, we validated 257 unique miRNA genes, including 202 novel and 55 previously reported genes, corresponding to 324 loci in the silkworm genome. Over 30 known silkworm miRNAs were further corrected in their sequence constitutes and length. A number of reads originated from the loop regions of the precursors of two previously reported miRNAs (bmo-miR-1920 and miR-1921). Interestingly, the majority of the newly identified miRNAs were silkworm-specific, 23 unique miRNAs were widely conserved from invertebrates to vertebrates, 13 unique miRNAs were limited to invertebrates, and 32 were confined to insects. We identified 24 closely positioned clusters and 45 paralogs of miRNAs in the silkworm genome. However, sequence tags showed that paralogs or clusters were not prerequisites for coordinated transcription and accumulation. The majority of silkworm-specific miRNAs were located in transposable elements, and displayed significant differences in abundance between the anterior-middle and posterior silk gland.
Conservative analysis revealed that miRNAs can serve as phylogenetic markers and function in evolutionary signaling. The newly identified miRNAs greatly enrich the repertoire of insect miRNAs, and provide insights into miRNA evolution, biogenesis, and expression in insects. The differential expression of miRNAs in the anterior-middle and posterior silk glands supports their involvement as new levels in the regulation of the silkworm silk gland.
Following their initial discovery in worms, an increasing number of 18-30 nt-sized small RNAs have been identified as crucial regulatory molecules in multicellular organisms, animal viruses, and unicellular organisms [1–7]. Identification of abundant miRNAs and other small regulatory RNAs in different organisms is critical in improving our understanding of genome organization, genome biology, and evolution . The silkworm, Bombx mori (B. mori), an important model organism used to investigate several fundamental biological phenomena (including development, gene regulation, and morphological innovation ), has been employed for silk production for about 5,000 years. The recently sequenced B. mori is the first lepidopteran insect genome that provides a resource for comparative genomics studies, facilitating our understanding of insect evolution . The latest miRNA database release (miRBase 14.0) presents 91 silkworm miRNAs and two so-called miRNA* sequences originating from the RNA hairpin arm opposite the annotated mature miRNA-containing arm [2, 11]. However, some of these miRNAs have been identified solely on the basis of sequence similarity to known orthologs, and have never been confirmed experimentally. Furthermore, the total number of silkworm miRNA genes is significantly lower than that in fruit fly (152) and human (701), and it is likely that further miRNAs remain to be discovered in the silkworm.
To extend the known repertoire of small regulatory RNAs expressed in the silkworm, we constructed and sequenced three small RNA libraries prepared from the whole body (WB) as well as the anterior-middle and posterior silk glands (AMSG and PSG) of day-3 fifth instar larvae. The silk gland of B. mori is differentiated into anterior, middle, and posterior sections [12, 13]. Expression of all sericin genes is limited to the anterior and middle parts of the middle silk gland [14, 15], whereas the fibroin genes are expressed exclusively in the posterior silk gland [16, 17]. Both sericin and fibroin genes are topologically and temporally regulated at the transcriptional level in a concerted manner during larval development [18, 19]. The spatial distribution of miRNAs may contribute to the mechanistic understanding of concerted silk protein synthesis. Each library was individually sequenced, and generated more than 5 million short reads, resulting in a total of 36 million reads, of which 1,819,103 were miRNA reads. The newly identified miRNAs significantly enhance our knowledge of insect miRNA species and provide insights into miRNA evolution, biogenesis, and expression in insects.
To date, only 94 miRNAs of Bombyx mori have been reported (Additional file 7) [24–30], and are available from latest miRBase database (Release 14.0) . Some of these miRNAs were identified based solely on sequence similarity to known orthologous miRNAs, and have never been confirmed experimentally (Additional file 7) [24, 26, 32]. Here, we present evidence to support the authenticity of 55 known miRNAs. The remaining 39 known miRNAs, including bmo-mir-124 and bmo-mir-1926, were not successfully sequenced, possibly because of low expression levels or stage-/tissue-specific transcription. In total, 10 conserved miRNAs were firstly identified in the silkworm, 54 conserved miRNAs were sequenced in all three RNA libraries, while 12 were detected in only one or two small RNA libraries (Figure 1A; Additional file 7). For example, bmo-miR-133, miR-137, and miR-932 were identified in the whole body, but not in the anterior-middle and posterior silk gland, opposite to the patterns shown by bmomiR-285, miR-929, and miR-9b. Sequence comparisons between silkworm miRNA candidates and other miRNAs present in miRBase (miRBase 14.0) revealed that 23 silkworm miRNAs (including the new homolog of dme-miR-33) are widely distributed in over 20 species from invertebrates to vertebrates, 13 (including the novel homolog of lgi-miR-1175) are evolutionarily conserved throughout invertebrates, 32 (including 7 new sequences) exist only in the Insecta, and 37 known miRNAs (bmo-mir-1920, mir-1921, mir-1923, mir-1926 and 34 latest members) are presently confined to B. mori (Additional file 7). All conserved silkworm miRNAs were classified into known families or currently undefined groups on the basis of sequence similarity (Additional file 8). Various families of these miRNAs may have evolved for purposes as diverse as the phyla in which they occur. The identification of conserved miRNAs as potential phylogenetic markers raises the possibility that miRNAs serve as rapid evolutionary signaling molecules.
In total, 209 unique miRNA and 87 star sequences were identified in this study (see Additional file 3). Sequence comparisons between the silkworm candidates and miRNAs of other organisms present in miRBase (miRBase 14.0) revealed that 11 novel silkworm miRNAs were orthologous to those identified in other species (Additional file 7) whereas most unique miRNAs displayed no evidence of evolutionary conservation, even within the Insecta, and thus appeared to be silkworm-specific (Additional file 3). The genome loci of all silkworm miRNA genes were characterized, as presented in Additional file 3 and Figure 1B. We identified 12 miRNA genes in the exons of protein-coding genes, including four conserved and eight silkworm-specific miRNAs. Four novel members, bmo-mir-2783, bmo-mir-2811, bmo-mir-2825, and bmo-mir-2829, were located within the introns and exons of single protein-coding genes. In total, 52 miRNAs, including 13 conserved and 39 silkworm-specific members were intronic, whereas 95 miRNA genes were widely distributed within intergenic regions (IG regions) of the silkworm genome. In contrast to conserved and known miRNAs, 190 silkworm-specific miRNA genes were located in predicted transposable elements, and were repeat-associated (Figure 1C, Additional file 3). Strikingly, these transposable elements are preferentially found outside protein-coding genes; only 30 of 181 unique transposable elements seem to be contained within predicted protein-coding genes. We identified 24 clusters consisting of 63 novel miRNA and 14 annotated genes. Clusters 3 and 4 were located in the second exon of the protein-coding genes, BGIBMGA010447 and BGIBMGA008683, respectively. Clusters 2, 7, 9, 16, 17, 18 were composed of intronic miRNA genes. Both clusters 6 and 12 consisted of intronic as well as intergenic miRNAs (Additional file 11), and are thus unlikely to be transcribed as polycistronic primary transcripts. Clusters 10, 11, 13-15 and 24 were located in intergenic regions, whereas five clusters (19 to 23) were derived from predicted transposable elements. We identified 45 groups of paralogs, 38 consisting of 110 silkworm-specific miRNA genes and 7 comprising 21 conserved miRNAs (Additional file 12). Paralogous miRNAs existed in 14 clusters, specifically, 2, 4, 7, 8, 13-15,, 18-24 (Additional file 11). These clusters possibly originated via a series of duplication and deletion events during silkworm evolution. The clustered paralogous miRNAs may have overlapping functions in regulating a similar set of genes, as reported for the three clusters, miR-106b~25, miR-106a~363, and miR-17~92 in mice .
A number of evolutionarily conserved miRNAs (let-7a, mir-1, and bantam) were among the most abundant miRNAs, as demonstrated previously [8, 34–37], whereas the majority of novel miRNAs (particularly the silkworm-specific components) were among the least abundant (Additional file 13, Figures 1C,1D), consistent with the correlation between evolutionary conservation of miRNAs and their expression levels [35, 38, 39]. However, some conserved miRNAs displayed very low read counts; these included bmo-mir-282, bmo-mir-307, and bmo-mir-252. Some silkworm-specific miRNAs (bmo-mir-2755, bmo-mir-2756, bmo-mir-2758, and bmo-mir-2766) exhibited very high sequence reads, comparable to those of the abundant conserved miRNAs (Additional file 13). Nevertheless, momentary and local read counts cannot represent transcriptional levels during the entire life-cycle of the silkworm. For example, we could not identify bmo-mir-124 in the present study, but this miRNA was confirmed to be strongly expressed in the embryo and early larval stages of the silkworm . Only 26 silkworm-specific miRNAs were common to the three libraries. In addition, 7 were shared by the whole body and the anterior-middle silk gland, 3 by the whole body and the posterior silk gland, 24 by the anterior-middle and posterior silk gland, 19 were in the whole body only, 94 were in the anterior-middle silk gland only, and 21 were identified only in the posterior silk gland (Additional file 13, Figure 1A). These data support the theory of several levels of complexity in miRNA processing and regulation in response to specific physiological circumstances.
The present results provide experimental evidence supporting the authenticity of 257 unique miRNA genes in the silkworm, including 202 novel and 55 known ones (Additional file 3). Analysis of the evolutionary conservation of all silkworm miRNAs revealed that only 23 were widely conserved from invertebrates to vertebrates, 13 were limited to invertebrates, and 32 had homologs in other insects (Additional file 7), whereas the majority of miRNAs were specific to the silkworm. Nearly 430 of the 447 newly identified chicken miRNAs were also likely to be exclusively expressed in chicken lineages . These data indicate that most of the newly identified miRNAs are present in only a small group of organisms, and in some cases, in a single species [8, 35–37]. Species-specific miRNAs may be large in number and evolutionarily dynamic as a result of gene duplication, sequence divergence and gene loss. A gray pawn hypothesis has been proposed for the species-specific chemoreceptor gene families in Caenorhabditis species ; this hypothesis suggests that individual genes are of little significance, but the aggregate activities of a large number of diverse genomic loci are required to establish a large phenotype space. The evolutionarily divergent miRNAs may also contribute to establishing and maintaining phenotypic diversity between different groups of organisms [41, 42], and highly specific miRNAs have a specialized function in particular organisms, possibly involving regulation of lineage-specific pathways . Furthermore, over 60% of the matched sequence tag fraction was attributed to unannotated small RNAs. The substantial proportion of unclassified small RNAs identified in this study may represent other classes of small regulatory RNAs in the silkworm that have not been covered in our analyses. Some of these may include rare miRNAs represented by very low sequence reads, thus not passing our filtering criteria. These candidates should be explored in future studies applying the deep sequencing approach to developmental stages, tissues, or cells.
Our sequence tag analysis led to the identification of miRNA and miRNA* sequences for 42 previously annotated miRNA genes and unilateral sequence tags for 11 known miRNA genes (Additional file 9). The total reads of 9 miRstar sequences (miR-10*, miR-276*, miR-281*, miR-282*, miR-2a-1*, miR-965*, miR-993a*, miR-993b*, and miR-9b*) were heavily skewed toward the RNA hairpin arm containing annotated miRNAs in at least one library (Additional file 9, names in green). Furthermore, we observed that bmo-miR-iab-4-5p levels exceeded those of bmo-miR-iab-4-3p in all three libraries (Additional file 9), thus revealing strand bias, even for the twin miRNAs. The reversal in the ratios of 5'- and 3'-derived sequence tags indicates preferential use of mature miRNAs originating from different arms of pre-miRNA precursors, and suggests additional levels of complexity in miRNA processing, which remain incompletely understood . These findings are evidently inconsistent with the current knowledge of miRNA biogenesis and strand selection. The miRNA* strand is probably degraded rapidly on exclusion from the RNA-induced silencing complex (RISC), as the recovery rate of miRNAs* from endogenous tissues is 100-fold lower than that of miRNAs [43, 44]. Consequently, in many cases, miRNA* cannot be detected using conventional methods, because of rapid turnover . However, in the silkworm, a number of miRNA genes (mir-2a-1, mir-2b, mir-10, and mir-282) exhibited a similar number of sequence reads originating from the 5' and 3' arms of the miRNA hairpin precursors (5p and 3p, respectively) (Additional file 9 *). The equivalent expression rates of miRNA and miRNA* largely result from similar 5' end stability that leads to equal incorporation of either strand into the RISC and protection from degradation . In some cases, sequence tags originate from the terminal loop region of the pre-miRNA precursor (Additional file 10). Such examples have also been reported in other studies , and are explained as genuine products of pre-miRNA processing or random degradation products of unprocessed pre-miRNA .
We have sequenced miRNAs in the whole body and silk gland of the silkworm. Our data confirm the authenticity of 257 miRNA genes in the silkworm, including 202 novel and 55 known miRNAs. Conservative analyses imply that miRNAs act as phyla markers in evolutionary signaling. The silkworm-specific miRNAs were significantly different between the anterior-middle and posterior silk glands. Target predictions revealed that the sense/antisense miRNAs target the 3' and 5' UTRs of their host genes. Identification of novel miRNAs resulted in significant enrichment of the repertoire of insect miRNAs and provided insights into miRNA evolution, biogenesis, and expression in insects.
Domesticated silkworm (Bombyx mori) strain DaZao, was reared at 25°C on a natural diet of clean mulberry leaves. Silk glands were manually dissected from day-3 fifth larvae in 0.9% NaCl, rinsed with DEPC-treated water, and promptly immersed in liquid nitrogen.
Silk glands and whole bodies of day-3 fifth instar larvae were collected for RNA isolation. Following purification, total RNA samples were immediately preserved in ethanol and stored at -80°C until further use. For deep sequencing, the small RNA samples were prepared as follows: total RNA of each sample was size-fractionated on a 15% PAGE gel, and a 16-30 nt fraction was collected. The 5' RNA adapter (5'-GUUCAGAGUUCUACAGUCCGACGAUC-3') was ligated to the RNA pool with T4 RNA ligase. Ligated RNA was size-fractionated on a 15% agarose gel, and a 40-60 nt fraction excised. The 3'RNA adapter (5'-pUCGUAUGCCGUCUUCUGCUUGidT-3'; p, phosphate; idT, inverted deoxythymidine) was subsequently ligated to precipitated RNA using T4 RNA ligase. Ligated RNA was size-fractionated on a 10% agarose gel, and the 70-90 nt fraction (small RNA + adaptors) excised. Small RNAs ligated with adaptors were subjected to RT-PCR (Superscript II reverse transcriptase, 15 cycles of amplification) to produce sequencing libraries. PCR products were purified and small RNA libraries were sequenced using Solexa, a massively parallel sequencing technology.
Small 35 nt RNA reads were produced using an Illumina 1G Genome Analyzer at BGI-Shenzhen. Low quality reads were trimmed with our own perl script. Adaptor sequences were accurately clipped with the aid of a dynamic programming algorithm. After elimination of redundancy, sequences ≥ 18 nt were mapped to the silkworm Build2 genome using SOAP v1.11 . Sequences that perfectly matched the genome along their entire length were considered for subsequent analyses. Genome sequences and annotations of the silkworm Build2 genome were downloaded from SilkDB . Sequences matching silkworm rRNA, tRNA, snRNA and snoRNA deposited at the NCBI GenBank database or overlapping with rRNA and tRNA annotations of the Build2 genome were discarded. Repeat overlapping sequences were annotated as repeat-associated small RNAs. The majority of sequences overlapping with predicted exons were excluded from further analysis in view of the possibility that they were derived from messenger RNAs, and only those with precursors presenting p-values below 0.01 were collected.
After loading small RNAs and mapping information, small RNAs were sorted according to their position on the reference genome. Each read start position was examined to establish its similarity to a Drosha/Dicer processing site. For each candidate site, the mature sequence was extended to obtain two possible pre-miRNAs. One sequence encompassed 10 nt upstream and 70 nt downstream, and the other included 70 upstream and 10 downstream of the respective miRNAs. The two possible pre-miRNAs were evaluated using Mfold to determine ability to form characteristic hairpin structures . A well-designed computational filter was employed to identify miRNA-like hairpins . RNAs displaying folding energy values ≤ -25 kcal/mol were subjected to further analysis. Both miRNA and miRNA* had to reside in different arms of a hairpin structure, each with no more than 6 unpaired bases. We ensured that the maximum bulge over the miRNA/miRNA* duplex was not more than 4 bases, and asymmetry of the miRNA/miRNA* duplex equal to or less than 3. In addition to the above requirements, sequencing of both miRNA and miRNA* required that the miRNA/miRNA* duplex had 3' overhangs at both ends, a typical feature of Drosha and Dicer processing. Candidates corresponding to known miRNAs deposited at the miRBase 14.0  and supported by two reads of mature sequences were considered real miRNA genes.
The criteria used to identify novel miRNAs from sequencing data of the three small RNA libraries were as follows: (1) genomic loci annotated as known silkworm miRNAs or other classes of noncoding RNA were excluded; (2) an individual locus had to be supported by a minimum of five independent sequence reads originating from at least one small RNA library to be considered for further analysis; (3) loci lacking hairpin-like RNA secondary structures, including positions of the small RNA tags, were eliminated. The resulting set of sequences and their respective RNA structures were further analyzed to distinguish genuine miRNA precursors from other RNAs containing similar RNA structures. All involved target sites were predicted using RNAhybrid2.2  and filtered with MirTif . The SVM score value was higher than 1.0.
anterior-middle silk gland
posterior silk gland
RNA-induced silencing complex
support vector machine
We thank Sam Griffiths-Jones for helpful discussion and naming all newly identified miRNAs. This work was supported by grants from the National Basic Research Program of China (2005CB121000), the Program for Changjiang Scholars and Innovative Research Team in University (IRT0750), and the Doctor Foundation (SWUB2008063) and the Doctoral Innovation Fund (b2007002) of Southwest University.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.