Burst expansion, distribution and diversification of MITEs in the silkworm genome
© Han et al. 2010
Received: 22 March 2010
Accepted: 27 September 2010
Published: 27 September 2010
Skip to main content
© Han et al. 2010
Received: 22 March 2010
Accepted: 27 September 2010
Published: 27 September 2010
Miniature inverted-repeat transposable elements (MITEs) are widespread in plants and animals. Although silkworm (Bombyx mori) has a large amount of and a variety of transposable elements, the genome-wide information of the silkworm MITEs is unknown.
We used structure-based and homology approaches to search for MITEs in the silkworm genome. We identified 17 MITE families with a total of 5785 members, accounting for ~0.4% of the genome. 7 of 17 MITE families are completely novel based on the nucleotide composition of target site duplication (TSD) and/or terminal inverted repeats (TIR). Silkworm MITEs were widely and nonrandom distributed in the genome. One family named BmMITE-2 might experience a recent burst expansion. Network and diversity analyses for each family revealed different diversification patterns of the silkworm MITEs, reflecting the signatures of genome-shocks that silkworm experienced. Most silkworm MITEs preferentially inserted into or near genes and BmMITE-11 that encodes a germline-restricted small RNA might silence its the closest genes in silkworm ovary through a small RNA pathway.
Silkworm harbors 17 MITE families. The silkworm MITEs preferred to reside in or near genes and one MITE might be involved in gene silence. Our results emphasize the exceptional role of MITEs in transcriptional regulation of genes and have general implications to understand interaction between MITEs and their host genome.
Since transposable elements (TEs) were first discovered in maize by McClintock , TEs have been a hotspot subject in genetics. As the advent of genomics era, it is now known that TEs constitute a significant component of eukaryotic genomes and that there are two major classes of TEs [2, 3]. Class I elements (retrotransposon) transpose by the mechanism of 'copy-and-paste' through a RNA intermediate, while class II elements (DNA transposons) by the mechanism of 'cut-and-paste' through a DNA intermediate.
In addition to transposition through a DNA intermediate rather than a RNA intermediate, DNA transposable element is distinguished by terminal inverted repeats (TIRs). With the action of transposase, DNA transposons usually excise from one site and reinsert elsewhere in the genome. Furthermore, DNA transposons can be also further classified into autonomous and non-autonomous. Of non-autonomous DNA transposons, a group of miniature inverted repeat transposable elements (MITEs) first discovered in maize were found to be widespread in various higher organisms [4–9]. MITEs have some common characteristics: short, terminal inverted repeats (TIR), target site duplication (TSD), high AT content, potential to form stable secondary structure, high number of copies in a genome [3, 9–11]. Importantly, recent genome-wide analyses revealed that MITEs insert preferentially into or near genes [5, 12, 13] and that several families of miRNAs in humans, Arabidopsis, rice and Solanaceae were derived from MITEs [14–16]. These results suggest that MITEs play important roles not only in genome evolution but also in transcriptional regulation of genes.
MITEs were originally discovered in plants and have two major superfamilies. They are Tourist -like and Stowaway -like on the basis of their similarity to two elements previously identified in maize and sorghum [3, 4, 10]. It is now clear that these two MITEs originated by internal deletion of the corresponding autonomous elements  and could transpose by other autonomous DNA elements [17, 18]. Virtually, all MITEs have been identified through computer-assisted database searches . As more and more genome sequences become available, therefore, several programs have been developed for MITE identification [8, 19, 20]. Strikingly, a recently developed program, known as MUST, has been verified to be very powerful and efficient for identification of MITEs . This facilitates discovery of more novel MITEs.
Silkworm, Bombyx mori, is a model insect for the order Lepidoptera and has importantly economic value for silk production and bioreactor. The draft genome sequences of silkworm were released by Mita et al.  and Xia et al. , respectively. Recently, a new assembly has been completed . Analyses of the silkworm genome sequence suggested that ~40% of the genome is composed of the known TEs . This number is only smaller than the TE proportion (47%) in Aedes aegypti genome that has the largest proportion of TEs in the insect genomes sequenced to date . DNA transposons (Tc1-mariner, Helitron, Harbinger, hAT, P and Piggybac) are only ~3% of the genome while most of the silkworm TEs are retrotransposons . Although two MITEs, Hoshidandy and Organdy, were previously discovered in silkworm  (also see GenBank accession no. AB455941), the genome-wide information about the silkworm MITEs remains unclear.
In this study, we scanned the new assembly of the silkworm genome sequence to identify MITEs by using a recently developed program, MUST and a strict filtering approach . The results indicated that the silkworm genome harbors 17 MITE families. Estimates of insertion date and diversity for each MITE family showed that the silkworm MITE families might have experienced burst expansions at different time points of evolution and exhibited various patterns of diversification. That MITEs preferentially insert into or near genes has been also confirmed in silkworm. In addition, we found evidence that a sRNA derived from a MITE might silence its host and neighbor genes.
Families and characteristics of silkworm MITEs
AT content (%)
Three families flanked respectively by TA and by TWA have no any similarities in TIR and intra-sequence; they may have independent origins but may be transposed by the same transposase. Four families flanked by NNNNNNNN have no any similarities in intra-sequence but have high similarities in TIR regions; they may derive from a common ancestor and may be transposed by the same transposase . Seven families with different TSD, TIR and intra-sequences may have different origins and may be transposed by different transposases.
Next, a homology search was used to estimate number of copies for each MITE family in the silkworm genome. That is, a BLASTN (E < e-5) search  was used against the silkworm genome sequence. With this approach, we identified 5785 MITEs in total, which constitute ~1.86 Mb (0.4%) of the silkworm genome. Then, each MITE was classified as intact (full length) or fragmentary: individual with both complete TIRs was regarded as intact and otherwise as fragmentary. As a result, 3337 MITEs are intact whereas 2448 MITEs are fragmentary. As mentioned above, 5785 MITEs belong to 17 families. However, family size varies greatly among MITEs, which ranges from 9 to 2790 (Table 1). The information about insertion sites of the MITEs into scaffold was shown in Additional file 2. The largest family is BmMITE-2 with 2790 copies. The copies of both BmMITE-2 and BmMITE-8 occupy approximately 74% of all MITEs. The ratio of full length to all copies of each family ranges from 10% to 100%. Four MITE families (BmMITE-1, BmMITE-4, BmMITE-6 and BmMITE-8) have the ratios less than 50% and all others more than 50%. Especially, BmMITT-2, BmMITE-11, BmMITE-12, and BmMITE-15 have 85%, 94%, 86% and 100% of high ratios, respectively, suggesting that they may be very young and active recently. It should be noted that BmMITE-2 has the largest number (2371) of full length MITEs whereas BmMITE-8 has the largest number (1364) of fragmentary MITEs among the families, suggesting that the former might experience a recent burst expansion whereas the latter might undergo an old burst expansion.
The average AT content for each MITE family ranges from 44% to 73%. Only BmMITE-4 has less than 50% AT content and all others more than 50%, suggesting that the silkworm MITEs are AT rich. The average AT content of the silkworm genome is approximately 62% . Eight families BmMITE-1, BmMITE-2, BmMITE-8, BmMITE-9, BmMITE-10, BmMITE-11, BmMITE-12 and BmMITE-15 have AT content more than the average of the silkworm genome.
The identified silkworm MITE families were annotated based on BLAST search. Using a representative member of each MITE family as query, we searched ISfinder, RepBase and NCBI nr databases, respectively. These three databases almost include all known MITEs and transposable elements. We found that BmMITE-2 perfectly matched to one silkworm MITE named Hoshidandy (No. AB455941) in NCBI nr database and BmMITE-7 perfectly matched to Organdy identified previously in silkworm  (Table 1). Another 15 MITE families have no match to any known MITEs or transposable elements. This suggested that the program MUST used in this study is feasible and efficient for identification of MITEs because 2 known families of the silkworm MITEs were all recovered and 15 novel families were discovered by MUST .
Four predicted insertion sites of BmMITE-2 were selected for verification by PCR in 14 silkworm accessions representing four main geographic strains (Chinese, Japanese, European, and Tropical). Primers were designed based on flanking regions of each insertion site (additional file 4). The PCR results for the four insertion sites were summarized in additional file 5. Furthermore, sequencing of the corresponding PCR products confirmed the presence or absence of BmMITE-2 (additional file 6). The polymorphic distribution of indel for all four tested insertion sites suggested that they are not fixed residents of the silkworm genome and validated the prediction by the program MUST .
Pattern of regional variation for each family provides details of divergence between copies. We performed diversity (pi) sliding window analysis for the alignment of full length sequence of each MITE family. Window size is 20 nucleotides and step size is 20 nucleotides . The regions with diversity equal or higher than average sequence diversity (pi) +2SD was defined as variable whereas those with diversity less than average sequence diversity (pi)-2SD were considered as conserved regions (additional file 7). It was found that regional variation pattern varies greatly among these 17 silkworm MITE families. Almost all families have highly conserved TIRs, however, 6 families (BmMITE-4, BmMITE-5, BmMITE-9, BmMITE-12, BmMITE-14, and BmMITE-15) have only one conserved TIR. Because TIRs must be first recognized by corresponding transposase for transposition, it is not surprising to see more conserved in TIR than intra-sequence.
We recorded information about position of each MITE on a chromosome to look at distribution of MITEs in the genome. The results suggested that all MITE families are widely distributed on all 28 silkworm chromosomes. Then, we examined whether MITEs were randomly distributed among 28 silkworm chromosomes using χ2 test. The null hypothesis was rejected (P < 0.01), suggesting that the distribution of MITEs is nonrandom in the silkworm genome (additional file 8).
Characteristics of insertion sites of the silkworm MITEs.
No. analyzed copies
5'-flank < 5 kb 1
3'-flank < 5 kb 1
To investigate whether silkworm MITEs preferentially insert in a special gene family, we annotated all the closest genes using WEGO, an online gene ontology website, and compared the closest genes with all silkworm genes (additional file 10). The results indicated that functionalities of the closest genes were randomly selected.
To determine whether BmMITE-11 can regulate expressions of the closest genes through the small RNA, we examined expression profiles of the closest genes using the available microarray data . First, we found the six closest genes of BmMITE-11: BGIBMGA009213, BGIBMGA001511, BGIBMGA013723, BGIBMGA013305, and BGIBMGA014223 in SilkDB http://silkworm.swu.edu.cn/silkdb/. With information of these six genes, corresponding expression profiles were extracted from the microarray data. Because BGIBMGA002049 has no information in microarray data, so it was not included in the study. Figure 5B showed that the five closest genes have different expression patterns. However, they have a common feature that all the five genes were not expressed in the ovary and testis. There are 10393 active genes in the silkworm microarray data. 8431 of 10393 genes were expressed in ovary or testis . Thus, the probability of 5 random selected genes that were not expressed in ovary or testis is 0.000241 (P = (1- 8431/10393)5 = 0.000241). Chance cannot explain the observed expression pattern. Therefore, BmMITE-11 might self-silence germline-specifically by small RNA. As a result, BmMITE-11 might silence these five closest genes in silkworm ovary through small RNA pathway.
In this study, we carried out a systematic and genome-wide analysis to search for MITEs in silkworm using a novel computational approach, MUST . First, 143333 MITEs were predicted and were grouped into 1350 families. Having further examined these predicted MITEs, however, we found that a large number of MITEs were pseudo-MITEs. After removing pseudo-MITEs by a strict filtering process, we identified only 17 silkworm MITE families. Of 17 families, Hoshidandy and Organdy have been reported previously . However, 7 of 17 families are novel according to the classification of nucleotide composition of TSD. This suggests that the procedure for MITE identification used in this study is reliable.
Previous studies identified various MITEs in other insects. For example, eight MITE families were found in mosquito . Our results indicated that silkworm harbors 17 MITE families and 7 of which are completely novel. Furthermore, the silkworm MITEs show huge diversification in both TSD composition and full length sequence (Table 1). However, all families have MITE-specific features: high number of copies, high AT content and potential to form a stable secondary structure. This also suggested validity of MUST in predicting the silkworm MITE. In addition, the verification of predicted MITEs by PCR and sequencing of the PCR products showed existence of MITEs.
It should be pointed out that a large number of candidate MITEs were predicted by MUST at first. However, it was found that most of these candidates were false positive after filtering out by a strict process. Probably, a high rate of false positive in MITE identification by MUST is due to incompleteness of the silkworm genome sequence, that is, there are many gaps to be sequenced in the genome. Again, that silkworm has a large proportion of repetitive sequences may affect correct identification of MITEs. In addition, a high sensitivity of MUST in identifying MITEs may be a reason. It can be known from considerations above that 17 MITE families contained in silkworm should be a conserved number. More silkworm MITEs are to be identified in future.
The ages of 17 MITE families vary greatly (Figure 3), ranging from 0 to 4 mya ago. Strikingly, BmMITE-2 might be dramatically amplified during a period from 0 mya to 1 mya (Figure 3B). A similar result has been reported that mPing underwent a dramatic amplification during rice domestication . It is interesting to determine whether silkworm MITEs experienced recent expansions is due to bottleneck of silkworm domestication in future.
It should also be noted that the method used to estimate time of insertion of MITE copies in this study has an implicit assumption that all full sequences evolve neutrally and rate-equally . However, this may not be true: Not all sequences are under neutral evolution; there may be differential rate of substitution among nucleotides as some regions may be important for transposition and there may be different rate of change in different genomic regions. All these will affect the accuracy of estimate time of MITE insertion. To solve these problems, a sophisticated estimation method needs to be established in future. Nevertheless, the results of MITE insertion time presented in this study should provide some useful information about the evolutionary dynamics of the silkworm MITEs.
Our diversity analyses suggested that members of each MITE family have a high level of similarity and similar sequence lengths. This shows that they can be considered as a population that experienced several successive steps of amplifications from a handful of master copies . Right after amplification, copies are almost identical but will diverge in sequence and length over time due to random mutation. Therefore, a large number of identical or very similar copies within a MITE family imply a recent burst. Our network analyses revealed that 9 MITE families (BmMITE-1-3, BmMITE-5, BmMITE-8-10, BmMITE-12, and BmMITE-13) might experience several successive recent expansions (Figure 4). If these recent expansions were associated with the demographic events that silkworm experienced such as domestication bottleneck, then network topologies of above 9 MITE families can be explained by the genome-shock theory proposed by McClintock . Thus, it is interesting to investigate dynamics of these silkworm MITE families during domestication in future.
Our observations suggested that the silkworm MITEs are widely distributed in the genome. However, they are not randomly distributed on chromosomes (additional file 8). Since the silkworm MITEs preferentially insert into gene regions like MITEs in other higher organisms, a nonrandom distribution of the silkworm MITEs may be due to different gene densities on chromosomes. However, we did not find the significant correlation between the densities of genes and MITEs among the silkworm chromosomes (R 2 = 0.055, P > 0.05, df = 26). Thus, the causes of nonrandom distribution of the silkworm MITEs on chromosomes are to be explored.
We observed that the silkworm MITEs in introns, 3'flank and 5'flank are in total much more than in exons (Table 2). It should be noted that the silkworm MITEs inserted into introns and 3' flanking regions significantly more than 5' flanking regions (χ2 test: P < 0.01, respectively). These observations could be explained by two reasons: MITE insertions in exons were rapidly purged out from a population because they are deleterious ; because many MITEs have been found to contain poly (A) signal , these MITEs are likely to be maintained in 3' flanking regions to act in regulation. Since introns have been suggested to harbor regulatory elements , much more silkworm MITE insertions into introns imply that the MITEs may play important roles in gene expression by changing regulatory motifs. In addition, that MITEs preferentially insert into gene regions provides a material basis for establishment of a TE-derived genetic regulatory network  and this is also an interesting topic to be studied in future.
A recent study tried to relate MITEs to biogenesis of their siRNAs in Solanaceae . However, evidence for functional implications of MITEs in gene regulation through small RNA pathways is still lacking. Most importantly, we found a BmMITE-11 from which a silkworm ovarian small RNA (RNA-36850: No. AB423040) derived (Figure 5A). The expression data further suggested that the closest genes of BmMITE-11 were germline-specifically silenced (Figure 5B). Because this small RNA (RNA-36850) belongs to a new class of germline-restricted small RNAs (26-33 bp) called piRNAs [30, 31], our result may be the first evidence for effects of a MITE on its neighbor genes in transcriptional regulation through the sRNA pathway. The validation of this mechanism is underway. Given the high copy number of MITEs, many small RNAs and miRNAs derived from MITEs [14–16] and their preferential insertion into gene regions, it will be important to systematically account for the different mechanisms of MITEs and their potential functional roles in transcriptional regulation of genes.
Although MITEs in various higher organisms including mosquito and Drosophila as well as beetle have been extensively investigated, little is known about the genome-wide information of MITEs in the silkworm genome. We identified the 17 silkworm MITE families by using a recently developed algorism to scan the genome sequence. Silkworm has 17 MITE families, and furthermore, 7 of 17 families are completely novel based on the nucleotide composition of TSD. These results added new knowledge for understanding the evolution of MITEs. Importantly, we not only corroborated the preference of MITEs inserted into or near genes seen in the other genomes but also found that BmMITE-11 might silence its closest genes through sRNA pathway. These results emphasize the exceptional role of MITEs in the transcriptional regulation of genes and have general implications to understand the interaction between MITEs and their host genome.
The new assembly of the silkworm genome sequence was downloaded from SilkDB http://silkworm.swu.edu.cn/silkdb. First, a structure-based approach implemented in the MUST program was used to search the silkworm genome sequence for candidate MITEs that have characteristics : TIR with 8-50 bp of length; TSD with 2-30 bp of length; 100-600 bp of the sequence length. The parameters were selected based on the common features of known MITEs. Candidate MITEs were grouped by all-blast-all. The MITEs for which any pair has an identity >0.8 were defined as a family. Families that have <3 members were excluded to reduce false positive rate for identification of MITEs.
Next, a homology BLAST search was used to scan the silkworm genome sequence with a representative member for estimating copy number of each MITE family. At this step, a MITE family was defined by a sequence similarity with E < e-5 for BLASTN. This standard has been recently used by Kuang et al. . The BLAST results were filtered out using a Perl script according to criteria: a minimum nucleotide identity rate >90% and query coverage >80%. Those including both complete TIRs were defined as intact MITEs and others as fragmentary MITEs.
UNAFOLD http://www.bioinfo.rpi.edu/applications/hybrid/twostate-fold.php was used to predict secondary structures of MITEs . Then, all MITE families were used against the ISfinder database http://www.is.biotoul.f/is.html, RepBase database (version 14.11)  and NCBI nr database respectively to find known families.
Fourteen silkworm accessions (Wu-B, Wu-D, DongDe-201, Wu-E, Lu-10, DaXianTuZhong, FuRongHuiLuan, PeiXianZhong, Ri-9, HeiZi, ChunSi, QiongShanHaiNan, Ri-110, ShangSanHuBan) that represent four main geographic strains (Chinese, Japanese, European, and Tropical) were used in insertion validation of a predicted MITE at four sites. A MITE used in insertion validation was randomly selected and the primers were designed based on flanking regions of each of insertion sites (additional file 4).
To estimate the age of MITEs, DNA sequences of each MITE family were aligned using Clustal W  and the family consensus sequences were constructed using the program DAMBE , then the level of nucleotide substitutions (k) between each MITE element and the family consensus sequence was estimated using Kimura 2-parameter distance . The each MITE age was estimated using the formula T = k/2r, assumed r = 1.56 × 10-8, it's the fruitfly neutral rate of substitutions per year and has been used in silkworm .
All full length sequences of each MITE family were aligned using Clustal W . Then, using the alignment of all full length sequences of each family and the program Network4.5 http://www.fluxus-engineering.com/sharenet.htm, we constructed median-joining (MJ) networks to estimate BmMITE intrafamily diversity. This method has been used in maize . Additionally, a sliding window analysis with a window size 20 bp and a step size 20 bp was used to look at pattern of regional nucleotide diversity (pi) for each family and completed by DnaSP (version5.10) .
All identified members of each MITE family were mapped to chromosomes by SilkMap http://silkworm.swu.edu.cn/silksoft/silkmap.html and copy numbers of each family on chromosomes were counted.
Two files for positions of predicted genes in scaffolds and for lengths of scaffolds were downloaded from SilkDB http://silkworm.swu.edu.cn/silkdb. Then, Perl script was written to scan the files for extracting the information of MITEs close to or in predicted genes. To determine whether insertions of MITEs close to gene regions are due to chance, a computer simulation, which is similar to that used by Naito et al. , was performed. Briefly, the fragments of up to 10 kb were randomly sampled from the silkworm genome sequence, the middle of each 10 kb sequence was presumed as the insertion site and the information about the insertion site close to or in predicted genes (i.e., in intron and exon) was recorded accordingly. The genes that have less than 5 kb distances to MITEs or contain MITEs were defined as the closest genes.
The closest genes were annotated by WEGO http://silkworm.swu.edu.cn/cgi-bin/wego/index.pl, and gene expression profiles were examined by using the available microarray data as described in Xia et al. .
We thank Dr. Tian Li for his help in drafting Perl scripts used in the study. We have also benefited from the discussion with Dr. Jun Duan when the study was launched. We also thank three reviewers for their constructive comments that improved the manuscript. This work was supported by the Hi-Tech Research and Development (863) Program of China (2006AA10A117), a grant from National Science Foundation of China (No. 30970409), and the Programme of Introducing Talents of Discipline to Universities (B07045).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.