Skip to main content
  • Research article
  • Open access
  • Published:

Genome-wide identification of MITE-derived microRNAs and their targets in bread wheat



Plant miRNAs are a class of small non-coding RNAs that can repress gene expression at the post-transcriptional level by targeting RNA degradation or promoting translational repression. There is increasing evidence that some miRNAs can derive from a group of non-autonomous class II transposable elements called Miniature Inverted-repeat Transposable Elements (MITEs).


We used public small RNA and degradome libraries from Triticum aestivum to screen for microRNAs production and predict their cleavage target sites. In parallel, we also created a comprehensive wheat MITE database by identifying novel elements and compiling known ones. When comparing both data sets, we found high homology between MITEs and 14% of all the miRNAs production sites detected. Furthermore, we show that MITE-derived miRNAs have preference for targeting degradation sites with MITE insertions in the 3’ UTR regions of the transcripts.


Our results revealed that MITE-derived miRNAs can underlay the origin of some miRNAs and potentially shape a regulatory gene network. Since MITEs are found in millions of insertions in the wheat genome and are closely linked to genic regions, this kind of regulatory network could have a significant impact on the post-transcriptional control of gene expression.


In plants, the miRNA biogenesis pathway is an intricate, well-studied mechanism, which requires Dicer or Dicer-like enzymes, Argonaute proteins as part of the RISC (RNA Induced Silencing Complex) complex, and nucleotide complementarity to exert their function [13]. The miRNA precursors are single-stranded transcripts that can fold into characteristic hairpin structures, which are transcribed from pol II loci. Usually, each miRNA gene in a specific location gives rise to only one functional RNA duplex formed by miRNA and miRNA*. Mature miRNAs (20-24 nt) can downregulate gene expression of target transcripts by either of two post-transcriptional mechanisms: mRNA cleavage or translational repression [3, 4].

Knowledge of the origin and diversification of miRNAs contributes to a better understanding of the complexity of the regulatory networks where they participate [5]. Several hypotheses have been proposed to explain the origin of miRNA genes. The more accepted ones are miRNA genes originated by inverted duplication of target genes [6] or by fortuitous foldbacks in the genomes [7]. Furthermore, there is increasing evidence that some miRNAs can derive from certain transposable elements (TEs) [2, 8].

TEs are ubiquitous components of genomes and one of the forces driving genome evolution [9]. The hexaploid (bread) wheat (Triticum aestivum L. 2n = 6x = 42, genome AABBDD) is one of the most important crop species. Remarkably, 85% of its 17Gb genome is derived from repetitive elements that underwent massive amplification [10]. Therefore, wheat is a particularly useful system to study the contribution of TEs in regulatory networks involving small RNAs (sRNAs) and/or miRNAs [11]. Miniature Inverted-repeat Transposable Elements (MITEs) are a small type of TEs (about 500 bp) and have an inverted repeat structure. In rice, 80% of the TE-derived miRNAs came from MITEs, while 10% are derived from retrotransposons and 9% from other DNA transposons [2].

According to a previous study, MITEs account for approximately 0.16% of the wheat genome, are distributed along the 21 chromosomes and map within gene-rich regions. Approximately 8% of the wheat genes contain MITEs within their coding or regulatory sequences [12].

In this work, using public wheat sRNAs libraries, we identified several miRNA loci showing high homology with MITEs. Through bioinformatics analyses and the use of public degradome libraries we established that several of them have target transcripts. Furthermore, many of the target genes have a MITE inserted, exclusively in the 3’ UTR region of the transcripts. These results allow us to propose that MITE-derived miRNA integrate a genetic regulatory network in the wheat genome. Since MITEs are found in millions across the wheat genome and are closely linked to genic regions, this kind of regulatory network could have a great impact on the post-transcriptional control of gene expression of this important crop.


Identification and annotation of MITEs in the wheat genome

To identify and annotate MITEs in the wheat genome, we started by creating a comprehensive MITE wheat database using 569 sequences from the TREP database [13] and 6,013 sequences obtained from the wheat genome using the software MITE Tracker [12]. After clustering all sequences using VSEARCH [14], a FASTA file containing 2,002 unique MITEs was obtained, where 223 elements (11%) showed high homology with elements already described in the TREP database and 1.779 (89%) displayed high homology to new elements discovered by MITE Tracker (Additional file 1). Next, a BLAST search was performed against the wheat genome using the MITEs FASTA file as input. After filtering the results using the parameters described in Zavallo et al. for transposable element annotation [15], 1,211,340 complete MITEs sequences were finally identified and annotated. These results are available as a GFF file in Additional file 2.

Genome wide identification of wheat miRNAs derived from sRNAs libraries

To detect valid wheat miRNAs loci, ShortStack [16] software was run combining 11 wheat sRNAs libraries from [17] and four wheat sRNAs libraries described in [18].

First, ShortStack identified sRNA clusters using a de-novo method; next, a loci file was fed to the program containing the MITE annotation file obtained in the previous step. A total of 280,891 non-duplicated sRNA clusters with at least 10 reads of coverage were identified using all 15 sRNA libraries. Of those, only 270 (0.1%) sRNA clusters met all ShortStack requirements to be classified as valid miRNAs (Table 1). Most of the valid miRNAs (65%) were 21-nt in length, followed by 22 and 24-nt (13% each), and in a lower proportion, 20 (7%) and 23-nt (3%) in length, according to the ShortStack results (Additional file 4). Additional file 3 shows the predicted secondary structure plots of all found pre-miRNAs. Regarding the position in the genome, valid miRNAs were detected in all wheat chromosomes: 195 (72%) were found in intergenic regions of the genome, 12 (5%) in promoter regions, and the remaining 63 (23%) within genes.

Table 1 Aggregated Shortstack outputs for all sRNA libraries and the counts of sRNA clusters for each step

To explore how well we covered the already described miRNAs candidates, the 270 mature miRNA sequences were compared using BLAST against three well-known comprehensive wheat miRNA libraries: miRBase [19], Plant sRNA gene server [20] and PmiREN [21].

Our de-novo approach was able to cover 42 miRNAs (34.4%) of the 125 Triticum aestivum elements reported in miRBase, 114 (80%) of the 141 available in Plant sRNA gene server and 496 (46.6%) of the 1,063 elements described in PmiREN. These results indicate that, due to the high similarity between elements, some mature miRNA sequences found in our analysis match with more than one entry in the database we used for comparison. Additionally, based on the level of stringency we chose for comparing mature miRNAs, we found 87 novel miRNAs locus in our analysis.

MITE-derived miRNAs

Regions surrounding each miRNA loci reported by ShortStack were compared against all MITEs sequences. Interestingly, of the 270 putative miRNAs detected, 38 (14.1%) had high homology (80% of identity in more than 80% of the sequence) with a MITE sequence (see Table 2 and Additional file 4). The alignments between each pre-miRNA and MITE are shown in Additional file 5.

Table 2 MITE-derived miRNAs detected in wheat genome

As an example, Fig. 1A shows the high sequence homology in the alignment between the MITE-derived miRNA precursor sequence (given by ShortStack) located in the intron 2 of the gene TraesCS2B02G010300, the most similar MITE identified (DTT_Hvul_ Pan_M801L24-1), and the mature MITE-derived miRNA 7 (Table 2). The small RNA sequences that mapped against the region are also shown. Figure 1B shows the hairpin structure of the precursor sequence of MITE-derived miRNA 7. Finally, Fig. 1C shows the TraesCS2D02G449600.1 transcript where a portion of the 3’ UTR corresponds to the cleavage target site of MITE-derived miRNA 7. Figure S5 in Additional file 6 shows the similarities between the predicted secondary structures of four examples of highly conserved miRNAs and four examples of MITE-derived miRNAs, which is an important feature for predicting miRNAs loci. Regarding their size, 28 of 38 (74%) of the MITE-derived miRNAs produce predominantly 21-nt sRNAs (Additional file 6, Fig. S1) indicating that, in wheat, MITE-derived miRNAs gave rise to 21-nt sRNAs as well as canonical miRNAs. Since these 21-nt MITE-derived sRNAs also presented hairpin structures typical of canonical miRNAs (see Additional file 3), we consider it is unlikely they were DCL3-dependent. On the other hand, ten MITE-derived miRNAs produced predominantly 24-nt sRNAs (Additional file 6, Fig. S1) (Table 2). The MITE-derived miRNAs were detected in all wheat chromosomes except for 2D and 7D. The genomic annotation indicated that 19 of those MITE-derived miRNAs were intergenic, 15 mapped inside introns, two in promoter regions and one (MITE-derived miRNA 13) in the promoter and the 3’ UTR of two different genes (Table 2).

Fig. 1
figure 1

A Scheme of the production site of MITE-derived miRNA 7 (highlighted in red) located at the intron 2 of TraesCS2B02G010300 gene. The DNA alignment sho ws the high sequence homology between the MITE-derived miRNA precursor, the most similar MITE identified (DTT_Hvul_ Pan_M801L24-1) and the mature MITE-derived miRNA 7. Highlighted in green is the Target Site Duplication (TSD) where the MITE was inserted and in yellow the mismatches between the intron 2 and MITE sequence. Also, mature miRNA and miRNA* as well as all small RNA reads mapped to the region as reported by Shortstack are shown below B Hairpin structure found in the production site of MITE-derived miRNA 7 in the intron 2 of the TraesCS2B02G010300 gene. C Scheme of the TraesCS2D02G449600.1 transcript showing a MITE-derived miRNA 7 (highlighted in red) target site at its 3’ UTR.The DNA alignment shows the high sequence homology between the target region, the most similar MITE (DTT_Taes_Pan_42j2-6) and the complementary position of the MITE-derived miRNA 7. Highlighted in green the TSD, in yellow the mismatches between the 3’ UTR, and the MITE and light blue the mismatches between the mature MITE-derived miRNA 7 and the complementary site in the 3’ UTR of TraesCS2D02G449600.1. D Signature plot of the cleavage site in the target transcript

Identification of miRNAs and MITE-derived miRNA targets

To further explore the functional activity of the previously detected 270 miRNAs, target analysis was performed by sequentially using two programs that give complementary information. Cleaveland [22] is based on target degradation whereas psRNATarget [23] detects the homology between miRNA and their potential targets. Four previously described degradome libraries [17] and the 270 mature miRNAs sequences were used as input for Cleaveland, while the miRNA wheat cDNAs sequences were used in the psRNATarget pipeline. The results showed that 93 (34.4%) of the detected miRNAs had at least one confirmed target identified by both programs, producing in total 277 cleavage sites (from categories 0 to 3 and P-value <0.05 in Cleaveland plus positive cleavage in psRNATarget) for 227 different wheat transcripts. Additional file 7 shows all the miRNAs and their identified target transcripts. Additional file 8 shows all the cleavage site plots ordered by experiment, miRNA code name, and transcript name. Noticeably, when considering only the subgroup of the 38 MITE-derived miRNAs (Table 2), 16 (42.4%) of them had at least one target under the conditions mentioned above. These MITE-derived miRNAs targeted 36 cleavage sites in 34 different transcripts (Fig. 2; Additional file 7).

Fig. 2
figure 2

An overview of all wheat miRNAs and their targets showing how many of them are related to MITEs

It is worth noting that miRNAs that are not derived from MITEs target both in 5’ and 3’ UTR regions as well as in coding regions (CDS), although these target regions do not show homology with MITEs (Fig. 3). Conversely, the MITE-derived miRNAs mainly had target regions with high homology with MITEs, and these regions were all located in the 3’ UTR regions of the transcripts. An example of this is shown in Fig. 1D, where the target site in the 3’ UTR of the TraesCS1B02G479800.1 transcript and the sequence alignment between the target region, the most similar MITE (DTT_Taes_Pan_42j2-6), and the complementary position of the MITE-derived miRNA 7 promoting the cleavage of the 3’ UTR of the transcript are shown. Additional examples are listed in Table 3. However, seven MITE-derived miRNA target sites did not have high homology with MITEs at the cleavage site. In this case, four of them were located in 3’ UTRs, two in 5’ UTRs and one in a CDS (Table 3, Fig. 3).

Fig. 3
figure 3

Wheat miRNAs and their targets. The X-axis has the miRNA names. Y-axis represents the P-value of the target sites according to Cleaveland software. MiRNA names in orange (left most of X-axis) have high homology with MITEs. Similarly, targets in orange have high homology with MITEs. Targets represented by are located in coding sequences, represents 5 UTR and , 3 UTR

Table 3 MITE-derived miRNAs and its targets transcripts with MITE insertions detected in wheat genome

Overall, our results support the existence of two fairly independent miRNA-based regulatory networks. In the first and more “conventional” one, miRNAs and their targets show no homology with MITEs. The target sites of this network can be found equally in coding regions as well as in the 5’ and 3’ UTR of the transcripts. Instead, in the second putative regulatory network, the sequences of the miRNA-producing region as well as those of the target regions of the miRNAs showed high homology with MITEs. The target sites of this network were aiming to the 3’ UTR ends of the transcripts.

Figure 3 shows all the miRNAs and their targets found in the wheat genome, grouped by unique miRNAs. Out of the 38 MITE-derived miRNAs, 16 (42%) had targets and 12 of them (75%) had also a MITE inserted in one of their targets. Noticeably, none of the non-MITE-derived miRNAs with detected targets, report a MITE insertion in their target regions.

The MITE-derived miRNA regulatory network

For a MITE to produce a miRNA, it must be inserted into an active transcription site. Of the 38 MITE-derived miRNAs detected, 17 (45%) were found in known active transcription sites, gene promoters or introns, and the remaining 21 (55%) were located in intergenic regions, with no additional evidence of being in an active transcribed site (Table 2). We explored the idea that the incorporation of a target transcript into the MITE-derived miRNA regulatory network could depend on the evolutionary time of the MITE insertion in the target region. Taking the wheat A genome as a reference, the MITE-derived miRNA 5 (Table 2) is produced by the MITE Icarus_BG607724-1 inserted in the promoter of the gene TraesCS2A02G175300. The MITE-derived miRNA 5 targets the 3’ UTR of the transcript TraesCS2A02G281000.1 where a MITE Pan_M801L24-1 is inserted (Table 3). The Pan_M801L24-1 insertion was also observed in the A genome of wheat relatives T. urartu (2n = 2x = 14, genome AA), T.turgidum spp. diccocoides (2n = 4x = 28, genome AABB) and domesticated T. turgidum spp. durum (2n = 4x = 28, genome AABB) but the insertion was absent in the B and D genome homeologous of TraesCS2A02G281000 (TraesCS2B02G298400; TraesCS2D02G279900 respectively), suggesting that the insertion of TraesCS2A02G281000.1 transcript in the MITE-derived miRNA regulatory network occurred relatively early in the evolution of the A genome lineage (Additional file 6, Fig. S2). Instead, the MITE-derived miRNA 37 (Table 2), which is produced by Athos_103H9-1 in an intergenic region, targets the AthosAF326781-1 MITE inserted in the 3’ UTR of the TraesCS6A02G276700.1 transcript. This insertion was observed in the A genome of the domesticated T. turgidum spp. durum but is absent in the wild relatives T. urartu and T. turgidum spp. diccocoides and also in the TraesCS6A02G276700 homeologous in the B and D genomes (TraesCS6B02G304200; TraesCS6D02G257000 respectively) (Additional file 6, Fig. S3). These results suggest that the incorporation of the TraesCS6A02G276700.1 transcript into the MITE-derived miRNA_MITE_37 regulatory network occurred more recently in evolutionary time. Finally, another target of MITE-derived miRNA_MITE_37, is the TraesCS3A02G274100.1 which has an Athos_103H9-1 insertion in the 3’ UTR. Such insertion was observed in T. aestivum A genome and not present neither in wild relatives T. urartu and T. diccocoides, nor in the domesticated T. turgidum spp. durum, or in the homeologous B and D genomes (TraesCS3B02G307800; TraesCS3D02G273300) (Additional file 6, Fig. S4) indicating that this transcript was incorporated into the regulatory network very recently in terms of evolutionary time.

Expression profiles of MITE-derived miRNAs and their targets in different tissues

The expression level of MITE-derived miRNAs with validated targets in different tissues was analyzed by using Illumina-sequencing profiles from [17] (Fig. 4, Additional file 9). The miRNA_MITE_3 and miRNA_MITE_7 showed expression in all the tissues evaluated. On the other hand, miRNA_MITE_10 showed expression in all tissues except in seedling leaf. The miRNA_MITE_15 showed expression in most tissues except in seedling root, 0-5 mm young spikes and 10-15 mm young spike. Moreover, miRNA_MITE_22 showed expression in almost all tissues, except seedling leaf and grain of 8 days after pollination. miRNA_MITE_25 presented an expression similar to miRNA_MITE_22 with the exception that it is not expressed in flag leaf. The miRNA_MITE_32 is expressed in almost all tissues except 10-15 mm young spike and dry grain. On the other hand, miRNA_MITE_34 is expressed in most tissues except in embryo of germinating seed, seedling leaf and grain of 8 days after pollination. Interestingly, miRNA_MITE_5 is expressed in higher proportions in sRNAs libraries derived from grain tissues, especially in 8 and 15 days after pollination (Fig. 4A). Additionally, the miRNA_MITE_5 showed no expression in seedling root and seedling leaf tissues libraries. According to our search, the mature the miRNA_MITE_5 (UGAGACGGGUAAUUUGGAACGGAG) is not similar to any of those already described in wheat. In contrast, in a less stringent search using the miRBASE tool, we found that this miRNA is similar to tae-MIR5175 (BLAST score of 71 and an e-value of 1.5). Our findings indicates that miRNA_MITE_5 can target the 3’ UTRs of three transcripts, TraesCS2A02G281000.1 (Dynamin-like family protein), TraesCS7B02G175300.2 (Monodehydroasorbate reductase) and TraesCS1A02G090300.1 (Anaphase-promoting complex subunit 7) (Additional file 7). By using public wheat expression data [24], we compared the expression level of these three target transcripts in different tissues. Remarkably, we found that two of them (TraesCS7B02G175300.2 and TraesCS1A02G090300.1) are down-regulated in grain and reproductive stages (Fig. 4B). These results lead us to speculate that the high expression of the miRNA_MITE_5 during grain filling may be one of the causes of down-regulation of the two aforementioned transcripts.

Fig. 4
figure 4

A Expression profiles of MITE-derived miRNA with target genes showing the insertion of a MITE in the miRNA recognition site in the 3’ UTR. The expression was evaluated in 11 different tissues, counts normalized to the total mapped reads in each library (RPM) B Expression profiles of target transcripts of miRNA_MITE_5 (TPM)


Characterization of the MITE-derived miRNAs in wheat

The association between miRNA and MITEs has been previously described in wheat [25, 26]. Still, to our knowledge, no genome-wide studies were conducted to decipher the connection between these two important genomics elements. Using a very stringent approach (all ShortStack miRNA discovery checkpoints passed) we identified 38 MITE-derived miRNAs throughout the entire wheat genome. However, when relaxing some parameters and taking the 2,081 sRNAs that were arrested in ShortStack N15 step (where the miRNA-star sequence was missing, Table 1) as putative miRNAs (for more detail, see Methods section), the number of miRNA derived from MITEs raised to 1,507 (72%) which would give a much more extensive and complex putative network.

Previously Yu et al. [25] found that a MITE inserted in the promoter of the spring Vrn-A1a allele would be responsible for the encoding of TamiR1123. Here, we have detected two new MITE-derived miRNAs that originated in promoter regions (Table 2) and both showed target transcripts (Table 3).

Intronic miRNAs or so-called “mirtrons” were first discovered in animals [27], and are being increasingly reported in plants. So far, mirtrons transcribed from the spliced out introns of the protein-coding genes have been reported in Arabidopsis, rice, cassava foxtail millet, and recently in wheat [26, 2831]. Here, we detected 16 new MITE-derived miRNAs originated in intronic regions with evidence that four of them targets transcripts (Tables 2 and 3). These results reinforce the evidence for the presence of functional mirtrons within the wheat genome and their association with TEs. Regarding the size of the MITE-derived miRNAs, in rice Li et al. [2] detected that the miRNAs derived from TE (80% from MITEs) give rise to sRNA populations with different sizes in each loci. The authors suggest that the length distribution of sRNA products may reflect heterogeneity in the processing of the miRNA derived from TE and proposed that may be due to the transition of some TEs from the heterochromatic siRNAs pathway (which acts mainly via 24-nt siRNAs) to the hairpin siRNA or miRNA pathways (which act mainly via 21-nt sRNAs). Finally, the authors found that only 5.5% of the miRNAs derived from MITEs in rice were 21-nt long and 39.5% were 24-nt long. In contrast, in wheat, we found that 74% of the MITE-derived miRNAs were 21-nt long, while 26% were 24-nt long (Additional file 6, Fig. S1), suggesting that MITEs-derived miRNAs may have different origins and evolution.

MITE-derived miRNA targets

In animals, miRNAs control the expression of target genes through binding to complementary target sites in mRNA targets [3]. In mammals, some miRNAs are similar to TEs and the presence of cognate elements in the 3’ UTR of protein-coding genes may confer susceptibility to miRNA regulation [3236]

Otherwise, in plants, a few reports in Arabidopsis [37], rice [38], and more recently in wheat [26] evidenced the presence of miRNA complementary sites located at the 3’ UTR regions and not within the CDS. In this work, we present substantive evidence that MITE-derived miRNAs would target preferentially the 3’ UTRs of target genes, where a large number of MITEs are inserted (Table 3, Figs. 1C and 2).

The evolutionary separation of the diploid species Aegilops and Triticum occurred between 5.3 and 1.8 MYA, the creation of the allotetraploid species of Triticum between 1.8 and 0.01 MYA, and the creation and domestication of hexaploid (bread) wheat, by the incorporation of the A. tauschii genome and domesticated forms of tetraploid wheat occurred very recently in the last 10,000 years [39]. Several TE families underwent either proliferation or reduction during species diversification. Markers from MITEs showed polymorphic insertions (about 80%) among species and accessions of Aegilops and Triticum [40]. Interestingly, we found evidence of a regulatory network produced by MITE-derived miRNAs, on transcript targets with MITEs insertions early in evolutionary time, from T. urartu (5.3-1.8 MYA) to T. aestivum (Additional file 6, Fig. S2). We also presented evidence of the functionality of this network in more recent evolutionary periods from T. durum to T. aestivum (Additional File 6, Fig. S3) and even in very recent insertions present in T. aestivum (<10,000 years old) (Additional file 6, Fig. S4).

In general, our results lead us to propose that there are three crucial factors to define whether a given target transcript can be controlled by a miRNA derived from a MITE: a) the family to which the inserted MITE belongs; b) the precise place of insertion in the target gene (3’ UTR); and c) the rate of mutations of the target region. In contrast, the evolutionary time of insertion of a certain MITE into its target gene is not a crucial factor since we found old (5.3-1.8 MYA) and new (<10,000 years old) MITE elements already incorporated into the MITE-derived miRNAs regulatory network. An example of this is that MITE-derived miRNA 3 (Table 2) produced by the MITE 1125 detected by MITETracker [12], targets an element of the family Oleus (Table 3), suggesting that the variability between elements from different or the same family is important for the proper functioning of the regulatory network.

Previously Crescente et al. [12] and more recently Poretti et al. [26] proposed that genome-wide distribution of thousands of MITE copies and the frequent association of these with genes, combined with a functionally diversified regulatory network, might provide a selective advantage for frequent domestication of MITEs into miRNA genes relevant in the development and/or for adaptation to biotic stress. Here we extended these findings using a genome-wide approach to study the miRNA MITE regulatory network in wheat and unrevealed that this network is much more extensive and deep than it was foreseen in the past.

Regarding the co-expression studies of miRNAs and their target transcripts, in this work it was observed that a large part of the miRNAs derived from MITE presented expression in numerous wheat tissues and in some cases their expression was very abundant, as in the case of miRNA_MITE_5. In our study using public expression libraries we were able to establish at least a negative correlation between the expression of the miRNA and the target gene. However, deeper and more precise studies must be carried out in the future to establish accurately and reliably the expression associations between MITE-derived miRNAs and their target transcripts throughout the wheat genome.

Concluding remarks

In this work, we identified 38 MITE-derived miRNAs out of a total of 270 miRNAs identified in the entire wheat genome. For several of these MITE-derived miRNAs, we reliably established the presence of at least one cleavage target transcript, which mostly also has a MITE inserted in the 3’ UTR region. These data revealed a regulatory network parallel to the classical miRNA network described so far. This work was carried out following stringent rules for miRNA and targets discovery. The data suggested that the scope of the MITE-derived miRNA regulatory network could be much broader and with deeper implications. Because MITEs are found in millions of copy numbers in the wheat genome and are mostly located in gene-rich regions, this regulatory network could have major impact in the wheat post-transcriptional control of gene expression.


MITEs database

A comprehensive wheat MITEs database was created using families retrieved with MITE Tracker [12] and from TREP database [13]. All MITEs sequences were merged into a single file and clustered using VSearch [14] (parameters –iddef 1 –id 0.8) to collapse redundant sequences. If one of the clusters obtained contained elements from TREP, a single of those elements was picked to represent each family, otherwise a random element was selected. Next, to annotate MITEs in the entire wheat genome, a genome-wide search was performed using BLAST [41]. The results were subjected to a set of filters using python scripts previously implemented [15] where elements shorter than 50 and larger than 1100 nucleotides were removed and a minimum distance between the start and end position of two different elements of five nucleotides is required to avoid duplicated hits. Also, a minimum percentage of identity between the query and the subject of 80% is required and the alignment length must be at least 80% the size of the query length. Ultimately, the filtered BLAST results file was converted to a GFF file.

miRNAs assessment

To find putative miRNA production loci, we used two comprehensive sets of sRNA libraries. The first one corresponds to 11 libraries from different tissues of the hexaploid wheat cultivar Chinese Spring (Triticum aestivum L.) [17]. The second set is from four libraries from fully developed T. aestivum leaves were two of them came from plants infected with Mal de Río Cuarto virus (MRCV) and the remaining two were mock-infected controls [18]. Libraries from [17] were trimmed using Trim Galore [42] with the adapter 5’ TCGTATGCCGTCTTCTGCTTG 3’. The script used for trimming is available in the provided code repository. The two sRNA libraries and the wheat genome were used as input for ShortStack v3.8.5 software with default parameters [16, 43]. Using the alignments files from the two sets given by Shortstack, a second genome-wide identification step was done analyzing only intervals annotated with MITEs (using the –locifile parameter) using the MITE annotation obtained from the GFF file from the previous step. Secondary structure prediction plots were done using RNAfold web server [44].

miRNAs production loci

Only loci annotated as valid miRNAs according to ShortStack criteria were kept and merged into a single file. In addition, all clusters with less than 10 reads were discarded. Duplicated elements were identified based on the miRNA sequence and the position of the sRNA cluster and removed to avoid redundancy.

To search for known mature miRNAs, a BLAST search was executed within the most common miRNA sequence of each cluster and mature miRNAs sequences from miRBase [19] and Plant sRNA gene server [20]. Results were filtered so that no more than four mismatches and gaps were allowed. Also, only alignments with a length of at least the matching mature miRNA length minus four were kept.

To annotate MITE-derived miRNAs we used ShortStack’s output which contains genomic sequences surrounding each sRNA cluster (between 60 and 900 nucleotides length). This procedure was used to find sequence homology between the production site of the miRNAs and the MITEs by using BLAST. Only results with at least 80% of identity were kept. Also a minimum alignment length of 80% of the length of the query was required.

Using the genomic annotation of [43], a GFF file containing introns and promoter’s regions (2000-nt upstream each gene) was created. Using these expanded genomic annotations, each miRNA production site was labelled according to the genomic coordinates that overlaps with it. If there was no match, the miRNA was labelled as intergenic.

Cleavage of targets

To find putative miRNA target sites, we used two approaches that complemented each other. First, sliced targets of miRNAs from four degradome libraries from [17] were searched using the Cleaveland pipeline [22]. Only results with a P-value lower than 0.05 were kept. Second, transcript targets were identified using psRNATarget website with the Schema V2 (2017 release) scoring parameters [23]. The results from both approaches were merged by using the cleavage alignment positions in the transcripts reported by each program. Only those results simultaneously reported by both programs were considered. MITE insertions in target regions were identified using BLAST between the target genes sequences and the MITEs library. Only results with at least 80% identity and 80% query/alignment length were considered. Also, those hits which location included the exact miRNA cleavage genomic location were kept. Each region was labeled as 5’ UTR, 3’ UTR, or exon by using a python script that matches the position of the target sites in the transcripts and genomic annotations. Python scripts were orchestrated using several Jupyter notebooks [45]. Pandas [46] was used to read and transform miRNA production and target sites as well as genomic annotations. Sequences were read and manipulated using biopython [47]. All scripts are available in a Github repository (see Availability of data and materials).

miRNAs expression profiles

The expression level of the miRNAs were obtained from ShortStack’s output using the sRNA libraries of [17]. These specific libraries were chosen because they represent expression levels of miRNAs in different age and high level tissues. The read counts were normalized in RPM using a python script available in the provided repository inside a Jupyter notebook. A heatmap plot was created using the Seaborn library [48]. The expression level of the target transcripts was obtained from the website [24].

Availability of data and materials

All scripts, commands and software used are detailed in Relevant data is provided as additional files.



Generic file format



MITE-derived miRNA:

microRNA produced by a MITE-related region


Miniature Inverted-repeat Transposable Element


Million Years Ago


Small RNA


Transposable Element


untranslated region


  1. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, et al.High-throughput sequencing of arabidopsis micrornas: evidence for frequent birth and death of mirna genes.PloS ONE. 2007; 2(2):e219.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Li Y, Li C, Xia J, Jin Y. Domestication of transposable elements into microrna genes in plants. Plos ONE. 2011; 6(5):19212.

    Article  Google Scholar 

  3. Bartel DP. Micrornas: genomics, biogenesis, mechanism, and function. Cell. 2004; 116(2):281–97.

    Article  CAS  PubMed  Google Scholar 

  4. Zhang Z, Teotia S, Tang J, Tang G. Perspectives on micrornas and phased small interfering rnas in maize (zea mays l.): functions and big impact on agronomic traits enhancement. Plants. 2019; 8(6):170.

    Article  CAS  PubMed Central  Google Scholar 

  5. Cui J, You C, Chen X. The evolution of micrornas in plants. Curr Opin Plant Biol. 2017; 35:61–7.

    Article  CAS  PubMed  Google Scholar 

  6. Allen E, Xie Z, Gustafson AM, Sung G-H, Spatafora JW, Carrington JC. Evolution of microrna genes by inverted duplication of target gene sequences in arabidopsis thaliana. Nat Genet. 2004; 36(12):1282–90.

    Article  CAS  PubMed  Google Scholar 

  7. De Felippes FF, Schneeberger K, Dezulian T, Huson DH, Weigel D. Evolution of arabidopsis thaliana micrornas from random sequences. RNA. 2008; 14(12):2455–9.

    Article  Google Scholar 

  8. Piriyapongsa J, Jordan IK. Dual coding of sirnas and mirnas by plant transposable elements. RNA. 2008; 14(5):814–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kazazian HH. Mobile elements: drivers of genome evolution. Science. 2004; 303(5664):1626–32.

    Article  CAS  PubMed  Google Scholar 

  10. Wicker T, Gundlach H, Spannagl M, Uauy C, Borrill P, Ramirez-Gonzalez RH, De Oliveira R, Mayer KF, Paux E, Choulet F. Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol. 2018; 19(1):1–18.

    Article  Google Scholar 

  11. Cantu D, Vanzetti LS, Sumner A, Dubcovsky M, Matvienko M, Distelfeld A, Michelmore RW, Dubcovsky J. Small rnas, dna methylation and transposable elements in wheat. BMC Genomics. 2010; 11(1):408.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Crescente JM, Zavallo D, Helguera M, Vanzetti LS. Mite tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes. BMC Bioinformatics. 2018; 19(1):348.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wicker T, Matthews DE, Keller B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 2002; 7(12):561–562.

    Article  CAS  Google Scholar 

  14. Rognes T, Flouri T, Nichols B, Quince C, Mahé F. Vsearch: a versatile open source tool for metagenomics. PeerJ. 2016; 4:2584.

    Article  Google Scholar 

  15. Zavallo D, Crescente JM, Gantuz M, Leone M, Vanzetti LS, Masuelli RW, Asurmendi S. Genomic re-assessment of the transposable element landscape of the potato genome. Plant Cell Rep. 2020; 39(9):1161–74.

    Article  CAS  PubMed  Google Scholar 

  16. Johnson NR, Yeoh JM, Coruh C, Axtell MJ. Improved placement of multi-mapping small rnas. G3 Genes Genomes Genet. 2016; 6(7):2103–11.

    CAS  Google Scholar 

  17. Sun F, Guo G, Du J, Guo W, Peng H, Ni Z, Sun Q, Yao Y. Whole-genome discovery of mirnas and their targets in wheat (triticum aestivum l.)BMC Plant Biol. 2014; 14(1):142.

    Article  PubMed  PubMed Central  Google Scholar 

  18. de Haro LA, Dumón AD, Mattio MF, Argüello Caro EB, Llauger G, Zavallo D, Blanc H, Mongelli VC, Truol G, Saleh M-C, et al.Mal de río cuarto virus infection triggers the production of distinctive viral-derived sirna profiles in wheat and its planthopper vector. Frontiers Plant Sci. 2017; 8:766.

    Article  Google Scholar 

  19. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. mirbase: tools for microrna genomics. Nucleic Acids Res. 2007; 36(suppl_1):154–8.

    Article  Google Scholar 

  20. Lunardon A, Johnson NR, Hagerott E, Phifer T, Polydore S, Coruh C, Axtell MJ. Integrated annotations and analyses of small rna–producing loci from 47 diverse plants. Genome Res. 2020; 30(3):497–513.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Guo Z, Kuang Z, Wang Y, Zhao Y, Tao Y, Cheng C, Yang J, Lu X, Hao C, Wang T, et al.Pmiren: a comprehensive encyclopedia of plant mirnas. Nucleic Acids Res. 2020; 48(D1):1114–21.

    Article  Google Scholar 

  22. Addo-Quaye C, Miller W, Axtell MJ. Cleaveland: a pipeline for using degradome data to find cleaved small rna targets. Bioinformatics. 2009; 25(1):130–1.

    Article  CAS  PubMed  Google Scholar 

  23. Dai X, Zhuang Z, Zhao PX. psrnatarget: a plant small rna target analysis server (2017 release). Nucleic Acids Res. 2018; 46(W1):49–54.

    Article  Google Scholar 

  24. Ramírez-González R, Borrill P, Lang D, Harrington S, Brinton J, Venturini L, Davey M, Jacobs J, Van Ex F, Pasha A, et al.The transcriptional landscape of polyploid wheat. Science. 2018; 361(6403):1–12.

    Article  Google Scholar 

  25. Yu M, Carver BF, Yan L. Tamir1123 originated from a family of miniature inverted-repeat transposable elements (mite) including one inserted in the vrn-a1a promoter in wheat. Plant Sci. 2014; 215:117–23.

    Article  PubMed  Google Scholar 

  26. Poretti M, Praz CR, Meile L, Kälin C, Schaefer LK, Schläfli M, Widrig V, Sanchez-Vallet A, Wicker T, Bourras S. Domestication of high-copy transposons underlays the wheat small rna response to an obligate pathogen. Mol Biol Evol. 2020; 37(3):839–48.

    Article  CAS  PubMed  Google Scholar 

  27. Ruby JG, Jan CH, Bartel DP. Intronic microrna precursors that bypass drosha processing. Nature. 2007; 448(7149):83–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Meng Y, Shao C. Large-scale identification of mirtrons in arabidopsis and rice. PLoS ONE. 2012; 7(2):31163.

    Article  Google Scholar 

  29. Joshi PK, Gupta D, Nandal UK, Khan Y, Mukherjee SK, Sanan-Mishra N. Identification of mirtrons in rice using mirtronpred: a tool for predicting plant mirtrons. Genomics. 2012; 99(6):370–5.

    Article  CAS  PubMed  Google Scholar 

  30. Yi F, Xie S, Liu Y, Qi X, Yu J. Genome-wide characterization of microrna in foxtail millet (setaria italica). BMC Plant Biol. 2013; 13(1):212.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Patanun O, Lertpanyasampatha M, Sojikul P, Viboonjun U, Narangajavana J. Computational identification of micrornas and their targets in cassava (manihot esculenta crantz.)Mol Biotechnol. 2013; 53(3):257–69.

    Article  CAS  PubMed  Google Scholar 

  32. Smalheiser NR, Torvik VI. Mammalian micrornas derived from genomic repeats. TRENDS in Genetics. 2005; 21(6):322–326.

    Article  CAS  PubMed  Google Scholar 

  33. Niwa R, Slack FJ. The evolution of animal microrna function. Curr Opin Genet Dev. 2007; 17(2):145–50.

    Article  CAS  PubMed  Google Scholar 

  34. Piriyapongsa J, Jordan IK. A family of human microrna genes from miniature inverted-repeat transposable elements. PloS ONE. 2007; 2(2):203.

    Article  Google Scholar 

  35. Piriyapongsa J, Mariño-Ramírez L, Jordan IK. Origin and evolution of human micrornas from transposable elements. Genetics. 2007; 176(2):1323–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lehnert S, Van Loo P, Thilakarathne PJ, Marynen P, Verbeke G, Schuit FC. Evidence for co-evolution between human micrornas and alu-repeats. PLoS ONE. 2009; 4(2):4456.

    Article  Google Scholar 

  37. Gandikota M, Birkenbihl RP, Höhmann S, Cardon GH, Saedler H, Huijser P. The mirna156/157 recognition element in the 3 utr of the arabidopsis sbp box gene spl3 prevents early flowering by translational inhibition in seedlings. Plant J. 2007; 49(4):683–93.

    Article  CAS  PubMed  Google Scholar 

  38. Campo S, Peris-Peris C, Siré C, Moreno AB, Donaire L, Zytnicki M, Notredame C, Llave C, San Segundo B. Identification of a novel micro rna (mi rna) from rice that targets an alternatively spliced transcript of the n ramp6 (n atural resistance-associated macrophage protein 6) gene involved in pathogen resistance. New Phytol. 2013; 199(1):212–27.

    Article  CAS  PubMed  Google Scholar 

  39. Feldman M, Levy AA. Origin and evolution of wheat and related triticeae species. In: Alien Introgression Wheat. Springer: 2015. p. 21–76.

  40. Yaakov B, Ceylan E, Domb K, Kashkush K. Marker utility of miniature inverted-repeat transposable elements for wheat biodiversity and evolution. Theor Appl Genet. 2012; 124(7):1365–73.

    Article  CAS  PubMed  Google Scholar 

  41. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. Ncbi blast: a better web interface. Nucleic Acids Res. 2008; 36(suppl_2):5–9.

    Article  Google Scholar 

  42. Krueger F. Trim galore. A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. 2012.

  43. Appels R, Eversole K, Feuillet C, Keller B, Rogers J, Stein N, Pozniak CJ, Choulet F, Distelfeld A, Poland J, et al.Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018; 361(6403):7191.

    Article  Google Scholar 

  44. Zuker M, Stiegler P. Optimal computer folding of large rna sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981; 9(1):133–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, Kelley K, Hamrick JB, Grout J, Corlay S, et al.Jupyter notebooks-a publishing format for reproducible computational workflows. In: Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press: 2016. p. 87–90.

  46. The pandas development team. pandas-dev/pandas: Pandas: Zenodo; 2020.

  47. Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al.Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009; 25(11):1422–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Waskom ML. seaborn: statistical data visualization. J Open Source Softw. 2021; 6(60):3021.

    Article  Google Scholar 

Download references


Not applicable.


The present work was funded by the Instituto Nacional de Tecnología Agropecuaria grants PNBIO 1131022 and 2019-PE-E6-I116-001. JMC is a research fellow of the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) at the EEA INTA Marcos Juárez, Córdoba, Argentina.

Author information

Authors and Affiliations



This work was conceived by JMC and LSV. JMC and LSV led the bioinformatic analyses with collaboration from DZ, MDV and SA. JMC, LSV and DZ wrote the first version of the manuscript while MDV, SA, MH and EF contributed to the improvement of the final version. All authors approved the final manuscript.

Corresponding author

Correspondence to Juan M. Crescente.

Ethics declarations

Ethics approval and consent to participate

This study does not include human or animal subjects.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

FASTA file containing 2002 unique MITEs used in this study.

Additional file 2

GFF file containing 1.211.340 MITEs locations in the IWGSC wheat genome.

Additional file 3

pre-miRNAs secondary structures predictions.

Additional file 4

CSV file containing detailed information about the 270 valid miRNAs detected through ShortStack. Contains additional information about homology with MITEs and genomic annotations.

Additional file 5

MITEs and pre-miRNAs sequences and alignments between them.

Additional file 6

Figure S1: Length distribution of the sRNAs produced by the 38 MITE-derived miRNAs. Figure S2: Description of the MITE-derived miRNA 5 production site and an alignment of the target transcript TraesCS2A02G281000.2 region considering the wheat relatives for the A genome. Figure S3: Description of the MITE-derived miRNA 36 production site and an alignment of the target transcript TraesCS6A02G276700.1 region considering the wheat relatives for the A genome. Figure S4: Description of the MITE-derived miRNA 36 production site and an alignment of the target transcript TraesCS3A02G274100.1 region considering the wheat relatives for the A genome. Figure S5: Hairpin plots of the secondary structures of 4 examples of highly conserved miRNAs and 4 examples of MITE-derived miRNAs.

Additional file 7

CSV file containing detailed information about all targets detected by Cleaveland and psRNATarget. Contains additional information about homology with MITEs and genomic annotations.

Additional file 8

Degradome plots of all miRNA and targets discovered.

Additional file 9

MITE-derived miRNA expression profiles in different ages and high level tissues.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Crescente, J.M., Zavallo, D., del Vas, M. et al. Genome-wide identification of MITE-derived microRNAs and their targets in bread wheat. BMC Genomics 23, 154 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: