Identification of miRNA-eQTLs in maize mature leaf by GWAS

Background MiRNAs play essential roles in plant development and response to biotic and abiotic stresses through interaction with their target genes. The expression level of miRNAs shows great variations among different plant accessions, developmental stages, and tissues. Little is known about the content within the plant genome contributing to the variations in plants. This study aims to identify miRNA expression-related quantitative trait loci (miR-QTLs) in the maize genome. Results The miRNA expression level from next generation sequencing (NGS) small RNA libraries derived from mature leaf samples of the maize panel (200 maize lines) was estimated as phenotypes, and maize Hapmap v3.2.1 was chosen as the genotype for the genome-wide association study (GWAS). A total of four significant miR-eQTLs were identified contributing to miR156k-5p, miR159a-3p, miR390a-5p and miR396e-5p, and all of them are trans-eQTLs. In addition, a strong positive coexpression of miRNA was found among five miRNA families. Investigation of the effects of these miRNAs on the expression levels and target genes provided evidence that miRNAs control the expression of their targets by suppression and enhancement. Conclusions These identified significant miR-eQTLs contribute to the diversity of miRNA expression in the maize penal at the developmental stages of mature leaves in maize, and the positive and negative regulation between miRNA and its target genes has also been uncovered.


Background
MicroRNAs (miRNAs) are a class of small RNA fragments widely distributed in plants and animal genomes that regulate their target genes through translational inhibition, mature messenger RNA cleavage, and RNA-dependent DNA methylation (RdDM) [1]. In plants, pri-miRNA is transcribed by RNA polymerase II from the miRNA gene with a 5′ cap and 3′ polyadenylation modification and subsequently removed by a microprocessor while cutting the end of the stem-loop structure to produce pre-miRNA. Subsequently, Dicer like-1 (DCL-1) and Dicer like-4 (DCL-4) nucleases cut the pre-miRNA into 21-24 nt short double-strand RNAs, which are methylated by protein Hua-Enhancer1 (HEN1) to produce mature miRNA. The mature miRNA is later exported to the cytoplasm by HASTY (HST) protein. In the cytoplasm, only a single strand of the mature miRNA is loaded into the RNA-induced ribonucleoprotein silencing complex (RISC), which includes Argonaute (AGO1), to guide the miRNA by finding its target sites to cleave the corresponding mRNA or to inhibit translation [2]. In plants, miRNAs regulate target genes by being nearly complementary to the target sequences. In contrast, in animals, they regulate target regions by relying on the recognition of 7-8 nucleotide "seed sequences" [3]. The identification of miRNAs and their target genes has been well documented in both plants [4] and animals [5]. As far as the strategy is concerned, methods such as microarray, miRNA silencing and immunoprecipitation (IP)-based approaches and computational prediction after highthroughput sequencing are also applied for miRNA target identification [6].
The target gene number of miRNAs has been reported to be much lower in plants than in animals. Most of the targets are transcription factors, indicating the importance of plant miRNAs in gene regulation [7,8]. The mechanisms of miRNA regulating different plant developmental stages have been reported, including (1) posttranscriptional regulation, e.g., miR160 and miR166 regulate the plant height of Gossypium hirsutum through auxin and ABA response factors [9]. In Arabidopsis, phosphate starvation regulated by miR399 suppressed its target by binding to the 5′ untranslated region (UTR) [10]. An increase in related miRNAs to regulate response genes under biotic and abiotic stresses has also been reported [11]. (2) Posttranslational regulation: miR172 adjusted the flowering time by inhibiting the translation of flowering gene AP2 (APETALA2) in Arabidopsis [12]. Recently, in maize, many miRNAs and their targets have been identified by computational prediction, such as miRNAs involved in maize kernel development [13,14] and grain filling stages [15]. In addition, the global repression of miRNA expression has been reported in maize hybrid lines, which may be critical for increasing yield [16,17]. These reports demonstrate the important function of miRNAs in plants [7].
Numerous miRNA studies have been reported in maize due to its importance as a major source of food starch worldwide. Whole genome miRNA expression profiles from roots, seedlings, tassels, ears, and pollen grains have been reported, in which approximately 35% of ancestral sites were found as duplicated homoeologous miRNAs [18]. Additionally, miRNAs responding to various stresses in different tissues in maize have been reported. The miRNA responses to long-term waterlogging in seedlings [19]; low phosphorus-related miRNAs [20], and low nitrate-related miRNAs [21,22] in maize seedling roots; and the miRNAs associated with the resistance of Exserohilum turcicum [23], and diazotrophic bacteria [24]. All these studies have mainly focused on the identification of miRNAs, miRNA target genes, and miRNA functions by computational prediction. However, it remains unclear how genomic loci affect miRNA expression in the maize genome.
Genome-wide association study (GWAS) has been known as an efficient way to search the locations associated with traits of interest, in particular, quantitative traits. It is similar to QTL mapping, which considers the relationship between genotypes of each marker and the variance of corresponding phenotypes. However, GWAS provides higher resolution than QTL mapping because more recombination events have occurred in the breeding history of tested populations [25,26]. GWAS has been broadly applied to analyze the correlation between different plant species and hundreds of target traits [27], including agro climatic traits in sorghum [28], carbon isotope ratio in soybean [29], and the association of genome location with head smut resistance in cotton [30]. In addition, when target gene expression was input as a phenotype in GWAS processing, the locations associated with the variations of gene expression could be found as expressed QTLs (eQTLs) in the genome [31]. This approach has been applied in a previous study in which maize root tissue genotyped by 1731 SNPs and the gene expression profile from maize microarray as phenotypes, many cis-and trans-regulators have been successfully identified as causes affecting the gene expression patterns in maize root [32]. In addition, in humans, miRNA expression-related quantitative trait loci (miR-eQTLs) in different organs or stimulations have been reported, including human fibroblasts [33], glioblastoma [34], adipose tissue [35], infected dendritic cells [36], and 5239 human whole blood samples [37].
Although the locations and functions of miRNAs in plants have been revealed by numerous studies, to the best of our knowledge, none of them have focused on miR-eQTLs in plants. Therefore, the relationship between eQTLs responsible for variations in miRNA expression levels in a plant population remains unknown. The objective of this study is to reveal this relationship by combining the maize HapMap and miRNA expression profile of a maize panel (200 maize lines) to identify miR-eQTLs. This is the first report on the identification of four miR-eQTLs in maize and the finding that miRNAs regulate their target genes in both negative and positive ways.

Result
Identification of the highly expressed miRNAs in the maize panel To obtain the miRNA expression data among the 200 maize lines (hereafter called the maize panel), small RNA (sRNA) libraries were developed individually and pooled together for sequencing. Reads with lengths ranging from 20 nt to 32 nt after trimming were kept. The distribution from total libraries showed that 24 nt, 22 nt and 21 nt sRNAs were most abundant (Fig. 1a). This distribution is similar to the previous report by Liu [38], in which the highest peak at read length was 24 nt in an sRNA library from maize developing ear tissue. This indicated that the library developed in this study is suitable for the following analysis. A total of 31 miRNAs belonging to 17 miRNA families were identified and defined as highly expressed miRNAs based on the sum of total read counts greater than 2000 among 200 maize lines of the panel (Fig. 1b). Among highly expressed miRNAs, miR159a-3p showed the highest expression level with 1,369,775 reads, and miR171d-3p had the lowest expression level with 2160 reads.
To understand the diversity of miRNA expression patterns in the maize panel, 31 highly expressed miRNAs were analyzed by multidimensional scaling (MDS) analysis, and no cluster was detected (Fig. 2). Furthermore, the Boxcox transformation was applied for highly expressed miRNAs individually, which transformed the distribution of miRNA expression to a normal distribution for GWAS analysis.

The effect of miR-eQTLs on their target genes
It is evident that miRNAs control plant development by regulating their target genes. To understand whether miR-eQTLs identified in this study could affect the expression of the associated miRNAs and their target genes, the expression levels of miRNA and their target genes in the maize panel were obtained from the same mature leaf RNA samples for both miRNA library and  mRNA library construction. In this way, we could ensure the accuracy of the expression level of miRNA relative to their target genes. To understand the regulation of miRNAs on their target genes, the expression levels of miRNAs and their target genes were collected, separated and compared based on the genotypes of significantly associated SNPs in the identified miR-eQTLs individually ( Table 4). For miR-eQTL to contribute to miR156k-5p, the miRNA expression level was lower when the most associated SNP type (marker 6_96,303,448) was A but higher when the SNP type was G (Fig. 3c). In contrast, the target gene sbp29 of miR156k-5p displayed a higher expression level if the SNP type was A and a lower level when the SNP type was G (Fig. 3d). Therefore, a significantly negative relationship between the expression levels of miR156k-5p and the target gene sbp29 was identified. This indicated that miRNA might exert its control over target genes by suppression. A similar trend was shown between miR390a-5p ( Fig. 5c) and its target gene GRMZM5G806469 (Fig. 5d). On the other hand, the expression levels of miR159a-3p and its target gene GRMZM2G416652 (MYB-transcription factor 122, myb122) were both lower when the strongest associated SNP type was A and higher when the SNP type was C ( Fig. 4b and c), indicating a positive relationship between them. The same positive relationship was also discovered between miR396e-5p (Fig. 6c) and its target genes GRMZM2G478709, GRMZM2G124566 (GRF-transcription factor 9, grftf9), and GRMZM2G033612 (GRF-transcription factor 5, grf5) ( Fig. 6d to f). In summary, the miRNAs associated with miR-eQTLs identified in this study might exert their control over their targets

Highly expressed miRNAs and coexpressed miRNAs
This study aimed to identify miR-eQTLs in a specific developmental stage using the mature leaf tissue (leaf at the flowering time) of the maize panel as RNA sources for both mRNA and small RNA libraries. Gene regulation at this stage is not only important for maize yield but also a very complex network with many genes controlled in a highly and precisely regulated way. The highly expressed miRNAs might be different among plant tissues. Zhang et al. [18] studied the miRNA expression levels of 26 maize miRNA families among five different tissues (root, seedling, tassel, ear and pollen). It was shown that the highest expressed miRNAs varied based on different maize tissues, suggesting tissue-and developmental stage-specific miRNA expression [18]. In this study, miRNAs of the mature leaf tissue analyzed showed that the expression level of miR159a-3p was the highest, while miR171d-3p was the lowest, which was quite different from the reported miRNA types in a previous study [18]. This might be caused by the difference in environmental factors in the maize field including water, sunlight, winds, and soil conditions between materials reported in Zhang et al. [18] and this study. According to the pervious study, the correlation between different miRNAs which physically clustered miRNAs (< 10 kb) showed a higher positive correlation in humans. It was suggested that coexpression might occur among those miRNAs in the proximity of the genomic regions [39]. To understand if 31 highly expressed miRNAs were response to the environment or the physically clustered, the coexpression of highly expressed miRNA were detected. In our study, none of those miRNAs with a strong positive correlation were located on the same chromosome, even though some of them were within the same miRNA family. This indicated that the physical locations of miRNAs might not be a major factor responsible for the coexpression of miRNAs during the maize mature leaf developmental stage. With regard to the coexpression of miRNAs, the function of the miRNA target gene is one of the possible reasons why miRNAs would be coexpressed within the same miRNA family or between members of different miRNA families. For example, coexpression of members within or between miR166 and miR167 was identified in this study, and maize miR166 and miR167 families have been reported to all target bZIP transcription factors and auxin response factors, respectively [40]. The bZIP transcription factors have been reported to be involved in pathogen defense responses, light and stress signaling, seed maturation, and flower development [40]. Auxin response factors were reported to affect leaf and flower development [41]. During maize plant growth, leaves might need bZIP transcription factors to respond to stress and auxin response factors for developmental regulation. This might explain why the coexpression within the same family members of miR166 (miR166a-3p vs miR166e, miR166e vs miR166k-3p, miR166a-3p vs miR166I-3p, miR166k-3p vs miR166I-3p) and miR167 (miR167a-5p vs miR167e-5p) and members between different miRNAs (miR166k-3p and miR167h-3p) were detected in this study. On the other hand, Sunkar [11] reported that maize miR166 and miR167 showed similar expression patterns after treatment with hypoxia stress, which might also explain the coexpression of members between the two miRNA families. Nevertheless, the target gene functions of miR398, miR408, and miR528, which were coexpressed in this study, remain unknown, and more experiments are required to understand the relationship between members of these miRNA families. Furthermore, the maize panel we used could be distinguished into six subgroups based on their genetic background [42]. We have tested if the six genetic groups affect the miRNA expression of the final four targeted miRNAs, and found that no differences was shown among six different maize subgroups.

The trans miR-eQTLs and their candidate genes
Trans-regulators tend to be identified more than cisregulators when they contribute to miRNA expression in  fibroblasts [33] and whole blood [37] in humans. Similarly, 70% of found eQTLs contributed as transregulators in maize, rice, and brassica rape (reviewed by [32]). Furthermore, the trans-eQTLs for regulatory genes were mostly found > 100 kb away (71.6%) from their regulated target genes, and 8% were located on different chromosomes in maize [43]. It has also been discovered that trans-eQTLs are highly specific to the environment in C. elegans [44] and plants [32]. In this study, the identified eQTLs were trans-regulators, with one located  100 kb away and three located on different chromosomes to their target miRNA. This suggests that each eQTL detected in this study might be specific to a certain environment. The four miRNAs identified with eQTLs in the present study indicate that miRNAs play an important role in plant development and environmental adaptation. For instance, miR156k and miR159a have been proven to respond to salt stress in maize roots [45] or change their expression under hormone depletion and light exposure in maize somatic embryogenesis [46]. In addition, miR396e has been revealed to respond to UV-B regulation in maize leaves [47], and the miR390 family has been found to accumulate during the development of the maize shoot apex [48]. Maize growth in the field may owe its endurance to the above special environmental factors. This could explain why the eQTLs we found were all trans-regulators. Furthermore, three targeted miRNAs in our study, miR156, miR159 and miR390 have been found functional involved in flowering time, a sensitive trait that response to the environment, in which played important roles to regulate the transcription factors and control the time of transition from the vegetative stage into the productive stage in a previous report by Spanudakis and Jackson [49]. This may indirectly provide the evidence that trans-regulators identified in this study have shown the interaction with the environment. However, more experiments were required in the future to validate the assumptions. A total of eight candidate genes were found within the genomic region of miR-eQTL that contribute to the expression variance of miR156k-5p ( Table 3). One of the candidate genes, hagtf27 (GRMZM2G049730), showed homology to histone acetyltransferases involved in several metabolic pathways in plants [50], such as lipid metabolism and jasmonic acid biogenesis in cotton [51]. The function of ubiquitin family proteins is a key characteristic of cell autophagy and proteasome metabolism [52]. Another candidate gene, dmag1 (GRMZM2G171317), is a DNA glycosylase superfamily protein participating in the DNA damage repair pathway [53]. Additionally, the candidate gene GRMZM2G171328 showed homology to a DOMON domain-containing protein in rice, indicating an important role in heme and sugar recognition in maize cells [54]. The candidate gene AC233936.1_GF003, similar to proteins with the WD domain, has been reported to have multiple functions in genome integrity and cell cycle progression [55]. It has been reported that DNA damage could regulate miRNA expression through a p53dependent pathway or modulation of the steps of miRNA processing and maturation. MiRNAs were also shown to be related to DNA damage repair and apoptosis [53]. The sampled tissue in this study was mature leaves in which cells might undergo DNA damage, even the progression of autophagy and apoptosis. This might explain why these candidate genes were identified within the miR-eQTL genomic region and contribute to the expression of miR156k-5p in this study. For another two miR-eQTLs, MiR390a-5p and miR396e-5p, no annotation of the candidates GRMZM2G108694 and GRMZM2G027282 in maize genome databases could be found. However, the translated sequences of GRMZM2G108694 showed homology to Arabidopsis cytokinin-independent 1 (CKI 1), which functions as a cytokinin receptor [56]. In addition, the translated sequences of GRMZM2G027282 showed homology with Arabidopsis ATS6A.2 and RPT5A, which played an essential role in gametophyte development [57]. The expression of these genes has no significant correlation to the miRNA expression after analysis by Pearson's correlation coefficient. This suggested that there might be other factors affecting miRNA expression in additon to the genes within the eQTL region.
The possible regulatory mechanisms of the four miR-eQTLs and their target genes The relationship between miRNAs and their target genes is of great interest. This study identified that miR-eQTLs contribute to miRNAs, which showed both negative (miR156k-5p, miR390a-5p) and positive (miR159a-3p, miR396e-5p) ways of regulating their target genes individually (Table 4, Fig. S2). In humans, the localization of miRNA has been reported to play a key role in determining the mechanisms of target gene regulation [58]. In this study, the target binding sites of miR156k-5p and miR390a-5p were all at the 3'UTR of miRNA target genes. On the other hand, miR159a-3p and miR396e-5p were at the CDS of miRNA target genes. For the negative regulation of miRNAs on their target genes, such as miR156k-5p and miR390a-5p in this study, it has been reported that human miRNAs repress their target genes through RNA degradation and translational repression pathways. For positive regulation, in humans, miRNA was reported to accumulate in the cell nucleus [59]. Several types of human miRNAs in which target gene expression is upregulated have been reported, including promoter-targeting, TATA-box-activating and enhancer-associated miRNAs. However, this study identified positive regulation of miR159a-3p and miR396e-5p through binding the CDS of target genes that did not include any type of upregulated miRNA described above [60]. This indicated that the positive regulation mechanisms of miRNAs in maize might be different from those in humans, and this might be a novel or plant-specific regulation method.
Conclusion miRNAs have been proven to universally exist in animals and plants. Its biogeneration methods, functions, and target genes have also been investigated carefully. In this study, we estimated the correlation among 31 highly expressed miRNAs using NGS on small RNA libraries derived from mature leaf samples of the maize panel (200 maize lines) to understand the network of miRNAs. A strong positive correlation was found within and between miRNA families, and environmental factors might be the cause of the coexpression. Four miR-eQTLs contributing to miR156k-5p, miR159a-3p, miR390a-5p and miR396e-5p were identified by GWAS analysis on the maize panel. This study is the first report on the identification of maize miR-eQTLs and demonstrates that both negative and positive regulatory relationships exist between miR-eQTLs and their target genes. The positive regulation of miR159a-3p and miR396e-5p on the targets suggests that some plant miRNAs might regulate their target genes positively by a mechanism different from that of humans.

Plant materials and total RNA extraction
The maize 282 panel as mentioned in the previous study [61] was applied. The mature leaf tissues (half individuals of each row flowering in the field) of each maize line was collected with two different date (8 August 2014 and 26 August 2014) to minimize the effects from the collection date. The leaf defined as matured when at least half of individuals per genotype were flowering in the field before the two harvest dates above. The second leaf from tassel was collected and the leaf section was made based on 1 cm square in the center of the leaf, three plants were combined per genotype for further analysis. The total RNA was extracted in a previous report [61]. Total RNA was prepared for further small RNA libraries.

Small RNA library development and Next Generation Sequencing
For developing small RNA libraries through the NEB small RNA library prep kit, a 2 ng fraction of the total RNA of each sample from 282 maize lines was prepared and shipped to the company "Global Biologics". Small RNA libraries were diluted to 4.3 mM individually, and 96 libraries were pooled together for next-generation sequencing by NextSeq 500 with the condition of 75 bp, single end. The raw reads were uploaded to NCBI (Bio-Project ID: PRJNA599406).

Library analysis and estimation of miRNA expression
A total of 200 out of the original 282 libraries derived from mature leaf samples were selected based on the quality of sequencing, and reads with lengths between 20 and 32 bp were kept after trimming the adapter. Then, reads were eliminated from rRNA, tRNA, and snoRNA by the Bowtie alignment tool [62]. From miRbase (http://www.mirbase.org/), 203 maize unique mature miRNA sequences were downloaded as a reference of mature miRNAs. Reads from libraries were classified by length and matched with the relative length of the miRNA reference. For example, reads lengths equal to 20 bp were separated from each library and matched with an miRNA reference of length equal to 20 bp. Highly expressed miRNAs were selected and defined as the sum of matched read numbers among all maize lines in the panel higher than 2000 and kept for further analysis. Data transformation was performed by Boxcox [63] with the matched read number as a phenotype of the specific miRNA. The correlation among all highly expression miRNA was performed by the Pearson correlation coefficient under R program.
Multidimensional scaling (MDS) and Genome-wide association study (GWAS) MDS was performed by the R program to identify the distance relationship among all samples based on the highly expressed miRNAs. Maize hapmap 3.21 was chosen as a genotype [64] and imputed by the K-nearest neighbor imputation (KNNi) method [65]. GWAS analysis was performed by TASSEL 5.0 [66] under GLM with 5 genetic PCAs plus 2 latent PCAs as covariates to control the population structure. SNPs were kept at only a p-value lower than 0.1 after the analysis. To test if the miRNA and target gene show the different expression level based on the highest associated SNP, T-test was performed under R program.
Additional file 2: Fig. S2. Regulation network among most significant SNP, miRNA and miRNA target genes. a. Regulation network of miR156k-5p and its target genes based on the most significant SNP. b. Regulation network of miR159a-3p and its target genes based on the most significant SNP. c. Regulation network of miR390a-5p and its target genes based on the most significant SNP. d. Regulation network of miR396e-5p and its target genes based on the most significant SNP.