Genetic analysis of three maize husk traits by QTL mapping in a maize-teosinte population

Background The maize husk consists of numerous leafy layers and plays vital roles in protecting the ear from pathogen infection and dehydration. Teosinte, the wild ancestor of maize, has about three layers of small husk outer covering the ear. Although several quantitative trait loci (QTL) underlying husk morphology variation have been reported, the genetic basis of husk traits between teosinte and maize remains unclear. Results A linkage population including 191 BC2F8 inbred lines generated from the maize line Mo17 and the teosinte line X26–4 was used to identify QTL associated with three husk traits: i.e., husk length (HL), husk width (HW) and the number of husk layers (HN). The best linear unbiased predictor (BLUP) depicted wide phenotypic variation and high heritability of all three traits. The HL exhibited greater correlation with HW than HN. A total of 4 QTLs were identified including 1, 1, 2, which are associated with HL, HW and HN, respectively. The proportion of phenotypic variation explained by these QTLs was 9.6, 8.9 and 8.1% for HL, HN and HW, respectively. Conclusions The QTLs identified in this study will pave a path to explore candidate genes regulating husk growth and development, and benefit the molecular breeding program based on molecular marker-assisted selection to cultivate maize varieties with an ideal husk morphology.

chromosome 1S, 1 L, 3 L and 7 L [9]. In 2016, the first genome wide association study (GWAS) for NH and husk weight were performed using 3 K SNP markers and identified a total of 24 and 29 SNPs associated with HN and husk weight, respectively [19]. At the same year, our group also performed GWAS using a larger scale of population with higher marker density (508 lines with 0.5 million of SNP markers) [13]. Under the stringent threshold P < 1.04 × 10 − 5 , nine variants significantly associated with HN, HW and HL were identified [13]. In 2018, the linkage mapping integrated with GWAS revealed five candidate genes related to HL and HN [14]. In 2020, utilizing denser marker (1.25 million) coupled with advanced statistical method, the other five candidate genes associated with HN and HT were detected [2]. Overall, these studies have unambiguously addressed that husk traits are complex and genetically controlled by multiple genes.
Teosinte (Zea mays ssp. parviglumis) is the wild progenitor of maize [20][21][22]. It exhibits significant resistance to cold [23,24], drought [23], waterlogging [25,26], pests [27] and diseases [27]. Maize-teosinte populations have been emerged as the ideal materials to identify important genes or QTLs related to multifaced maize traits. In addition, it is helpful to reveal the genetic mechanism of maize adapting to domestication and facilitate continued improvement of maize yield and quality [28][29][30][31][32]. In this study, we utilized a maizeteosinte population (MX) to analyze the genetic basis of three phenotypic husk traits, including HL, HN and HW. In addition, we positioned the large-effect QTL intervals using the bin map and predicted candidate genes associated with husk traits. A total of 4 QTLs were mapped out and 6 candidate genes were identified.

Phenotypic variation and heritability of husk traits
The phenotypic variation and heritability of three husk traits in the parental line Mo17 and the recombinant inbred line (RIL) population in three environments were summarized in Table 1. Analyses of the best unbiased linear predictive (BLUP) values revealed that there was a broad range of phenotypic variation while the mean values were close to Mo17-parent value for all the three traits ( Table 1). The three husk traits exhibited continuous and approximately normal distributions (Fig. 1). HL and HW were positively correlated, suggesting that the husk growth and development were coordinated in the aspects of length and width. The calculation of Broad-sense heritability revealed the high heritability for all the three husk traits (0.91, 0.86, 0.86 for HL, HN and HW, respectively) ( Table 1), indicating that the majority of husk phenotypic variations are controlled by genetic factors and suitable for further QTL mapping analysis.

Identification of QTLs for three husk traits
With the ultra-high-density linkage maps, a total of four QTLs were identified after 1000 permutations with an empirical logarithm of the odds (LOD) threshold of 3.5, 4.0 and 3.5 for HL, HN and HW, respectively (Table 2 and Fig. 2). The average of QTL intervals was 6.0 Mb with a range of 5.1-8.9 Mb. For HL, one QTL (qHL-1-1) was detected on chromosome 1 and the phenotypic variation explained by this QTL was 9.6%. For HN, a total of two QTLs (qHN-1-1 and qHN-1-2) were identified on chromosome 1 and the phenotypic variation explained by each individual QTL was 8.9%, respectively. For HW, one QTL (qHW-3-1) was identified on chromosome 3 and explained 8.1% of phenotypic variation. All the four QTLs explained less than 10% of phenotypic variation, indicating that HL, HN and HW are controlled by multiple small-effect QTLs in this BC 2 F 8 teosinte-maize population.

Genetic overlap of QTL in MX and other RIL populations
To assess the genetic overlap related to different husk traits, a 1.5-LOD support interval of QTL for HL, HW, and HN in MX population and the other three RIL populations [14] were compared (Fig. 3). This analysis revealed a minimal number of overlap with only qHL-1-1 and qHN-1-2 with a HW QTL in BYK population. These results suggest that genetic loci controlling husk traits in the MX population may largely differ from the other RIL populations.

Identification of candidate genes and the corresponding tissue-specific expression pattern
To explore the candidate genes underlying husk traits, four QTLs were further narrowed by bin map analysis  Fig. 4). The physical distance of peak bins ranged from 0.54 Mb -2.72 Mb (Table 3). According to the annotation in the MaizeGDB database (www.maizegdb.org), a total of 10, 58, 62 and 16 protein-coding genes were identified within peak bin for qHL-1-1, qHN-1-1, qHN-1-2 and qHW-3-1, respectively. Next, the in-silico expression pattern analysis was performed using RNA-seq data collected from husk and other nine different tissues, which are published available in an online comparative RNA-seq expression platform (https://qteller.maizegdb. org) (Fig. 5). Judged from the specific and high expression in husk, a total of six candidate genes with annotated function were identified, including 1, 2, 2 and 1 for HL-1-1, HN-1-1, HN-1-2 and HW-3-1, respectively.

Genetic basis of husk traits in the MX population
All the three husk traits in the MX population exhibited a broad range of phenotypic variation with normal distribution. The genetic analysis showed that the heritability of three husk traits is fairly high, indicative of superior genetic effect in this population studied [33]. In addition, except for environment variation, none of significant difference were detected in genetic or interaction between genetic and environment within RILs. Considering this population was conducted by twice back-cross with Mo17, the high consistency of linkage maps among RILs may result in the genetic similarity among MX RILs [34]. Moreover, significantly positive correlation was observed between HL and HW, indicating that the growth and development of husk is coordinated in the dimension of length and width. By comparison, HN was not correlated with HL or HW, suggesting that the molecular pathways regulating the numerous initiations of husk layer may be independent from husk growth. The husk traits were regulated by one, two and one QTL with small-effects (8.1-9.6%), indicating that each of three husk traits is polygenic and controlled by multiple genes with small effects in the MX population. Interestingly, when compared four QTLs of husk traits in MX with QTLs identified previously in other maize linkage populations [14], we did not detect any overlap of QTLs for the same trait. This result implies that the genetic basis of husk morphology in the MX population is kind of unique compared to other maize linkage population, highlighting the power of the MX population to interpret the genetic variation, which may be never accomplished in the regular modern maize population.

Candidate genes underlying husk QTLs
To successfully obtain candidate genes, fine-mapping is considered as the general strategy in QTL study. However, it often takes long period for back-cross to get near   isogenic lines (NIL) lines. Bin map is an alternative strategy to fine-mapping the yield-associated loci applied in sorghum [35], rice [36], and maize [2,13,14,18]. In this study, four husk related QTLs were narrowed from the original 5.1 Mb -8.9 Mb interval to 0.54 Mb -2.72 Mb region according to a bin map. Within four peak bin intervals, there are a total of 102 putative protein-coding genes. By retrieving tissue-specific expression pattern, we could identify six candidate genes with known molecular function and highly expressed in husk. For qHL-1-1, the only candidate gene was GRMZM2G106928, which encodes Copper/zinc superoxide dismutase 2 (Cu/Zn SOD2) involving in the photosynthetic antioxidant system [37]. If this gene could be proved by the future functional study, it will provide evident that the photosynthesis may play a role in regulating husk development alike the foliar leaves. For qHN-1-1, two candidate genes GRMZM2G162749 and GRMZM2G034302 were identified, which encode Cycling DOF factor 1 (CDF1) and Sucrose transporter 2 (SUC2), respectively. It has reported in Arabidopsis that Cycling DOF factors are essential for a photoperiodic flowering response [38]. In our previous study, maize flowering time showed significantly positively correlated with HN [13]. In this scenario, it is likely that the maize CDF1 could control HN via mediating the flowering time. It is well known that SUC2 functions in transporting the sucrose into phloem vascular in crops [39]. Therefore, it is likely that sucrose transported by SUC2 plays a role in husk development.
For qHN-1-2, two candidate genes GRMZM2G032339 and GRMZM5G826714 were identified, which encode a MADS-box transcription factor and a COBRA-like extracellular glycosyl-phosphatidyl inositol-anchored protein, respectively. It has been reported that the MADS-box transcription factor plays a key role in plant flowering time and node number development [40,41].  Therefore, this MADS-box transcription factor may also regulate HN through mediating maize flowering time. COBRA-family proteins have been documented as regulators of cellulose biogenesis [42], and act as essential factors in anisotropic expansion via cellulose microfibril orientation of plant morphogenesis [43]. As HN is determined by inner husk organ elongation related to anisotropic expansion, it is conceivable that COBRA-family proteins are involved in the formation of husk. For qHW-3-1, the only candidate gene GRMZM2G037650 encodes a Myb-family transcription factor, which is known to participate in multifaced molecular pathways through regulating down-stream gene expression.

Importance of QTLs relevant to husk traits in maize genetic and breeding
As the wild ancestor of maize, teosinte exhibits many advantages relative to modern maize, such as significant resistance to biotic or abiotic stresses [23][24][25][26][27]. However, during the maize domestication, hundreds of genes lost. In this context, recovering and utilizing teosinte genes became a promising strategy to further improve modern maize satisfying the requirement of varieties growing in different area. Indeed, a recent study has demonstrated that introgressing the wild UPA2 allele originated from teosinte into modern hybrids could reduce leaf angle, leading to the enhanced yield under high-density condition [44]. The heavy coverage of maize husk offers nursery and healthy environment safeguarding the early stage of ear growth and development. However, it may turn into the major barrier against kernel dehydration after physiological maturation of maize, challenging the mechanical harvest of corn production. Till now, none of genes specially regulating husk development have been identified, raising a critical issue that we do not have any objective to fulfil gene editing. Therefore, the husk-relevant QTLs offers prospective routes to modify husk morphology through molecular marker-assisted selection in maize breeding program.

Conclusion
In this work, we describe the interpretation of the genetic basis and QTL mapping of three husk traits in a teosinte-maize population. A total of four QTLs underlying husk length and width as well as the number of husk layer were identified. Importantly, all four QTLs were not overlapped with other husk-relevant QTLs identified in the previous population. Therefore, the newly-identified QTLs in this study will greatly enlarge genomic targets to explore candidate genes regulating husk growth and development, and benefit the breeding program based on molecular marker-assisted selection to pursue new varieties with proper husk morphology.

Plant materials and phenotyping
The Each line was grown in a single-row plot with a row length of 250 cm and 60 cm between rows under natural field conditions. The details about husk trait measurement were described previously [13]. We declare that all the collections of plant and seed specimens related to this study were performed in accordance with the relevant guidelines and regulations by Ministry of Agriculture (MOA) of the People's Republic of China.

Analysis of phenotypic data
The phenotypic variation of husk traits was analyzed using R software 4.0.1 with the "aov" function (ANOVA). The model for the ANOVA was y = + i + j +, where i is the effect of ith genotype, j is the effect of the jth environment with error. The broad-sense heritability of husk traits was calculated as: h2 = G2/(G2 + GE2/n + 2/n) [46], where G2 is genetic variance, GE2 is the interaction of genotype with environment, 2 is the resident error and n is the number of the environments. The BLUP value was calculated using a linear mixed model. Both genotype and environment were treated as random effects in the R function "lme4".

Genotyping and constructing the bin map
The genotype of the MX population was obtained by utilizing the Illumina MaizeSNP50 array (Illumina Inc., San Diego, CA, USA) [47], containing 56,110 SNPs. Quality control was performed by removing monomorphic markers (MAF < 5%) with a missing rate higher than 10% by PLINK software [48]. Finally, 12,390 high-quality SNPs were selected to build the genetic linkage map with CarthaGene software [49] using Perl scripts in a Linux system (www.maizego.org/Resources. html). The details about the construction of genetic linkage maps has been described previously [34]. The cosegregating markers were merged into a bin. With the logarithm of the odds (LOD) of each bin marker, a bin map could be drawn following the physical position of bin marker.

QTL mapping
The QTLs were analyzed by composite interval mapping method implemented in Windows QTL Cartographer 2.5 [50]. Genome was scanned at every 1.0 cM interval between markers using a 10 cM window size. A forward and backward stepwise regression with five controlling markers was conducted to control background from flanking markers. The LOD threshold was determined by the 1000 permutations at a significance (P < 0.05) and used to identify the significant QTL [51]. With the 1.5-LOD support interval method, the confidence interval for each QTL position was estimated [52].

Gene annotation
QTLs were delimited to a single peak bin interval based on bin map. The protein-coding genes within intervals were listed according to MaizeGDB database (V2). Each of the corresponding gene were annotated by performing BLASTP searches at the NCBI (blast.ncbi.nlm.nih.gov/ Blast.cgi).
Tissue-specific expression pattern of candidate genes RNA-seq dataset from husk were downloaded from NCBI's Sequence Read Archive (SRA) database. The quality of RNA-seq reads were controlled by FastQC software. Sequence reads were aligned to B73 RefGen_ v2 by the TopHat (v2.1.0) pipeline with a built-in Bowtie (v0.12.9) program. Unique-mapped reads were retained for counting FPKM. All the RNA-seq datasets from other nine tissues were obtained by qTeller RNA-seq expression platform (https://qteller.maizegdb.org). Then the FPKM was calculated to TPM by the model: Values used in heat-map plot were the log 10 -transformed ratios of normalized TPM counts in husk relative to other tissues.