Significance of the study for taro improvement
Farmers and breeders have focused on selecting crops with desirable phenotypes for several years [45] which leads to loss of genetic and phenotypic variation. This is the major cause for genetic bottlenecks especially when stress occurs [46]. For example, Markwei et al. [47] reported the loss of cocoyam cultivar amankani kyirepe and that others such as Amankani fita and amankani Serwaa face the risk of being lost. Hence, evaluation of genetic diversity and genome association study is an important step for further genetic conservation and breeding program of the crop.
Taro has a large genome estimated to be 4.08 Gbp [48]. However, currently, taro genome of only 2.2 Gbp (chromosome based) and 0.27 Gbp of unknown region is available in NCBI database submitted by Jiangsu Academy of Agricultural Sciences [49]. This is promising progress to improve our understanding of taro genetics but still needs further sequencing to a high-quality reference genome. That might have led to uneven distribution of the SNPs across the chromosomes in this study. The size of the sequenced reference genome also varied 212.14 Mbp (Chromosome 1) to 102.22 Mbp (Chromosome 12). This may also be another cause for the uneven distribution of the SNP markers across the chromosomes.
Southeast Nigerian taro has untapped phenotypic variability
Significant variability was observed among the landraces in all studied morphological traits. The phenotypic variation among landraces was also high which more desirable for selection. Specifically, higher the genetic variation than environment variation among landraces is an indication of the potential for selection of the given trait. Corm diameter, corm length, cormel diameter, cormel length, number of suckers per plant, plant height, yield per hectare, and yield per plant traits had more genetic coefficient of variation than environmental variation. These traits might be used for clonal selection for further improvement of taro landraces. Similarly, Mukherjee et al. [50] reported that high genotypic coefficient of variability (GCV) values for weight of cormels per plant and number of cormels per plant. The trait heritability varied from medium to high except in number of cormels per plant trait. Both heritability and genetic advance were high for corm diameter, and yield per hectare. Such high heritability followed by high genetic advance indicates that clonal selection may be effective for the improvement of such characters. The phenotypic coefficient of variability (PCV) was generally higher than the GCV for all the studied traits but the differences were quite small except for the number of cormels per plant. This suggests that environmental effects constitute a less portion of the total phenotypic variation in the traits [51].
Collinear explained SNP variation more than single climatic variable
Although RDA and LFMM are efficient methods to identify candidate SNPs associated with variability in environmental conditions [52, 53], no significant relationship was detected between any of the SNPs and climatic variables (the temperature and precipitation) alone. In total, only 10% variation of the SNP explained by collinear of altitude, annual temperature, annual precipitation, and space among 92 Nigerian taro landraces. The maximum SNPs variation (4%) is explained by the collinear of annual temperature, altitude, and space. This suggests that collinear climatic variables are more important than single climatic variables in shaping variation for taro clinal adaptation. A considerable percentage of the variance was not explained by either geographic location or climatic variables, implying that other factors such as human activities or human habitation may be important. According to recent studies, sorghum genetic structure has also been shaped via seed sharing and ethnolinguistic grouping [8, 54]. Markwei et al. [47] also reported that the development of human selection based on people’s interests and their cultural communication habits has great impact on taro diversity and distribution in China. Taro seeds that are exchanged among farmers and grown often harbour a unique genetic diversity in landraces [55].
Southeast Nigerian taro has low admixture
The success of plant breeding is associated with accessing landraces and wild relatives of crop species for new sources of variation [56]. Hence, knowledge about the genetic diversity and the population structure of landraces is needed to access the reservoir of favourable alleles within landrace or wild germplasm. The collection (92 taro landraces) was grouped into four subpopulations with low admixture (6.5%) among the individuals in the collection. The low admixture observed is likely due to low gene flow among subpopulations or individuals in the subpopulation. This indicates the introduction of new genetic lineages into a population is low. Different studies reported that taro is not native to Africa and it reached through human migration with a single clone introduction from a single point of origin, then the accumulation of mutations leading to different multi-locus genotypes during the dissemination process [10, 11]. This may lead to loss of genetic resources due to outbreaks (such as new pests and diseases or climatic changes). Recently, loss of genetic resources started with the outbreaks of taro leaf blight disease in west Africa including Nigeria [57]. Hence, taro breeding through hybridization is important in Nigeria. However, taro is a clonally propagated crop with different polyploidy levels 2n = 2x = 14, 28, 42 [58, 59]. One, the challenge of performing cross-pollination due to the infrequent flowering habit such as rarely flowers and its flower anatomy discourages natural pollination when it does. However, Wilson and Cable [60] reported that the application of gibberellic acid-induced flower formation in taro increases the possibility of producing new taro varieties or hybrids. Another option is the introduction of the germplasm from centre of origin or centre of diversity. The region may have germplasm suitable for hybridization breeding such as Oceania, New Guinea, and Hainan Island [12, 61].
No single model exclusively is suitable for all studied traits in taro
The Q-Q plot shows how well the null hypothesis fits without phenotypic association with SNP. The expected and observed distributions should overlap and most SNPs should be diagonal. Power et al. [62] reported that some SNP deviations may reflect expanded p-values due to population structure, but very few deviates from the diagonal of a truly polygenic trait. Overall, the results showed that for all the properties investigated, there is no single method that can better detect population-confounding effects than other methods. However, given the overall performance of the five models, Blink appeared to have a slight advantage over the other models and was selected for subsequent evaluation of all GWAS models.
GEA identified local adapted loci and candidate genes
Signatures of selection and local adaptation can be evaluated in populations across entire genomes or genome sampling using population differentiation approaches (i.e., outliers) or in association with environmental variables to test the influence of biotic and abiotic factors in the spatial genomic structure. A total of nine SNP markers were associated with environmental variables. Specifically, S_101024366 marker was significantly associated with all environmental variables except precipitation of the warmest quarter. The scaffold that contains this significant marker is NMUH01001869.1 genebank accession number in NCBI. This accession region contains six candidate genes (Fig. 6). Hence, all the genes are six hypothetical unknown proteins in taro genome. The genes nucleotide sequences blasted in NCBI using default parameters. Hence, one of the genes, accession number MQL96045.1 (Taro_0284712), identified the homologues region in Diospyros lotus (date-plum) DNA for the astringency trait with the 2e-15 E-values and 85.86% of identity. This Taro_0284712 is in the range of LD window size (35 kb). One of the most essential aspects of fruit sensory quality is astringency [63, 64]. This might favour during human selection. Astringency is dominant in tannin sorghums [65]. Traditional sorghum varieties with medium tannin (moderate astringency) levels are widely cultivated and utilized for staple foods and alcoholic beverages in eastern and southern Africa [66]. However, some African cultures prefer tannin sorghums (more astringency) because the porridge from tannin sorghums stays in the stomach longer and giving the farmer the feeling of being full for the majority of the working day. Taro leaves are known by astringent due to the acridity content of the plant [65].
Another significantly associated marker is S_100991964. It was associated with all precipitation variables such as annual precipitation, precipitation of wettest quarter, precipitation of driest quarter, precipitation of warmest quarter, and precipitation of coldest quarter. NMUH01002301.1 (Colocasia esculenta cultivar Niue isolate Niue_2 TARO_scaffold_002301) accession number or Scaffold contained this S_100991964 SNP marker. Seven genes were linked within NMUH01002301.1 accession which was identified as hypothetical protein in taro (Fig. 7). Again, the genes nucleotide sequences blasted in NCBI using default parameter. One of the genes (MQL99127.1, Taro_031845) is homologous with diaminopimelate decarboxylase gene in different crops (Hevea brasiliensis, Gossypium arboretum, Manihot esculenta, Jatropha curcas, Ricinus communis, Populus alba, and Citrus sinensis) with E-values 2e-37 to 4e-19. Interestingly, this diaminopimelate decarboxylase gene is highly expressed under induced drought stress in different crops [67].
Another gene (MQL99126.1), co-located with S_100991964 SNP marker, was found homologues with cyclin dependent kinase (CDK) gene in different plants (Populus alba, Daucus carota, Prosopis alba, Zingiber officinale, Glycine max, and Brassica rapa) with the E-values ranging 5e-08 to 4e-04. CDKs are core cell cycle regulators and play important role in different aspects of plant growth and development [67, 68]. Several studies have indicated the involvement of CDKs in the plant stress responses [68,69,70,71]. Magwanga et al. [70] also reported that the possibility of CDKF-4 s and CDKG-2 s primary regulators of drought responses in cotton.
MQL99125.1gene, co-located with S_100991964 SNP marker, has a homologues region with MYB transcription factor (MYB) in different plants (Anthurium andraeanum, Elaeis guineensis, Ricinus communis, Pinus radiata, Triticum aestivum, and Hordeum vulgare). MYB family transcription factors play crucial roles in response to abiotic stresses [72, 73]. For instance, TaMYB31 is transcriptionally induced by drought stress in Arabidopsis thaliana [74].
Genome-wide association study (GWAS)
Mapping traits in taro population provides another opportunity to validate and use the SNP markers for further breeding programs. GWAS identified a total of 45, 40 and 34 significant SNP markers associated with studied traits in combined, year 1 and year 2 data sets, respectively. Out of these, five markers were identified in two data sets out of the three, including S1_18891752 S3_100795476, S1_100584471 S1_100896936 and S2_100587991. Additionally, single SNP marker (S2_100587991 SNP) was associated with a climatic variable (mean temperature of warmest quarter) and phenotypic trait (yield per hectare). S2_100587991 SNP is found in scaffold of NMUH01001840.1. This scaffold contains 17 genes identified as hypothetical proteins in the taro genome. Several genes are linked to the identified five significant SNP markers that are identified as hypothetic proteins in the taro genome with the 35 kb window size. The Blast result is presented in detail in Table S5.