Expression quantitative trait loci infer the regulation of isoflavone accumulation in soybean (Glycine max L. Merr.) seed

Background Mapping expression quantitative trait loci (eQTL) of targeted genes represents a powerful and widely adopted approach to identify putative regulatory variants. Linking regulation differences to specific genes might assist in the identification of networks and interactions. The objective of this study is to identify eQTL underlying expression of four gene families encoding isoflavone synthetic enzymes involved in the phenylpropanoid pathway, which are phenylalanine ammonia-lyase (PAL; EC 4.3.1.5), chalcone synthase (CHS; EC 2.3.1.74), 2-hydroxyisoflavanone synthase (IFS; EC1.14.13.136) and flavanone 3-hydroxylase (F3H; EC 1.14.11.9). A population of 130 recombinant inbred lines (F5:11), derived from a cross between soybean cultivar ‘Zhongdou 27’ (high isoflavone) and ‘Jiunong 20’ (low isoflavone), and a total of 194 simple sequence repeat (SSR) markers were used in this study. Overlapped loci of eQTLs and phenotypic QTLs (pQTLs) were analyzed to identify the potential candidate genes underlying the accumulation of isoflavone in soybean seed. Results Thirty three eQTLs (thirteen cis-eQTLs and twenty trans-eQTLs) underlying the transcript abundance of the four gene families were identified on fifteen chromosomes. The eQTLs between Satt278-Sat_134, Sat_134-Sct_010 and Satt149-Sat_234 underlie the expression of both IFS and CHS genes. Five eQTL intervals were overlapped with pQTLs. A total of eleven candidate genes within the overlapped eQTL and pQTL were identified. Conclusions These results will be useful for the development of marker-assisted selection to breed soybean cultivars with high or low isoflavone contents and for map-based cloning of new isoflavone related genes.


Background
Soy food has been taken as a functional food because it contains many health beneficial molecules such as isoflavones [1]. Studies on human nutrition have shown that soybean isoflavones play an important role in preventing a number of chronic diseases [2,3]. Equally, isoflavones are critical factors in defending soybean crops against pests [4,5], in promoting nodulation by rhizobia [6], and in changing or adjusting the microorganisms around plant roots [7]. The major bioactive components of soybean isoflavones in human nutrition are daidzein (DZ), genistein (GT) and glycitein (GC). Isoflavone contents in soybean seed are inherited as complex quantitative traits [8][9][10][11]. Since soy seed isoflavones are regulated by multiple genetic factors, their concentrations in seed are highly variable [1,[12][13][14]. Over fifty QTLs underlying individual and/ or total soybean isoflavone content have been reported [8,[15][16][17][18][19][20][21][22][23]. However, only 12 of these QTLs were in genomic regions encoding isoflavone synthesis enzymes.
Regulating transcript abundance is an effective approach to improve phenotypes [36]. The integrated analysis of genotype and transcript abundance data for association with complex traits can be used to identify novel genetic pathways involved in complex traits. 'Expression QTL' (eQTL), first defined by Jansen and Nap [37], could identify the genetic determinants of transcript abundances and is widely used for investigating gene regulation pathways. This approach treats transcript abundance of individual genes as quantitative traits in a segregating population. The eQTL map information enables genetic regulatory networks to be modeled that can provide a better understanding of the underlying phenotypic variation. It has been successfully applied in humans [38][39][40], plants [41][42][43][44], yeasts [45,46], worms [47], flies [48], mice [49,50], pigs [51] and rats [52] populations. These studies showed that transcript abundance was highly heritable and could be linked to either a local locus (cis-eQTL) or a distant locus (trans-eQTL). Cis-eQTL is mapped to the same genomic location like an expressed gene (within 5 Mb), and trans-eQTL is mapped to a different genomic location from an expressed gene (>5 Mb or on different chromosomes) [40,53]. In general, cis-eQTL tends to produce stronger statistical associations than does by trans-eQTL [54]. This phenomenon is regarded as evidence of greater biological plausibility for the existence of true functional cis-eQTL [55]. Trans-eQTL could occur individually at a single genomic locus or could occur collectively as part of eQTL trans-bands [55]. This genomics approach has been employed to identify eQTL related genes in soybean [36,[56][57][58]. To date, no information concerning eQTLs underlying soybean isoflavone synthetic enzyme genes is available.
It has been proved that many enzymes in the phenylpropanoid pathway underlie QTLs that determine the accumulation of isoflavone contents in soybean seeds [11]. Meanwhile, the modification of enzyme encoded genes that are involved in phenylpropanoid pathway could promote the biosynthesis of isoflavone [31,35]. In this study, PAL, CHS, IFS and F3H in the phenylpropanoid pathway were selected as the target genes (TGs) to analyze isoflavone-relative eQTL. Potential candidate genes underlying the accumulation of isoflavone contents in soybean seed were also evaluated. In addition, overlapped loci both for eQTL and phenotypic QTL (pQTL) were identified.

Results
Total and individual isoflavone contents, target gene transcript abundance and correlation analysis Transcript abundances of target genes (TGs) between parents from R3 to R8 developmental stages were compared. Total and individual isoflavone contents and transcript abundances of TGs at R6 stage of soybean development were measured in the F 5:11 population. The results showed that significant differences among the transcript abundances of TGs between the two parents existed at the R6 stage. The phenotypic variation of individual and total isoflavones showed a continuous distribution (Table 1).
GT showed a high positive correlation coefficient with DZ (r = 0.762, P < 0.01; Table 2). The transcript abundance of PAL was positively correlated with both GT and TI, but exhibited no significant correlation with DZ and GC. The transcript abundance of CHS was positive correlated with DZ, GT and TI, but negatively associated with GC amount. The transcript abundance of IFS displayed a positive correlation with DZ, but showed no correlation with other isoflavone components. The transcript abundance of F3H showed significantly negative correlation with individual and total isoflavone contents.

eQTL analysis for four TGs
The linkage map that included 194 SSR markers (accepted by Molecular Biology Reports) and covered 2,312 cM with mean distance of about 12 cM between markers was used to identify eQTLs associated with the expression of the four TGs. Thirty-three eQTLs that appeared to underlie transcript abundance of the four TGs are detected and located on fifteen LGs (Table 3, Figure 1). Regarding to the locational relationships between the eQTL and the genes, thirteen of the eQTLs were cis-acting (within 5 Mb upstream or downstream of the genes) and twenty of the eQTLs were trans-acting (more than 5 Mb away or on different chromosomes) [40,53].
Among the identified eQTLs (Table 3), qPALB2_1 and qPALD2_1 were associated with PAL transcript abundance, and could explain 8.11% and 6.67% of the phenotypic variation, respectively. Eight eQTLs, underlying CHS transcript abundance, were located on six LGs, and could explain 2.07-15.65% of the phenotypic variation. qCHSDla_1 (Satt436-Sat_345, Gm01) was detected with a higher LOD score (8.64) in the regions where ciselements and CHS family genes were located.

Discussion
Soybean isoflavones have been broadly used in food, medicine, cosmetics and animal husbandry [59]. Increasing and decreasing seed isoflavone content will be an important target of soybean breeding. MAS based on genotype selection rather than solely on phenotype selection provides additional power for the selections during soybean breeding [60]. Cultivar 'Zhongdou 27' proved to have highisoflavone content (3,791 μg/g isoflavone in seed) as reported previously [16]. Meng et al. [19] identified two QTL underlying resistance to soybean aphid through leaf isoflavone-mediated antibiosis in soybean cultivar 'Zhongdou 27'. A number of pQTLs associated with seed isoflavone were identified in multiple environments from cultivar 'Zhongdou 27' using 194 SSR markers (accepted by Molecular Biology Reports). Therefore, 'Zhongdou 27' should be given more attention as an elite germplasm to improve soybean seed isoflavone concentration, disease and pest resistances. In our previous studies, some identified QTLs associated with individual/total isoflavone contents showed higher contribution to phenotypic variation. Some specific copies of genes (PAL, CHS, IFS, F3H) in the phenylpropanoid pathway were near or falling into these quantitative trait loci by browsing the reference genome sequence of Williams 82 (http://www.phytozome.net/soybean).
To investigate the regulation mechanism of isoflavone synthetic enzyme genes, the transcript abundances of PAL, CHS, IFS and F3H in the mapping population were examined, and the genomic regions affecting the expression of the TGs were identified using the eQTL methodology [61]. A global microarray eQTL analysis of a limited number of samples can be used for exploring functional and regulatory gene networks and for scanning cis-eQTL, whereas the subsequent analysis of a subset of likely cis-regulated genes by real-time RT-PCR in a larger number of samples may identify QTL region by targeting these positional candidate genes [62]. In this study, real-time PCR reactions were used to analyze the transcript abundance variations of the four TGs in the F 5:11 RI lines.
When combined with classical QTL phenotypes, correlation analysis can directly provide an overview of potential genes underlying isoflavone traits [63,64]. Through the comparison of the transcript abundances of the four TGs (PAL, CHS, IFS and F3H), the parents ('Zhongdou 27' and 'Jiunong 20') showed different patterns at the R6 stage. This observation was consistent with the previous report by Sarah et al. [65]. Significant correlations between the transcript abundances of TGs and isoflavone contents were found in developing seeds at the R6 stage, indicating that these genes could affect total and individual isoflavone accumulations (Table 2).
Previously, two major QTLs that affect isoflavone content across multiple environments were mapped on Gm05 (LG A1) and Gm08 (LG A2) by Gutierrez et al. [17] and Yang et al. [20], respectively. In the present work, one eQTL qIFSA2_1 (Sat_129-Sat_181) was mapped close to qGCA2_1 on Gm08 (LG A2) (Figure 1, Table 5). This result suggested that qIFSA2_1 might be a cis-enzyme related locus. Some of these identified eQTLs associated with seed isoflavone content did not coincide with the TGs, suggesting that the differences in TGs transcript abundances might be caused by several trans-acting factors [66].
In this study, since the 194 markers were not uniformly distributed, large gaps appeared with low marker density on chromosomes Gm02, 04, 13, 16 and 18, implying that more markers should be developed among these gaps and the authenticity of pQTL or eQTL should be further clarified. Among these gaps, special attention should be paid to eQTL qF3HDlb_2 on chromosome Gm02 and qIFSC1_1 on chromosome Gm04 because of their higher LOD score and contribution to phenotypic variation (Table 3). Overlapped loci of qF3HF_1 and qDZF_1, and genes that fall into this region should also be further clarified with more markers. Consequently, fine mapping on these intervals with more SSR or SNP markers and to determine the authenticity of these loci as well as the underlying genes were extremely essential in the future work.
The analysis of eQTL overlapped with pQTL suggested that the candidate genes or elements among the marker intervals could affect phenotypic traits [49,67,68]. Therefore, overlapped loci of eQTLs and pQTLs were analyzed to find the potential candidate genes affecting the accumulation of isoflavone contents in soybean seed. Five eQTL intervals were overlapped with pQTLs according to the comparison of genomic regions between pQTLs and eQTLs ( Table 5). These results indicated that some candidate genes or elements in these intervals could regulate the biosynthesis of isoflavone components, and affect their accumulation. Additionally, some eQTLs overlapped with other eQTLs or shared the same markers with pQTLs, suggesting that some candidate genes or elements were located near these loci. Several genes involved in isoflavone accumulation in soybean seed had been identified [22,27,31]. 11 candidate genes falling into the overlapped intervals of pQTL and eQTL were found (Table 4). Bolon et al. [58] identified eQTL for genes with seed-specific expression and discovered striking eQTL hotspots at distinct genomic intervals on chromosome Gm13. A chalcone isomerase (CHI3) and IFS2 gene were located in the same region identified by qGEN13 on Gm13 [11]. Another QTL for GC that encoded PAL and 4CL paralog was also reported on Gm13 [10,11]. In the present work, seven candidate genes on Gm13 (LG F) were identified, implying that there could be a hotspot of gene cluster that regulated seed isoflavone content on Gm13. Among them, CHS (Glyma13g09640.1) and FLS (Glyma13g02740.1) were identified on three overlapped loci, implying that they could interact or trans-regulate other genes in the phenylpropanoid pathway. Furthermore, PAL1 (Gly-ma13g20800.1) and IFS (Glyma13g24200.1) paralogs were identified within two overlapped loci. In the marker interval (Satt149-Sat_234) associated with qCHSF_1, qIFSF_2 and qGTF_2, both Glyma13g24200.1 and Gly-ma13g09640.1 were found to encode CHS and IFS, indicating that they could be the potential candidate genes. It was supposed that Glyma13g09640.1 could interact or trans-regulate the expression of IFS. However, the function of these potential candidate genes should be tested in future works.
Although open questions about the biology and applications of eQTL mapping still exist [69], there are considerable advances in the eQTL studies. Detailed analysis of eQTL combined with cluster analysis of transcript abundance and eventually gene expression patterns could assist map-based cloning of genes underlying these traits. Markers based on underlying genes are also desirable for MAS in soybean breeding programs. The mechanism underlying seed isoflavone synthesis and its accumulation may contribute to the development of marker-assisted selection for soybean cultivars with high or low isoflavone contents.

Conclusions
A total of thirty three eQTLs (thirteen cis-eQTLs and twenty trans-eQTLs) were identified on fifteen chromosomes. Five eQTL intervals were overlapped with pQTLs and a total of eleven candidate genes within the overlapped eQTL and pQTL were identified. These results might be beneficial for the development of markerassisted selection to breed soybean cultivars with high isoflavone contents.
To detect eQTL, the parents and the 130 F 5:11 RI lines were planted at Harbin, Heilongjiang Province, China, in 2011. Randomized complete block designs were used for all experiments with rows 3 m long, 0.65 m apart, and a space of 6 cm between plants. Mature and immature seeds in the reproductive stages (from soybean growth stage R3 to R8) [70] were harvested from a bulked sample collected from three plants in each plot. These samples were quantified for individual and total seed isoflavone contents and transcript abundances.

Isoflavone extraction and quantification
Approximately 150 g of soybean seed samples were ground to a fine power using a commercial coffee grinder. Isoflavones were extracted from flour and separated using HPLC as described previously [16]. Measurements were done as micrograms of isoflavone per gram of seeds plus and minus the standard deviations (μg/g ± SD).

Synthesis of cDNA, Real-Time PCR and data collection
To investigate the expressions of four TGs, total RNA was isolated from soybean seed samples from R3 to R8 stages using plant RNA purification reagent Kit (D9108A, TaKaRa, Japan). RNAs were transcribed to cDNA using the first strand DNA synthesis reagent Kit (D6110A, TaKaRa, Otsu, Shiga, Japan). Four TGs (PAL, GenBank accession: GQ220305; CHS, GenBank accession: EU526827; IFS, GenBank accession: FJ770473 and F3H, GenBank accession: AY595420) in the phenylpropanoid pathway, were selected to analyze the transcript abundance variations in the F 5:11 RI line population. These four TGs were analyzed by real-time PCR (Kit DRR081A, TaKaRa, Japan). Genespecific primers for expression analysis of the four TGs were listed in Table 6. Primer specificity was confirmed based on each primer pair sequence against soybean genome sequences by BLASTing (http://www.phytozome.net/ soybean) using the BLASTN algorithm. Moreover, through the BLASTN of the sequences of the TGs, PAL2 (located on Gm10 (LG O)) of the PAL gene family, CHS8 (located on Gm11 (LG B1)) of the CHS gene family, IFS1 (located on Gm07 (LG M)) of the IFS gene family, and F3H1 and F3H2 (located on Gm02 (LG D1b)) of the F3H gene family were amplified [11].
PCR amplification was performed as follows: 95°C for 60 s, followed by 40 cycles of 95°C for 11 s, 60°C for 12 s and 72°C for 18 s. The soybean actin4 (GenBank accession: AF049106) gene was used as a reference to quantify the expression levels of the target genes [71]. Three replicates for each reaction were performed. The relative transcript abundance of TGs in different samples was calculated using 2 -ΔΔCt method [72], defined as: ΔCt = Ct (target) -Ct (actin). Pearson correlations between total/individual isoflavone contents and the expression of the four TGs in F 5:11 RILs were evaluated using SAS 8.2 (Cary, NC, USA) [73].

Identification of genomic region of target genes
The whole genome sequence Glyma1 assembly for Williams 82 [74] provided a powerful tool for interrogating QTL data. Previously reported genes for isoflavone biosynthesis [75] were used in BLAST searches against the whole genome sequence to identify homologous regions in the genome with assigned or putative functions. All twenty soybean chromosomes have regions sharing a high percentage of homology with genes of known function in the phenylpropanoid pathway [11]. The coding regions of TGs were compared with genome of Williams 82 through BLAST (E-value ≤ 1.0E-05, http://www.phytozome.net/soybean) to identify homologous regions.

eQTL analysis
In previous work, fifteen QTL underlying seed isoflavone contents of soybean were identified based on RI line populations derived from a cross between 'Zhongdou 27' (high isoflavone) and 'Jiunong 20' (low isoflavone) through a genetic linkage map including 99 SSR markers [16]. Another 95 SSR markers were added to the map of Zeng et al. [16] to identify novel phenotypic QTLs (pQTLs) associated with seed isoflavone contents of soybean (accepted by Molecular Biology Reports). In this study, 194 polymorphic markers were assembled onto the 20 linkage groups (LGs) by Mapmaker 3.0b with the Kosambi mapping function [76]. WinQTLCart2.1 [77] was used to detect eQTL between marker intervals by 1,000 permutations at significance (P ≤ 0.05). The genetic linkage map was  [78]. The nomenclature of the eQTLs/pQTLs included four parts following the recommendations of the Soybean Germplasm Coordination Committee. For example, qCHSF_1, q, CHS, F and 1 represent eQTL, trait (CHS), linkage group name and eQTL order in the linkage group, respectively.
Identification of candidate genes underlying overlapped loci of pQTL and eQTL Coincident genetic locations of eQTL and pQTL may be available to identify important regulatory genes underlying traits, and lead to the identification of molecular mechanisms [49,67,68]. Previous studies have combined eQTL and pQTL mapping to gain insight into regulatory pathways involved in determining phenotypic traits [49,68,[79][80][81]. eQTL located in the same marker intervals of pQTL might contribute to significant phenotypic variations [49,67,68]. In this study, thirty four phenotypic QTL (pQTL) identified with the 194 SSR markers were compared with eQTL to identify overlapped loci. Genetic map positions were estimated by identifying the nearest flanking SSR markers using the genome browser (http://www. soybase.org). The candidate genes underlying overlapped loci of pQTL and eQTL were identified by browsing after using BLAST search of flanking markers against the whole genome sequence of Williams 82 (available at: http:// www.phytozome.net/soybean).