Meiotic recombination is a major source of genetic variation in eukaryotes. The role of recombination in evolution is recognized but little is known about how evolutionary forces affect the recombination pathway itself. Although the recombination pathway is fundamentally conserved across different species, genetic variation in recombination components and outcomes has been observed. Theoretical predictions and empirical studies suggest that changes in the recombination pathway are likely to provide adaptive abilities to populations experiencing directional or strong selection pressures, such as those occurring during species domestication. We hypothesized that adaptive changes in recombination may be associated with adaptive evolution patterns of genes involved in meiotic recombination.
To examine how maize evolution and domestication affected meiotic recombination genes, we studied patterns of sequence polymorphism and divergence in eleven genes controlling key steps in the meiotic recombination pathway in a diverse set of maize inbred lines and several accessions of teosinte, the wild ancestor of maize. We discovered that, even though the recombination genes generally exhibited high sequence conservation expected in a pathway controlling a key cellular process, they showed substantial levels and diverse patterns of sequence polymorphism. Among others, we found differences in sequence polymorphism patterns between tropical and temperate maize germplasms. Several recombination genes displayed patterns of polymorphism indicative of adaptive evolution.
Despite their ancient origin and overall sequence conservation, meiotic recombination genes can exhibit extensive and complex patterns of molecular evolution. Changes in these genes could affect the functioning of the recombination pathway, and may have contributed to the successful domestication of maize and its expansion to new cultivation areas.
Meiotic recombination produces genetic variation by creating new combinations of alleles, and facilitates purging of deleterious mutations from genomes and populations . While the role of recombination in evolution is well recognized, the evolution of the recombination pathway itself has received little attention. Components of the recombination pathway exhibit a high degree of overall conservation across eukaryotes , suggesting a preponderance of purifying selection during evolution. Nevertheless, frequencies of recombination events and their distribution across the genome differ between as well as within species [3–9]. Differences also exist between species in the presence or absence of certain recombination pathway components [2, 10]. These observations suggest that at least some recombination genes are under more flexible evolutionary constraints. Indeed, patterns of sequence polymorphisms indicative of positive selection have been found in meiotic and recombination-related genes in Drosophila melanogaster and D. simulans , as well as diploid and tetraploid Arabidopsis arenosa [12–14]. Furthermore, it has been proposed that environmental as well as genomic changes, such as whole-genome duplication, are triggers of adaptive changes in meiosis and meiotic recombination .
To gain more insight into the patterns of recombination gene evolution, we studied sequence polymorphism in genes involved in key steps of the meiotic recombination pathway in maize and teosinte. Modern maize is a product of a single domestication event from Balsas teosinte (Zea mays ssp. parviglumis) that occurred about 8700 years ago in the Balsas River valley in southern Mexico [16, 17]. Maize has maintained a substantial proportion (60–70%) of the genetic variation found in Z. mays ssp. parviglumis, and is more diverse than its more distantly related wild relative Z. luxurians [18–21]. However, during domestication and subsequent breeding, a small fraction of maize genes (“domestication genes”) have been subject to very strong selection pressures and have lost most if not all of their pre-domestication diversity [19, 20, 22, 23]. Theoretical predictions  as well as empirical studies  indicate that populations experiencing directional or strong selection pressures are likely to evolve increased recombination rates. Higher recombination rates should be of most value when selection is strong and genetic variability is limited , such as during domestication. Indeed, increases in meiotic recombination rates have been shown to accompany domestication of several plant species, including maize . Selection to alter recombination patterns might leave footprints in sequences of recombination genes. To examine this issue, we analyzed patterns of sequence polymorphisms in several genes controlling key steps of meiotic recombination in maize (Fig. 1).
Meiotic recombination is initiated by formation of double-strand breaks (DSBs) in chromosomal DNA by SPO11, a protein belonging to the topoisomerase family [28–30]. The DSBs are subsequently resected from 5′ to 3′ by a protein complex known in plants and animals as MRN to generate single-stranded DNA (ssDNA) overhangs [31, 32]. MRE11 is a key component of this complex. It possesses endonuclease, exonuclease, and helicase activities that directly facilitate the ssDNA overhang formation [32, 33]. The ssDNA ends are coated by two DNA strand-exchange proteins, RAD51A and DMC1, and invade homologous double-stranded DNA regions . Eventually, meiotic DSBs are repaired into either crossovers (COs) or non-crossovers (NCOs). A RecQ helicase SGS1, is one of the best studied of several regulators thought to control the CO/NCO decision . The functional homolog of SGS1 in plants is RECQ4 [36, 37]. CO formation is, furthermore, regulated by a phenomenon of CO interference, which prevents formation of multiple COs in the same chromosome region [38, 39]. In plants, as well as yeast and mammals, there are two CO types, type I COs that are subject to interference, and type II COs that are not [39–44]. The two CO classes are outcomes of parallel pathways, which are facilitated by different complexes of recombination proteins. MSH4 and MLH1 act in formation of type I COs [42, 45], while MUS81 is involved in type II CO formation [43, 46].
Overall, we examined sequence polymorphisms in eleven recombination genes in maize and teosinte. We found that, despite their overall conservation, most of the recombination genes exhibited detectable patterns of sequence evolution. Several genes showed polymorphism patterns indicative of adaptive evolution.
Genomic organization of maize meiotic recombination genes
To study patterns of sequence polymorphism in meiotic recombination genes, we selected genes encoding eight proteins that facilitate key steps of the recombination pathway (Fig. 1): DMC1, MLH1, MRE11, MSH4, MUS81, RAD51A, and SPO11. As the first step of the study, we investigated the presence and organization of these genes in the maize genome. MRE11 and RAD51A were studied in maize before and examined at the functional level [31, 47–49]. These studies showed that the maize genome contains two homologs of Mre11 (Mre11A and Mre11B) and Rad51A (Rad51A1 and Rad51A2). We confirmed these findings (Additional file 1: Table S1) using the complete draft of the maize genome sequence . DMC1, MLH1, MSH4, MUS81, RECQ4, and SPO11 have not yet been studied in maize at the functional level but they have been examined in Arabidopsis [29, 30, 36, 37, 42, 43, 45, 46, 51–53]. We identified maize homologs of the genes encoding these six proteins by searching maize genomic and EST sequence resources using TBLASTN with Arabidopsis protein sequences as queries. In Arabidopsis, DMC1, MLH1, MSH4, MUS81 are encoded by single genes. Analysis of the maize genome sequence revealed that Dmc1, Mlh1, and Msh4 were also present as single full-length genes in maize (Additional file 1: Table S1). In contrast, we found two sequence homologs of Mus81. These two genes, Mus81-1 and Mus81-2, shared a rather limited 46% identity and 62% similarity at the amino acid level.
The Arabidopsis genome contains two closely related copies of RECQ4, RECQ4A and RECQ4B [52, 54], which are products of a fairly recent duplication event . In contrast, maize, as well as most other plant species, have single RECQ4 homologs (Additional file 1: Figure S1).
SPO11 in Arabidopsis is represented by three isoforms thought to stem from an ancient gene duplication in now extinct basal eukaryotes [55, 56]. However, only two of the three SPO11 genes, SPO11-1 and SPO11-2, function in meiotic recombination [29, 30]. The maize genome contains single homologs of all three Arabidopsis SPO11 genes.
Origin of the duplicated recombination genes in maize
To examine if presence of duplicated copies of Mre11, Mus81-1, Rad51A, and Spo11 was unique to maize in the context of homologs of these genes present in other eukaryotes, we conducted phylogenetic analyses of amino acid sequences from several representative species using Bayesian as well as maximum parsimony (MP) methods (Fig. 2 and Additional file 1: Figure S1). Both methods produced essentially identical trees, except for SPO11, where the Bayesian tree provided a finer resolution of the phylogenetic relationships than the MP tree.
The maize genome is thought to be a product of an allopolyploidization event that occurred about 5 to 12 Mya between two ancestors that diverged from each other about 12 Mya [57–59]. Franklin et al. proposed that the presence of two Rad51A homologs in maize is a result of this duplication . However, phylogeny reconstructions revealed that Rad51A, as well as Mre11, became duplicated before the divergence of the maize and rice lineages (Fig. 2). Mus81 also has undergone a duplication before the maize-rice divergence but in the rice lineage the copy corresponding to maize Mus81-2 was subsequently lost. According to Tajima’s 1D relative rate test, maize Mus81-2 had a much accelerated evolution rate compared to Mus81-1 (P = 0.00465; Arabidopsis was used as outgroup). This finding created uncertainty whether Mus81-2 retained the same function in meiosis as Mus81-1. The Mus81-1 gene has been shown to play a role in recombination in rice . In contrast, no functional information exists for Mus81-2 in any species. Consequently, we decided to only use Mus81-1 in further analyses. Interestingly, Hartung et al.  also identified a second MUS81-like gene in Arabidopsis (At5g39770) but were unable to amplify any, even partial, cDNA using primers to different regions of the predicted mRNA, leading them to conclude that this sequence represented a non-functional pseudogene.
Reconstruction of the Spo11 phylogeny (Fig. 2) revealed that the three Spo11 genes present in maize, Spo11-1, Spo11-2, and Spo11-3, are orthologs of the three Arabidopsis SPO11 genes. Interestingly, the rice genome contains two additional homologs of SPO11, SPO11-4 and SPO11-5, which are not present in maize, sorghum, or any other plant species with a sequenced genome, and have been so far only identified in japonica and indica rice [61, 62]. These data suggest that both SPO11-4 and SPO11-5 originated after the maize-rice divergence. However, they do not appear to belong to either of the plant SPO11-1, SPO11-2, or SPO11-3 clades, so their exact origin is not clear.
Sequence diversity in maize recombination genes
As the first step to characterize polymorphism patterns in the recombination genes, we examined their sequence diversity in maize and teosintes. To do this, we sequenced eleven genes, Dmc1, Mlh1, Mre11A, Mre11B, Msh4, Mus81-1, Rad51A1, Rad51A2, Recq4, Spo11-1, and Spo11-2, from a set of 31 diverse maize inbred lines. The inbreds were selected to maximize genetic diversity  and included 25 of the 26 founders of the Nested Association Mapping (NAM) population, representing more than 80% of the allelic diversity of maize . The 31 inbreds included both tropical (A188, A344, CML52, CML69, CML103, CML228, CML247, CML277, CML322, CML333, Ki3, Ki11, M37W, Mo18w, NC350, NC358, and Tx303) and temperate (B73, B97, CO106, CO125, CO255, HP301, Il14h, Ky21, M162w, Mo17, MS71, Oh7b, Oh43, and P39) lines. We also used nine lines of Zea mays ssp. parviglumis (Balsas teosinte), the direct wild ancestor of cultivated maize [16, 17]. In addition, we included several more-distantly related teosinte accessions representing Z. mays ssp. mexicana, Z. mays ssp. huehuetenangensis, Z. diploperennis, and Z. luxurians. For each gene, we sequenced the entire coding region (Additional file 1: Table S1), up to 240 bp of the region upstream from the ATG codon, and between 826 and 3605 bp of intron fragments.
To characterize sequence diversity of the recombination genes, we focused on their coding regions, as our interest was to study sequence diversity patterns that may affect protein function. We examined nucleotide sequence diversity by calculating two commonly used diversity statistics: π, the average number of nucleotide differences per site between any two sequences in sample , and θW, which is a scaled measure of the number of polymorphic nucleotide sites per nucleotide . The calculations were conducted on a set of 25 maize inbreds that were in common for the eleven gene datasets. The nucleotide diversity estimates varied more than seven-fold among the genes (Table 1). Furthermore, we detected larger-than-four-fold differences between paralogs in the three duplicated genes, Mre11, Rad51A, and Spo11.
We also investigated sequence diversity at the amino acid level. To do this, we calculated the ratio of nucleotide diversity at non-synonymous vs. synonymous sites (πa/πs) and the number of polymorphic amino acid residues per 100 residues. Here, we also observed substantial differences among the genes, although the patterns were different from those of nucleotide diversity (Table 1).
In addition to studying the coding regions, we examined the 5′ and intron regions of the genes (Additional file 1: Figure S2). As expected, non-coding regions exhibited higher levels of sequence diversity than coding regions. We anticipated observing the highest level of sequence diversity in introns but found that in some genes the level of sequence polymorphism was higher in the 5′ regions than in introns. The average π value for the whole sequenced length of the eleven recombination genes was 0.0026, which is 60% lower than the maize genic site average of 0.0067 .
Overall, the analyses we performed revealed that different genes exhibit distinct sequence diversity patterns, rather than showing similar evolution trajectories, as generally exhibited by domestication genes. Some recombination genes in maize, including Rad51A1 and Spo11-1, showed low diversity in both DNA and amino acid sequences. In contrast, Rad51A2 had relatively high DNA as well as amino acid sequence diversity. In yet another group of genes, which included Mre11B and Mus81-1, the diversity at the nucleotide sequence level was not related to the diversity at the amino acid sequence level.
Sequence diversity in Z. mays ssp. parviglumis, tropical, and temperate maize inbreds
Following domestication, maize has adapted to a number of distinct environmental conditions after domestication and is now grown in many parts of the world. To gain insight into whether germplasms grown in different environments have accumulated different polymorphisms in meiotic recombination genes, we examined diversity rates separately in temperate and tropical maize inbreds. We also included in the analysis accessions of Z. mays ssp. parviglumis. Following its initial domestication from parviglumis and cultivation in Central America, maize was brought to North America about 4000 years ago . This migration resulted in a split into tropical and temperate germplasms. The set of 31 diverse maize inbreds used in our study contained 14 temperate and 17 tropical lines. To compare the three germplasm pools, we computed the π statistic to examine DNA sequence diversity of the genes’ coding regions and the πa/πs ratio to examine amino acid sequence diversity. Tropical maize is known to harbor higher overall genetic diversity than temperate maize [63, 68]. However, in the recombination genes, we observed a somewhat lower average π value (by about 15%) in the tropical inbred set than in the temperate set (Fig. 3a). Several genes, Dmc1, Rad51A2, Recq4, and Spo11-1, showed π values that were substantially lower in tropical germplasm than in temperate germplasm. On the other hand, tropical inbreds showed a higher average πa/πs ratio (Fig. 3b).
To further examine differences between temperate and tropical maize, we calculated the fixation index (FST), which is a measure of population differentiation based on genetic structure . FST value of one indicates that two populations do not share any genetic diversity, while low FST values suggest lack of genetic differentiation. Of the eleven genes, only Rad51A2 exhibited substantial differentiation between temperate and tropical inbreds (Fig. 3c).
Altogether, the data suggested that several recombination genes exhibited differences in sequence polymorphisms between temperate and tropical inbreds. However, only in Rad51A2 these differences were large enough to suggest clear differentiation between temperate and tropical maize.
Patterns of sequence divergence in recombination genes
A simple explanation for different recombination genes exhibiting different levels of sequence diversity may be that some recombination proteins exhibit fewer functional constraints and evolve faster than others. To investigate if this was indeed the reason behind our findings, we examined how fast different recombination genes evolve. To do this, we used two measures of sequence divergence. First, we calculated dN/dS, the ratio of the number of non-synonymous substitutions per non-synonymous site (dN) to the number of synonymous substitutions per synonymous site (dS) relative to a closely related sister species, which is a measure of sequence change over a fairly short evolutionary time (Table 1). As an outgroup, we used Z. luxurians for all genes except Mus81-1, in which we used Z. diploperennis because we could not obtain a sufficiently long sequence fragment for Z. luxurians Mus81-1. We detected variation in the dN/dS ratios among the eleven recombination genes (Table 1). However, except Spo11-1, the ratios were relatively low (Table 1), which is consistent with presence of purifying selection. The higher dN/dS ratio for Spo11-1, indicative of directional selection, may be a reflection of the fact that eukaryotic SPO11 proteins exhibit sequence conservation mainly in several domains and motifs, while the rest of the protein sequence shows higher levels of divergence [55, 70, 71].
To gain a measure of how fast recombination proteins diverge across longer evolutionary time, we calculated K scale factors using several fairly distant taxa (Table 1). This analysis compared the overall sizes of phylogenetic trees for each recombination protein to an arbitrary reference tree. The K scale factor is a measurement of the overall size of a phylogenetic tree in comparison to an arbitrary reference tree. K scale factors are smaller for proteins diverging faster than the reference, while proteins that diverge slower than the reference exhibit larger K scale factors. Core recombination proteins are considered to be generally very conserved but we found that the proteins in our study differed among each other in the K scale factor values by more than five-fold (Table 1).
Altogether, the dN/dS and K scale factor analyses indicated that there were substantial differences in how fast different recombination genes evolved. Rad51A1 and Dmc1 exhibited the lowest levels of divergence while Mlh1, Recq4, Spo11-1, and Mus81-1 exhibited the highest divergence levels among the 11 genes. However, differences in divergence rates among the different genes were not good predictors of differences in sequence diversity within maize. Out of the five genes with above-average divergence rates (Mlh1, Mre11B, Mus81-1, Recq4, and Spo11-1) only Mlh1 and Mre11B exhibited above-average amino acid sequence diversity within maize. On the other hand, Rad51A2, one of the genes with low divergence rates, showed above-average diversity within maize.
Selection patterns in recombination genes in maize and teosinte
To gain better understanding of molecular evolution patterns of recombination genes, we examined protein coding regions of the eleven recombination genes for signatures of selection. Selection in regulatory gene regions is thought to be associated with adaptive changes during maize domestication in many classes of genes, for example those controlling plant architecture, such as teosinte branched 1 (tb1) . However, expression differences are unlikely to underlie evolution of meiotic recombination genes, since there is little evidence of tight transcriptional regulation of recombination genes in plants. Even genes that are tightly regulated in other taxa, such as Spo11 or Hop2, are ubiquitously expressed in plants [29, 73, 74].
To study selection patterns, we first examined the distribution of alleles in the population employing the commonly used frequency spectrum-based statistics Tajima’s D , and Fu and Li’s D and F  (Table 2). The three tests are based on a similar principle but are not identical and often provide complementary results. Tajima’s D is calculated using a difference between the mean number of pairwise differences sequences and the number of segregating sites. The Fu and Li’s D and F tests are based on similar algorithms but take into account the polarity of nucleotide changes relative to an outgroup. We used Z. luxurians as outgroup for all genes, except Mus81-1, for which we used Z. diploperennis. Tajima’s D and Fu and Li’s D and F tests found significant departures from neutral evolution in several maize genes. Dmc1, Msh4, Rad51A2, and Spo11-2, exhibited significant positive values of at least one of the tests, indicating that they could be under balancing (diversifying) selection [75, 76]. Mlh1 and Recq4 were marginally significant (P < 0.10). In contrast, Mre11B exhibited significant negative values, suggesting history of a selective sweep [75, 76].
For comparison, we conducted frequency spectrum-based tests for the Z. mays ssp. parviglumis accessions in the same manner as in maize. Here, we did not find significant departures from neutrality for any of the genes (Table 3). It should be noted, though, that the number of accessions that we used for Z. mays ssp. parviglumis was smaller than the number of accessions used in maize, which could have affected our results.
Frequency spectrum statistics are known to be sensitive to demographic factors. A population bottleneck can result in strongly positive test values whereas a population expansion may cause negative values of the statistics . To test the impact of selection vs. demographics, we used two tools to assess the statistical significance of the results of the three tests: (i) we compared Tajima’s D values for the eleven recombination genes to a genome-wide distribution of Tajima’s D values using an approach similar to the one employed to examine selection patterns in cell cycle genes in Arabidopsis , and (ii) we compared the values of the three statistics to critical values derived from coalescent simulations (CS) .
The comparison of Tajima’s D values for specific loci to a genome-wide distribution of the statistic is based on a tenet that while selection acts on individual loci, demographics is likely to have a genome-wide effect. We compared Tajima’s D values for the recombination genes to a genome-wide distribution of Tajima’s D values based on a survey of 703 random polymorphic maize loci by Wright et al. . This survey was conducted on a set of 14 maize inbreds, all of which were included in our inbred set. To conduct the comparison, we recalculated Tajima’s D values using only a subset of the 14 lines that were in common with the Wright et al.  study. Loci that showed no polymorphism in the Wright et al.  study were excluded from the dataset, as Tajima’s D statistic cannot be calculated for them. Tajima’s D values for Mre11B and Rad51A2 fell into the extreme 2.5% fractions of the genome-wide Tajima’s D distribution (Table 2), indicating that they are significant at P < 0.05. As we did not have a genome-wide outgroup sequence data we could not make a similar comparison for the Fu and Li’s D and F tests.
To further evaluate the statistical significance of the Tajima’s D, and Fu and Li’s D and F statistics, we examined the probability of obtaining our empirical values under conditions of neutral evolution in coalescent simulations using Hudson’s ms program . The simulations incorporated a population bottleneck under the parameters proposed for the maize domestication bottleneck by Wright et al. . These analyses (Table 2) showed that the empirical Tajima’s D values for Mre11B fell into the low extreme 2.5% of simulated Tajima’s D values, indicating that they were significant at P < 0.05. Rad51A2 was marginally significant (P < 0.07). Computation of CS-derived critical values for the Fu and Li’s F and D tests indicated that the empirical values for maize Mlh1 and Mre11B were unlikely to result from neutral evolution alone, even under a domestication bottleneck affecting population demographics (Table 2).
As a complementary approach to investigating signatures of selection, we also used the likelihood ratio test (LRT) , which is based on a very different principle than the frequency spectrum tests. LRT examines ω, the ratio of the non-synonymous substitution rate dN to the synonymous substitution rate dS in gene coding regions. This method allows different ω values for individual codons, as different regions in the protein sequence may be under very different selection pressures and constraints, and is highly sensitive in detecting adaptive selection signatures. LRT can be used for within-species comparisons as long as sequence diversity is high and the level of intragenic recombination is low . Therefore, prior to the analyses, we examined the data set for presence of intragenic recombination using the Genetic Algorithm for Recombination Detection (GARD) method . In all of the eleven genes, we found no or low levels of recombination frequencies that did not exceed the rates acceptable for LRT analyses . To further ensure that recombination did not affect the results of the test, for the genes where GARD detected recombination breakpoints, we individually tested the fragments separated by the recombination breakpoints. In each case, the results of the LRT analysis were identical to the results obtained using the entire gene coding region. For the LRT analysis, we only used lines in which the entire coding region of the gene was available. Overall, we found that coding regions of five genes, Mlh1, Mre11B, Mus81-1, Rad51A2, and Spo11-2, showed statistically significant signatures of positive selection (Table 4).
Presence of different selection patterns among maize inbreds
Because tropical and temperate maize inbreds differed in levels of sequence diversity in several recombination genes, we examined the patterns of selection separately in the tropical and temperate inbred pools using frequency spectrum tests. In Mlh1, Mre11B, and Mus81-1 we found statistically significant differences in frequency spectrum test results between tropical and temperate inbreds (Table 5). Mre11B and Mus81-1 showed departures from neutral evolution in the tropical inbred pool but not in the temperate pools whereas Mlh1 exhibited the opposite trend.
To further explore differences in evolution patterns among inbreds, we investigated whether different gene genealogy branches exhibited different dN/dS ratios. To conduct these analyses, we used LRT  to test whether any branches exhibit dN/dS values statistically different from those of other branches in the genealogy . We examined five genes, Mlh1, Mre11B, Mus81-1, Rad51A2, and Spo11-1, in which the overall dN/dS ratios indicated presence of positive selection. We found that for three of the genes, Mlh1, Mre11B, Mus81-1, a model with two ratios, each for different branches, was marginally statistically significant (0.05 < P < 0.1) compared to a one-ratio model. The branches showing elevated dN/dS ratios were those leading to CML52, CML69, CML333, CO255, and Ki11 for Mlh1, CML277 for Mre11B, and A344, CO255, Oh43, and P39 for Mus81-1.
Polymorphism and selection patterns in recombination genes in maize and teosintes
Even though most core recombination proteins exhibit high degree of sequence conservation across eukaryotes, our analyses revealed fairly substantial levels of diversity and a variety of selection patterns. Using two very different approaches to identify sequence patterns indicative of selection, LRT and frequency spectrum tests, we identified a largely overlapping set of genes as possible selection targets. Frequency spectrum tests revealed patterns indicative of selection in coding regions of four genes, Mlh1, Mre11B, Mus81-1, and Rad51A2. The four genes, along with Spo11-2, were also identified by LRT as subject to positive selection. These observations were further corroborated by analyses of sequence diversity and divergence patterns. Mlh1, Mre11B, and Mus81-1 displayed some of the highest sequence divergence rates among the recombination genes that we analyzed, which is consistent with them being subject to directional selection specifically in the maize lineage. Rad51A2 showed low sequence divergence, which is typical of Rad51A homologs, but its diversity rates in maize were above the average for the eleven genes, which is consistent with the gene being subject to diversifying selection. Because we examined a group of genes, results of a single test alone may not provide very strong evidence for a particular gene to be a selection target because of multiple testing. However, consistently identifying the same genes in different tests provides stronger evidence for them indeed experiencing selection pressure.
Presence of a selection signature at a specific locus using frequency spectrum tests may imply that the locus itself is a target of selection but it also may be caused by proximity to another locus that is a strong selection target (hitchhiking effect) . However, in maize, linkage disequilibrium (LD) decays quite rapidly  and the gene density is quite low, raging from 9 to 200 kbp per gene. Consequently, the hitchhiking effect is unlikely to be a major source of non-neutral evolution patterns. Furthermore, LRT, which produced very similar results to those of the frequency spectrum tests, is not sensitive to the hitchhiking effect .
To further explore the possibility of hitchhiking, we examined the locations of the 11 recombination genes with regard to regions of selective sweep previously identified in the maize genome [86, 87]. We found that Mre11B was in a known selective sweep region , while the other ten genes were 100 kbp away (RAD51A1) or more from regions of selective sweep. More detailed analyses of the sweep region encompassing Mre11B showed that Mre11B was the only bona fide gene located within the region, whereas all other ORFs represented transposons or low-confidence genes. Consequently, we believe that Mre11B may be the target of the selective sweep.
Previous analyses of genome-wide diversity patterns in maize have shown that a relatively small number of maize genes, about 2–4%, have experienced extremely strong selective sweeps during maize domestication [19, 20, 22]. These sweeps led to very severe diversity losses at the affected loci and their targets were mostly genes controlling plant architecture and critical agronomic traits (“domestication traits”) [22, 72]. The recombination genes examined in this study do not appear to be in the same category of selection targets and exhibit higher levels of polymorphism in maize than the domestication genes regulating agronomic and plant architecture traits. Consequently, the selective pressures experienced by the recombination genes are likely to be lower than those experienced by the domestication genes.
The predominant type of selection we uncovered was diversifying selection, suggesting that selection pressures experienced by recombination genes vary among maize lineages. This observation is consistent with the finding of differences in evolution patterns between the tropical and temperate inbred sets. It is conceivable that selection pressure affecting recombination genes is episodic. In some situations, increased meiotic recombination rates may be favored, as they facilitate formation of new gene combinations. In other lineages, lower recombination rates may help preserve linkage blocks of advantageous alleles. The fact that QTLs for recombination frequencies are not widely shared among diverse maize inbreds , although they are detected in individual mapping populations , provides additional credibility to this claim. Future analyses of larger sets of maize inbreds may help discern whether lineage-specific selection patterns in recombination genes indeed exist.
Potential functional consequences of sequence polymorphisms
One of the tests of adaptive evolution is identifying functional differences between polymorphic alleles of the gene. However, this kind of analysis is not currently feasible for recombination pathway components. Nevertheless, numerous functional domains have been identified in recombination proteins and their three-dimensional structures have been elucidated. To investigate if any of the amino acid polymorphisms that we identified could be associated with functional changes in the proteins, we examined the positions of the polymorphic residues in predicted three-dimensional proteins structures. These analyses identified several polymorphisms that have the potential to considerably affect protein structure. One of notable polymorphisms was an alanine to phenylalanine change in RAD51A2 at residue 110, which is located in a conserved ATP-binding site of the protein. We found that this residue flanks a small loop on the protein surface that is adjacent to the Mg2+-binding pocket in the ATPase domain (Fig. 4). Mg2+ binding is known to induce a conformational change in the RAD51 protein, which is required for DNA binding . Alanine is present at this residue in all eukaryotes that we examined, except for the two unusual RAD51A proteins in Physcomitrella , which instead contain phenylalanine. Another example of a polymorphism with a potential to result in a functional change was a substitution of highly conserved hydrophilic serine by hydrophobic glycine at position 330 in SPO11-2. This residue is located on the surface of the TOPRIM domain in the SPO11 protein, on the face of the protein that directly interacts with the DNA . Overall, sequence polymorphisms in recombination genes could, for example, be associated with increased meiotic recombination rates. They could also affect the distribution of crossovers across the genome or affect the frequency of ectopic recombination between repetitive or homeologous genome regions.
Several proteins involved in the meiotic recombination pathway, including MLH1, MRE11, MUS81, RAD51, and RECQ4, also function in somatic DNA repair. However, we did not find obvious differences between the evolution patterns of genes encoding these proteins and genes encoding proteins with exclusively meiotic functions, such as Dmc1, Msh4, and Spo11. It could be speculated that perhaps the meiotic functions are predominant in the dual-function genes or that meiotic functions are the ones that predominantly experienced selection pressures during maize evolution. Mutants in most meiotic recombination genes in plants do not show somatic defects, unless artificially exposed to genotoxic stress [43, 45, 52, 92, 93], suggesting that these genes are not absolutely required for somatic growth and development. However, growth conditions in controlled experimental environments may not mimic the conditions that plans encounter in their natural environments and evolution of somatic DNA repair functions of recombination genes cannot be categorically excluded. For example, maize could have experienced changing levels of genotoxic stress, such as UV radiation during its geographic diversification. Finally, many recombination proteins act as components of large protein complexes, and changes in some proteins could induce co-evolutionary changes in proteins that interact with them.
Duplicated recombination genes
Three of the recombination genes examined in this study, Mre11, Rad51, and Spo11, are present in the maize genome as duplicated copies. Two duplications (Rad51 and Mre11) likely took place at the base of the grass lineage and one (Spo11) in the ancestor of all extant eukaryotes. Interestingly, we did not find any duplications that would trace their origin to the allopolyploidization event that took place in the direct ancestry of maize . This observation is consistent with previous predictions, based on analyses of gene duplications in Arabidopsis, that DNA metabolism genes are preferentially subject to gene loss following whole-genome duplication .
We observed substantial differences in evolution patterns between the Mre11, Rad51A, and Spo11 paralogs. In all three cases, the paralogs exhibited large differences in the levels of sequence diversity and divergence. We should note, however, that even though we see commonalities in the behavior of the three duplicated gene pairs, the mechanisms of their evolution may not necessarily be the same because of the extreme difference in the ages of these duplication events (i.e. the Spo11 duplication being much more ancient that the Rad51 and Mre11 duplications).
Overall, our data suggested that the paralogs in the three pairs of duplicated maize recombination genes have acquired distinct functions, which is known to follow gene duplications , and are under differing selection pressures and constraints. Although the functions of the two Spo11 genes have not been studied in maize yet, in Arabidopsis both Spo11-1 and Spo11-2 are required for proper progression of recombination, which is consistent with each of the genes having a somewhat distinct function. In contrast, the two Rad51A genes in maize were found to be redundant in mutant analyses . Presence of different selection patterns in Rad51A1 and Rad51A2 suggests that differences in the functions of the two genes are likely to exist after all. Interestingly, a recent study in rice found that although both RAD51A1 and RAD51A2 can bind ssDNA and dsDNA, RAD51A2 exhibits much higher strand exchange activity than RAD51A1, indicating that the two proteins are functionally different .
Analysis of eleven genes controlling key steps in the meiotic recombination pathway in a diverse set of maize and teosinte lines uncovered substantial levels of sequence polymorphism in most of the genes and identified signatures of adaptive evolution in several of them. The data show that despite their ancient origin and overall conservation, recombination genes can exhibit extensive and complex patterns of molecular evolution. At least some of the sequence polymorphisms that we found have the potential to cause changes in the functioning of the recombination pathway. Such changes could have contributed to the successful domestication of maize and its subsequent expansion to new cultivation areas.
Six of the eleven recombination genes represented ancient gene duplications events. We found that the duplicated genes exhibited distinct patterns of sequence diversity even in the case of duplicates that appear to be functionally redundant in genetic experiments. These results show that evolutionary analyses are useful in complementing genetic analyses when studying functions of duplicated genes.
Sequence diversity in recombination pathway genes was examined in 31 maize inbreds and 14 teosinte accessions. The maize inbreds were: A188, A344, B73, B97, CML52, CML69, CML103, CML228, CML247, CML277, CML322, CML333, CO106, CO125, CO255, HP301, Il14h, Ki3, Ki11, Ky21, M37W, M162w, Mo17, Mo18w, MS71, NC350, NC358, Oh7b, Oh43, P39, and Tx303). The teosinte accessions included nine Z. mays ssp. parviglumis lines, eight of which came from a set developed by John Doebley (University of Wisconsin, Madison) (TIL01, TIL02, TIL05, TIL07, TIL11, TIL15, TIL16, and TIL17), and one from the CIMMYT collection (TL74A J2 K67-5). The other teosinte lines were Z. mays ssp. mexicana (K69-7 and BA93 WST 85-2), Z. mays ssp. huehuetenangensis (TL93B Teo Huehue), Z. diploperennis (JAL87 Las Joyas), and Z. luxurians (TL92B TEO-Guate). Seeds and/or tissue samples for all lines except A188, A344, and Mo17 were kindly provided by Ed Buckler (USDA-ARS and Cornell University, Ithaca, NY).
Genomic regions of maize recombination genes were identified in the whole-genome sequence of the maize B73 inbred  using sequences of known Arabidopsis and maize recombination genes as BLAST queries. Gene coding regions were delineated using full-length cDNA and EST sequences available in the GenBank. For Mlh1, Msh4, Mus81, and Recq4, only partial EST sequences were present in the GenBank. We determined the full coding regions of these genes using RT-PCR, which was performed as previously described .
In addition to full-length gene copies, we discovered in the maize genome truncated fragments of Dmc1, Mlh1, and Spo11-2 (Additional file 1: Table S1). These partial gene copies represented different fragments of the corresponding genes, always included several exons and introns, and exhibited nearly 100% sequence identity to the corresponding regions of their full-length relatives. They were flanked by DNA showing no similarities to the full-length genes. These observations suggested that the truncated copies of Dmc1, Mlh1, and Spo11-2 are relatively recent pseudogenes.
To obtain sequences of the eleven recombination genes from the set of maize inbreds and teosinte lines for sequence polymorphism analyses, PCR primers (Additional file 1: Table S3) were designed to amplify full-length genomic regions of each gene. Nearly all primers were designed to anneal in introns to obtain entire coding regions and ensure that only orthologous sequences were amplified. When selecting primer sites, we avoided regions present in multiple copies in the maize genome sequence.
Sequencing was performed directly on PCR products in both orientations with BigDye v3.1 (Applied Biosystems, Foster City, CA), and analyzed using the Applied Biosystems 3730 automated sequence analyzer. Manual sequence editing was conducted using Sequencher (Gene Codes Corp., Ann Arbor, MI).
Alignments of protein sequences from several species of eukaryotes were performed using ClustalX  and adjusted manually. Alignment gaps were excluded from analyses. Maximum parsimony analyses of protein sequences were conducted using PAUP 4.0 . Bayesian analyses were performed using MrBayes 3.1.2  using the Poisson model . TreeviewPPC  was used to display phylogenetic trees. To compare evolution rates between different branches of phylogenetic trees, we used the Tajima’s 1D relative rate test  implemented in MEGA4 .
Sequence divergence and diversity analyses
Nucleotide diversity measures: π  and θW  were calculated using DNAsp v.5 . K tree analysis was used to examine the rates of divergence of recombination genes across eukaryotes . One of the outcomes of this analysis is the K scale factor, which is a comparison of the overall sizes of two phylogenetic trees. We constructed Bayesian trees based on protein sequences of the recombination proteins and compared them to an arbitrarily selected reference tree constructed from concatenated sequences of all of the examined recombination proteins. To prevent the comparison from being confounded by differences in tree topology, we arbitrarily selected five fairly distantly related taxa, S. cerevisiae, human, Arabidopsis, rice, and the B73 inbred of maize. The K scale factor analysis was conducted with the Ktreedist_v1 program .
To examine DNA sequences for presence of selection signatures, we used the Tajima’s D , Fu and Li’s D , Fu and Li’s F  tests as implemented in DNAsp v.5 . We used coalescent simulations to validate the results of these tests and examine whether deviations from neutral evolution may have been caused by demographic factors rather than selection. To conduct coalescent simulations, we utilized Hudson’s ms program . We generated 10,000 coalescent simulations using previously described parameters . A conservative assumption of no intra-locus recombination was used in the simulations . The population mutation parameter θ was estimated from teosinte data. To simulate the domestication bottleneck, the value for the bottleneck severity parameter (k) was set for 2.45 . This parameter is the ratio of population size during bottleneck to bottleneck duration. Critical values for neutrality tests were computed using the msstats software (https://github.com/molpopgen/msstats) modified by Eli Stahl.
We also used the likelihood ratio test (LRT) , which examines ratios of non-synonymous to synonymous nucleotide substitution rates and conducts pair-wise comparisons between several models that describe different selection patterns defined by the ratio of non-synonymous (dN) to synonymous (dS) substitution rates to identify the best-fitting model for each gene. These patterns include purifying selection (dN/dS < 1), neutral evolution (dN/dS = 1), and positive selection (dN/dS > 1). The analysis was performed with the codeml program in the PAML package  using coding regions of the recombination genes. The LRT method is not reliable when recombination is frequent among the examined haplotypes . Therefore, prior to the LRT analysis, we determined that recombination frequencies in the coding region of each gene did not exceed the acceptable limits  using the Genetic Algorithm for Recombination Detection (GARD) method  conducted using a web interface (http://www.datamonkey.org/dataupload.php). This method identifies recombination breakpoints in sequence alignments by searching for phylogenetic incongruence.
Protein structure predictions
To determine the locations of polymorphic amino acid residues in three-dimensional protein structures, we conducted protein structure prediction analyses. Maize protein sequences were threaded using Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml) on the available empirical three-dimensional structures of the MutL transducer domain (MMDB ID: 10447) in MLH1, the phosphoesterase domain (MMDB ID: 34451) in MRE11, the ERCC domain (MMDB ID: 52594) in MSH4, the ERCC domain (MMDB ID: 52594) in MUS81, the BRC repeat (MMDB ID: 21264) in RAD51, the DEXDc domain (MMDB ID: 13107), the HELICc domain (MMDB ID: 12961), the REQC domain (MMDB ID: 36694), and the HRDC domain (MMDB ID: 34680) in SGS1, and the TOPRIM domain (MMDB ID: 11634) in SPO11.
Number of non-synonymous substitutions per non-synonymous site
Number of synonymous substitutions per synonymous site
Genetic Algorithm for Recombination Detection
Likelihood ratio test
Scaled measure of the number of polymorphic nucleotide sites per nucleotide
Nucleotide diversity; average number of nucleotide differences per site between any two sequences in sample
Nucleotide diversity at non-synonymous sites
Nucleotide diversity at synonymous sites
Gaut BS, Wright SI, Rizzon C, Dvorak J, Anderson LK. Recombination: an underappreciated factor in the evolution of plant genomes. Nat Rev Genet. 2007;8:77–84.
Koehler KE, Cherry JP, Lynn A, Hunt PA, Hassold TJ. Genetic control of mammalian meiotic recombination. I. Variation in exchange frequencies among males from inbred mouse strains. Genetics. 2002;162:297–306.
Piperno DR, Ranere AJ, Holst I, Iriarte J, Dickau R. Starch grain and phytolith evidence for early ninth millennium B.P. maize from the Central Balsas River Valley, Mexico. Proc Natl Acad Sci U S A. 2009;106:5019–24.
Yamasaki M, Tenaillon MI, Bi IV, Schroeder SG, Sanchez-Villeda H, Doebley JF, Gaut BS, McMullen MD. A large-scale screen for artificial selection in maize identifies candidate agronomic loci for domestication and crop improvement. Plant Cell. 2005;17:2859–72.
Waterworth WM, Altun C, Armstrong SJ, Roberts N, Dean PJ, Young K, Weil CF, Bray CM, West CE. NBS1 is involved in DNA repair and plays a synergistic role with ATM in mediating meiotic homologous recombination in plants. Plant J. 2007;52:41–52.
Buis J, Wu Y, Deng Y, Leddon J, Westfield G, Eckersdorff M, Sekiguchi JM, Chang S, Ferguson DO. Mre11 nuclease activity has essential roles in DNA repair and genomic stability distinct from ATM activation. Cell. 2008;135:85–96.
Mannuss A, Dukowic-Schulze S, Suer S, Hartung F, Pacher M, Puchta H. RAD5A, RECQ4A, and MUS81 have specific functions in homologous recombination and define different pathways of DNA repair in Arabidopsis thaliana. Plant Cell. 2010;22:3318–30.
Seguela-Arnaud M, Crismani W, Larcheveque C, Mazel J, Froger N, Choinard S, Lemhemdi A, Macaisne N, Van Leene J, Gevaert K, et al. Multiple mechanisms limit meiotic crossovers: TOP3alpha and two BLM homologs antagonize crossovers in parallel to FANCM. Proc Natl Acad Sci U S A. 2015;112:4713–8.
Mercier R, Jolivet S, Vezon D, Huppe E, Chelysheva L, Giovanni M, Nogue F, Doutriaux MP, Horlow C, Grelon M, et al. Two meiotic crossover classes cohabit in Arabidopsis: one is dependent on MER3, whereas the other one is not. Curr Biol. 2005;15:692–701.
Chen C, Zhang W, Timofejeva L, Gerardin Y, Ma H. The Arabidopsis ROCK-N-ROLLERS gene encodes a homolog of the yeast ATP-dependent DNA helicase MER3 and is required for normal meiotic crossover formation. Plant J. 2005;43:321–34.
Higgins JD, Armstrong SJ, Franklin FC, Jones GH. The Arabidopsis MutS homolog AtMSH4 functions at an early step in recombination: evidence for two classes of recombination in Arabidopsis. Genes Dev. 2004;18:2557–70.
Li J, Harper LC, Golubovskaya I, Wang CR, Weber D, Meeley RB, McElver J, Bowen B, Cande WZ, Schnable PS. Functional analysis of maize RAD51 in meiosis and double-strand break repair. Genetics. 2007;176:1469–82.
Couteau F, Belzile F, Horlow C, Grandjean O, Vezon D, Doutriaux M-P. Random chromosome segregation without meiotic arrest in both male and female meiocytes of a dmc1 mutant of Arabidopsis. Plant Cell. 1999;11:1623–34.
Higgins JD, Ferdous M, Osman K, Franklin FC. The RecQ helicase AtRECQ4A is required to remove inter-chromosomal telomeric connections that arise during meiotic recombination in Arabidopsis. Plant J. 2011;65:492–502.
Malik SB, Ramesh MA, Hulstrand AM, Logsdon Jr JM. Protist homologs of the meiotic Spo11 gene and topoisomerase VI reveal an evolutionary history of gene duplication and lineage-specific loss. Mol Biol Evol. 2007;24:2827–41.
Mimida N, Kitamoto H, Osakabe K, Nakashima M, Ito Y, Heyer WD, Toki S, Ichikawa H. Two alternatively spliced transcripts generated from OsMUS81, a rice homolog of yeast MUS81, are up-regulated by DNA-damaging treatments. Plant Cell Physiol. 2007;48:648–54.
Shingu Y, Tokai T, Agawa Y, Toyota K, Ahamed S, Kawagishi-Kobayashi M, Komatsu A, Mikawa T, Yamamoto MT, Wakasa K, et al. The double-stranded break-forming activity of plant SPO11s and a novel rice SPO11 revealed by a Drosophila bioassay. BMC Mol Biol. 2012;13:1.
McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, Flint-Garcia S, Thornsberry J, Acharya C, Bottoms C, et al. Genetic properties of the maize nested association mapping population. Science. 2009;325:737–40.
Pawlowski WP, Wang CJ, Golubovskaya IN, Szymaniak JM, Shi L, Hamant O, Zhu T, Harper L, Sheridan WF, Cande WZ. Maize AMEIOTIC1 is essential for multiple early meiotic processes and likely required for the initiation of meiosis. Proc Natl Acad Sci U S A. 2009;106:3603–8.
Sterken R, Kiekens R, Coppens E, Vercauteren I, Zabeau M, Inze D, Flowers J, Vuylsteke M. A population genomics study of the Arabidopsis core cell cycle genes shows the signature of natural selection. Plant Cell. 2009;21:2987–98.
Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, Elshire RJ, Gaut B, Geller L, Glaubitz JC, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet. 2012;44:803–7.
Esch E, Szymaniak JM, Yates H, Pawlowski WP, Buckler ES. Using crossover breakpoints in recombinant inbred lines to identify quantitative trait loci controlling the global recombination frequency. Genetics. 2007;177:1851–8.
Markmann-Mulisch U, Hadi MZ, Koepchen K, Alonso JC, Russo VE, Schell J, Reiss B. The organization of Physcomitrella patens RAD51 genes is unique among eukaryotic organisms. Proc Natl Acad Sci U S A. 2002;99:2959–64.
Puizina J, Siroky J, Mokros P, Schweizer D, Riha K. Mre11 deficiency in Arabidopsis is associated with chromosomal instability in somatic cells and Spo11-dependent genome fragmentation during meiosis. Plant Cell. 2004;16:1968–78.
Li W, Chen C, Markmann-Mulisch U, Timofejeva L, Schmelzer E, Ma H, Reiss B. The Arabidopsis AtRAD51 gene is dispensable for vegetative development but required for meiosis. Proc Natl Acad Sci U S A. 2004;101:10596–601.
Bishop MJ, Friday AE. Tetropad relationships: the molecular evidence. In: Patterson C, editor. Molecules and morphology in evolution: conflict or compromise? Cambridge, England: Cambridge University Press; 1987. p. 123–39.
We thank Teresa Pawlowska, Susan McCouch, and members of the Pawlowska and Pawlowski labs for advice and comments.
This research was supported by the National Science Foundation grants DBI-0702454 and IOS-1025881, and a grant US-4828-15 from BARD, the United States–Israel Binational Agricultural Research and Development Fund.
Availability of data and materials
Sequences generated in this study can be found in the GenBank under accession numbers: KY423515–KY423555 (Dmc1), KY308233–KY308265 (Mlh1), KY308195–KY308232 (Mre11A), KY423556–KY423595 (Mre11B), KY423596–KY423637 (Msh4), KY423638–KY423665 (Mus81-1), KY423666–KY423707 (Rad51A1), KY423708–KY423748 (Rad51A2), KY423749–KY423784 (Recq4), KY423785–KY423827 (Spo11-1), and KY423828–KY423867 (Spo11-2).
GKS and WPP designed the study. GKS and TW performed the experiments. GKS and WPP conducted the analyses. GKS and WPP wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Authors and Affiliations
School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
Gaganpreet K. Sidhu, Tomasz Warzecha & Wojciech P. Pawlowski
Current address: Institute for Cancer Genetics, Columbia University, New York, NY, 10032, USA
Gaganpreet K. Sidhu
Permanent address: Department of Plant Breeding and Seed Science, Agricultural University, Krakow, Poland
Table S1. Maize genes encoding key meiotic recombination proteins. Table S2. Accession numbers for sequences from GenBank used in the phylogenetic analysis of recombination genes. Table S3. PCR primers that were used to amplify sequences of meiotic recombination genes from maize. Figure S1. Phylogeny reconstructions of the DMC1, MLH1, MSH4, and RECQ4 proteins in eukaryotes based on the Bayesian and maximum parsimony methods. Figure S2. Comparison of nucleotide sequence diversity rates (π) in the promoter, coding, and intron regions of meiotic recombination genes in Z. mays ssp. parviglumis and in tropical and temperate maize hybrids. (PDF 264 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.