In a previous paper, the analysis of 12 plant proteomes revealed a similar occurrence of IDPs to that found in other eukaryotic organisms , and concerning their taxonomic distribution, no differences were observed for IDPs among plant species. However, in some cases, homologous sequences displayed variations in the frequency of disordered segments. The inspection of 5 representative plant proteomes performed in this work indicated that on average 36% of paralogues do not conserve their composition of disordered segments. These proteins seem to be involved in regulatory processes, as most IDPs are, and therefore there is no obvious functional argument to explain their differential conservation behaviour. This result fits well with a previous study in yeast, which reported that non-conserved disordered proteins cannot be clearly associated with any function, and are expressed at low levels .
Gene duplication is a prominent feature in plant genome evolution with likely implications in genetic diversity and adaptation, although there is not a direct causal link between an adaptive phenotype and a specific gene duplication event because they usually occur at different times . Duplicate genes arise either by regional genomic events or genome-wide polyploidization. In plants, the last is the most common mechanism. For instance, in Arabidopsis, duplications most probably resulted from a single tetraploidization event occurred some 65 million years ago . This phenomenon presumably involved most genomic regions, although it has been found that centromeric regions have significantly fewer duplicated genes than chromosome arms [25, 26]. In addition to these events, which are charted in physical maps, the available genetic maps expose the empirical recombination rates along each chromosome. It is known that recombination rates vary substantially along genomic regions. For instance, the average recombination rate ranges from 0.3 cM/Mb to 251 cM/Mb in A. thaliana and from 0.39 to 0.42 cM/Mb in O. sativa. Peak recombination rates can indicate hotspots, which are opposed to regions of suppressed recombination (coldspots). An overall positive correlation between gene density and recombination rate has been reported in model plant Brachypodium distachyon. On the contrary, a negative correlation has been observed between gene density and the frequency of repetitive regions, and rearranged chromosomal segments that retained centromeric repetitive sequences .
The analyses reported in this work show, for the first time, positive correlations between genetic recombination rates and protein disorder frequency in A. thaliana and O. sativa. Moreover, the results expose that certain proteins with substantially more predicted disordered segments (i.e., 5–7 segments) than the average (i.e., 2–3 segments) are located within recombination hotspots  (Figure 3B). These findings suggest that the physical location of paralogous genes along chromosomes could partially explain the differences found in their protein disorder composition. Genetic recombination could then be considered an evolutionary force contributing to structural disorder in proteins, at least in plants. Previous reports already discussed a relationship between low complexity proteins (LCPs) and recombination rate in Plasmodium falciparum. Interestingly, in this parasite up to 50% proteins are longer than their yeast orthologues due likely to insertions or expansions of LC regions .
Changes in genomic architecture are a formidable force in the evolution of plants, and structural chromosome rearrangements similar to those of A. lyrata and A. thaliana[21, 22] are frequent. As a side-effect, these processes can drive domain sorting in proteins or the formation of novel domains . Indeed, it has been reported that a significant portion of emerged novel domains during evolution are highly disordered . Thus, evolutionary increase of protein disorder could be driven by modular or domain exchanges. The link between intrinsic structural disorder and modularity has been recently investigated in the human genome, finding that high levels of disorder within proteins are encoded by symmetric exons, possibly derived from internal tandem duplications . The data in this work clearly indicate that disordered segments are mostly located outside annotated domains, with a similar frequency at both N- and C- termini, and a rather low occurrence in linker regions.
This paper reports strong positive correlations between G + C content in coding sequences and predicted protein disorder in 5 plant proteomes. This finding is in agreement with computational studies in Archaea and Bacteria, which established relationships between G + C composition and intrinsic protein disorder . During meiotic recombination, parental chromosomes undergo either large-scale genetic exchanges by crossover or small-scale exchanges by gene conversion. There is evidence that in some eukaryote species gene conversion affecting G/C:A/T heterozygous sites yields more frequently G/C than A/T alleles. This process is known as GC-biased gene conversion (gBGC) and increases the GC content of recombining DNA over evolutionary time [34–37]. Indeed gBGC is considered the major mechanism explaining the variation of G + C content within and between eukaryote genomes, as coding sequences rich in G + C bases have a higher content of Arg, Gly, Ala and Pro codons, precisely those amino acids overrepresented in IDPs [2, 15, 33, 38]. These composition differences explain the G + C content reported in this work for ordered and disordered regions.
Previous papers have published strong positive correlations in human, yeast, Caenorhabditis elegans, Drosophila melanogaster and two rice species between crossover rates and G + C composition [39–42]. On the contrary, the work of Wu et al. about recombination hotspots and coldspots in O. sativa did not reveal a clear relationship between these two variables. Moreover, Pessia et al.  found no significant correlations in the genomes of A. thaliana, P. trichocarpa and Vitis vinifera and even reported a negative correlation not consistent with gBGC in S. bicolor. A negative correlation was also reported for A. thaliana chromosome 4 . At first sight these apparent contradictions could be telling that the relationship between recombination and G + C composition might be dependent on the plant species. Yet, a review of these studies reveals that G + C measurements are not always comparable, and that recombination rates are estimated with different resolution thresholds. For instance, theoretical equilibrium G + C values cannot be directly compared to empirical G + C counts in sequenced genomes. Regarding this open question, this paper reports a significant but weak association between recombination and G + C content in A. thaliana and O. sativa. When a multiple regression analysis was carried out to delineate their influence on protein disorder, clearly the effect of G + C content was stronger than recombination. Taken together, these observations support a strong molecular-based dependency of protein disorder and G + C content, while suggesting a much weaker relationship between G + C and recombination. In other words, codon composition of amino acid residues common in disordered segments is directly translated into higher G + C values. However, the proposed link between gBGC and G + C content is much harder to capture with the kind of data used in this work.