Genetic recombination is associated with intrinsic disorder in plant proteomes
© Yruela and Contreras-Moreira; licensee BioMed Central Ltd.Contreras-Moreira 2013
Received: 24 May 2013
Accepted: 31 October 2013
Published: 9 November 2013
Intrinsically disordered proteins, found in all living organisms, are essential for basic cellular functions and complement the function of ordered proteins. It has been shown that protein disorder is linked to the G + C content of the genome. Furthermore, recent investigations have suggested that the evolutionary dynamics of the plant nucleus adds disordered segments to open reading frames alike, and these segments are not necessarily conserved among orthologous genes.
In the present work the distribution of intrinsically disordered proteins along the chromosomes of several representative plants was analyzed. The reported results support a non-random distribution of disordered proteins along the chromosomes of Arabidopsis thaliana and Oryza sativa, two model eudicot and monocot plant species, respectively. In fact, for most chromosomes positive correlations between the frequency of disordered segments of 30+ amino acids and both recombination rates and G + C content were observed.
These analyses demonstrate that the presence of disordered segments among plant proteins is associated with the rates of genetic recombination of their encoding genes. Altogether, these findings suggest that high recombination rates, as well as chromosomal rearrangements, could induce disordered segments in proteins during evolution.
KeywordsChromosome Evolution Intrinsically disordered proteins Orthologues Plant genome Recombination rate
A significant fraction of known eukaryotic genomes encode for proteins that contain regions that do not fold into a well-defined three-dimensional (3D) structure. These proteins are named intrinsically unstructured or disordered proteins (IDPs) and normally carry out signalling and regulatory functions [1–5]. These proteins might be either entirely disordered or partially disordered, characterised by regions spanning just a few (<10) consecutive disordered residues (loops in otherwise well-structured proteins) or long stretches (≥30) of contiguously disordered residues. It is thought that disordered regions confer dynamic flexibility to proteins, allowing transitions between different structural states . The possible utility of such regions was first proposed by Linus Pauling . More recently, computational predictions of disordered regions have discovered that IDPs are prevalent in proteomes, and have increased during evolution. Indeed, it is predicted that 30% to 60% of proteins contain stretches of 30 or more disordered residues, being multicellular eukaryotes more enriched in predicted disordered segments than unicellular eukaryotes and prokaryotes . These results suggest that proteome size, organism complexity and proteome disorder are related. Nevertheless, no overall correlations have been found apart from the clear gain in predicted disorder from prokaryotes to eukaryotes . The relationship between low complexity proteins (LCPs)  and recombination rate has been discussed . These authors suggested that the evolution of LCPs in malaria parasite Plasmodium falciparum might be related to their high genomic A + T content and recombination rates. However, although low complexity and intrinsic disorder share some structural and sequence similarities, they are distinct phenomena . There is evidence that the unstructured state, common to all living organisms, is essential for basic cellular functions linked with complex responses to environmental stimuli and communication between cells [9–14]. Moreover, structural disorder is critical for some protein-protein interactions, the assembly of large protein complexes and the modulation of protein activity.
The frequency of IDPs in 12 complete plant proteomes, including vascular plants, bryophyte and chlorophyta, has been previously estimated by applying the DISOPRED2 algorithm . That work focused on proteins encoded by genes transferred from the chloroplast to the nucleus and reported a strong correlation between the frequency of disorder of transferred and nuclear-encoded proteins, even for polypeptides that play functional roles back in the chloroplast. Moreover, it suggested that the distribution of disordered and non-disordered segments in proteins could be to some extent random, as it showed that orthologous proteins across different species do not necessarily conserve disordered segments, despite presumably carrying out similar functions.
The evolutionary history of IDPs seems to be multi-parametric, as high disorder content in proteomes has been linked to a variety of observations: i) high G + C content, (i.e., in order to explain some exceptions in bacteria, such as Mycobacterium tuberculosis, Myxococcus xanthus and Streptomyces coelicolor) [9, 16, 17]; ii) expanding multi-domain protein families ; iii) domain arrangements ; or iv) alternative splicing events . In order to gain additional biological and evolutionary insights into this topic, the frequency of long disordered segments among homologous proteins of 5 plant proteomes is further studied in this work. Likewise, the distribution of IDPs along the chromosomes of model eudicot (Arabidopsis thaliana) and monocot (Oryza sativa) plant species is analyzed. Finally, the possible correlations between structural disorder, genetic recombination rates and G + C content are investigated.
Distribution of disordered segments along proteins
Percentages of disordered residues 1 within and outside of protein domains, including linkers, N-terminal and C-terminal regions
Analysis of disorder in paralogous proteins of plants
Percentages of predicted disordered segments and their conservation across plant paralogues
Genetic recombination correlates with the evolutionary dynamics of disordered segments in A. thaliana and O. sativa
G + C content correlates with disordered protein segments in A. thaliana and O. sativa
An investigation about the relationship between gene G + C content and the frequency of disordered residues within the complete proteomes of A. thaliana and O. sativa was carried out. The G + C content of complete nucleotide sequences (G + Ctotal), disordered (G + Cdisordered) and ordered (G + Cordered) segments was calculated for all predicted IDPs (Additional file 1: Table S2). The results show that disordered regions are modestly but significantly enriched in G + C bases (+3.0% in O. sativa and +1.6% in A. thaliana, p < 0.01). Similar enrichments were observed in Sorghum bicolor (monocot) and Arabidopsis lyrata and Populus trichocarpa (eudicots) (Additional file 1: Table S2).
The G + Cdisordered frequency was plotted versus the frequency of disordered residues calculated for chromosome windows in monocot (2) and eudicot (3) plant species (Additional file 1: Figure S2). In the case of A. thaliana the obtained Pearson correlation coefficients were between r = 0.773 (p < 1E-4) for chromosome 5 and r = 0.928 (p < 1E-4) for chromosome 3. In the case of O. sativa the correlation coefficients were between r = 0.737 (p < 1E-4) for chromosome 8 and r = 0.869 (p < 1E-4) for chromosome 11. These results unveil a strong dependence (r2 = 0.78 for A. thaliana and r2 = 0.66 for O. sativa) of these two variables. Similar results were obtained for the eudicots A. lyrata (r2 = 0.84) and P. trichocarpa (r2 = 0.76) and the monocot S. bicolor (r2 = 0.80).
G + C content correlates with recombination rates in A. thaliana and O. sativa
In order to re-assess the dependencies among protein disorder, G + C content and recombination rates, a multiple regression analysis was performed. Taking the A. thaliana data a linear model was obtained with multiple r2 = 0.52, indicating that protein disorder was only significantly dependent on G + C content (p < 2E-16). The rice model (r2 = 0.46) confirmed the main contribution of G + C content to protein disorder (p < 2E-16), but also supported a minor, but significant role of recombination rates (p = 0.013). Note that gene density within A. thaliana chromosomal regions where recombination rates have been reported is about an order of magnitude larger than in O. sativa (see Methods). These analyses required the calculation of frequencies of disordered residues as explained in Methods (Additional file 1: Figures S3A and S3B).
In a previous paper, the analysis of 12 plant proteomes revealed a similar occurrence of IDPs to that found in other eukaryotic organisms , and concerning their taxonomic distribution, no differences were observed for IDPs among plant species. However, in some cases, homologous sequences displayed variations in the frequency of disordered segments. The inspection of 5 representative plant proteomes performed in this work indicated that on average 36% of paralogues do not conserve their composition of disordered segments. These proteins seem to be involved in regulatory processes, as most IDPs are, and therefore there is no obvious functional argument to explain their differential conservation behaviour. This result fits well with a previous study in yeast, which reported that non-conserved disordered proteins cannot be clearly associated with any function, and are expressed at low levels .
Gene duplication is a prominent feature in plant genome evolution with likely implications in genetic diversity and adaptation, although there is not a direct causal link between an adaptive phenotype and a specific gene duplication event because they usually occur at different times . Duplicate genes arise either by regional genomic events or genome-wide polyploidization. In plants, the last is the most common mechanism. For instance, in Arabidopsis, duplications most probably resulted from a single tetraploidization event occurred some 65 million years ago . This phenomenon presumably involved most genomic regions, although it has been found that centromeric regions have significantly fewer duplicated genes than chromosome arms [25, 26]. In addition to these events, which are charted in physical maps, the available genetic maps expose the empirical recombination rates along each chromosome. It is known that recombination rates vary substantially along genomic regions. For instance, the average recombination rate ranges from 0.3 cM/Mb to 251 cM/Mb in A. thaliana and from 0.39 to 0.42 cM/Mb in O. sativa. Peak recombination rates can indicate hotspots, which are opposed to regions of suppressed recombination (coldspots). An overall positive correlation between gene density and recombination rate has been reported in model plant Brachypodium distachyon. On the contrary, a negative correlation has been observed between gene density and the frequency of repetitive regions, and rearranged chromosomal segments that retained centromeric repetitive sequences .
The analyses reported in this work show, for the first time, positive correlations between genetic recombination rates and protein disorder frequency in A. thaliana and O. sativa. Moreover, the results expose that certain proteins with substantially more predicted disordered segments (i.e., 5– 7 segments) than the average (i.e., 2– 3 segments) are located within recombination hotspots  (Figure 3B). These findings suggest that the physical location of paralogous genes along chromosomes could partially explain the differences found in their protein disorder composition. Genetic recombination could then be considered an evolutionary force contributing to structural disorder in proteins, at least in plants. Previous reports already discussed a relationship between low complexity proteins (LCPs) and recombination rate in Plasmodium falciparum. Interestingly, in this parasite up to 50% proteins are longer than their yeast orthologues due likely to insertions or expansions of LC regions .
Changes in genomic architecture are a formidable force in the evolution of plants, and structural chromosome rearrangements similar to those of A. lyrata and A. thaliana[21, 22] are frequent. As a side-effect, these processes can drive domain sorting in proteins or the formation of novel domains . Indeed, it has been reported that a significant portion of emerged novel domains during evolution are highly disordered . Thus, evolutionary increase of protein disorder could be driven by modular or domain exchanges. The link between intrinsic structural disorder and modularity has been recently investigated in the human genome, finding that high levels of disorder within proteins are encoded by symmetric exons, possibly derived from internal tandem duplications . The data in this work clearly indicate that disordered segments are mostly located outside annotated domains, with a similar frequency at both N- and C- termini, and a rather low occurrence in linker regions.
This paper reports strong positive correlations between G + C content in coding sequences and predicted protein disorder in 5 plant proteomes. This finding is in agreement with computational studies in Archaea and Bacteria, which established relationships between G + C composition and intrinsic protein disorder . During meiotic recombination, parental chromosomes undergo either large-scale genetic exchanges by crossover or small-scale exchanges by gene conversion. There is evidence that in some eukaryote species gene conversion affecting G/C:A/T heterozygous sites yields more frequently G/C than A/T alleles. This process is known as GC-biased gene conversion (gBGC) and increases the GC content of recombining DNA over evolutionary time [34–37]. Indeed gBGC is considered the major mechanism explaining the variation of G + C content within and between eukaryote genomes, as coding sequences rich in G + C bases have a higher content of Arg, Gly, Ala and Pro codons, precisely those amino acids overrepresented in IDPs [2, 15, 33, 38]. These composition differences explain the G + C content reported in this work for ordered and disordered regions.
Previous papers have published strong positive correlations in human, yeast, Caenorhabditis elegans, Drosophila melanogaster and two rice species between crossover rates and G + C composition [39–42]. On the contrary, the work of Wu et al. about recombination hotspots and coldspots in O. sativa did not reveal a clear relationship between these two variables. Moreover, Pessia et al.  found no significant correlations in the genomes of A. thaliana, P. trichocarpa and Vitis vinifera and even reported a negative correlation not consistent with gBGC in S. bicolor. A negative correlation was also reported for A. thaliana chromosome 4 . At first sight these apparent contradictions could be telling that the relationship between recombination and G + C composition might be dependent on the plant species. Yet, a review of these studies reveals that G + C measurements are not always comparable, and that recombination rates are estimated with different resolution thresholds. For instance, theoretical equilibrium G + C values cannot be directly compared to empirical G + C counts in sequenced genomes. Regarding this open question, this paper reports a significant but weak association between recombination and G + C content in A. thaliana and O. sativa. When a multiple regression analysis was carried out to delineate their influence on protein disorder, clearly the effect of G + C content was stronger than recombination. Taken together, these observations support a strong molecular-based dependency of protein disorder and G + C content, while suggesting a much weaker relationship between G + C and recombination. In other words, codon composition of amino acid residues common in disordered segments is directly translated into higher G + C values. However, the proposed link between gBGC and G + C content is much harder to capture with the kind of data used in this work.
The results demonstrate that the presence of disordered segments among plant protiens is associated with the rates of genetic recombination of their encoding genes. High recombination rates, as well as chromosomal rearrangements, could induce disordered segments in proteins during evolution. Additionally, the results indicated a stronger molecular-based dependency of protein disorder and G + C content and much weaker dependency between G + C content and recombination rate in plant genomes.
Proteomic, GO and chromosome map databases
Complete plant proteomes, orthology and paralogy assignments and GO annotations for Arabidopsis thaliana (AT), Oryza sativa (OS), Populus trichocarpa (PT), Sorghum bicolor (SB) were retrieved from PLAZA v.1 (http://bioinformatics.psb.ugent.be/plaza_v1/), and Arabidopsis lyrata (AL) from PLAZA v.2.5 (http://bioinformatics.psb.ugent.be/plaza/). Note that the same data versions of a previous paper  were employed to facilitate comparisons to the results published here. Genetic maps from A. thaliana and O. sativa were retrieved from http://www.arabidopsis.org/servlets/mapper and http://rapdb.dna.affrc.go.jp/, respectively. Superfamily-defined (http://supfam.cs.bris.ac.uk) protein domains for A. thaliana and O. sativa were retrieved from http://bioinformatics.psb.ugent.be.
Empirical rates of recombination in A. thaliana and O. sativa were taken from Colomé-Tatché et al. and IRGSP/RAP build 5 annotation data (http://rapdb.dna.affrc.go.jp/), respectively. For Pearson correlation analyses, chromosomes were split in fragments corresponding to the regions of the genetic map where recombination rates had been empirically determined. The mean sizes of those fragments were 0.83 kb ± 0.14 and 0.14 ± 0.005 kb for A. thaliana and O. sativa, respectively. The mean numbers of contained genes were 178 ± 36 and 19 ± 7, respectively. Recombination rates (cM) were normalized by dividing by window size (Mb).
Predictions of intrinsic disorder
DISOPRED2 v2.42  disorder predictions were performed for all protein sequences annotated in 5 plant species. All input sequences, plus the reference database uniref90, were low-complexity filtered with PFILT and scanned with 3 iterations of blastpgp with an E-value cut-off of 0.001. Please check the previous paper for a benchmark on disorder predictions in plant proteins . In order to put these results on a genomic scale, and to correct for regions with distinct gene density, chromosomes were split in non-overlapping windows and the observed number of disordered segments of length (L) ≥30 amino acids in each window divided by the total number of open reading frames contained therein.
Frequencies of disordered residues were also computed for their direct comparison with windowed G + C contents, which are residue-based, in a multiple regression analysis, as explained below. These frequencies were defined as the number of disordered residues contained in segments of L ≥ 30 over the length of the encoding gene. It is worth mentioning that linear regressions using frequencies of disordered residues yield coefficients somewhat lower than those shown in Figure 3.
Calculation of G + C content
Three variants of G + C content were computed: i) G + Ctotal, the total number of G + C bases over the complete nucleotide sequence of a gene; ii) G + Cdisordered, the number of G + C bases spanning the predicted disordered segments in a gene; and iii) G + Cordered = G + Ctotal - G + Cdisordered. The G + Cdisordered frequency was defined as G + Cdisordered / gene_length.
Student’s t tests on G + C content were performed after checking the normality of data. The index of dispersion is defined as the ratio of the variance to the mean physical location of annotated genes, and was calculated for individual chromosomes. Multiple regression linear models of frequency of disordered residues as a function of both G + C content and recombination rates were calculated with the lm function of the R package (http://www.R-project.org) with A. thaliana and O. sativa data. The obtained linear models were subsequentially evaluated with ANOVA tests in order to assess the contribution of each variable. Heteroscedasticity and normality diagnostic plots were performed to validate the model.
We thank S. Begueria for help in statistical analysis. This work was supported by grants from Ministerio de Economía y Competitividad (MAT2011-23861) and Gobierno de Aragón (DGA-GC B18 to I.Y. and DGA-GC A06 to B.C.-M.). All these grants were partially funded by the EU FEDER Program. We acknowledge support from CSIC Open Access Publication Support Initiative, through its Unit of Information Resources for Research (URICI), to help cover the Open Access fee.
- Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ: Intrinsic protein disorder in complete genomes. Genome Inform. 2000, 11: 161-171.Google Scholar
- Dyson HJ, Wright PE: Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005, 6: 197-208.View ArticlePubMedGoogle Scholar
- Tompa P, Taylor and Francis Group: Structure and Function of Instrinsically Disordered Proteins. 2009, Boca Raton, FL: CRC PressView ArticleGoogle Scholar
- Schlessinger A, Schaefer C, Vicedo E, Schmidberger M, Punta M, Rost B: Protein disorder – a breakthrough invention of evolution?. Curr Opin Struct Biol. 2011, 21: 412-418.View ArticlePubMedGoogle Scholar
- Bellay J, Han S, Michaut M, Kim T, Costanzo M, Andrews BJ, Boone C, Bader GD, Myers CL, Kim PM: Bringing order to protein disorder through comparative genomics and genetic interactions. Genome Biol. 2011, 12: R14-PubMed CentralView ArticlePubMedGoogle Scholar
- Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK: Protein flexibility and intrinsic disorder. Protein Sci. 2004, 13: 71-80.PubMed CentralView ArticlePubMedGoogle Scholar
- Pauling L: A theory of the structure and process of formation of antibodies. J Am Chem Soc. 1940, 62: 2643-2657.View ArticleGoogle Scholar
- Radivojak P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK: Intrinsic disorder and functional proteomics. Biophys J. 2007, 92: 1439-1456.View ArticleGoogle Scholar
- Schad E, Tompa P, Hegyi H: The relationship between proteome size, structural disorder and organism complexity. Genome Biol. 2011, 12: R120-PubMed CentralView ArticlePubMedGoogle Scholar
- Wootton JC: Non-globular domains in protein sequences: automated segmentation using complexity measures. Compt Chem. 1994, 17: 149-163.View ArticleGoogle Scholar
- De Pristo MA, Zilversmit MM, Hartl DL: On the abundance, amino acid composition, and evolutionary dynamics of low complexity regions in proteins. Gene. 2006, 378: 19-30.View ArticleGoogle Scholar
- Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK: Sequence complexity of disordered protein. Proteins. 2001, 42: 38-48.View ArticlePubMedGoogle Scholar
- Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK: Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol. 2002, 323: 573-584.View ArticlePubMedGoogle Scholar
- Tompa P: Intrinsically unstructured proteins. Trends Biochem Sci. 2002, 27: 527-533.View ArticlePubMedGoogle Scholar
- Yruela I, Contreras-Moreira B: Protein disorder in plants: a view from the chloroplast. BMC Plant Biol. 2012, 12: 165-PubMed CentralView ArticlePubMedGoogle Scholar
- Hegyi H, Tompa P: Increased structural disorder of proteins encoded on human sex chromosomes. Mol Biosyst. 2012, 8: 229-236.View ArticlePubMedGoogle Scholar
- Singer T, Fan Y, Chang HS, Zhu T, Hazen SP, Briggs SP: A high-resolution map of Arabidopsis recombinant inbred lines by whole-genome exon array hybridization. PLoS Genet. 2006, 15: e144-View ArticleGoogle Scholar
- Bornberg E, Albà MM: Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol. 2013, doi:10.1016/j.sbi.2013.02.012Google Scholar
- Light S, Elofsson A: The impact of splicing on protein domain architecture. Curr Opin Struct Biol. 2013, doi:10.1016/j.sbi.2013.02.013Google Scholar
- Dunker AK, Silman I, Uversky VN, Sussman JL: Function and structure of inherently disordered proteins. Curr Opin Struct Biol. 2008, 18: 756-764.View ArticlePubMedGoogle Scholar
- Kawabe A, Hansson B, Forrest A, Hagenblad J, Charlesworth D: Comparative gene mapping in Arabidopsis lyrata chromosomes 6 and 7 and A. thaliana chromosome IV: evolutionary history, rearrangements and local recombination rates. Genetical Res. 2006, 88: 45-56.View ArticleGoogle Scholar
- Hansson B, Kawabe A, Preuss S, Kuittinen H, Charlesworth D: Comparative gene mapping in Arabidopsis lyrata chromosomes 1 and 2 and the corresponding A. thaliana chromosome 1: recombination rates, rearrangements and centromere location. Genetical Res. 2006, 87: 75-85.View ArticleGoogle Scholar
- Lawton-Rauth A: Evolutionary dynamics of duplicate genes in plants. Mol Phylogenet Evol. 2003, 29: 396-409.View ArticleGoogle Scholar
- Ziolkowski PA, Blanc G, Sadowski J: Structural divergence of chromosomal segments that arose from successive duplication events in the Arabidopsis genome. Nucleic Acids Res. 2003, 31: 1339-1350.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang L, Gaut BS: Does recombination shape the distribution and evolution of tandemly arrayed genes (TAGs) in the Arabidopsis thaliana genome?. Genome Res. 2003, 13: 2533-2540.PubMed CentralView ArticlePubMedGoogle Scholar
- Kawabe A, Hansson B, Hagenblad J, Charlesworth D: Centromere locations and associated chromosome rearrangements in Arabidopsis lyrata and A. thaliana. Genetics. 2006, 173 (Forrest A): 1613-1619.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu J, Mizuno H, Hayashi-Tsugane M, Ito Y, Chiden Y, Fujisawa M, Katagiri S, Saji S, Yoshiki S, Karasawa W, Yoshihara R, Hayashi A, Kobayashi H, Ito K, Hamada M, Okamoto M, Ikeno M, Ichikawa Y, Katayose Y, Yano M, Matsumoto T, Sasaki T: Physical maps and recombination frequency of six rice chromosomes. Plant J. 2003, 36: 720-730.View ArticlePubMedGoogle Scholar
- Huo N, Garvin DF, You FM, McMahon S, Luo MC, Gu YQ, Lazo GR, Vogel JP: Comparison of a high-density genetic linkage map to genome features in the model grass Brachypodium distachyon. Theor Appl Genet. 2011, 123: 455-464.View ArticlePubMedGoogle Scholar
- Colomé-Tatché M, Cortijo S, Wardenaar R, Morgado L, Lahouze B, Sarazin A, Etcheverry M, Martin A, Feng S, Duvernois-Berthet E, Labadie K, Wincker P, Jacobsen SE, Jansen RC, Colot V, Johannes F: Features of the Arabidopsis recombination landscape resulting from the combined loss of sequence variation and DNA methylation. Proc Natl Acad Sci U S A. 2012, 109: 16240-16245.PubMed CentralView ArticlePubMedGoogle Scholar
- Aravind L, Iyer LM, Wellems TE, Miller LH: Plasmodium biology: genomic gleanings. Cell. 2003, 115: 771-785.View ArticlePubMedGoogle Scholar
- Toll-Riera M, Albà MM: Emergence of novel domains in proteins. BMC Evol Biol. 2013, 13: 47-PubMed CentralView ArticlePubMedGoogle Scholar
- Schad E, Kalmar L, Tompa P: Exon-phase symmetry and intrinsic structural disorder promote modular evolution in the human genome. Nucleic Acids Res. 2013, 41: 4409-4422.PubMed CentralView ArticlePubMedGoogle Scholar
- Pavlović-Lažetić GM, Mitić NS, Kovačević JJ, Obradović Z, Malkov SN, Beljanski MV: Bioinformatics analysis of disordered proteins in prokaryotes. BMC Bioinforma. 2011, 12: 66-View ArticleGoogle Scholar
- Eyre-Walker A: Recombination and mammalian genome evolution. Proc Biol Sci. 1993, 22: 237-243.View ArticleGoogle Scholar
- Galtier N, Piganeau G, Mouchiroud D, Duret L: GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics. 2001, 159: 907-911.PubMed CentralPubMedGoogle Scholar
- Marais G: Biased gene conversion: implications for genome and sex evolution. Trends Genet. 2003, 19: 330-338.View ArticlePubMedGoogle Scholar
- Duret L, Galtier N: Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2003, 10: 285-311.View ArticleGoogle Scholar
- Uversky VN, Oldfield CJ, Dunker AK: Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Plant Physiol Plant Mol Biol. 2008, 37: 215-246.Google Scholar
- Meunier J, Duret L: Recombination drives the evolution of GC content in the human genome. Mol Biol Evol. 2004, 21: 984-990.View ArticlePubMedGoogle Scholar
- Muyle A, Serres-Giardi L, Ressayre A, Escobar J, Glémin S: GC-biased gene conversion and selection affect GC content in the Oryza genus (rice). Mol Biol Evol. 2011, 28: 2695-2706.View ArticlePubMedGoogle Scholar
- Jensen-Seaman MI, Furey TS, Payseur BA, Lu Y, Roskin KM, Chen CF, Thomas MA, Haussler D, Jacob HJ: Comparative recombination rates in the rat, mouse, and human genomes. Genome Res. 2004, 14: 528-538.PubMed CentralView ArticlePubMedGoogle Scholar
- Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005, 14: 310-321.Google Scholar
- Pessia E, Popa A, Mousset S, Rezvoy C, Duret L, Marais GAB: Evidence for widespread GC-biased gene conversion in eukaryotes. Genome Biol Evol. 2012, 4: 675-682.View ArticlePubMedGoogle Scholar
- Drouaud J, Camilleri C, Bourguignon PY, Canaguier A, Bérard A, Vezon D, Giancola S, Brunel D, Colot V, Prum B, Quesneville H, Mézard C: Variation in crossing-over rates across chromosome 4 of Arabidopsis thaliana reveals the presence of meiotic recombination “hot spots”. Genome Res. 2006, 16: 106-114.PubMed CentralView ArticlePubMedGoogle Scholar
- Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004, 20: 2138-2139.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.