- Research article
- Open Access
Evolutionary implications of inversions that have caused intra-strand parity in DNA
BMC Genomicsvolume 8, Article number: 160 (2007)
Chargaff's rule of DNA base composition, stating that DNA comprises equal amounts of adenine and thymine (%A = %T) and of guanine and cytosine (%C = %G), is well known because it was fundamental to the conception of the Watson-Crick model of DNA structure. His second parity rule stating that the base proportions of double-stranded DNA are also reflected in single-stranded DNA (%A = %T, %C = %G) is more obscure, likely because its biological basis and significance are still unresolved. Within each strand, the symmetry of single nucleotide composition extends even further, being demonstrated in the balance of di-, tri-, and multi-nucleotides with their respective complementary oligonucleotides.
Here, we propose that inversions are sufficient to account for the symmetry within each single-stranded DNA. Human mitochondrial DNA does not demonstrate such intra-strand parity, and we consider how its different functional drivers may relate to our theory. This concept is supported by the recent observation that inversions occur frequently.
Along with chromosomal duplications, inversions must have been shaping the architecture of genomes since the origin of life.
The most famous of Chargaff's rules is that in DNA, the proportion of A equals that of T, and C that of G . This nucleotide balance is governed by complementary base-pairing rules fundamental to the structure of the double helix . Astonishingly, the nucleotides retain almost the same equality balance in either of the two single strands of DNA  and this phenomenon is sometimes named Chargaff's second parity rule [4–10]. Table 1 provides an illustration, with analysis of large contiguous segments from each human chromosome.
When there is no bias in mutation and selection between complementary strands, base substitution may explain the parity phenomenon [11, 12]. In fact, strand bias has been demonstrated with mutational skews between the two strands, which causes deviation from parity [13, 15]. Bacterial origins of replication were successfully identified by the distribution of such skews [16, 17]. The strand bias of mutations, which can be associated with direction of transcription, is also found in mammalian genomes [18, 19]. In spite of these anomalies, any violation of the second parity phenomena is generally small in magnitude [8, 20].
Although different explanations for this parity phenomenon have been put forth, such as intra-strand base pairing , a simpler explanation for the rule may be DNA duplication and inversion [4, 8, 10]. If double-stranded DNA of any composition undergoes duplication followed by an inversion of the duplicated region, then each strand of the resulting DNA molecule would precisely satisfy Chargaff's second parity rule, so that %A = %T and %C = %G (Fig. 1A).
Not only single nucleotides but also oligonucleotides up to 30 nucleotides (nt) in length can demonstrate the parity phenomenon within strands [5, 7, 8]. In other words, the frequency of a particular oligonucleotide is approximately equal to that of its reverse complementary sequence in the same strand. Since DNA strands are complementary, the frequency of a particular oligonucleotide in one strand approximates that in the opposite strand. Hence, this double-stranded DNA characteristic can also be called "symmetry of complementary DNA strands" [5, 8]. Chargaff's second parity rule ordinarily considers only mononucleotides, which have been extensively studied. However, since a single nucleotide could be deemed a one-nt oligonucleotide, it is plausible that addressing the symmetry of oligonucleotides (high-order strand symmetry) is a more general way of assessing biological meaning. Hereafter, we designate this comprehensive symmetry as "intra-strand parity" and attempt to explain it based on the mechanism of chromosomal inversion. Single nucleotide mutations may be considered to explain mononucleotide parity within strands [11, 12] but have not been effective to explain the extended parity of oligonucleotides .
We propose that inversion events (with or without underlying duplications) might be a sufficient mechanism to explain the phenomenon. To test this, we consider a double-stranded DNA molecule without intra-strand parity but which is long enough to undergo various (stochastic) inversions (Fig. 1B). A n and T n are defined as the frequency of any particular oligonucleotide sequence and its reverse complementary sequence, respectively, in the same strand after n inversions (n > 0). A 0 (0 <A 0 < 1) is the initial frequency of any particular oligonucleotide sequence (which can also be a mononucleotide) in the upper strand. T 0 (0 < T 0 < 1) is the initial frequency of its reverse complementary sequence in the same strand. If we define r n (0 < r n << 1) as the relative length of the n th inversion (Fig. 1B), we obtain these two equations.
A n = An-1- r n (An-1- Tn-1) (1)
T n = Tn-1- r n (Tn-1- An-1)(2)
Equations (1) and (2) mean that an inversion changes A n and T n toward T n and A n , respectively. When the whole sequence is long enough, r n is close to 0. Nevertheless, whatever the size of the inverted region examined, any oligonucleotide sequence will eventually be homogenized between two strands. In other words, A n and T n ultimately converge to be equal to each other, regardless of r n , as long as r n is stochastic (see mathematical derivation in Methods).
Equation (3) is a mathematical explanation of intra-strand parity based on our hypothesis that inversions are sufficient to cause any DNA segment conform to parity. In this way, the vast majority of naturally occurring DNA molecules (chromosomes) will evolve to intra-strand parity via many inversions. Those few that deviate, such as mitochondrial DNA (mtDNA) [8, 9, 17], will have special properties (see below). We presume that any DNA can be made to evolve to intra-strand parity through a process of inversions, and that deviations from parity have been rare in evolution. Inversions must have been occurring as genomes of ancestral organisms were growing in complexity with the acquisition or creation of new genes.
The insertion of repetitive sequences was proposed to be a possible source underlying parity [8, 10]. However, removing apparent repeats from the human and other genomes prior to analysis (see Methods) did not alter the symmetry characteristics of the remaining sequences. (An example of a 28.6-Mb contig from human chromosome 21 is shown in Table 2). Therefore, it is unlikely that insertion of such sequences accounts for the intra-strand parity, either in humans or organisms that have fewer repetitive sequences in their genomes.
We employ radar charts to allow simple visual perception of the high-order symmetry and asymmetry of exemplary DNAs (Fig. 2). Mitochondria are thought to have been derived from bacteria . Mammalian mtDNA (Fig 2C) is an exception that does not demonstrate intra-strand parity [8, 9, 17] whereas mtDNAs from plants and lower eukaryotes do. Mammalian mtDNA may have gradually deviated from its ancestral form . The small circular size, its unique replication mechanism , and extra-nuclear localization could introduce different selective pressures against tolerance of inversions and thus deviation from the more general observation of intra-strand parity.
The mammalian mtDNA offers a natural source of sequence sufficiently deviating from parity to allow us to further test our mathematical explanation. We produced in silico semi-random inversions in human mtDNA. As few as eight 1-kb regularly-distributed inversions (see Methods) would be sufficient to homogenize the two strands of the 16.6-kb mtDNA and create intra-strand parity (Fig. 2D). We also depict a hypothetical inversion in the mtDNA to show the potential for rapid homogenization (Fig. 1C).
Although the lack of intra-strand parity in mammalian mtDNA could be ascribed to its small length, other loci of comparable length (e.g. the TP53 gene, Fig. 2B) do adhere to parity. Unlike other mtDNAs, those of mammals have no intergenic segments and have only one regulatory region per strand. Moreover, unlike among nuclear genomes, the order and direction of genes – as well as biased gene density between the two strands – are strictly conserved among mammalian species . Therefore, it seems that the configuration is already fixed, and that inversions are not tolerated in mammalian mtDNA.
The ubiquity of inversions suggests that they had some advantage in natural selection. Duplications are thought to play an important role in creating genetic variety , however, some duplications are deleterious for organisms, due to sudden increases of gene dosage. To avoid being negatively selected, one of the duplicated copies could undergo mutation such as deletion. Inversions or interchromosomal rearrangements could render the duplicated gene nonfunctional due to its release from interaction with its promoter or other regulatory elements. This may be one reason why many inverted and interchromosomal segmental duplications are found in the human genome [25, 26]. An approximately symmetrical gene distribution between the two strands may have been brought about by these rearrangements .
In some cases, a rearranged genome might confer positive selection. Although we can find syntenic regions among vertebrates, chromosomal organizations can be quite different among species. This suggests an advantage for evolution or speciation. Recently, the importance of gene order and gene position in the three-dimensional nucleus has been suggested . It is likely that genomes continually undergo rearrangement toward optimal positions for each gene and each gene cluster. Our group showed an unexpectedly large number of inversions (from 23 bp to 62 Mb in size) between human and chimpanzee genomes , species which diverged only six million years ago. Although most may be selectively neutral, some likely were selected for, and contributed to the speciation. Many more inversions may also have occurred and may have been negatively selected. Inversions can also give rise to new transcripts, some of which will be selected for and become new genes. We identified hybrid transcripts of the AZGP1 and GJE1 genes on human chromosome 7 (manuscript in preparation) and are intrigued that the orthologues of these genes in non-primate mammals reside in a head-to-head manner. It is likely that the common ancestor of primates underwent inversion of the AZGP1 gene to produce the hybrid transcripts, creating an opportunity for primate diversity.
In summary, we propose that the relatively frequent occurrence and accumulation of inversions in genomes may be a major contributor to the phenomenon of intra-strand parity. Whereas single base substitutions might explain Chargaff's second parity rule at the level of mononucleotides, they can explain neither the high-order intra-strand parity nor the exceptional deviation of mammalian mtDNAs. In contrast, inversion events are not limited by size and can involve millions of bases of sequence. Other mechanisms may have contributed to some extent; nevertheless, they are not necessary to account for intra-strand parity if inversions are considered.
Inversions are one process contributing to genome evolution that allow for rearrangement toward optimal position, order, and orientation of genes and regulatory elements, and for escape from deleterious effects caused, for example, by some duplications. Although we acknowledge the possibility of preferential sites, inversions occur randomly as shown in our mathematical explanation. Many of these are expected to be deleterious and would presumably be selected against, but others should be neutral or positively selected and could therefore become fixed in the genome . Quantitative estimation of inversion using genomic sequences of extant organisms is unfortunately meaningless, as it cannot account for those events lost to natural selection. Further, inversions must have contributed to the basic character of DNA sequences since the origin of life. There are now substantial data supporting the frequency of inversions within genomes of a variety of organisms, including plants, insects and primates [29–33], and these observable events are but the tip of the iceberg. Chromosomal rearrangements such as inversions reduce the rate of meiotic recombination between homologous chromosomes, with subsequent reproductive isolation . Moreover, in these regions, mutations tend to be positively selected to give rise to speciation . Ohno's seminal work  and that of others have emphasized the importance of duplications in evolution. Our suppositions further these ideas, in particular suggesting how inversions and duplications can complement each other to yield the properties of extant genomes.
Calculation of frequencies of oligonucleotides
The genomic sequences (human contigs, the TP53 gene, and the mtDNA sequence) were downloaded from NCBI (Build 36). Calculation of frequencies of oligonucleotides (including mononucleotides) was performed using Perl scripts, which are available upon request. The "plus" strand, which is stored in the database, was analyzed. We generated sequence free of repetitive elements using RepeatMasker with which 46.4% of the 28,617,429 nucleotides were masked. The coordinates of the eight 1-kb regularly-scattered in silico inversions were 1001–2000, 3001–4000, 5001–6000, 7001–8000, 9001–10000, 11001–12000, 13001–14000, and 15001–16000 in NC_001807.
For the frequency of a particular oligonucleotide A n (n > 0), via the n th inversion, (1 - r n ) An-1remains; r n An-1decreases; r n Tn-1increases if we suppose the distribution of contents is even in the whole sequence. In this way, the two recurrence formulas (1) and (2) are derived (see text). The following equations are obtained by adding equations (1) and (2).
A n + T n = An-1+ Tn-1(4)
A n + T n = A0 + T0(5)
These mean that inversions do not change the sum of the two frequencies. Using (5), other forms of (1) and (2) are derived.
A n = (1 - 2r n )An-1+ r n (A0 + T0) (6)
T n = (1 - 2r n )Tn-1+ r n (A0 + T0)(7)
When we subtract (A 0 + B 0 )/2 from (6) and define B n , (9) is derived.
Using -1 << 1 - 2r k < 1 (0 <r k << 1), .
Chargaff E: Structure and function of nucleic acids as cell constituents. Fed Proc. 1951, 10: 654-659.
Watson JD, Crick FH: Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature. 1953, 171: 737-738. 10.1038/171737a0.
Rudner R, Karkas JD, Chargaff E: Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. Proc Natl Acad Sci USA. 1968, 60: 921-922. 10.1073/pnas.60.3.921.
Fickett JW, Torney DC, Wolf DR: Base compositional structure of genomes. Genomics. 1992, 13: 1056-1064. 10.1016/0888-7543(92)90019-O.
Prabhu VV: Symmetry observations in long nucleotide sequences. Nucleic Acids Res. 1993, 21: 2797-2800. 10.1093/nar/21.12.2797.
Forsdyke DR, Mortimer JR: Chargaff's legacy. Gene. 2000, 261: 127-137. 10.1016/S0378-1119(00)00472-8.
Qi D, Cuticchia AJ: Compositional symmetries in complete genomes. Bioinformatics. 2001, 17: 557-559. 10.1093/bioinformatics/17.6.557.
Baisnée PF, Hampson S, Baldi P: Why are complementary DNA strands symmetric?. Bioinformatics. 2002, 18: 1021-1033. 10.1093/bioinformatics/18.8.1021.
Mitchell D, Bridge R: A test of Chargaff's second rule. Biochem Biophys Res Commun. 2006, 340: 90-94.
Albrecht-Buehler G: Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. Proc Natl Acad Sci USA. 2006, 103: 17828-17833. 10.1073/pnas.0605553103.
Sueoka N: Intrastrand parity rules of DNA base composition and usage biases of synonymous codons. J Mol Evol. 1995, 40: 318-325. 10.1007/BF00163236.
Lobry JR: Properties of a general model of DNA evolution under no-strand-bias conditions. J Mol Evol. 1995, 40: 326-330. 10.1007/BF00163237.
McLean MJ, Wolfe KH, Devine KM: Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes. J Mol Evol. 1998, 47: 691-696. 10.1007/PL00006428.
Bell SJ, Forsdyke DR: Deviations from Chargaff's second parity rule correlate with direction of transcription. J Theor Biol. 1999, 197: 63-76. 10.1006/jtbi.1998.0858.
Daubin V, Perriere G: G+C3 structuring along the genome: a common feature in prokaryotes. Mol Biol Evol. 2003, 20: 471-483. 10.1093/molbev/msg022.
Nikolaou C, Almirantis Y: A study on the correlation of nucleotide skews and the positioning of the origin of replication: different modes of replication in bacterial species. Nucleic Acids Res. 2005, 33: 6816-6822. 10.1093/nar/gki988.
Nikolaou C, Almirantis Y: Deviations from Chargaff's second parity rule in organellar DNA insights into the evolution of organellar genomes. Gene. 2006, 381: 34-41. 10.1016/j.gene.2006.06.010.
Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing Program, Green ED: Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 2003, 33: 514-517. 10.1038/ng1103.
Louie E, Ott J, Majewski J: Nucleotide frequency variation across human genes. Genome Res. 2003, 13: 2594-2601. 10.1101/gr.1317703.
Prescott DM, Dizick SJ: A unique pattern of intrastrand anomalies in base composition of the DNA in hypotrichs. Nucleic Acids Res. 2000, 28: 4679-4688. 10.1093/nar/28.23.4679.
Fileé J, Forterre P: Viral proteins functioning in organelles: a cryptic origin?. Trends Microbiol. 2005, 13: 510-513. 10.1016/j.tim.2005.08.012.
Clayton DA: Replication of animal mitochondrial DNA. Cell. 1982, 28: 693-705. 10.1016/0092-8674(82)90049-6.
Pääbo S, Thomas WK, Whitfield KM, Kumazawa Y, Wilson AC: Rearrangements of mitochondrial transfer RNA genes in marsupials. J Mol Evol. 1991, 33: 426-430. 10.1007/BF02103134.
Ohno S: Evolution by Gene and Genome Duplication. 1970, Springer, Berlin
Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297: 1003-1007. 10.1126/science.1072047.
Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui LC, Scherer SW: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 2003, 4: R25-10.1186/gb-2003-4-4-r25.
Dunham I, Shimizu N, Roe BA, Chissoe S, Hunt AR, Collins JE, Bruskiewich R, Beare DM, Clamp M, Smink LJ, Ainscough R, Almeida JP, Babbage A, Bagguley C, Bailey J, Barlow K, Bates KN, Beasley O, Bird CP, Blakey S, Bridgeman AM, Buck D, Burgess J, Burrill WD, O'Brien KP, et al: The DNA sequence of human chromosome 22. Nature. 1999, 402: 489-495. 10.1038/990031.
Kosak ST, Groudine M: Gene order and dynamic domains. Science. 2004, 306: 644-647. 10.1126/science.1103864.
Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW: Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet. 2005, 1: e56-10.1371/journal.pgen.0010056.
Hoffmann AA, Sgrò CM, Weeks AR: Chromosomal inversion polymorphisms and adaptation. Trends Ecol Evol. 2004, 19 (9): 482-488. 10.1016/j.tree.2004.06.013.
Blanc G, Barakat A, Guyot R, Cooke R, Delseny M: Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell. 2000, 12: 1093-1101. 10.1105/tpc.12.7.1093.
Coluzzi M, Sabatini A, della Torre A, Di Deco MA, Petrarca V: A polytene chromosome analysis of the Anopheles gambiae species complex. Science. 2002, 298: 1415-1418. 10.1126/science.1077769.
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE: Fine-scale structural variation of the human genome. Nat Genet. 2005, 37: 727-732. 10.1038/ng1562.
Rieseberg LH: Chromosomal rearrangements and speciation. Trends Ecol Evol. 2001, 16 (7): 351-358. 10.1016/S0169-5347(01)02187-5.
Navarro A, Barton NH: Chromosomal speciation and molecular divergence – accelerated evolution in rearranged chromosomes. Science. 2003, 300: 321-324. 10.1126/science.1080600.
The authors thank J. Buchanan, O. Akiyama, S. Horike, C. R. Marshall, A. Navarro, P. Pevzner, R. F. Wintle and J. Zhang for discussions and critical reading of the manuscript. We acknowledge the Centre for Computational Biology and The Centre for Applied Genomics for computational assistance. The work is supported by Genome Canada/Ontario Genomics Institute, the McLaughlin Centre for Molecular Medicine, and The Hospital for Sick Children Foundation. S.W.S. is an Investigator of the Canadian Institutes for Health Research and International Scholar of the Howard Hughes Medical Institute.
KO conceived the study, performed the computational analyses, mathematical derivation, and drafted the manuscript. JW participated in the coordination of the study and performed the computational analyses. SWS participated in the design and coordination of the study and helped draft the manuscript. All authors read and approved the final manuscript.