Evolutionary implications of inversions that have caused intra-strand parity in DNA
© Okamura et al; licensee BioMed Central Ltd. 2007
Received: 25 January 2007
Accepted: 11 June 2007
Published: 11 June 2007
Chargaff's rule of DNA base composition, stating that DNA comprises equal amounts of adenine and thymine (%A = %T) and of guanine and cytosine (%C = %G), is well known because it was fundamental to the conception of the Watson-Crick model of DNA structure. His second parity rule stating that the base proportions of double-stranded DNA are also reflected in single-stranded DNA (%A = %T, %C = %G) is more obscure, likely because its biological basis and significance are still unresolved. Within each strand, the symmetry of single nucleotide composition extends even further, being demonstrated in the balance of di-, tri-, and multi-nucleotides with their respective complementary oligonucleotides.
Here, we propose that inversions are sufficient to account for the symmetry within each single-stranded DNA. Human mitochondrial DNA does not demonstrate such intra-strand parity, and we consider how its different functional drivers may relate to our theory. This concept is supported by the recent observation that inversions occur frequently.
Along with chromosomal duplications, inversions must have been shaping the architecture of genomes since the origin of life.
Mononucleotide content in contiguous single-stranded DNA scaffolds from each human chromosome *
When there is no bias in mutation and selection between complementary strands, base substitution may explain the parity phenomenon [11, 12]. In fact, strand bias has been demonstrated with mutational skews between the two strands, which causes deviation from parity [13, 15]. Bacterial origins of replication were successfully identified by the distribution of such skews [16, 17]. The strand bias of mutations, which can be associated with direction of transcription, is also found in mammalian genomes [18, 19]. In spite of these anomalies, any violation of the second parity phenomena is generally small in magnitude [8, 20].
Not only single nucleotides but also oligonucleotides up to 30 nucleotides (nt) in length can demonstrate the parity phenomenon within strands [5, 7, 8]. In other words, the frequency of a particular oligonucleotide is approximately equal to that of its reverse complementary sequence in the same strand. Since DNA strands are complementary, the frequency of a particular oligonucleotide in one strand approximates that in the opposite strand. Hence, this double-stranded DNA characteristic can also be called "symmetry of complementary DNA strands" [5, 8]. Chargaff's second parity rule ordinarily considers only mononucleotides, which have been extensively studied. However, since a single nucleotide could be deemed a one-nt oligonucleotide, it is plausible that addressing the symmetry of oligonucleotides (high-order strand symmetry) is a more general way of assessing biological meaning. Hereafter, we designate this comprehensive symmetry as "intra-strand parity" and attempt to explain it based on the mechanism of chromosomal inversion. Single nucleotide mutations may be considered to explain mononucleotide parity within strands [11, 12] but have not been effective to explain the extended parity of oligonucleotides .
We propose that inversion events (with or without underlying duplications) might be a sufficient mechanism to explain the phenomenon. To test this, we consider a double-stranded DNA molecule without intra-strand parity but which is long enough to undergo various (stochastic) inversions (Fig. 1B). A n and T n are defined as the frequency of any particular oligonucleotide sequence and its reverse complementary sequence, respectively, in the same strand after n inversions (n > 0). A 0 (0 <A 0 < 1) is the initial frequency of any particular oligonucleotide sequence (which can also be a mononucleotide) in the upper strand. T 0 (0 < T 0 < 1) is the initial frequency of its reverse complementary sequence in the same strand. If we define r n (0 < r n << 1) as the relative length of the n th inversion (Fig. 1B), we obtain these two equations.
A n = An-1- r n (An-1- Tn-1) (1)
T n = Tn-1- r n (Tn-1- An-1)(2)
Equation (3) is a mathematical explanation of intra-strand parity based on our hypothesis that inversions are sufficient to cause any DNA segment conform to parity. In this way, the vast majority of naturally occurring DNA molecules (chromosomes) will evolve to intra-strand parity via many inversions. Those few that deviate, such as mitochondrial DNA (mtDNA) [8, 9, 17], will have special properties (see below). We presume that any DNA can be made to evolve to intra-strand parity through a process of inversions, and that deviations from parity have been rare in evolution. Inversions must have been occurring as genomes of ancestral organisms were growing in complexity with the acquisition or creation of new genes.
Dinucleotide frequencies in a human genomic contig without repetitive sequences *
The mammalian mtDNA offers a natural source of sequence sufficiently deviating from parity to allow us to further test our mathematical explanation. We produced in silico semi-random inversions in human mtDNA. As few as eight 1-kb regularly-distributed inversions (see Methods) would be sufficient to homogenize the two strands of the 16.6-kb mtDNA and create intra-strand parity (Fig. 2D). We also depict a hypothetical inversion in the mtDNA to show the potential for rapid homogenization (Fig. 1C).
Although the lack of intra-strand parity in mammalian mtDNA could be ascribed to its small length, other loci of comparable length (e.g. the TP53 gene, Fig. 2B) do adhere to parity. Unlike other mtDNAs, those of mammals have no intergenic segments and have only one regulatory region per strand. Moreover, unlike among nuclear genomes, the order and direction of genes – as well as biased gene density between the two strands – are strictly conserved among mammalian species . Therefore, it seems that the configuration is already fixed, and that inversions are not tolerated in mammalian mtDNA.
The ubiquity of inversions suggests that they had some advantage in natural selection. Duplications are thought to play an important role in creating genetic variety , however, some duplications are deleterious for organisms, due to sudden increases of gene dosage. To avoid being negatively selected, one of the duplicated copies could undergo mutation such as deletion. Inversions or interchromosomal rearrangements could render the duplicated gene nonfunctional due to its release from interaction with its promoter or other regulatory elements. This may be one reason why many inverted and interchromosomal segmental duplications are found in the human genome [25, 26]. An approximately symmetrical gene distribution between the two strands may have been brought about by these rearrangements .
In some cases, a rearranged genome might confer positive selection. Although we can find syntenic regions among vertebrates, chromosomal organizations can be quite different among species. This suggests an advantage for evolution or speciation. Recently, the importance of gene order and gene position in the three-dimensional nucleus has been suggested . It is likely that genomes continually undergo rearrangement toward optimal positions for each gene and each gene cluster. Our group showed an unexpectedly large number of inversions (from 23 bp to 62 Mb in size) between human and chimpanzee genomes , species which diverged only six million years ago. Although most may be selectively neutral, some likely were selected for, and contributed to the speciation. Many more inversions may also have occurred and may have been negatively selected. Inversions can also give rise to new transcripts, some of which will be selected for and become new genes. We identified hybrid transcripts of the AZGP1 and GJE1 genes on human chromosome 7 (manuscript in preparation) and are intrigued that the orthologues of these genes in non-primate mammals reside in a head-to-head manner. It is likely that the common ancestor of primates underwent inversion of the AZGP1 gene to produce the hybrid transcripts, creating an opportunity for primate diversity.
In summary, we propose that the relatively frequent occurrence and accumulation of inversions in genomes may be a major contributor to the phenomenon of intra-strand parity. Whereas single base substitutions might explain Chargaff's second parity rule at the level of mononucleotides, they can explain neither the high-order intra-strand parity nor the exceptional deviation of mammalian mtDNAs. In contrast, inversion events are not limited by size and can involve millions of bases of sequence. Other mechanisms may have contributed to some extent; nevertheless, they are not necessary to account for intra-strand parity if inversions are considered.
Inversions are one process contributing to genome evolution that allow for rearrangement toward optimal position, order, and orientation of genes and regulatory elements, and for escape from deleterious effects caused, for example, by some duplications. Although we acknowledge the possibility of preferential sites, inversions occur randomly as shown in our mathematical explanation. Many of these are expected to be deleterious and would presumably be selected against, but others should be neutral or positively selected and could therefore become fixed in the genome . Quantitative estimation of inversion using genomic sequences of extant organisms is unfortunately meaningless, as it cannot account for those events lost to natural selection. Further, inversions must have contributed to the basic character of DNA sequences since the origin of life. There are now substantial data supporting the frequency of inversions within genomes of a variety of organisms, including plants, insects and primates [29–33], and these observable events are but the tip of the iceberg. Chromosomal rearrangements such as inversions reduce the rate of meiotic recombination between homologous chromosomes, with subsequent reproductive isolation . Moreover, in these regions, mutations tend to be positively selected to give rise to speciation . Ohno's seminal work  and that of others have emphasized the importance of duplications in evolution. Our suppositions further these ideas, in particular suggesting how inversions and duplications can complement each other to yield the properties of extant genomes.
Calculation of frequencies of oligonucleotides
The genomic sequences (human contigs, the TP53 gene, and the mtDNA sequence) were downloaded from NCBI (Build 36). Calculation of frequencies of oligonucleotides (including mononucleotides) was performed using Perl scripts, which are available upon request. The "plus" strand, which is stored in the database, was analyzed. We generated sequence free of repetitive elements using RepeatMasker with which 46.4% of the 28,617,429 nucleotides were masked. The coordinates of the eight 1-kb regularly-scattered in silico inversions were 1001–2000, 3001–4000, 5001–6000, 7001–8000, 9001–10000, 11001–12000, 13001–14000, and 15001–16000 in NC_001807.
For the frequency of a particular oligonucleotide A n (n > 0), via the n th inversion, (1 - r n ) An-1remains; r n An-1decreases; r n Tn-1increases if we suppose the distribution of contents is even in the whole sequence. In this way, the two recurrence formulas (1) and (2) are derived (see text). The following equations are obtained by adding equations (1) and (2).
A n + T n = An-1+ Tn-1(4)
A n + T n = A0 + T0(5)
These mean that inversions do not change the sum of the two frequencies. Using (5), other forms of (1) and (2) are derived.
A n = (1 - 2r n )An-1+ r n (A0 + T0) (6)
T n = (1 - 2r n )Tn-1+ r n (A0 + T0)(7)
Using -1 << 1 - 2r k < 1 (0 <r k << 1), .
The authors thank J. Buchanan, O. Akiyama, S. Horike, C. R. Marshall, A. Navarro, P. Pevzner, R. F. Wintle and J. Zhang for discussions and critical reading of the manuscript. We acknowledge the Centre for Computational Biology and The Centre for Applied Genomics for computational assistance. The work is supported by Genome Canada/Ontario Genomics Institute, the McLaughlin Centre for Molecular Medicine, and The Hospital for Sick Children Foundation. S.W.S. is an Investigator of the Canadian Institutes for Health Research and International Scholar of the Howard Hughes Medical Institute.
- Chargaff E: Structure and function of nucleic acids as cell constituents. Fed Proc. 1951, 10: 654-659.PubMedGoogle Scholar
- Watson JD, Crick FH: Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature. 1953, 171: 737-738. 10.1038/171737a0.PubMedView ArticleGoogle Scholar
- Rudner R, Karkas JD, Chargaff E: Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. Proc Natl Acad Sci USA. 1968, 60: 921-922. 10.1073/pnas.60.3.921.PubMed CentralPubMedView ArticleGoogle Scholar
- Fickett JW, Torney DC, Wolf DR: Base compositional structure of genomes. Genomics. 1992, 13: 1056-1064. 10.1016/0888-7543(92)90019-O.PubMedView ArticleGoogle Scholar
- Prabhu VV: Symmetry observations in long nucleotide sequences. Nucleic Acids Res. 1993, 21: 2797-2800. 10.1093/nar/21.12.2797.PubMed CentralPubMedView ArticleGoogle Scholar
- Forsdyke DR, Mortimer JR: Chargaff's legacy. Gene. 2000, 261: 127-137. 10.1016/S0378-1119(00)00472-8.PubMedView ArticleGoogle Scholar
- Qi D, Cuticchia AJ: Compositional symmetries in complete genomes. Bioinformatics. 2001, 17: 557-559. 10.1093/bioinformatics/17.6.557.PubMedView ArticleGoogle Scholar
- Baisnée PF, Hampson S, Baldi P: Why are complementary DNA strands symmetric?. Bioinformatics. 2002, 18: 1021-1033. 10.1093/bioinformatics/18.8.1021.PubMedView ArticleGoogle Scholar
- Mitchell D, Bridge R: A test of Chargaff's second rule. Biochem Biophys Res Commun. 2006, 340: 90-94.PubMedView ArticleGoogle Scholar
- Albrecht-Buehler G: Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. Proc Natl Acad Sci USA. 2006, 103: 17828-17833. 10.1073/pnas.0605553103.PubMed CentralPubMedView ArticleGoogle Scholar
- Sueoka N: Intrastrand parity rules of DNA base composition and usage biases of synonymous codons. J Mol Evol. 1995, 40: 318-325. 10.1007/BF00163236.PubMedView ArticleGoogle Scholar
- Lobry JR: Properties of a general model of DNA evolution under no-strand-bias conditions. J Mol Evol. 1995, 40: 326-330. 10.1007/BF00163237.PubMedView ArticleGoogle Scholar
- McLean MJ, Wolfe KH, Devine KM: Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes. J Mol Evol. 1998, 47: 691-696. 10.1007/PL00006428.PubMedView ArticleGoogle Scholar
- Bell SJ, Forsdyke DR: Deviations from Chargaff's second parity rule correlate with direction of transcription. J Theor Biol. 1999, 197: 63-76. 10.1006/jtbi.1998.0858.PubMedView ArticleGoogle Scholar
- Daubin V, Perriere G: G+C3 structuring along the genome: a common feature in prokaryotes. Mol Biol Evol. 2003, 20: 471-483. 10.1093/molbev/msg022.PubMedView ArticleGoogle Scholar
- Nikolaou C, Almirantis Y: A study on the correlation of nucleotide skews and the positioning of the origin of replication: different modes of replication in bacterial species. Nucleic Acids Res. 2005, 33: 6816-6822. 10.1093/nar/gki988.PubMed CentralPubMedView ArticleGoogle Scholar
- Nikolaou C, Almirantis Y: Deviations from Chargaff's second parity rule in organellar DNA insights into the evolution of organellar genomes. Gene. 2006, 381: 34-41. 10.1016/j.gene.2006.06.010.PubMedView ArticleGoogle Scholar
- Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing Program, Green ED: Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 2003, 33: 514-517. 10.1038/ng1103.PubMedView ArticleGoogle Scholar
- Louie E, Ott J, Majewski J: Nucleotide frequency variation across human genes. Genome Res. 2003, 13: 2594-2601. 10.1101/gr.1317703.PubMed CentralPubMedView ArticleGoogle Scholar
- Prescott DM, Dizick SJ: A unique pattern of intrastrand anomalies in base composition of the DNA in hypotrichs. Nucleic Acids Res. 2000, 28: 4679-4688. 10.1093/nar/28.23.4679.PubMed CentralPubMedView ArticleGoogle Scholar
- Fileé J, Forterre P: Viral proteins functioning in organelles: a cryptic origin?. Trends Microbiol. 2005, 13: 510-513. 10.1016/j.tim.2005.08.012.PubMedView ArticleGoogle Scholar
- Clayton DA: Replication of animal mitochondrial DNA. Cell. 1982, 28: 693-705. 10.1016/0092-8674(82)90049-6.PubMedView ArticleGoogle Scholar
- Pääbo S, Thomas WK, Whitfield KM, Kumazawa Y, Wilson AC: Rearrangements of mitochondrial transfer RNA genes in marsupials. J Mol Evol. 1991, 33: 426-430. 10.1007/BF02103134.PubMedView ArticleGoogle Scholar
- Ohno S: Evolution by Gene and Genome Duplication. 1970, Springer, BerlinView ArticleGoogle Scholar
- Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297: 1003-1007. 10.1126/science.1072047.PubMedView ArticleGoogle Scholar
- Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui LC, Scherer SW: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 2003, 4: R25-10.1186/gb-2003-4-4-r25.PubMed CentralPubMedView ArticleGoogle Scholar
- Dunham I, Shimizu N, Roe BA, Chissoe S, Hunt AR, Collins JE, Bruskiewich R, Beare DM, Clamp M, Smink LJ, Ainscough R, Almeida JP, Babbage A, Bagguley C, Bailey J, Barlow K, Bates KN, Beasley O, Bird CP, Blakey S, Bridgeman AM, Buck D, Burgess J, Burrill WD, O'Brien KP, et al: The DNA sequence of human chromosome 22. Nature. 1999, 402: 489-495. 10.1038/990031.PubMedView ArticleGoogle Scholar
- Kosak ST, Groudine M: Gene order and dynamic domains. Science. 2004, 306: 644-647. 10.1126/science.1103864.PubMedView ArticleGoogle Scholar
- Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW: Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet. 2005, 1: e56-10.1371/journal.pgen.0010056.PubMed CentralPubMedView ArticleGoogle Scholar
- Hoffmann AA, Sgrò CM, Weeks AR: Chromosomal inversion polymorphisms and adaptation. Trends Ecol Evol. 2004, 19 (9): 482-488. 10.1016/j.tree.2004.06.013.PubMedView ArticleGoogle Scholar
- Blanc G, Barakat A, Guyot R, Cooke R, Delseny M: Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell. 2000, 12: 1093-1101. 10.1105/tpc.12.7.1093.PubMed CentralPubMedView ArticleGoogle Scholar
- Coluzzi M, Sabatini A, della Torre A, Di Deco MA, Petrarca V: A polytene chromosome analysis of the Anopheles gambiae species complex. Science. 2002, 298: 1415-1418. 10.1126/science.1077769.PubMedView ArticleGoogle Scholar
- Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE: Fine-scale structural variation of the human genome. Nat Genet. 2005, 37: 727-732. 10.1038/ng1562.PubMedView ArticleGoogle Scholar
- Rieseberg LH: Chromosomal rearrangements and speciation. Trends Ecol Evol. 2001, 16 (7): 351-358. 10.1016/S0169-5347(01)02187-5.PubMedView ArticleGoogle Scholar
- Navarro A, Barton NH: Chromosomal speciation and molecular divergence – accelerated evolution in rearranged chromosomes. Science. 2003, 300: 321-324. 10.1126/science.1080600.PubMedView ArticleGoogle Scholar