 Research article
 Open Access
Inversion symmetry of DNA kmer counts: validity and deviations
 Sagi Shporer^{1},
 Benny Chor^{1},
 Saharon Rosset^{2} and
 David Horn^{3}Email authorView ORCID ID profile
 Received: 13 April 2016
 Accepted: 11 August 2016
 Published: 31 August 2016
Abstract
Background
The generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse (reversecomplement) kmer. This Inversion Symmetry (IS) holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. The existence of IS has been demonstrated in the literature, and other pairwise candidate symmetries (e.g. reverse or complement) have been ruled out.
Results
Studying IS in the human genome, we find that IS holds up to k = 10. It holds for complete chromosomes, also after applying the low complexity mask. We introduce a numerical IS criterion, and define the klimit, KL, as the highest k for which this criterion is valid. We demonstrate that chromosomes of different species, as well as different human chromosomal sections, follow a universal logarithmic dependence of KL ~ 0.7 ln(L), where L is the length of the chromosome.
We introduce a statistical ISPoisson model that allows us to apply confidence measures to our numerical findings. We find good agreement for large k, where the variance of the Poisson distribution determines the outcome of the analysis. This model predicts the observed logarithmic increase of KL with length. The model allows us to conclude that for low k, e.g. k = 1 where IS becomes the 2^{nd} Chargaff rule, IS violation, although extremely small, is significant. Studying this violation we come up with an unexpected observation for human chromosomes, finding a meaningful correlation with the excess of genes on particular strands.
Conclusions
Our ISPoisson model agrees well with genomic data, and accounts for the universal behavior of klimits. For low k we point out minute, yet significant, deviations from the model, including excess of counts of nucleotides T vs A and G vs C on positive strands of human chromosomes. Interestingly, this correlates with a significant (but small) excess of genes on the same positive strands.
Keywords
 Generalized Chargaff rules
 Chromosome kmer distributions
 Inversion symmetry
Background
Erwin Chargaff has made, in 1950, the important observation that the numbers of nucleotides in DNA satisfy #A = #T and #G = #C [1, 2]. This statement, made on the basis of experimental observations with fairly large errors, played a crucial role in realizing that DNA has an underlying basepair grouping, as subsequently proposed by Crick and Watson [3] in their doublehelix structure.
The second Chargaff rule [4] states that the same sets of identities of nucleotide pairs hold for each long enough single DNA strand. This rule has been tested [5] for genome assemblies of many species, and found to be globally valid for eukaryotic chromosomes, as well as for bacterial and archaeal chromosomes. It fails for mitochondria, plasmids, singlestranded DNA viruses and RNA viruses.
The validity of the second Chargaff rule was unexpected. Obviously it should be regarded as a global rule, i.e. applicable to large sections of chromosomes. Nonetheless, not being derived from a compelling principle, such as the one underlying the first rule, it remains a mystery. This is even more so, when one studies extended versions of Chargaff’s second rule. Indeed, AlbrechtBuehler [6] observed that for triplet oligonucleotides, or 3mers, it remains true that their chromosomewide frequencies are almost equal to those of their reversecomplement 3mers. Prabhu [7] has shown that this symmetry holds up to 5mers in various species. This has been reviewed by Baldi and Brunak [8] who have argued that such symmetry rules have to be incorporated in Markov models of genomic sequences.
We refer to the symmetry between counts of kmers and their reverse complements as
Inversion Symmetry (IS) : the counts of a kmer of nucleotides on a chromosomal strand are almost equal to those of its inverse (reversecomplement) string.
Note that this implies that the number of times a string of nucleotides of length k is observed on a strand, when read from 5’ to 3’, is almost equal to the number of times it is observed on the other strand when the latter is read from its 5’ end to 3’ end.
Recent analyses of inversion symmetry include the following: Qi and Cuttichia [9] who have shown that inversion symmetry exists while reverse symmetry fails, i.e. kmers and their reverses do not appear with equal rates; Baisnee, Hampson and Baldi [10], who introduced a measure S1 to analyze inversion symmetry in a systematic fashion; Kong et al. [11], who established the validity of IS on 786 chromosomes of many species and showed that reverse or complement symmetry do not hold, and argued that IS may be due to segmental or wholegenome inverse duplications; Wang et al. [12] who argued that values of k for which kmer IS is valid increase with organismal complexity; and Afreixo et al. [13] who applied various criteria to demonstrate the statistical significance of IS up to k = 10. Studies of symmetries related to IS appear in [14, 15].
We introduce an IS measure which is different from S1 of [10], albeit the numerical results of both measures are correlated (see section 4 in Methods). Our measure is based on the ratio between differences of counts of inverse kmer pairs and their sum. We propose the criterion that if the average of this normalized measure (over all strings of length k) is less than about 0.1, IS will be regarded as a valid approximate symmetry. The average is taken over all M_{k} ≤ 4^{k} strings of length k which exist at least once on the chromosome. The value of k, for which the IS measure is closest to 0.1, is defined as the klimit (KL). This turns out to be KL = 10 on long human chromosomes (see Additional file 1), and KL = 7 or KL = 8 for bacteria.
Using this measure, one can readily demonstrate the existence of inversion symmetry, and the absence of analog symmetries between reverse pairs or complement pairs, as well as compare between different species. We will show that the klimit of inversion symmetry, KL, is logarithmically dependent on the length L of the chromosome, or of a chromosomal section on which it is measured. Moreover, this dependence is universal, i.e. it is valid for most species.
To analyze all these observations on a rigorous statistical basis, we introduce a Poisson model for the random occurrence of counts, regarding N(S) as a stochastic variable for any string S of length k. We define X(S,S^{*}) = N(S)N(S^{*})/(N(S) + N(S^{*})), which is a stochastic variable having positive values 0 ≤ X ≤ 1. In general S^{*} is some permutation of S over the set of all strings of length k. When S^{*} ≡ S^{inv}, i.e. where S^{*} is the inverse of S, IS implies that X < <1. For strict IS, X = 0. In practice we may observe small deviations when checking for its realization on a chromosome. The important question we address is whether these deviations mean that the IS rule is not valid, or that the data are consistent with IS yet the observed values of X reflect statistical fluctuations.
To answer this question we introduce the stochastic variable Z(S,S^{*}) = (N(S)N(S^{*}))/(N(S) + N(S^{*}))^{½}. The symmetry assumption means that N(S) = N(S*), i.e. these two stochastic variables have the same distribution. If we further assume that N(S) ~ Poisson then Z should be approximately distributed as a standard normal (see Methods).

For small k, E_{k}[X] is extremely small, yet E_{k}[Z] = E_{k}[X/σ_{X}] > 2, where σ_{X} is the (theoretically estimated) standard deviation of X(S,S^{inv}). Therefore we conclude that there exists a systematic small breaking of IS, observed for k < 4 on human chromosomes.

For large k (k > 5 on human chromosomes) E_{k}[Z] = E_{k}[X/σ_{X}] < 1 hence, due to the large variance, one may state that the observed X values are consistent with IS. Moreover, the data are consistent with Z ~ Standard Normal for large k.

The empirical values of X(S,S^{inv}) for large k are of the order of magnitude of (N(S) + N(S^{inv}))^{½}.

The logarithmic variation of the klimit, KL, as function of chromosomal length L, is correctly predicted by our ISPoisson model.
We use the italicized notation N, X, Z, for the stochastic variables of our model, and employ N, X, Z for their empirical counts on chromosomes. KL is defined as the value of k for which E_{k}[X] is closest to 0.1.
Results
Inversion symmetry (Generalized 2^{nd} Chargaff Rule)
Let S and S^{*} be two strings of nucleotides of same length k, i.e. two kmers. Suppose they appear N(S) and N(S^{*}) times respectively on a particular chromosome. We denote by X(S,S^{*}) the normalized difference X(S,S^{*}) = N(S)N(S^{*})/(N(S) + N(S^{*})) where S is one of the M_{k} different kmers over the 4 nucleotides, which are being counted on the chromosome at least once, i.e. N(S) > 0 and/or N(S^{*}) > 0. If both N(S) = N(S^{*}) = 0, X(S,S^{*}) is defined to be 0. In general 0 ≤ X(S,S^{*}) ≤ 1.
We use E_{k}[X] = ∑_{S} X(S,S^{*})/M_{k}, where M_{k} is the number of different kmers encountered empirically, as a measure to demonstrate and quantify the studied symmetry. For low and moderate k, we find that M_{k} = 4^{k}, but for large kvalues, such as k > 10 in the human genome, many of the kmers may not be realized empirically, leading to lower M_{k}. In the following we will look at values of X(S,S^{*}) over various possible choices of string pairs S and S^{*}, and demonstrate that for inverse pairs they are distributed differently than for other types of kmer pairs.
Repetitive structures are wellknown to constitute major fractions of eukaryotic chromosomes, hence one may wonder to what extent they are responsible for the observed inversion symmetry. To resolve this issue, we employed the same operations on the masked output of the UCSC genome browser, after filtering chromosomes for interspersed repeats and low complexity sequences. The results keep displaying the same behavior, with negligible differences for high values of k. Even chrY, which is well known for containing numerous repeats, with only 36 % of it surviving the masking filter, keeps showing the same qualitative behavior as in Fig. 1. In Additional file 1 we provide a list of the highest kvalues for which E_{k}[X] ≈ 0.1, both before and after masking (which removes repetitive and lowcomplexity stretches of the chromosome). We define the klimit (KL) of IS, as the value of k for which E_{k}[X] is closest to 0.1. The observed reduction in KL from 10 to 9 for the largest chromosomes, is due to the fact that masking shortens the effective chromosome length. The dependence of KL on length is an issue to which we will return below.
We have performed the same analysis on the older genome assembly HG18, leading to very similar results (see Additional file 2). We find similar IS results for mouse, frog, fly, worm, and yeast. Moreover, we find that inversion symmetry holds also for bacteria, but it is valid for a lower range of kmers, only up to KL = 6 or 7.
Outstanding features of inverse kmer pairs
 a.
Inverse pairs (e.g. CGA vs TCG)
 b.
Random pairs
 c.
Reverse pairs (e.g. CGA vs AGC)
 d.
Complement pairs (e.g. CGA vs GCT)
We have evaluated histograms of X(S,S^{*}) = N(S)N(S^{*})/(N(S) + N(S^{*})) for all pairings, and computed their averages E_{k}[X] = 4^{k}∑_{S} X(S,S^{*}) for different k. Calculations were performed both for human chromosomes as well as for many other species.
comparisons of averages E_{k}[X] of μ_{ka} = inverse pairs, μ_{kb} = random pairs, and μ_{kc} = reverse pairs, for chr1 of HG38
k  μ_{ka}  μ_{kb}  μ_{kc} 

1  0.0009  0.083  0 
2  0.0008  0.20  0.15 
3  0.0031  0.26  0.21 
4  0.0055  0.33  0.27 
5  0.0090  0.40  0.32 
6  0.013  0.44  0.36 
7  0.017  0.49  0.40 
8  0.025  0.52  0.43 
9  0.043  0.55  0.46 
10  0.085  0.57  0.49 
11  0.18  0.60  0.53 
12  0.32  0.67  0.60 
We can further use these distributions to establish that a symmetry relation holds only for inverse pairs, leading to very low E_{k}[X] values, and not for any other pairing. Table 1 lists the values of μ_{k} = E_{k}[X] for the three cases a, b and c, making it quite evident that IS holds and other symmetries do not. We will not dwell on it further, since Kong et al. [11] have already established the validity of IS (albeit using different measures) on 786 chromosomes of various species, and showed that reverse or complement symmetries do not hold.
Statistical analysis of inversion symmetry
In the Methods section we point out that, for large enough counts, if the counts N(S) and N(S*) are drawn from the same distribution, then the variable Y = (N(S)  N(S*))/(N(S) + N(S*)) should have an approximately Gaussian distribution with mean 0 and standard deviation σ_{G.} Moreover, the distribution of X = Y will have an expectation value E_{k}[X] = 0.8 σ_{G} and standard deviation σ_{X} =0.6 σ_{G}. If the counts N(S) are drawn from a Poisson model, we expect for each pair to find σ_{G} = (N(S) + N(S*))^{½}. Hence Z = (N(S)  N(S*))/(N(S) + N(S*))^{½} should follow a standard normal distribution, i.e. a Gaussian with mean = 0 and variance = 1. Hence the ISPoisson model predicts E_{k}(Z) = 0.8 and σ_{k}(Z) = 0.6, when rounded up to first decimal point.
Results of the evaluation of averages and variances over kmers of X and Z distributions on human chr 1. Large kvalues approach the results E_{k}(Z) = 0.8 and σ_{k}(Z) = 0.6 expected from standard normal Z distributions
k  E_{k}[X]  E_{k}[Z] = E_{k}[X/σ_{X}]  σ_{k}[Z] 

1  .0004  4.56  3.7 
2  .0006  3.26  2.4 
3  .00075  1.98  1.58 
4  .00125  1.34  1.12 
5  .002  1.07  .86 
6  .004  .93  .75 
7  .0085  0.89  .72 
8  .018  0.866  .72 
9  .038  0.843  .69 
10  0.083  0.825  .67 
For low kvalues, where E_{k}[Z] = E_{k}[X/σ_{X}] ≥ 2, one may say that a mathematical hypothesis of strict IS is invalid, since the peak of the Zdistribution lies outside the allowed confidence interval. On the other hand, clearly for all k < 4, E_{k}[X] < <0.01. Although the violation of IS is very small numerically, it is still statistically significant.
For large k > 5 we see that the data tend toward the prediction of our ISPoisson model, approaching the limit of E_{k}(Z) ± σ_{k}(Z) = 0.8 ± 0.6. This means that, due to the large variance, arising from relatively small values of N(S) + N(S^{inv}), the mathematical IS hypothesis cannot be refuted. It also means that variance plays a dominant role leading to the observed values of X(S,S^{inv}) which are of the order of magnitude of (N(S) + N(S^{inv}))^{½}. This implies that we should be able to deduce the behavior of the klimit, which is indeed the case as will be shown below.
Considering the peak at Z = 0 displayed in Fig. 3b, it is important to note that there exists a subset of kmers which obey S = S^{rev}, i.e. they are palindroms. They will contribute to the peak at Z = 0, with small variations of palindroms contributing to the region around this peak. Nonetheless, their numbers are small compared to all 8mers: only 3224 out of 65536 8mers lie within Z < 1.65, which is where 90 % of a standard normal distribution are expected to reside. Hence the variance of the reversepair Zditribution is very large.
Inversion symmetry for chromosomal sections
We next test to what degree IS is valid within various sections of human chromosomes. In Additional file 3, we display a characteristic distribution of inverse pairs drawn from a section of length 10Mbp, and in Additional file 4 we show an analogous distribution for length of 1Mbp. The IS quality, as determined by our convention, deteriorates leading to lower klimits as the length of the section decreases, but it remains valid. The distributions in Additional file 4 are evidently noisier than their analogs in Additional file 3; however they are much narrower than those of the reverse and random pairs (not shown here).
To study systematically different sections of chromosomes, we evaluate the E_{k}[X] values of inverse, random and reverse pairs, on nonoverlapping windows of given lengths L. In general, inversepairs lead to smaller E_{k}[X] than the other pairing choices. To determine the klimit we impose the condition E_{k}[X] <0.1 on the average over all chromosomal sections. The example displayed in Additional file 5 is of chr1, which is being tested with windows of length L = 5Kbp for inversepairs of k = 2. Although the average value is 0.07, obeying our criterion for IS validity, it is quite obvious that on many 5 K windows the values are higher. The value KL = 2 is chosen as the klimit of IS validity in this case. Reducing the section length further down to L = 1Kbp, in Additional file 6, we find that IS fails even at order k = 1, i.e. the second Chargaff rule does not hold for such short sectors.
klimits for human data as well as other eukaryotes and prokaryotes
Species  Length  KL 

HG38 chr1  230 M  10 
HG18 chr1  225 M  10 
Chimpanzee chr1  217 M  10 
Mouse chr1  192 M  10 
HG18 chrX  151 M  9 
Zebrafish chr7  77 M  9 
D. melanogaster chr3R  28 M  9 
C. elegans chrV  21 M  9 
HG18 chrY  26 M  8 
Human section 10 M  10 M  8 
E. coli K12  4.6 M  8 
B. subtilis  4.2 M  8 
Human section 5 M  5 M  7 
M. avium paratubercolosis  4.8 M  7 
P. furyosus  1.91 M  7 
T. maritima  1.86 M  7 
S. cerevisiae chr IV  1.53 M  7 
Human section 1 M  1 M  6 
Human section 100 K  100 K  5 
Human section 50 K  50 K  4 
Human section 10 K  10 K  3 
Human section 5 K  5 K  2 
The expression in {} is the expectation value of (f(S))^{½} within the Zdistribution. Let us denote it by 0.8c_{k} since E_{k}(Z) = 0.8. It follows that E_{k}[X] < 0.1 means that 0.8 c_{k} (4^{k}/2L)^{½} < 0.1. If c_{k} is a slowly varying function of k then k < lnL/ln4 + const = 0.72lnL + const.
This result is evidently borne out by the experimental fit k < 0.73 ln(L) + const. in Fig. 4. Furthermore, studying human chr1 we find that the experimental averages of (f(S))^{½} (without weighting by Z) obtain the values 1.24, 1.45, 1.47 for k = 4, 6, 8 respectively. This verifies that c_{k} is indeed a slowly varying function.
An early observation of inversion symmetry measures increasing logarithmically with sequence size has been made by [10] for various DNA and RNA sequences (see Fig. 1 in [10]).
Modeling inversion symmetry
Evaluation of E[Z], E[X], fraction of unrealized inverse pairs, and chromosomal length
k  HG38  HG38M  HG18  HG18M  Mouse  MouseM  C eleg  Cerevisiae  Ecoli  

E[Z]  1  4.154  3.943  4.560  5.406  6.928  11.001  2.814  2.057  1.273 
2  2.581  2.316  3.260  3.417  3.695  5.652  1.548  1.682  1.479  
3  1.707  1.769  1.983  2.152  2.780  3.904  1.589  1.434  1.318  
4  1.446  1.392  1.339  1.492  1.809  2.342  1.397  1.000  1.012  
5  1.202  1.186  1.069  1.133  1.262  1.490  1.216  0.867  0.921  
6  1.057  1.001  0.930  0.943  0.990  1.070  1.075  0.791  0.852  
7  0.984  0.935  0.894  0.884  0.892  0.902  0.980  0.780  0.837  
8  0.929  0.883  0.867  0.845  0.843  0.839  0.893  0.787  0.815  
9  0.881  0.855  0.843  0.828  0.823  0.819  0.851  0.841  0.811  
10  0.844  0.831  0.825  0.816  0.815  0.813  0.824  0.902  0.815  
11  0.825  0.821  0.816  0.814  0.813  0.814  0.835  0.940  0.856  
12  0.824  0.829  0.821  0.826  0.822  0.828  0.881  0.956  0.916  
E[X]  1  0.00038  0.00050  0.00041  0.00067  0.00068  0.00152  0.00099  0.00672  0.00083 
2  0.00046  0.00058  0.00058  0.00083  0.00070  0.00150  0.00111  0.01021  0.00196  
3  0.00067  0.00095  0.00077  0.00106  0.00121  0.00218  0.00260  0.01752  0.00350  
4  0.00134  0.00179  0.00115  0.00170  0.00165  0.00283  0.00474  0.02527  0.00554  
5  0.00247  0.00329  0.00206  0.00284  0.00260  0.00397  0.00839  0.04547  0.01067  
6  0.00470  0.00593  0.00402  0.00537  0.00461  0.00636  0.01535  0.08576  0.02075  
7  0.00942  0.01205  0.00852  0.01123  0.00941  0.01222  0.02905  0.18223  0.04362  
8  0.01918  0.02472  0.01809  0.02355  0.01954  0.02505  0.05593  0.38663  0.08975  
9  0.03951  0.05169  0.03850  0.04998  0.04226  0.05334  0.11437  0.64905  0.18551  
10  0.08380  0.10979  0.08334  0.10736  0.09343  0.11601  0.24551  0.82906  0.36850  
11  0.17518  0.22274  0.17538  0.21909  0.19196  0.23044  0.47655  0.91443  0.61571  
12  0.31969  0.38249  0.32051  0.37838  0.33829  0.38843  0.68957  0.94564  0.81471  
Fraction of null pairs  7  0.00110  
8  0.04863  0.00079  
9  0.00001  0.00001  0.00002  0.48397  0.00954  
10  0.00042  0.00130  0.00042  0.00127  0.00101  0.00217  0.00552  2.39166  0.06538  
11  0.01460  0.02590  0.01471  0.02515  0.02289  0.03312  0.14178  9.83436  0.30279  
12  0.09259  0.14336  0.09292  0.13934  0.11537  0.15551  0.85693  39.18  0.66052  
length  2.3E + 08  1.1E + 08  2.2E + 08  1.2E + 08  1.9E + 08  1.1E + 08  1.5E + 07  230218  4639664 
The simplest random model is that of a uniform distribution, which is generated on the basis of the second Chargaff rule (i.e. #A = #T and different from #C = #G). Such a distribution will trivially account for low E_{k} [X] values for inverse pairs at large values of k, limited by the length of the model chromosome. However it will also give rise to very low values for reverse pairs at a similar range of k, because any comparison of kmers with one of their permutations will lead to similar E_{k}[X]. In other words, this random independent (but not IID) model satisfies additional symmetries that are not observed in genomic data. Therefore it is not a realistic model of inversion symmetry.
A plausible explanation of the observed IS can be based on the fact that genomes evolve through rearrangement processes. By comparing synteny blocks in human and mouse, Pevzner and Tesler [17] have argued that rearrangements occur on many scales in the genome, and intrachromosomal rearrangements are more frequent than interchromosomal ones. Rearrangements may be viewed as inversions of sections between two breakpoints on the chromosome, and they may even follow one another in a nested fashion. Their study [17] demonstrated that human and mouse chr X share 281 synteny blocks of size >1 Mb, and at least 245 rearrangements occurred since the divergence of the two species.
Building on this intuition, derived from comparative genomics, it seems reasonable to assume that a series of such rearrangements on different scales may lead to IS. This mechanism has already been suggested by [6], and has been studied by [18] and by [11]. We have tested it on a simple model, starting from the human mitochondrial chromosome, which does not satisfy the second Chargaff rule. Since the mitochondrial chromosome is only 16Kbp long, we first construct out of it an enlarged model chromosome with length L = 100Mbp, by concatenating random selections of subsequences of chr M. We then apply to it rearrangements at various scales. We found that 5000 rearrangements at scales of 100 K have led to good IS effects, but best results were obtained for 50,000 rearrangements, whose breakpoints were randomly chosen, and their section lengths befit a uniform random distribution of length < 10 K. These results exhibit a high degree of IS, as displayed in Additional file 7.
Inversion symmetry: validation and deviations
Figure 4 provides an experimental validation of our ISPoisson model, in so far that it predicts correctly the behavior of KL as function of ln(L).
To look further into it, we present in Table 4 a comparative analysis of chr1 of different species, in both its unmasked and masked formats, as well as the analysis of the E coli genome. Shown are E[Z] values, E[X] values, the fraction of unrealized pairs (where both N(S) = N(S^{inv}) = 0), and the relevant length of the studied chromosomes. Highlighted are all E[X] values which are closest to 0.1 (defining the relevant KL) for the different chromosomes. By comparing with the upper part of the table one realizes that for k = KL, E[Z] is indeed close to 0.8, hence the success of the KL formula. By comparing with the 3^{rd} part of the table, we see that for these values of k, only for a very small fraction of all possible kmers, both S or S^{inv} are not realized on the studied chromosomes.
Our criterion for approximate IS, E_{k}[X] ≤ 0.1, was introduced as an intuitive but somewhat arbitrary decision. From Table 4 we learn that this is where the Z distribution approximates very well the data. For larger k we observe that some of the kmers do not appear, and their fraction increases rapidly with k. Hence our criterion selects also the range where almost all kmer strings are being realized. This serves as a posterior justification of our IS criterion.
Violations of the 2^{nd} Chargaff rule on HG38. Columns contain the values of #T/#A, #G/#C on different chromosomes, as well as their Y and Z values. The latter reflect the significance of the inequality
T/A  G/C  Y(T,A)  Y(G,C)  Z(T,A)  Z(G,C)  

chr1  1.002593  1.001175  0.001295  0.000587  15  5.76 
chr2  1.00274  1.002747  0.001368  0.001372  16.41  13.49 
chr3  1.002416  1.002824  0.001207  0.00141  13.19  12.5 
chr4  1.001062  1.002595  0.000531  0.001296  5.75  11.04 
chr5  1.004679  1.004144  0.002334  0.002068  24.44  17.5 
chr6  1.000537  1.001981  0.000268  0.000989  2.72  8.12 
chr7  1.003332  1.001884  0.001663  0.000941  16.15  7.57 
chr8  0.999241  1.002536  −0.00038  0.001266  −3.53  9.65 
chr9  1.001327  1.002823  0.000663  0.001409  5.61  9.99 
chr10  1.0039  1.002911  0.001946  0.001454  17.18  10.82 
chr11  1.001915  1.002815  0.000956  0.001405  8.48  10.51 
chr12  1.003102  1.003317  0.001548  0.001656  13.75  12.2 
chr13  1.003831  1.005012  0.001912  0.002499  14.83  15.36 
chr14  1.008943  1.007342  0.004451  0.003658  32.58  22.24 
chr15  1.001842  1.00411  0.00092  0.002051  6.44  12.23 
chr16  1.009601  1.007001  0.004778  0.003488  32.17  21.07 
chr17  1.002905  1.006812  0.00145  0.003395  9.77  20.81 
chr18  1.005494  1.016917  0.00274  0.008388  19.03  47.34 
chr19  1.009276  1.007636  0.004617  0.003803  25.46  20.13 
chr20  1.011147  1.012815  0.005542  0.006367  33.22  33.7 
chr21  1.003017  1.005026  0.001506  0.002507  7.33  10.15 
chr22  0.998893  1.009337  −0.00055  0.004647  −2.52  19.94 
chrX  1.003463  1.005699  0.001728  0.002842  16.73  22.23 
chrY  1.008873  1.000209  0.004417  0.000105  17.58  0.34 
It is wellknown that there exist local violations of the 2^{nd}Chargaff rule; in particular, there exists an excess of #G over #C and #T over #A on the coding strand within most genes. Green et al. [19] have argued that mutational asymmetry has acted over long periods of time to produce such a compositional asymmetry, and discontinuities of such asymmetries are associated with loci of replication origin. These questions have also been studied by Huvet et al. [20]. Could it be that the asymmetry that we have encountered is somehow connected to these findings? Since the gene coding strand may be either the plus (P) or the minus (M) strand of the conventional genomic notation, this may seem to be unrelated, assuming there is equal probability for genes to occur on each strand.
Gene occurrences on the plus (#P) and minus (#M) strands of HG38 display abundance of the former
chr  P  M  Y(P,M)  Z(P,M)  p values  Z(T,A)  Z(G,C)  corr 

1  4488  4291  0.022  2.103  0.018  15.00  5.76  v 
2  4106  3367  0.099  8.549  0  16.41  13.49  v 
3  2938  2516  0.077  5.714  5.65E09  13.19  12.50  v 
4  2542  1792  0.173  11.392  0  5.75  11.04  v 
5  2777  2186  0.119  8.389  0  24.44  17.50  v 
6  4840  3563  0.152  13.931  0  2.72  8.12  v 
7  3024  2402  0.115  8.444  0  16.15  7.57  v 
8  2135  2032  0.025  1.596  0.055  −3.53  9.65  
9  3032  2180  0.163  11.802  0  5.61  9.99  v 
10  2532  2156  0.080  5.492  2.01E08  17.18  10.82  v 
11  2879  4047  −0.169  −14.035  0  8.48  10.51  x 
12  3003  2771  0.040  3.053  0.0011  13.75  12.20  x 
13  1261  1227  0.014  0.682  0.25  14.83  15.36  
14  2092  1906  0.047  2.942  0.0016  32.58  22.24  v 
15  4226  3547  0.087  7.702  6.77E15  6.44  12.23  v 
16  2529  1875  0.149  9.855  0  32.17  21.07  v 
17  3582  2902  0.105  8.445  0  9.77  20.81  v 
18  1182  1490  −0.115  −5.958  1.26E09  19.03  47.34  x 
19  3287  3036  0.040  3.157  0.00079  25.46  20.13  v 
20  1258  1193  0.027  1.313  0.09500  33.22  33.70  
21  670  779  −0.075  −2.863  0.00212  7.33  10.15  x 
22  1429  1793  −0.113  −6.413  7.28E11  −2.52  19.94  ? 
X  1927  1572  0.101  6.001  9.87E10  16.73  22.23  v 
Y  491  184  0.455  11.816  0.00E + 00  17.58  0.34  
P < M  p > 0.05  T < A  p > 0.05 
There are two different issues which are noteworthy in Table 6. One is the correlation of the preference of #T > #A and #G > #C with the positive labeling of the strand. The other is the correlation of #T > #A and #G > #C with the preference for gene counts. Whereas the first may be coincidental (although it could be related to the labelling convention whose sources we were unable to trace), we believe that the second can be meaningful.
Next we looked for the violation of the 2^{nd} Chargaff rule on mouse and yeast, with the purpose of characterizing the asymmetries and looking for correlations with gene occurrences. The gene counts were obtained from MGI (MRK_list2 in ftp://ftp.informatics.jax.org/pub/reports/index.html) for mouse, and from SGD snapshot (http://www.yeastgenome.org/genomesnapshot) for yeast. While asymmetries of nucleotide occurrences are evident and significant in both, gene data are quite smaller than in human and no conclusive correlations can be deduced. The analyses are listed in Additional files 10 and 11. Finally we test the 2^{nd} Chargaff rule on C elegans and E coli in Additional file 12. While the former shows some significant inconsistencies, the latter is completely consistent. This behavior correlates well with the trends already noted in the first raw of Table 4, indicating large values of E[Z] for k = 1.
Discussion
Inversion symmetry may be stated as the equality N(S) = N(S^{inv}) where N are counts and S is some arbitrary string existing on a chromosome. Conventionally one studies such equalities over the space of all S which are kmers of some given length k. In addition to this equality one requires that, if S^{inv} is replaced by other permutations over the space of all kmers, analog rules will not hold.
After reinvestigating these questions on various genomic data, with special attention devoted to human data, we turned to a rigorous statistical study. For this purpose we defined the normalized differences Y = (N(S)N(S^{inv}))/(N(S) + N(S^{inv})) and X = Y. If the equality N(S) = N(S^{inv}) holds for stochastic variables N, we expect the variable Y to have approximately Gaussian behavior. If, moreover, N is a Poisson distribution, then Z = (N(S)N(S^{inv}))/(N(S) + N(S^{inv}))^{½} should have approximately a standard normal distribution. The stochastic variable Z is the appropriate one to be used for a ztest, characterizing the significance of IS values displayed by Y or X, under the IPPoisson model.
In order to characterize approximate IS we have employed E_{k}[X] ≤ 0.1 as a convenient measure. We saw in Table 4 that it captures the region for which significant results are obtained, and almost all kmers appear on the chromosome. Defining the klimit KL as the kvalue for which E_{k}[X] turns out to be closest to 0.1, we uncover a logarithmic increase of KL with chromosomal length. It turns out that this behavior is accounted for by our ISPoisson model.
Our original definition of IS regarded it as an approximate symmetry. As such it was seen to be valid for all ranges of k up to KL. With the advent of the IPPoisson model, we may investigate to what extent it can serve as an exact symmetry. It turns out that, for very low k, E_{k}[X] though extremely low, is significantly different from 0. In other words, the confidence intervals derived from ISPoisson, exclude a peak at Z = 0. This has lead us to investigate the violation of the 2^{nd} Chargaff rule, i.e. deviations from the relations N(T) = N(A) and N(G) = N(C). We find that deviations are very significant in human and in mouse, and quite significant on chromosomes of other eukaryotes. Moreover, in human we observe that, for most chromosomes, N(T) > N(A) and N(G) > N(C), i.e. these excesses are observed to occur on chromosomal plus (P) strands. Investigating the occurrences of genes on both strands, we find a similar excess with significant slight preference for the P strand. These results, for nucleotide excess and gene excess, are displayed in Table 6, and are seen to hold for a large majority of chromosomes.^{1} Still, there exist also some counterexamples. Could it be that the known asymmetries of complement nucleotides on gene coding strands are related to the observed correlation of the two effects in the human genome? This remains an interesting question for future studies.
Conclusions
Inversion symmetry is valid for almost all chromosomes, even after filtering out their lowcomplexity regions. We have defined an empirical criterion of IS, and a corresponding klimit (KL), which is the highest k for which all kmer distributions abide by the symmetry. Analyzing the IS behavior using rigorous statistical methods, and comparing empirical results with our ISPoisson model, we account for the universal increase of KL with respect to the chromosomal length.
For low k we find minute, yet significant, deviations from strict IS. This includes excess of counts of nucleotides T vs A and G vs C on positive strands of human chromosomes. We point out that this finding correlates with a significant (but small) excess of genes on the same positive strands.
Methods
All three variables can be evaluated for each specific kmer, hence they carry implicit indices of the space of 4^{k} kmers. The empirical values of the different variables will be denoted by N, X, Y and Z. Measures similar to Y appear in the literature for k = 1 and k = 2 and are known as skews.
 1.
The semiGaussian distribution of inversepair differences.
Let us consider a pair of kmers, with counts N(S) and N(S*) respectively, for which we evaluate the ratios Y = (N(S)  N(S*))/(N(S) + N(S*)) and X = Y. Moreover, we assume that these counts are due to two random variables drawn from the same distribution, thus having the same average, E[N(S)N(S*)] = 0, and follow a Gaussian distribution G = exp(−Y^{2}/2 σ_{G} ^{2})/ σ_{G}(2π)^{½}. The counts of the distribution of X = Y will then follow a semiGaussian distribution P = 2 exp(−X ^{2}/2 σ_{G} ^{2})/ σ_{G}(2π)^{½}, defined for positive X only. The mean and variance of this semiGaussian are E[X] = σ_{G} (2/π) ^{½} ≈ 0.798σ_{G} and V[X] = σ_{G} ^{2} (12/π). Hence σ_{X} = 0.603σ_{G} = 0.755E_{X}.
Empirical verification of inverse pair distributions can be carried out by choosing counts for all nonoverlapping windows of some size L (e.g. L = 100 K on human chr 1). Testing the X and Y distributions for inverse pairs of k = 8 we find the above description to be valid.
 2.
Poisson distributions of counts.
Let us now assume that the counts N(S) observed on a chromosome, are realizations of stochastic variables which follow Poisson distributions, each with its own mean = variance. In the IS limit the distribution of the inverse N(S^{inv}) coincides with that of N(S). Their difference should have a mean of 0, and variance which is the sum of the variances. Thus, for each inverse pair of kmers, we expect Y = (N(S)  N(S^{inv}))/(N(S) + N(S^{inv})) to become approximately Gaussian with mean 0 and standarddeviation σ_{G} =1/(N(S) + N(S^{inv}))^{½}. Alternatively we can state that Z = (N(S)  N(S^{inv}))/(N(S) + N(S^{inv}))^{½} should approximately follow a standard normal distribution with mean = 0 and variance = 1. It follows then, from the previous paragraph, that in this regime we should obtain, after averaging over all kmers, the results E_{k}(Z) = 0.8 and σ_{k}(Z) = 0.6.
Note that E_{k}(Z) may also be viewed as E_{k}(X/σ_{X}), where σ_{X} = 1/(N(S) + N(S^{inv}))^{½} for every particular kmer under consideration. Tables 2 and 4 demonstrate that the experimental results for large k are close to the predicted theoretical expectation.
 3.
Monotonic increase of E_{k}[X] as function of k in the IS limit.
For perfect IS it is trivial to prove that E_{k}[X] = 0 implies that this equality holds for lower k, i.e. E_{k1}[X] = 0. Here we study the case of approximate inversion symmetry, with the purpose of proving that small E_{k}[X] < <1 implies even smaller E_{k1}[X]. For simplicity we assume that all 4^{k} kmers are being realized on the chromosomal strings.
Let {S_{j}, j = 1…4^{k}} be the set of all kmers, and {S’_{i}, i = 1…4^{k1}}be the set of all (k1)mers. Each (k1)mer can be extended to the right by one nucleotide, resulting in four kmers. Let us refer to this extension of S’_{i} as the set S_{jϵI}. Corresponding relations will hold for their inverse partners, extended by nucleotides to their left. It follows then that the counts of these sets can be related N(S’_{i}) = ∑_{jϵI} N(S_{j}). Hence$$ \begin{array}{l}{\mathrm{X}}_{\mathrm{k}\hbox{} 1}\left(\mathrm{i}\right) = \left\mathrm{N}\left(\mathrm{S}{'}_{\mathrm{i}}\right)\ \hbox{}\ \mathrm{N}\left(\mathrm{S}{'_{\mathrm{i}}}^{\mathrm{i}\mathrm{nv}}\right)\right/\left(\mathrm{N}\left(\mathrm{S}{'}_{\mathrm{i}}\right) + \mathrm{N}\left(\mathrm{S}{'_{\mathrm{i}}}^{\mathrm{i}\mathrm{nv}}\right)\right) = \left\left({\displaystyle {\sum}_{\mathrm{j}\in \mathrm{I}}\mathrm{N}\left({\mathrm{S}}_{\mathrm{j}}\right)}\hbox{}\ {\displaystyle {\sum}_{\mathrm{j}\in \mathrm{I}}\mathrm{N}\left({{\mathrm{S}}_{\mathrm{j}}}^{\mathrm{i}\mathrm{nv}}\right)}\right)\right\ \\ {}/\ \left({\displaystyle {\sum}_{\mathrm{j}\in \mathrm{I}}\mathrm{N}\left({\mathrm{S}}_{\mathrm{j}}\right)} + {\displaystyle {\sum}_{\mathrm{j}\in \mathrm{I}}\mathrm{N}\left({{\mathrm{S}}_{\mathrm{j}}}^{\mathrm{i}\mathrm{nv}}\right)}\right).\end{array} $$Using the notation N(S_{j}) – N(S_{j} ^{inv}) = ∆N(S_{j}), we note that the numerator on the right obeys ∑_{jϵI} ∆N(S_{j}) ≤ ∑_{jϵI} ∆N(S_{j}). Because of varying signs this inequality may imply a strong decrease.
We may now compare the expressions of E_{k1}[X] =4^{k+1}∑_{i}X_{k1}(i) and E_{k}[X] = 4^{k}∑_{j}X_{k1}(j). Using the results of the previous paragraph we conclude that the numerators of X_{k1}(i) in E_{k1}[X] are smaller (or equal) than the numerators of X_{k}(j), where jϵI, in E_{k}[X]. Note however that the denominators of X_{k1}(i) and X_{k}(j), where jϵI, are different. To the extent that all N(S_{jϵI}) have similar values within the group jϵI when we approach the IS limit, this leads to X_{k1}(i) ≤ ∑_{jϵI} X_{k}(j)/4, which implies that$$ {\mathrm{E}}_{\mathrm{k}\hbox{} 1}\left[\mathrm{X}\right]\ \le {\mathrm{E}}_{\mathrm{k}}\left[\mathrm{X}\right]. $$In practice, for large k, we find in Tables 2 and 4 that E_{k1} [X] ≈ E_{k}[X]/2.
It should be emphasized that the monotonic increase holds in the IS limit, i.e. when E_{k}[X] < <1, but it is not a general property of kmers on any chromosomal section. Synthetic counter examples can be constructed.
 4.
Comparison of E_{k}[X] with the S1 measure.
The measure S1, introduced by Baisnee Hampson and Baldi [10], comparing counts of all kmers with their inverses, is defined by$$ \mathrm{S}1=1{\textstyle \hbox{} }{\displaystyle {\sum}_{\mathrm{i}}}\left\mathrm{N}\left({\mathrm{S}}_{\mathrm{i}}\right){\textstyle \hbox{}}\mathrm{N}\left({{\mathrm{S}}_{\mathrm{i}}}^{\mathrm{i}\mathrm{nv}}\right)\right/{\displaystyle {\sum}_{\mathrm{i}}}\left(\mathrm{N}\left({\mathrm{S}}_{\mathrm{i}}\right)+\mathrm{N}\left({{\mathrm{S}}_{\mathrm{i}}}^{\mathrm{i}\mathrm{nv}}\right)\right). $$The denominator in this expression equals twice the length of the chromosome. The numerator may be regarded as an L1 distance between two sets of sequences.
Note the difference from our measure E_{k}[X], which may be written as$$ {\mathrm{E}}_{\mathrm{k}}\left[\mathrm{X}\right]={{\mathrm{M}}_{\mathrm{k}}}^{{\textstyle \hbox{} }1}{\displaystyle {\sum}_{\mathrm{i}}}\left\mathrm{N}\left({\mathrm{S}}_{\mathrm{i}}\right){\textstyle \hbox{}}\mathrm{N}\left({{\mathrm{S}}_{\mathrm{i}}}^{\mathrm{i}\mathrm{nv}}\right)\right/\left(\mathrm{N}\left({\mathrm{S}}_{\mathrm{i}}\right)+\mathrm{N}\left({{\mathrm{S}}_{\mathrm{i}}}^{\mathrm{i}\mathrm{nv}}\right)\right). $$E_{k}[X] averages the relative difference of all kmers on equal footing, whereas S1 sums all absolute differences.
A comparison of the two different measures on human chr1 is presented in Table 7. We find that E_{k}[X] is roughly twice (1S1)_{k}, and the latter is approximately equal to E_{k1}[X].Table 7Comparison of two measures of inversion symmetry on chr1 of HG18 and HG38
HG18 chr1
HG38 chr1
k
1S1
E_{k}[X]
1S1
E_{k}[X]
5
0.0016
0.0021
0.0072
0.009
6
0.0026
0.0040
0.010
0.013
7
0.0048
0.0085
0.014
0.017
8
0.0091
0.018
0.018
0.025
9
0.017
0.038
0.027
0.043
10
0.033
0.083
0.043
0.085
Skew analyses, i.e. nonvanishing Y(A,T) and Y(C,G), have been carried out before. One example is table 2 in [21]. The correlation with gene numbers on human chromosomes observed in Table 6 is new, to the best of our knowledge. Forsdyke et al. [22] have investigated the correlation of conventional positive (or “top”) strands with the difference of #A#T in chromosomes of C elegans and D melanogaster. These organisms, which have low numbers of chromosomes, do not exhibit a clear preference for excess of either #A or #T.
Declarations
Acknowledgments
We thank Martin Kupiec, Doron Lancet, Isaco Melijson, Erez Persi and Tal Pupko for helpful discussions.
Funding
This research was partially supported by the research fund of the Blavatnik School of Computer Science.
Availability of data and materials
The code we have developed for kmer counting on genomic sequences is available at https://genomeutils.codeplex.com.
Data were obtained from public depositories listed below under Data deposition.
Authors’ contributions
BC and DH initiated the study and contributed to its design. SR contributed to the design of the statistical analysis. SS carried out the numerical data analysis. BC and DH prepared the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare they have no competing interests.
Consent for publication
The authors consent to publication by BMC genomics.
Ethics approval and consent to participate
Not applicable.
Data deposition
Eukaryote genomes have been downloaded from http://hgdownload.cse.ucsc.edu
These include Human HG18 & Masked, Human HG38 & Masked, Mouse MM10 & Masked, Yeast sacSer3 & Masked, C. Elegans CE10 & Masked, Chimpanzee panTro2, Zebrafish danRer6, D. melanogaster dm3.
Prokaryote genomes were downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Bacteria/
These include B. subtilis AL009126 UID76, M. avium paratubercolosis CP005928 MAP4 UID168471, P. furyosus AE009950 UID287, T. maritima AE000512 UID111.
E.coli.K12. mg1655.U00096.2 genome was downloaded from
https://www.genome.wisc.edu/pub/sequence/U00096.2.fas
Information regarding gene counts on plus and minus strands for human has been received from the GeneCards administrator at http://www.genecards.org
Mouse gene counts were obtained from MGI MRK_list2 in ftp://ftp.informatics.jax.org/pub/reports/index.html.
Yeast gene counts were obtained from SGD snapshot at http://www.yeastgenome.org/genomesnapshot
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Chargaff E. Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia. 1950;6(6):201–9.PubMedView ArticleGoogle Scholar
 Chargaff E. Structure and function of nucleic acids as cell constituents. Federal Proc. 1951;10:654–9.Google Scholar
 Crick F, Watson JD. Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature. 1953;171:737–8.PubMedView ArticleGoogle Scholar
 Rudner R, Karkas JD, Chargaff E. Separation of B. subtilis DNA into complementary strands. III. Direct Analysis. Proc Natl Acad Sci U S A. 1968;60:921–2.PubMedPubMed CentralView ArticleGoogle Scholar
 Mitchell D, Bridge R. A test of Chargaff’s second rule. Biochem Biophys Res Commun. 2006;340(1):90–4.PubMedView ArticleGoogle Scholar
 AlbrechtBuehler G. Asymptotically increasing compliance of genomes with Chargaff’s second parity rules through inversions and inverse transpositions. Proc Natl Acad Sci U S A. 2006;103(47):17828–33.PubMedPubMed CentralView ArticleGoogle Scholar
 Prabhu VV. Symmetry observations in long nucleotide sequences. Nuc. Acids Res. 1993;21(12):2797–800.View ArticleGoogle Scholar
 Baldi P, Brunak S. Bioinformatics, the machine learning approach. MIT Press. 2001Google Scholar
 Qi D, Cuticchia AJ. Compositional symmetries in complete genomes. Bioinformatics. 2001;17:557–9.PubMedView ArticleGoogle Scholar
 Baisnee PF, Hampson S, Baldi P. Why are reverseary DNA strands symmetric? Bioinformatics. 2002;18:1021–33.PubMedView ArticleGoogle Scholar
 Kong SG, Fan WL, Chen HD, Hsu ZT, Zhou N, Zheng B, Lee HC. Inverse symmetry in complete genomes and wholegenome inverse duplication. PlosOne. 2009;4:e7553.View ArticleGoogle Scholar
 Wang S, Tu J, Jia Z, Lu Z. High order intrastrand partial symmetry increases with organismal complexity in animal evolution. Sci Rep. 2014;4:6400.PubMedPubMed CentralView ArticleGoogle Scholar
 Afreixo V, Bastos CAC, Garcia SP, Rodrigues JMOS, Pinho AJ, Ferreira PJSG. The breakdown of the word symmetry in the human genome. J Theor Biol. 2013;335:153–9.PubMedView ArticleGoogle Scholar
 Powdel BR, Satapathy SS, Kumar A, Jha PK, Buragohan AK, Borah M, Ray SK. A Study in Entire Chromosomes of Violations of the Intrastrand Parity of Complementary Nucleotides (Chargaff’s Second Parity Rule). DNA Res. 2009;16:325–43.PubMedPubMed CentralView ArticleGoogle Scholar
 Afreixo V, Rodrigues JMOS, Bastos CAC. Analysis of singlestrand exceptional word symmetry in the human genome: new measures. Biostatistics. 2015;16(2):209–21.PubMedView ArticleGoogle Scholar
 Chor B, Horn D, Goldman N, Levy Y, Massingham T. Genomic DNA kmer spectra: models and modalities. Genome Biol. 2009;10:R108.PubMedPubMed CentralView ArticleGoogle Scholar
 Pevzner P, Tesler G. Genome rearrangements in Mammalian Evolution: Lessons from Human and Mouse Genomes. Genome Res. 2003;13:37–45.PubMedPubMed CentralView ArticleGoogle Scholar
 Okamura K, Wei J, Scherer SW. Evolutionary implications of inversions that have caused intrastrand parity in DNA. BMC Genomics. 2007;8:160.PubMedPubMed CentralView ArticleGoogle Scholar
 Green P, Ewing B, Miller W, Thomas PJ. NISC Comparative Sequencing Program & Green ED. Transcriptionassociated mutational asymmetry in mammalian evolution. Nat Gen. 2003;33:514–7.View ArticleGoogle Scholar
 Huvet M, Nicolay S, Touchon M, Audit B, d’AubentonCarafa Y, Arneodo A, Thermes C. Human gene organization driven by the coordination of replication and transcription. Gen Res. 2007;17:1278–85.View ArticleGoogle Scholar
 Mascher M, Schubert I, Scholz U, Friedel S. Patterns of nucleotide asymmetries in plant and animal genomes. BioSystems. 2013;111:181–9.PubMedView ArticleGoogle Scholar
 Forsdyke DR, Zhang C, Wei JF. chromosomes as interdependent accounting units. J Biol Syst. 2010;18:1–16.Google Scholar