From: Single genome retrieval of context-dependent variability in mutation rates for human germline

Comparison of the in silico evolved sequence and the actual human genome. a The 5-mln-nt starting sequence is randomly generated with 60% G+C content. (b and c) The sequence is then neutrally evolved using \(r_{i,j}^{core}\) only, until the base-compositional equilibrium is established (c). This was reached after about 20 mln substitutions (or an average of 4 substitutions per site (b), where x-axis shows the number of substitutions divided by the simulated sequence length). The equilibration converges faster when we start from a sequence with lower G+C content. dg The plots showing the correlation of the k-mer contents in the equilibrated genome with the corresponding content in the real human genome. The lengths of the k-mers along with the correlation coefficients are shown on the bottom right corners of the plots. Two correlation coefficients are shown with the exclusion and the inclusion (the value in the bracket) of CpG containing oligomers (red points in the plots). The dashed lines depict the diagonals for the ideal match of the k-mer contents

