Three approaches were followed to examine multifractality in the human genome from the Chaos Game Representation (CGR) (Figure 1A).
1) Multifractal analysis by chromosome fragment
1.1) Analyses of multifractal parameters
The multifractal parameters for 9,379 chromosome fragments were calculated and analyzed (Additional file 1). Initially, the generalized dimension spectrum and MD for all chromosome fragments were determined. The extreme generalized dimension spectra and a medium spectrum are depicted for comparison (Figure 1B). Note that the maximum varies very little due to the fact that negative q values are associated with the structure and properties of sparse regions, with few points in the CGR of the human genome. In contrast, the Dq minimum varies widely because positive q values emphasize regions where the points are dense.
Subsequently, the corresponding scaling exponents τ(q) were calculated for each fragment (Additional file 2). The three multifractal spectra τ(q) show differences related to each other (Figure 1C). The scaling exponent τ(q) can reveal aspects of chromosome fragment structure. Monofractal behavior would correspond to a straight line for τ(q); for multifractal behavior, τ(q) is nonlinear. The changing curvature for the data for the chromosome fragments indicates multifractality. In contrast, τ(q) tends to be linear for that chromosome fragment with the lowest multifractality, indicating partial loss of multifractality.
Using the whole data set for each chromosome we calculated the MD from each generalized dimension spectrum. Thus, the degree of multifractality for all chromosome fragments goes from ~0.79 to 1.56 with an average of 1.042 and median of 1.018 (Additional file 1). Analysis by range of multifractality (RM) reveals that the multifractal behavior for the whole data set is biased toward low multifractal values, as expected (Figure 1D).
Next, we used a discrimination method based on 2-D distributions to study the information dimension for all chromosome fragments. The data show two different informational patterns (similar to a > symbol), one with high information content (Figure 1E, dots on top) and the other with low and medium information content (Figure 1E, dots on bottom) being the occurrence the latter more numerous in data than the former. We hypothesize that these behaviors are related with some molecular parameter, which is analyzed in the following section.
1.2) Analyses of molecular parameters
The annotated contents of coding and non-coding sequences for each fragment were determinated (Additional file 1). These counts were similar to those reported by other studies [1, 2] suggesting that our results are consistent at chromosome level. We hypothesize that these multifractal behaviors might be explained by different repetitive DNA contents in the human genome, similar to the results found in C. elegans[24]. Therefore, we examine several molecular density parameters against the MD. We especially focused on the Alu sequence content, given its high polymorphism. We observe 1,078,720 Alu sequences which is equivalent to about 10.58% of the human genome where chromosome fragments contain 0-563 Alu sequences with an average of 115 Alus, i.e., one Alu element for about every 2,600 bp of genomic DNA. We demonstrated how strong the relationship between the MD and Alu content is (Figure 2A).
This relationship was assessed in terms of Alu families and the Alu-S was found to be more correlated than the other Alu families (Figure 2B). Furthermore, the Alu contents (in conjunction with CGI) are biased toward high multifractal ranges suggesting the significant role of these sequences in determining the non-linearity in the human genome (Figure 2C).
When sequencing the human genome, a strong relationship between Alu and CGI contents [1, 2] also became evident. We observe that CGI have a lower multifractal relationship than that found for the Alu elements (Figure 2D). However, when both parameters are combined a significant fit was obtained (R2 = 0.85, p < 0.05). Other molecular parameters such as gene density, exons, introns, LINE, MIR, MER, and LTRs did not show a significant fit by a simple linear regression. However, when all repetitive elements (Alu, LINE, MIR, MER, and LTRs) are taken into account the R2 ~0.57. Thus, among the studied genomic features, Alu has the highest correlation with multifractal degree.
Multivariate analyses of ΔDq versus all variables (Alu, G+C, CGI, LINE, MIR, MER, LTRs, nCoding, nNonCoding, exons, genes, and SNPs) per chromosome were carried out and for each case the most relevant variables explaining ΔDq were selected. The most frequently used variables are (G+C), Alu, CGI, which are significant in 23, 23, 21 cases of the 24 chrs. (Additional file 3). CGI coefficients in all regressions are negative, probably compensating the high positive (G+C) coefficients, given that (G+C) and CGI are strongly correlated (R ~0.805). Positioning Alus among the most relevant variables confirms our prior analyses based on 1 and 2 dimensional regression. In the same way, we analyzed ΔDq for the whole genome, obtaining again (G+C), Alu, CGI as the most relevant variables explaining ΔDq (Additional file 4). Moreover, when the long interspersed repeats are analyzed by RM they tend to be located on low and medium multifractality (LMM) ranges (Figure 2E).
Given that the information dimension studied takes a form of > symbol (Figure 1E), we studied its behavior using a discrimination method based on 3-D distributions. In this analysis, the high information content is related to Alu content, whereas the low and medium information contents are rather related to low Alu contents and other genomic structures (Figure 2F).
1.3) Multifractal map of the human genome
We examined the multifractality and Alu content across the genomic landscape to map these relationships in each human chromosome. The analysis reveals how similarly these two variables behave. This is particularly clear when the determination coefficient for the linear regression is calculated for all chromosomes (Figure 3, Additional file 5). All R2 oscillate between ~0.78 and 0.92, with the exception of chromosomes Y, 21, 19, X, and 11. The apparent low correlation (0.24 ≤ R2 ≤ 0.76) of these chromosomes can be explained by the presence of some atypical chromosomic fragments: they may contain some kind of repeat (chrs. 4, 21, and Y) or present a lack of Alu contents (chrs. 11, 12, and 19). Once these fragments (nine in total) are removed from the analysis, the R2 for all chromosomes improve significantly (0.78 ≤ R2 ≤ 0.92, p < 0.05), including chromosome Y (R2 = 0.52). The chr. 17 has the highest determination coefficient between multifractality and Alu content with chromosome Y having the lowest. With the exception of the atypical chr. Y, these results indicate that multifractality in each human chromosome is dependent on the content of repetitive DNA - type Alu-. Additionally, other determination coefficients for several molecular parameters were calculated in this study (Additional file 5). Thus, among the studied molecular parameters, Alu shows the highest correlation with MD. A multivariate regression analysis also showed a similar result (Additional file 3).
Given that other repetitive elements could contribute to increase the local multifractality, five chromosome fragments with a low number of Alus and high MD in chromosomes 4, 21, and Y called our attention (Figure 3, asterisks, Additional file 5). We analyzed these sequences and found many variable short repeats in tandem (VSRTs) (Additional file 6). Thus, the presence of these repeats increases local multifractality but reduces the entire chromosome multifractality for these chromosomes, as mentioned before.
1.4) Chromosomal location of the most multifractal chromosome fragments
Human genome sequencing revealed that chromosomes 19, 16, 17, and 22 are richer in genes, CGI, and Alu elements [1]. Based on averages for ΔDq and Alus for these chromosomes we defined a threshold for chromosome fragments as ΔDq ≥ 1.159 and Alu contents ≥ 217.9 (Additional file 7). This allows to separate chromosome fragments with the highest multifractality from those with LMM (Additional file 8). A discrimination method based on distributions of 3-D points shows how both groups of chromosome fragments can be easily differentiated (Figure 4A). The plot reveals that the highest multifractality and Alu contents are observed in 1,292 fragments suggesting the existence of an abundant number of multifractal regions in the human genome with an average multifractality around 1.24 and an average of ~305 Alus. As expected, many fragments (~29%) are located on chromosomes 19, 17, 16, and 22, respectively (Figure 4B, above). Similar results were obtained by using a 3-D plot with MD-Alu-Dq(q = 1) (data not shown). Chromosome fragments with LMM and low Alu contents, in contrast, are situated mainly on the other chromosomes (~86.2%), being chromosomes 4, 13, 18, 5, and Y those with the lowest multifractality (Figure 4B, below), an average multifractality of 1.0, and average Alu of ~79.
1.5) Analyses by gene function, gene family, and gene length
One would expect to find other molecular characteristics of the gene associated with the multifractality; hence other related molecular parameters were examined. Several biased distributions toward high ranges of multifractality for gene functions, cluster of orthologous genes (KOGS), metabolic pathways (KEGGs), and number of exons were found (Figure 5A, Additional file 9). We only found gene function information for 5,823 chromosome fragments with an average multifractality degree (AMD) of 1.126 and median of 1.132. For example, many genes for the cell division cycle lie on chromosome fragments with an AMD of 1.203; many genes of the major histocompatibility complex, classes I and II are situated on fragments with AMD = 1.06; and many members of the melanoma antigen family lie on fragments with AMD = 0.96 to mention a few.
To gain further insights into the gene function, we focused on about 208 human gene families, consisting of 4,614 genes [25, 26]. We asked about the multifractal genomic context for these gene families. The distributions obtained show three different multifractal behaviors (Figure 5B): low-skewed (for OR, KCN, HLA, IFN, KRT, CDH, and RGS), high-skewed (for ZNF, SNORA, USP, RPS, SNORD, GTF, DHX, ALOX, and UBE2), and "medium" for most gene families. Other gene families can be placed within some of these categories (Additional file 10).
When multifractality is related to the information content (for example, number of exons, Figure 5A), it is expected that the more genetic information exists, the greater is the extent of genetic information fragmentation. To verify this assumption we looked for the average lengths of genes, exons, and introns in relation to the RM. The three corresponding distributions show how the average lengths decrease as multifractality increases (Figure 5C, Additional file 11). Another approach to validate this assumption is to observe the number of information units (IU) (exons plus introns) per RM. Here, the distribution shows that the number of IUs increases when the RM increases (data not shown).
2) Multifractal analysis by chromosome
We next explored the multifractal behavior of each chromosome. We found that the AMD and Alu content profiles have very similar behaviors (Figure 6A). This is particularly evident when observing how well these two variables fit (Figure 6B, Additional file 12). Following the linear regression line, three groups of chromosomes can be distinguished (by visual inspection): a first group where chromosomes 19, 17, 22, and 16 exhibit the highest multifractality (and the highest Alu contents), a second group consist of chromosomes 15, 20, 1, 10, 12, 9, 7, 14, and 21 with medium multifractality, and a third group of chromosomes 2, 11, 8, 6, Y, 3, 18, 5, 13, X, and 4 with the lowest multifractality, respectively. A similar analysis showed that the CGI were also highly correlated with the AMD (R2 ~0.86, p < 0.05) (Additional file 13).
We asked whether this subjective classification could be obtained by hierarchical clustering of the complete data set of averaged multifractal parameters, using multifractality as a similarity measure (Additional file 14). The clustering process classified the chromosomes into three multifractality groups (on top of Figure 6B), with among group similarities of 0.84 and 0.4, respectively: low, medium and high confirming (in part) the visual observation. Nearly all chromosomes (92%) lie on the consecutive, visually identified low, medium and high sections on the regression line. The only exceptions are chromosomes 22 and Y, which are placed in other groups.
3) Multifractal analysis by average of chromosome regions
Analysis by chromosome region proved to be a valid approach to study the genetic information content in the C. elegans genome [24]. Here, we applied the same approach to analyze several characteristics of the human genome. It is known that chromosome 21, involved in Down syndrome, shows a degree of asymmetric regionalization in the distribution of the Alu elements (Figure 3) [1, 2]. We hypothesized that one part of chromosome 21 should have low multifractality and the other one high multifractality. Indeed, the data show that the first 50% of chromosome 21 is of low multifractality (< 1.0), whereas the other 50% has a higher multifractality (> 1.08) (Figure 7A, Additional file 15).
Other relevant characteristics are observed in some chromosome bands and arms (Figure 7A). For example, the X chromosome, involved in X chromosome inactivation (XCI), which is rich in LINE1 elements and poor in Alu sequences showed a 0.95 ≤ ΔDq ≤ 1.027 (Additional file 16). The Y chromosome has two particular regions to the Yp and Yq ends, the pseudoautosomal region and the palindromic region, respectively [1]. We thought that the palindromic regions should have low multifractality because of their symmetric structure. We found, in fact, that this region has lowered non-linearity. Moreover, recombination rates in chromosome 8 tend to be much higher in distal regions (around 20 Mb) [1] and the analysis showed medium non-linearity at this region as expected (Additional file 17). Regarding chromosome 1, rich in Alu sequences in one of its arms [27], we found significantly high multifractality (~1.13) at this region; in contrast, the other three regions have a ΔDq ≤ 1.058 (Additional file 18). Similar situations can be analyzed for other chromosomes. As two opposing references we use chromosomes 4 and 19 for comparison (Additional file 19).
Antibodies to histone modifications previously linked to active transcription, showed close correspondence to regions rich in genes and CGI in human methaphase epigenome [28]. We analyzed chr. 1 and found that CGI profiles correspond well to multifractality (Figure 7B, Additional file 20).
Discussion
We discovered a strong relationship between the multifractal parameters and part of the genetic information coded by the human genome.
Initially, the multifractality in human genome was found strongly dependent on the Alu contents
Herein, thousands of chromosome fragments with multifractality ranging from low to high values were analyzed (Figure 1A-C). For all chromosome fragments, τ(q) is a nonlinear function (Figure 1C), indicating that the molecular structure of the chromosome fragments has a multifractal behavior. However, in many chromosome fragments, τ(q) tends to be close to linear behavior, especially for τ(q≥ 2), indicating partial loss of multifractality. These results suggest that nucleotide fluctuations are less anti-correlated in many chromosome fragments. In fact, the fragment distribution is biased toward low and medium multifractal values (Figure 1D), suggesting that the human genome has a large number of regularly arranged elements, highly periodic and not very polymorphic. This is not surprising because the human genome has about 98.9% of non-coding sequences with a complex composition given by introns and intergenic regions. That is, at least 55% of this information is poorly polymorphic given that these regions mainly consist of introns, LINEs (especially L1), LTRs and DNA transposons [1, 2]. In contrast, the human genome also has a significant number of chromosome fragments with high multifractality (Figure 1D). That means these regions should be rich in specific types of sequences that are highly polymorphic and organized in a large number of possible combinations. When the information dimension was analyzed a dual informational behavior confirmed such assumption (Figure 1E). Indeed, the multifractality was found to be strongly correlated with the Alu content (Figure 2A), which became visible when plotted against the information dimension (Figure 2F). This result is very significant given that the Alu family is highly polymorphic [29, 30] and in a 300 kb chromosome fragment one can find Alu elements in many combinations in up to 50% of its length. The Alu elements are not identical and can be classified into three major families: Alu-J, Alu-S and Alu-Y representing the oldest, intermediate, and youngest Alus, respectively and each family is divided into one or more levels of subfamilies [31]. In total, ~45 subfamilies encompass the complete Alu family. We found that multifractality was mainly dependent on the Alu-S contents (Figure 2B), especially the Alu-Sx, an expected result since these sequences are the most abundant Alu members in the human genome [1]. Analysis via RM confirmed that the Alu sequences tend to be located toward medium and high ranges of multifractality (Figure 2C) because of the high Alu content in the human genome.
The CGI showed a moderate relationship with the multifractality (Figure 2C, D), which might be because more than 95% of CGI are less than 1,800 pb long [1]. Genes, exons, introns, LINES, MIR, MER and LTR contents did not show any significant relationship with the multifractality because most of these sequences have a low number of members, are large and have few polymorphisms. For example, LINE elements are ~6 kb long, more numerous than Alus and consist of four families, being LINE-1 the most abundant family (~17%) in the genome [32], and their density pattern is quite uniform for most chromosomes [1]. Thus, the combination of number of members, size and polymorphism seem to be determining characteristics for multifractality changes. The earlier mentioned abundant number of polymorphic Alu sequences confirms the relation between these characteristics and multifractality. In fact, an in silico comparative genomics study between public and Celera versions of human genome sequences identifies several hundred new Alu insertion polymorphisms, showing that these elements are highly polymorphics [31]. A similar behavior is found in C. elegans where the TTAGGC repeat is abundant in number and combinations within the flanking sequences [24].
Subsequently, we elaborated a multifractal map of the human genome (Figure 3), which shows MD and Alu density along the human chromosomes. The map reveals that the human chromosomes contain many significant correlation structures for Alu-rich regions. Thus, the high contents of Alu account for the high aperiodicity and genetic variability of many chromosome sections. A similar result in C. elegans reported changes in multifractality related to a specific type of repetitive DNA [24]. Additionally, the correlations for CGI are lower but significant. However, no significant correspondence was found in regions poor in Alu sequences and rich in LINE, MIR, MER and LTR sequences. Not all multifractality is due to the Alu contents, many VSTRs can also contribute to increasing local multifractality (Figure 3, asterisks). We found very poor correspondence to the number of genes perhaps due to their low frequency. These results, taken together, indicate that the observed multifractality is primarily related to nonlinear distributions for those chromosome fragments which are rich in Alu sequences, next for those with high CGI content and in few instances, for those with high VSRT contents.
Hundreds of highly multifractal chromosome fragments mapped in chromosomes rich in genetic information
There were a large number of chromosome fragments with very high multifractality (Figure 4A), mainly located on chromosomes 19, 17, 22, and 16 (Figure 4B, above). All of these chromosome sections, so we suggest, generate a mosaic of regions locating the genetic information far from equilibrium [17, 24], which could be interpreted both, as a protector "shield" for the human genome against environmental fluctuations and as "genomic attractors" to maintain many components, functions and processes under a "deterministic" genomic control. In contrast, the same analysis also identified thousands of LMM chromosome fragments (Figure 2C) with low Alu content (Figure 4B, below) and perhaps prone to being affected by the environment. This result might be interpreted as some genome sections with low nonlinearity that might have high genetic instability associated with some particular (structural or functional) gene property.
Several gene characteristics are related to multifractality
This is not striking since three-fourths of all genes in the genome are associated with Alus (Figure 5A) [30]. Therefore, some gene families tend to be located preferentially within a multifractal genomic context (Figure 5B). For example, the hOR gene family lies mainly on a low multifractal genomic context. This is due to this family has a very periodic and repetitive structure. It is known that the OR gene family has about 390 active members which were propagated on the genome by gene duplication. Hence they share a high homology due to their high structural homogeneity and possess many clusters of regular characteristics; nonetheless, their functional expression depends on a complex interplay between regulatory sequences and the environment [33]. A similar behavior is observed in the KCN gene family, responsible for building potassium channels for cell communication. In contrast, the ZNF gene family, which codes for regulatory proteins and is, therefore, involved in many cellular functions, is located in a medium and high multifractal genomic context. For example, the ZEB2 protein involved in a chemical signaling pathway regulates early growth and development and obeys a pre-determinate genetic program. In addition, these genes have a high structural inhomogeneity and many irregular characteristics. Similar inferences might apply for the RPS gene family, which codes for highly conserved proteins for the ribosome, for the SNORA machinery involved in the nuclear splicing and for USPs that help to control the levels of many proteins in the cell [26]. This seems to suggest that the low multifractal genetic context might be related to information inputs from environmental processes, and the high one to inputs from deterministic processes. Thus, a few gene families in the human genome might be subjected to two types of information (or stimulus) inputs, while most gene families seem to be subjected to a complex regulatory interplay between epigenetic and genetic controls.
On the other hand, the degree of gene fragmentation by RM (Figure 5C) behaves according to the multifractal theory: multifractality increases when the length of exons and introns in the human genome decreases and the number of IUs per interrupted gene increases with multifractality, as expected.
The multifractal approach per chromosome permitted classifying the human chromosomes. This analysisvalidated the strong relationship to the Alu elements (Figure 6) we found especially for chromosomes 19, 17, 22, and 16, which are rich in genetic information content [1, 2]. Particularly chromosome 19 is by far the most multifractal chromosome and has the highest gene density of the whole genome. It is also unusual with respect to its density of repeat sequences. In fact, nearly 55% of this chromosome consists of repetitive elements, whereas chromosomes 6, 7, 14, 20, 21 and 22 all have repeat contents ranging from 40% to 46% (the genome average is 44.8%). This difference is due mainly to an unusually high content of SINEs in chromosome 19 [1]. In contrast, chromosomes 13, X, and 4 have the lowest multifractality because their Alu content is lower than the autosomal average, they have low gene density. Some of these chromosomes have very large "gene deserts" and the CGI and LINE contents are the highest percentage among all autosomes [1, 30]. A similar behavior can be observed for chromosomes 19, 17, and 4, as reported in a recent multifractal analysis [23].
Our analysis permits classifying human chromosomes into three multifractality groups suggesting that the chromosome molecular structure might be organized as a system operating far from equilibrium [24] (Figure 6B). Thus, those chromosomes with low multifractality might be closer to equilibrium and have greater genetic instability. If so, this would explain, why some chromosomes would be involved in some genomic disorders (structural and numerical chromosome alterations)[34]. For example, some microdeletion syndromes have been reported for chr. 4: Wolf-Hirschhorn syndrome, chr. 5: Cri du chat syndrome and chr. 15: Angelman and Prader-Willi syndromes. Some aneuploids can be present in chr. 8: Syndrome of Warkany, chr. 13: Patau syndrome, chr. 18: Edward syndrome, chr. 21: Down syndrome, chr. X: Turner syndrome (XO), Klinefelter syndrome (XXY), triple X syndrome and other tetra and pentaploids of chr. X. For chr. Y: XYY syndrome and Turner syndrome. With the exception of chromosomes 21 and Y, all were classified as chromosomes with low multifractality and are more susceptible to genetic damages or a wrong meiotic segregation.
The multifractal approach by chromosome region reveals different genomic scenarios (Figure 7A)
For instance, 21 chromosome regions with low multifractality might promote genetic instability during meiotic segregation in Down syndrome. Similar behaviors might arise for chromosomes X and Y to explain XCI and sex determination. For example, the most remarkable enrichment of repetitive sequences obtained for L1, which accounts for 29% of the X chromosome sequence compared to the average of only 17% [1]. Some studies have reported significant association between L1 and coverage and inactivation, and others have refuted this result [35]. However, the low multifractality, especially at the third region (AMD ~0.96) may be prone to XCI. With regard to chromosome Y, the pseudoautosomal region is more stable, while the palindromic (more periodic) region is unstable and more prone to producing some genetic disorder such as the mixed gonadal dysgenesis and infertility [1, 34]. On the contrary, the 8p region in which a vast section of ~15 Mb has a strikingly high mutation rate lay on a medium multifractality region [36]. Similar behavior can be inferred in the C. elegans chromosome arms, rich in mutation rates [24].
A similar approach showed that the CGI and Alus correspond well to multifractality (Figure 7B). This result is significant because of the role that CGI play in heritability of epigenetic states during the active transcription or modifications associated with active chromatin [28].
Finally, we propose a descriptive, non linear model for the function and organization of the human genome (Figure 8)
Firstly, several studies have suggested that multifractal systems might be organized as systems operating far from equilibrium [16, 17, 24]. Thus, the detection of a multifractal scaling in the human genome structure suggests that its molecular structure might be organized as a system operating far from equilibrium, meaning that no variable describing the state of the system shows a regular repetition of values. The high multifractality which strongly depends on Alu contents (and upon CGI to a lesser degree) and is located mainly in highly aperiodic regions, takes the chromosome away from equilibrium giving greater genetic stability, protection, and attraction of mutations (Figures 2A-C, 3F, 3, and 8C). Thus, hundreds of regions in the human genome might have a high genetic stability (Figures 1B-E, and 8A, B) and the most important genetic information of the human genome (genes) would be safeguarded from environmental fluctuations. It is because Alu elements (and CGI) are biased toward gene-rich regions [5]. Furthermore, it is well known that the Alu elements are highly polymorphic [29] or highly aperiodic and that a marked reduction of Alus is located within the interrupted genes, especially in exons [6]. Hence, a great number of mutations fall into the flanking regions of the coding sequences [37] and Alu elements become effectors of gene transcription by providing new enhancers, promoters and polyadenylation signals to many genes [38]. Based on these findings and those found in C. elegans[24], it seems that the non-linearity might be located on highly polymorphic genetic units that are distributed in many combinations through the genome. If so, we inquired on how these sequences have come to exist. Possibly, this might be explained by the fact that the multifractal scaling in the human genome appears to be located on fractal structures, which are mathematically created (in a deterministic way) by superposition of seed sequences [23]. So these seeds may have been the Alu sequences, which might have increased in number by retrotransposition, a process involving the insertion of reverse transcribed DNAs of Alu-derived transcripts back into the genome, apparently by hijacking the LINE-1 retrotransposition machinery [31]. Thus, multifractality may have occurred extensively in the past by the apparent "over-transposition" of different functional units (Alus, CGI) carried by each DNA sequence. Nowadays, it is hypothesized that the majority of transposable elements have been silenced perhaps by some repressive mechanism [39] to protect the genome. However, our results suggest that the Alu elements may themselves be responsible for genetic stability and protection to the genome. Thus, the human multifractal map developed here provides a tool to identify regions that are rich in genetic information and genome stability.
Secondly, there is a strong tendency to increase genetic information content when multifractality increases and to increase gene fragmentation when multifractality increases. These results are consistent with what the multifractal theory predicts (Figures 5A, C, and 8E). Thus, the human genome seems to be made by many information units (interrupted genes, Alus and CGIs) with different degrees of fragmentation (or size) that account for the aperiodic scaling of short and long range correlations found by other authors [14].
Thirdly, a multifractal genomic context seems to be a significant requirement for the functional and structural organization of thousands of genes and many gene families, i.e., a low multifractal context seems to be necessary for many sequences (generated by gene duplication and periodicy) to interact with environmental signals, while a high multifractal context (aperiodic) seems to be prone (or a "genomic attractor") to many genes; and some (very aperiodic) gene families are involved in deterministic and genetic processes (Figures 5A, B, and 8E, F). Thus, the highly multifractal regions would be a guaranty to maintain a deterministic regulation control in the genome [24], although most of the human genome sequences can be subjected to a complex epigenetic and genetic control as observed when the human epigenome due to the CGI contents is related to multifractality [28].
Fourthly, the human chromosome classification and some chromosomic region assays may have some medical implications. That is, the structure of low non-linearity exhibited for some chromosomes (or chr. regions) might imply an environmental predisposition to be sensible targets for structural and numerical chromosomic alterations (Figures 6, 7, and 8G). In fact, the loss of non-linearity is associated with failure or alterations of many vital systems close to equilibrium [17, 40, 41]. Additionally, the sex chromosomes must have low multifractality to maintain the sexual dimorphism and likely the XCI.
All these fractal and biological arguments might explain why the Alu elements are shaping the human genome in nonlinear manner. We believe that applying comparative multifractal genomics among many human genomes and other model organisms can help to respond to how the genome came to exist.