We have previously suggested that alphoid arrays made of highly homogeneous HORs evolve by homogenisation/amplification runs which differentiate them into a series of domains that bear almost identical haplotypes as defined by Diagnostic Variant Nucleotides . Moreover, we showed that exchanges between homologues are essentially absent, with each homologue evolving within its particular lineage through the accumulation of unequal crossovers during germ line mitosis. Conversion was viewed as primarily introducing divergence between the repeats.
In the present paper, we have revisited and extended these observations through the analysis of additional chromosomes (1, 3, 5, 19, and 21). We have also examined how the repeats corresponding to the CENP-A nucleosomes of the centromere behave with respect to these evolutionary mechanisms.
The D1Z5 locus is archetypical of the evolution of highly homogeneous alphoid arrays
We first examined the long stretch of alpha satellite DNA from locus D1Z5 (1q), which was available in databases. Knowing the map position of each repeat allowed us to confirm that the same observations could be made along its 55 homogeneous 1866 bp-long tandem repeats, albeit with greater complexity than what was observed and predicted in our previous paper . It is composed of two superimposed domains with relatively short subdomains, showing that the process of homogenisation/amplification acts at a high frequency and provides a somewhat constant flux through the generations. Exchanges were almost absent between the two domains due to the presence of an impassable barrier separating them that was generated by the duplication of a relatively divergent repeat by unequal crossing over, thereby increasing the distance between the most proximal repeats.
When six chromosome 21 homologues were examined, we were able to confirm that the number of DVNs was quite small. In contrast to our previous report, however, only a fraction of them was shared between the homologues, indicating that each chromosome 21 lineage "chooses" its DVNs to be homogenised/amplified independently from the others. We cannot, however, conclude that this "choice" is totally random (see below). A large minority of the copies shared the same haplotype, indicating that the D21Z1 alpha satellite repeats have been relatively stable over time, or, alternatively and more likely, that the formation and fixation of this locus occurred relatively recently.
Loci D1Z7 and D19Z3 from chromosomes 1 and 19 exhibited ClustalW alignment patterns that were similar to that of BAC BX248407 (D1Z5), with an even larger number of DVNs. They are also comprised of domains and subdomains superimposed on one another and exhibit obvious conversion events. In the absence of a position map for the analysed repeats, however, it was difficult to determine whether the pairs of relatively diverged copies that are observable constitute, as with D1Z5, barriers between different domains. When the two chromosome 5 homologues were compared, the number of DVNs was much lower, although it was still larger than that of the D21Z1 locus. Their DVN distributions were largely different, again showing independent homogenisation/amplification runs in the two corresponding lineages. They represent intermediate states of nucleotide variation and exchange between chromosome 21 and chromosomes 1 and 19.
An important property related to the molecular evolution of highly homogeneous alphoid arrays emerges from these analyses: all the chromosomes analysed to date are subjected to a constant flux of exchanges occurring during the series of mitoses in the germ line. This phenomenon probably takes place in each generation and is apparently an intrinsic property of the tandemly arranged highly homogeneous alphoid HORs. Given the differences in the extent of the phenomenon on different chromosomes, it is difficult to say if it depends on the particular chromosome involved or, more likely, on the amount of time that has elapsed since the formation of the homogeneous alphoid array.
The existence of several alphoid arrays coexisting within the centromeric regions of a number of chromosomes might be a consequence of this continuous process: with time, the divergence between the repeats has become so high in certain arrays that they are no longer capable of forming a centromere. Beyond a certain level of divergence, the process of accumulation of unequal crossovers stops and they drift, ultimately becoming monomeric. This model fits well with the observation made by Schueler et al  that the monomeric alphoid arrays present on Xp are ancestral to the highly homogeneous block where the centromere is formed . The same is true of chromosome 17 .
Which status for the repeats associated with CENP-A?
We wanted to investigate the evolutionary behaviour of the minority of repeats that are engaged in the actual centromere. Alphoid homogeneous arrays can be very small, as on chromosome 21 where the array can be less than 100 kb long . It was not surprising, therefore, that on chromosomes 1, 5, and 19, almost no repeats representative of those associated with CENP-A were detected in the bulk set of repeats. This confirmed that the proportion of alphoid repeats from a homogeneous array that is engaged in the real centromere can be very low. The overall features of these repeats were shown, however, to be similar to those exhibited by the bulk repeats. They are therefore evolving in the same way.
The sizes of the domains and subdomains they exhibit could not be estimated at present, but if they are similar to those that are supposed to exist within the pericentromeric alphoid repeats such as within BX248407, they would be compatible with the interspersed structure of human CENP-A and histone H3 nucleosomes [16, 17]. The most striking feature of this analysis is that the DVNs that the CENP-A associated repeats have "chosen" for homogenisation/amplification are quite distinct from the other repeats. These small sequence differences might reflect a certain degree of sequence dependence for the recruitment of the proteins that constitute the CENP-A centromeric nucleosome-associated complex . At the same time, when several homologues were examined the DVNs exhibited by these repeats (here chromosomes 21 and 5) were largely different, consistent with an absence of a strict sequence dependence for CENP-A to bind directly to alpha satellites, as reported by Conde e Silva et al .
A more plausible explanation for this difference in DVNs could be that during the constant process of change that supposedly leads to the loss of the capacity of the alphoid repeats to form an active centromere, certain nucleotide changes do not spread at the same rate within the CENP-A associated repeats. Alternatively, during the proposed centromere meiotic drive [25, 26], some haplotypes could be actively selected against to preserve the centromere integrity of the unique remaining cell that is available for fertilization during female meiosis II.
The comparison carried out between chromosomes 1, 5, and 19, which share almost identical consensus sequences at their respective centromeric loci, supports this hypothesis. The DVNs of repeats originating from the three chromosomes were shared in higher proportions when associated with CENP-A than when recovered from the bulk. This was shown by simultaneous clustalW alignments of the repeats of the four chromosomes tested (1, 19, and two chromosome 5 homologues). We cannot, however, conclude from this analysis that the DNA sequence of the centromere-associated repeats is an important factor in its formation, even though is it possible to suggest that there are constraints upon the nucleotide variations that occur in this portion of an alphoid array.
The CENP-A associated alphoid repeats may be found in unrelated alphoid arrays of the same chromosome
Another unexpected observation of this study was that repeats associated with CENP-A were detected on both chromosomes 1 and 19 on two unrelated but contiguous homogeneous arrays of alpha satellite DNA. This was not the case, however, for the two chromosome 5 homologues. This observation raises the possibility that centromeres can be formed by repeats originating from different alphoid arrays, provided that they are homogeneous enough. Another possibility is that there is an alternative centromere location on chromosomes 1 and 19, as has been shown in one Robertsonian fusion ; most fusions of this kind contain dicentric chromosomes with one of the two centromeres being inactivated. Interestingly, Sullivan and Willard  have described stable dicentric human X chromosomes in which the distance between the two functional centromeres is relatively small - as apparently is the case in the two chromosomes described here - thereby preventing anaphase bridge formation, chromosome breakage, and chromosome loss. It is noteworthy that in the case of the D1Z7 locus of chromosome 1, one of the two series of potential CENP-B boxes has been almost totally destroyed by mutation, whilst D1Z5 exhibits CENP-B boxes in their integrity, which could help this locus recruit CENP-A proteins .
A model for the formation and maintenance of active human centromeres
With the above observations in mind, it is possible to make some suggestions and predictions concerning the formation and evolution of human centromeres at alpha satellite loci, where they are mostly found (neocentromeres are estimated to occur in approximately 0.0005%-0.0014% of live births ).
It has been previously pointed out that the alphoid repeats that are capable of contributing to an active centromere must be part of an extremely homogeneous higher-order multimeric repeat unit array that is uninterrupted by retrotransposons [31, 6]. They are submitted to continuous nucleotide changes which spread at high rates to adjacent repeats. This constitutes a progressive process that probably depends on the amount of time that has elapsed since the homogeneous array was formed. This fits well with the differences found between chromosome 21 on the one hand and chromosomes 1 and 19 on the other, with chromosome 5 being intermediate between them with respect to both the number of detected DVNs and the proportion of sporadic mutations.
When a highly homogeneous array has been created, a functional centromere can be formed. This is clearly possible with a large variety of alpha satellite DNA sequences, since most chromosomes exhibit largely divergent ones. The intrinsic ability of highly homogeneous multimeric tandem repeats to homogenise/amplify by accumulating unequal crossovers continues to act upon repeats that are almost identical. This identity is slowly undermined by the accumulation of random mutations, but as long as domains compatible with the formation of an active centromere exist, the array continues to play its functional role. In this study, this is the case with D21Z1 and D5Z2, which have not yet accumulated enough divergence to affect this compatibility, in contrast to chromosomes 1 and 19, in which CENP-A associated higher-order alphoid repeat units have been recovered in a second homogeneous alphoid array.
We do not know, however, if these repeats are part of the active centromere or if they are part of a potential alternative centromere that is in the process of being formed. This might represent a general way of ensuring the stability of human chromosomes over time, as an alternative to the exceptional possibility of being rescued through neocentromere formation. Significantly, five chromosomes with neocentromeres have been described in which the alphoid array within which the centromere is normally formed is still present, three on chromosome Y, one on chromosome 3, and one on chromosome 4 . It is interesting to note that there is apparently only one alphoid array in each of these three chromosomes, meaning that there is no possibility for a centromere to form within another array if the unique one loses its capacity to bind the CENP-A centromeric nucleosome associated complex. The number of neocentromere-containing chromosomes reported to date could be largely underestimated because they are not associated with clinical defects, in contrast to those in which the alphoid sequences have been lost . The defects of the old inactivated centromeres have not been characterized, although it has been suggested that there might have been a partial deletion of the alphoid DNA, which seems unlikely if one refers to the extreme variations of alpha satellite DNA found in normal chromosomes . We rather think that the normal destiny of a centromere is to be lost over time and to be replaced by a new one, most often within the same alphoid array or in a second one, with neocentromeres of the above type representing in this case a transient possible way to rescue a chromosome with an impaired centromere .