Anchoring the physical map
The INRA bovine BAC library was screened by PCR using published microsatellite primer pairs (see BovMap database ) as previously described .
In collaboration with the Genoscope, we produced 53,550 BES from 26,935 INRA BAC clones and we collected another 28,468 BES from the CHORI-240 library via Genbank. Similarities with the human sequence (NCBI built 35) were searched for by BLASTN after repeatmasking () We used an expectation value E of 10-5 as the significance threshold for comparisons, since this value was shown to provide 95% accuracy in identifying orthologs . Coordinates in the human sequence were then obtained for each retained hit using GoldenPath . After localizing microsatellites in the bovine scaffolds by BLASTN analysis, the position of the bovine scaffolds on the human sequence were determined using BLASTN with parameters set to optimize detection of distant homologies (Ensembl BLASTView: W9, M1, N-1, Q2, R1).
Searching for particular sequence features in breakpoint regions
Evolutionary breakpoint regions between two species were defined as the interval between two segments of conserved gene order. As illustrated in the left part of Figure 1, breakpoint regions were classified into four categories, based on the human alignment:
- BTA breakpoints, having probably taken place in the bovine lineage, when a perfect co-linearity is observed between human and mouse maps, but not with the bovine map. (Figure 1a: the breakpoint region is identified only in cattle).
- MMU breakpoints, having probably taken place in the mouse lineage, when a perfect co-linearity is observed between human and bovine maps, but not with the mouse map (Figure 1b: the breakpoint region is detected only in mouse).
- HSA breakpoints, having probably taken place in the human lineage, when no co-linearity is observed among human, mouse or bovine maps. (Figure 1c: the breakpoint region is identified both in cattle and mouse).
- TeloCentro, when either telomeres or centromeres are involved in the breakpoint (Figure 1d).
The same classification was also performed based on the mouse alignment (right part of Figure 1). When identical breakpoint regions fell into different categories according to the human or mouse alignments (Figure 1e), they were discarded from the global analysis because they could represent regions of breakpoint reuse. Breakpoints involving centromeres or telomeres were likewise discarded because these regions may be poorly represented in the genome assemblies and rearrangements may have been mediated by centromeres or telomeres, independently of the repeats we wanted to analyze.
Human and mouse chromosome masked sequences (chromFaMasked.zip) and repeat annotations (chromOut.zip) were downloaded from UCSC .
The density (number of repeat per Mb of DNA) of each class of repeats was computed for each kind of breakpoint region. Likewise, densities were also computed along each chromosome in sliding windows of different sizes (0.5, 1, 2, 3, 4 and 5 Mb). The proportion of densities, in these sliding windows, greater than the density in each repeat region was then computed for each class of repeats. These calculations were performed using sliding widows of a size range similar to the breakpoint region size.
Enrichment was tested by applying the one-sample Kolmogorov-Smirnov test to these proportions (distribution under the null-hypothesis is uniform on [0,1]) and using the R software.
Classification results, positions on human and mouse sequences, repeat densities and proportions of frequencies are available as supplementary data (break_supp).