Correlation of microsynteny conservation and disease gene distribution in mammalian genomes

Background With the completion of the whole genome sequence for many organisms, investigations into genomic structure have revealed that gene distribution is variable, and that genes with similar function or expression are located within clusters. This clustering suggests that there are evolutionary constraints that determine genome architecture. However, as most of the evidence for constraints on genome evolution comes from studies on yeast, it is unclear how much of this prior work can be extrapolated to mammalian genomes. Therefore, in this work we wished to examine the constraints on regions of the mammalian genome containing conserved gene clusters. Results We first identified regions of the mouse genome with microsynteny conservation by comparing gene arrangement in the mouse genome to the human, rat, and dog genomes. We then asked if any particular gene types were found preferentially in conserved regions. We found a significant correlation between conserved microsynteny and the density of mouse orthologs of human disease genes, suggesting that disease genes are clustered in genomic regions of increased microsynteny conservation. Conclusion The correlation between microsynteny conservation and disease gene locations indicates that regions of the mouse genome with microsynteny conservation may contain undiscovered human disease genes. This study not only demonstrates that gene function constrains mammalian genome organization, but also identifies regions of the mouse genome that can be experimentally examined to produce mouse models of human disease.


Background
The availability of several mammalian genome sequences has enabled comparative genomic studies to identify regions of conserved linkage among different organisms (reviewed in [1][2][3][4]). These studies have been used to pre-dict the genome architecture of the common mammalian ancestor [5,6], as well as to assess recombination [7,8] and genome evolution [9]. Additionally, regulatory elements have been identified by the characterization of conserved regions of non-coding DNA [10][11][12]. Non-coding func-tional RNAs and microRNA targets have also been identified through comparative genomic approaches [3,13,14]. Gene function may be inferred from conserved proteins in other species. Likewise, comparative genomics among mammalian species is useful for predicting the functional consequences of mutations in human disease loci [15][16][17][18]. Additionally, mapping genes responsible for quantitative traits in rodents allows the prediction of locations of human quantitative traits underlying disease, based on conserved genomic structure between rodents and humans [19][20][21][22].
Although it is becoming increasingly apparent that genomes display a large degree of structural plasticity, there are nevertheless significant evolutionary constraints on genome structure. Previous studies have provided evidence for functional constraints on genome organization in prokaryotic and eukaryotic genomes [23]. Studies in yeast demonstrate that essential genes are found in genomic clusters [24]. The clustering of essential genes in yeast is likely driven by selection for reduced noise in gene expression levels, as essential gene clusters are localized in regions of open chromatin [25]. Additionally, in the nematode C. elegans, essential genes are located in clusters in regions with low recombination [26].
Clustering of genes with similar functions has also been observed in mammalian genomes. In the human genome, genes that are in the same pathway are in closer proximity than would be expected by chance [27]. Similarly, in the mouse genome, genes with common GO annotations are found in clusters [28]. This is not due to tandem duplications, as most genes in the same pathway that are adjacent in the genome do not arise from duplication events [27]. It is possible that functionally related genes are located in clusters to facilitate coordinated transcription, as many genes in clusters are co-expressed [27].
Many of the previous studies to detect gene clustering were based on bioinformatic analysis of genome annotation. However, there is also support for functional constraints on mammalian genome organization from experimental data. Analysis of saturation levels of mouse mutagenesis screens for lethal phenotypes directed at specific genomic regions demonstrated that mouse essential genes are disproportionately found in regions of conserved microsynteny [29], at least for the small number of genomic regions evaluated. To build on this prior work, we assessed microsynteny conservation on all mouse autosomes. By examining gene content in conserved and divergent genomic regions we found a significant correlation between microsynteny conservation and the density of mouse genes that are orthologous to human disease genes. As the mouse is widely used to model human disease [1,30], the identification of this correlation will facil-itate the creation of new mouse models of human disease by identifying regions of the mouse genome that contain a high density of disease gene orthologs.

Microsynteny conservation of mouse autosomes
We evaluated the level of microsynteny conservation between the mouse genome and those of human, dog and rat. First, we obtained all protein-coding genes and their genomic locations on all mouse autosomes as annotated in the Ensembl mouse genome browser [3,31] (release 50). We also obtained protein-coding genes from the human [32,33], rat [2], and dog [34] genomes. These were chosen because they had a sufficient level of assembly and annotation to allow comparison. The use of the dog genome as an outgroup to human, rat, and mouse improves the stringency of the study [35]. To identify orthologs of mouse genes in the other genomes, along with their genomic locations, Ensembl BioMart homology filters were used to compile a list of orthologous genes.
Although the four mammalian genomes chosen are those with the best available annotation, the degree and quality of annotation may vary somewhat between species. In order to control for this we took additional steps to find the human, rat, and dog orthologs of mouse genes. Protein sequences of all mouse genes that did not have an annotated ortholog in another species were searched using BLAT [36] against the other genomes to identify orthologous sequences in the other genomes. To allow a moderately strict search with a limited number of false positives, all hits with E-value < 10 -5 were retrieved. The genomic location of the best BLAT match in the other genome was used for the evaluation of microsynteny conservation. We searched a total of 6173 genes in at least one other species, finding an ortholog for 1210 of the genes in other genomes. A sensitivity analysis demonstrates that the number of genes retrieved from the BLAT searches is relatively insensitive to the choice of E-value cut-off, as changing the cut-off point from 10 -3 to 10 -7 results in 1266 -1169 ortholog annotations respectively. Therefore, utilizing alternate E-value cut-off points in this range would have changed the annotation of only 0.48% of the total mouse genes analyzed in our study.
Genes were defined as having conserved microsynteny if their orthologs had the same two orthologous neighboring genes in all four species examined. Each mouse gene was queried to determine whether it met these criteria for conserved microsynteny. We then assessed the level of conservation for segments of the mouse genome, determining the percentage of conserved genes in each segment. We examined 20 Mb regions of the mouse genome, as genomic regions of this size have been analyzed exper-imentally through region-specific mutagenesis [37][38][39][40]. Thus, the identification of additional genomic regions with conserved microsynteny will be useful for further experimental functional genomic annotation.
We found that the conservation of microsynteny varied throughout the mouse genome. The average percentage of genes with conserved microsynteny in a 20 Mb interval was 38.29%, with a standard deviation of 11.81%. The Shapiro-Wilk test was applied to non-overlapping 20 Mb windows to determine whether the distribution of gene microsynteny was normal. A P-value of 0.16 indicated that the null hypothesis (a normal distribution) should not be rejected. However, the use of non-overlapping 20 Mb windows restricted the resolution of the study. For example, chromosome 19 has only two observations from non-overlapping windows. To improve the resolution of our study, we next examined 20 Mb intervals staggered by 5 Mb. This sliding window analysis allowed more observations on each chromosome.
As the data conformed to a normal distribution, we therefore calculated Z-scores (number of standard deviations above or below the mean) for each 20 Mb sliding window. Windows with Z>1 were considered to have increased conservation, those with Z<-1 were considered to have decreased conservation, and windows with 1>Z>-1 had intermediate conservation. There were 51 sliding windows of the mouse genome found to have Z>1, indicative of higher microsynteny conservation, and 91 intervals found to have Z<-1, indicative of lower microsynteny conservation. Three hundred twenty-two genomic regions demonstrate intermediate microsynteny conservation with scores of 1>Z>-1. On individual chromosomes there is variation in the conservation of microsynteny, with most chromosomes containing both windows of increased microsynteny conservation and windows of decreased microsynteny conservation (Figure 1 black lines). However, mouse chromosomes 5, 7, 8, 10, 13, 14, 16 and 18 do not contain any regions of increased (Z>1) microsynteny conservation. We found that the percentage of genes with conserved microsynteny per chromosome also showed variation, with chromosome 13 having the lowest percentage of conserved genes at 25%, and chromosome 15 having the highest percentage of conserved genes at 48% (Table 1). Previous work has shown that syntenic conservation is not simply related to gene density [29].

Comparison with sequence-based synteny blocks
We compared our results from to the sequence-based synteny blocks presented for pair-wise genome alignments on the Ensembl genome browser. For each region of the mouse genome with increased microsynteny conservation, we identified the syntenic region of the dog, rat, and human genome ( Table 2, see Additional file 1). Most of the conserved mouse regions identified based on microsynteny also show conservation with a single region in the rat based on sequence. For the intervals on mouse chromosome 2 from 115 -140 Mb, mouse chromosome 3 from 45 -65 Mb, mouse chromosome 4 from 30 -65 Mb, mouse chromosome 9 from 40 -70 Mb, and mouse chromosome 15 from 65 -103 Mb, the breakpoints of synteny in the mouse genome as compared to dog and human genome are the same, showing evolutionary conservation of genome rearrangements. In a separate study directly comparing dog and human synteny blocks, all of these regions were found to be syntenic between dog and human [41]. Although the region from 0 -20 Mb on mouse chromosome 6 is the only region entirely conserved as a sequence-based synteny block in all three other genomes, it is not the most highly conserved region based on microsynteny in the mouse genome.

Conserved genes are located next to other conserved genes
Although there is variation in the density of genes with conserved microsynteny across the genome, it is possible that this variation merely represents random variation within a normal distribution. To determine whether the genomic arrangement of genes with conserved microsynteny is random, we calculated the likelihood that a gene with syntenic conservation is found next to another gene with syntenic conservation. We then compared this to the frequency of conserved-synteny neighbors in a set of 10,000 randomized genomes. In each of the randomized genomes the number and position of genes is maintained, but the annotation of conservation is randomly shuffled.
We find that the frequency of co-occurrence of genes with conserved microsynteny is significantly non-random (Figure 2A, P < 0.003). By our definition, for a gene to have conserved microsynteny it must have two orthologous neighbors in all genomes examined. Thus, a pair of genes with conserved micro-synteny represents a larger block of conserved synteny. The finding that there are conserved microsynteny blocks in the genome that extend beyond groups of several genes suggests that there are constraints on genome evolution that influence gene arrangement, as the placement of conserved genes significantly differs from a random distribution.

Distribution of mouse orthologs of human disease-related genes
As we found that conserved genes were more likely to have conserved genes as neighbors, we investigated whether any other groups of genes were found preferentially in regions of the genome with conserved microsynteny. One group of genes that are of interest is disease genes, as they are highly relevant to human health. We therefore performed a genome-wide analysis of the mouse The correlation between microsynteny and density of orthologs of disease-related genes on mouse autosomes Figure 1 The correlation between microsynteny and density of orthologs of disease-related genes on mouse autosomes. The relationship between conserved microsynteny (black) and density of mouse genes orthologous to human disease-related genes (blue) is shown for all mouse autosomes. Percentage of genes with conserved microsynteny and those with diseaserelated orthologs are calculated for a 20 Mb sliding window, offset by steps of 5 Mb. At each position the Z-score is plotted at the center of the sliding window. Pearson's correlation coefficient and P-values for the analysis of co-localization of conserved microsyteny and disease gene orthologs are given for each chromosome. The results of a randomization of the disease genes (red) demonstrate that there is no correlation between microsynteny (black) and random assignment of gene status. orthologs of human disease-related genes to assess whether they were found at a greater density in conserved regions of the mouse genome. We identified human genes with a disease-associated mutant allele from the OMIM Morbid Map database, and cross-referenced them to the mouse genome using Ensembl BioMart to identify mouse orthologs. Using the genomic locations of the mouse orthologs of human disease genes from our study on microsyntenty conservation, we determined the proportion of human disease gene orthologs in each 20 Mb sliding window of the mouse genome. The mean percentage of disease-related gene orthologs as compared to the number of total genes in a 20 Mb interval is 7.68%, with a standard deviation of 2.72%. We found variation in the distribution of disease gene orthologs in the mouse genome, with the highest percentage found on chromosome 18, and the lowest on chromosome 7 (Table 1). However, on all other autosomes, 7 -8% of the genes are orthologs of human disease genes.

Disease gene orthlogs are located next to other disease gene orthologs
As we had found that the distribution of genes with conserved microsynteny is non-random, we examined whether that was also true for genes with disease-related orthologs. Using a similar approach, we calculated the number of mouse orthologs of human disease genes with at least one disease gene ortholog as a neighbor. As a control, we randomized which genes were annotated as disease orthologs, keeping the same total number of disease gene orthologs. From 10,000 random trails we found that the mouse orthologs of human disease genes were significantly more likely to have other disease gene orthologs as neighbors ( Figure 2B, P < 0.07). This finding demon- Sequence-based synteny blocks were identified from the Ensembl genome browser. The intervals are listed according to positions in the mouse genome, with the positions of synteny in the other genomes listed following the "=" sign. The specific region of synteny in the other species is listed following the chromosome number and a ':'. Forward to forward alignments are show in normal text, forward to reverse alignments are shown in italics. strates that the distribution of orthologs of human disease genes in the mouse genome is not random.

Correlation between microsynteny conservation and disease gene distribution
We next assessed whether the orthologs of disease genes were located in regions of the mouse genome with increased microsynteny conservation. We detect a correlation between regions of the genome with conserved microsynteny and the distribution of disease gene orthologs over the whole genome ( Figure 3A, Pearson's R = 0.90, P < 1 × 10 -6 ). Such a representation over-estimates the true correlation between the two sets, since gene density varies considerably in different windows. This confounds the analysis as regions with high gene density would be expected to have both high numbers of disease orthologs and high numbers of genes with conserved microsynteny regardless of whether there is an additional correlation between microsynteny conservation and the presence of disease gene orthologs. When corrected for gene density, a significant correlation between microsynteny conservation and disease gene ortholog density is still observed ( Figure 3B, Pearson's R = 0.40, P < 4.0 × 10 -4 ).
The density of disease gene orthologs for each genomic region is shown in Figure 1 (blue lines). Z-scores are displayed to allow direct comparison between microsynteny conservation and disease orthologs. The additional calculation of Z-scores does not change the overall correlation. There is a significant correlation (P = 0.05) between microsynteny conservation and the density of disease gene orthologs for 12 of the 19 mouse autosomes. Thus, genomic regions with a high percentage of genes with conserved microsynteny also have a high percentage of disease gene orthologs. The chromosome with the best correlation between conserved microsynteny and density of disease gene orthologs is mouse chromosome 13, while the chromosome with the worst correlation is mouse chromosome 10.
To demonstrate that this correlation was not an artifact of our analysis, we randomized the annotation of disease genes. We assigned alternate genes as orthologs of human disease genes, keeping the total number of disease genes per chromosome the same as the first analysis. We then recalculated the percentage of alternate disease genes as compared to total genes in each sliding window throughout the genome, and the average and standard deviation for each sliding window. We plotted the Z-scores for each window containing these alternate disease orthologs, and compared them to the Z-scores for microsynteny conservation (Figure 1 red lines). When the chromosomal positions of orthologs of disease genes are changed to random locations, the correlation with microsynteny disappears (Pearson's R = 0.02, P < 0.58). As an additional control, we also randomized disease genes while retaining the same number of observed disease gene pairs for each chromosome. Again, we found no correlation (Pearson's R = 0.004, P < 0.93). Should the correlation between observed disease gene ortholog distribution and microsynteny conservation be an artifact of our methodology, we would also expect the randomized annotations to be correlated. This is not the case, demonstrating that the link between microsynteny correlation and density of disease gene orthologs does not arise from an artifact of the methodology.

Robustness to changes in window size
To determine if the correlation we observed between microsynteny conservation and disease gene ortholog Conserved genes and disease genes are not randomly distrib-uted throughout the mouse genome density was affected by the window size used in our analysis, we repeated our assays using additional window sizes. We chose to analyze window sizes of 10 Mb, 5 Mb, 2 Mb, and 1 Mb, with a stagger of one-quarter of the window size. We found that there was also a significant correlation between regions with conserved microsynteny and a high density of disease gene orthologs for window sizes of 10 Mb, 5 Mb, 2 Mb, and 1 Mb (all p < 1 × 10 -10 , Table   3, see Additional file 2). A repeat of our randomization test shows that this correlation is not significant when genes are randomly annotated for window sizes of 10 Mb, 5 Mb, and 2 Mb. However, with the small window sizes of 2 mb and 1 Mb, many genomic windows do not contain any annotated genes (Table 4), so these windows artificially show a correlation between microsynteny conservation and disease gene density, because 0 genes of either There is a significant correlation between microsynteny and density of disease-gene orthologs over the mouse genome as a whole. Figure 3 There is a significant correlation between microsynteny and density of disease-gene orthologs over the mouse genome as a whole. Panel A: The number of conserved genes plotted against the number of disease genes for 20 Mb sliding windows of the mouse genome. Note the fit with the regression line (Pearson's R = 0.90, P < 1 × 10 -6 ). Panel B: The relationship between the proportion of genes with conserved microsynteny (number of conserved genes per window/total genes per window) and the proportion of genes with disease orthologs (number of disease-related genes/total genes per window). The correlation for the whole mouse genome is significant (Pearson's R = 0.40, P < 4.0 × 10 -4 ).   class are found in windows lacking any annotated genes. When we remove all windows with no genes from our analysis, the correlation between microsynteny conservation and disease gene density at 2 Mb and 1 Mb improves, while the randomization trial correlation loses significance (Table 3).

Discussion
We have examined the relationship between microsynteny conservation and the density of orthologs of human disease genes in the mouse genome. We found a correlation between regions of conserved microsynteny and the location of mouse orthologs of human disease genes, which is consistent for variations in the window size used in our analyses. The correlation we observe suggests that regions of the mouse genome with a high density of disease gene orthologs undergo less rearrangement than regions of the genome with fewer disease gene orthologs. Genes associated with human disease are often orthologous to essential genes in other organisms [42]. Previous studies from both mammals [29] and other eukaryotes [24,26] have shown that essential genes are located in highly conserved genomic regions. Thus, disease-related genes, which perform essential functions, are more likely to be found in conserved regions of the genome.
Several studies have found that at the sequence level, human disease genes are more conserved than non-disease genes [26,43,44]. The sequence conservation of human disease genes with essential C. elegans orthologs is higher than those disease genes whose orthologs are not lethal when mutated [44]. Interestingly, genes with high polymorphism among humans, but no divergence between humans and chimpanzees, are highly associated with Mendelian disease [45]. Similarly, human disease genes with weak negative selection, where mutant alleles persist in the population, are more likely to cause diseases with Mendelian inheritance [46]. Mendelian disease genes are more constrained evolutionarily than disease genes with non-Mendelian inheritance patterns [45]. Together, these observations support our finding that the mouse orthologs of human disease genes are preferentially found in genomic regions with high microsynteny conservation.
Recombination may be mutagenic due to the possibility of unequal crossing-over. Thus faulty recombination events in regions with essential genes are likely to be deleterious to the survival of the organism and may thus be selected against during mammalian evolution. Studies of the human genome support this link between low recombination and essential genes. Regions of the human genome with high linkage disequilibrium, and thus low recombination, are enriched for genes associated with essential cellular functions such as response to DNA damage, cell cycle progression, or DNA and RNA metabolism [47]. Genes that show variation in populations, such as immune response genes, are often found in regions with low linkage disequilibrium, suggesting that recombination in these regions is not deleterious to the organism [47]. Likewise, human genes found in mutation cold spots tend to be genes involved in essential cellular processes, while those in mutation hot spots include immune response genes [48]. These findings extend to non-coding sequences as well, as human genomic regions that are highly conserved with the pufferfish have been found to contain enhancers for developmental genes [49].
The correlation between disease gene density and microsynteny conservation, although significant, is not perfect. Discrepancies may come from several sources. For example, annotation of human disease genes is incomplete. Many housekeeping genes, which are likely to be essential for mammalian development, are not annotated as human disease genes, probably because mutations in these genes are lethal early in development, and thus humans with mutations are not viable [50]. The genomic region between 55 -75 Mb on mouse chromosome 3 The number of windows with no annotations of genes, conserved genes, or disease orthologs is shown. Note that the bottom row shows the number of windows lacking both conserved and disease genes, which are a subset of the number of windows with either no disease genes or no conserved genes.
shows high conservation but a low density of disease gene orthologs. However, the genes Wwtr1 and Shox2 are located in this genomic region. A mouse knock-out of Wwtr1 displays a phenotype resembling human polycystic kidney disease [51], and the mouse knock-out of Shox2 is lethal with cleft palate [52], strongly suggesting that these genes are linked to human disease, although neither is annotated as a disease gene in OMIM.
Likewise, many genes that are annotated as human disease genes may not be strictly essential for survival, and thus these genes are not expected to have conserved microsynteny. The genomic region between 85 -105 Mb on mouse chromosome 12 has a high density of disease gene orthologs but low conservation. Mutations in the human gene SERPINA10, whose ortholog is located in this region, are associated with susceptibility to deep vein thromboses [53]. Although SERPINA10 is annotated in OMIM as a disease gene, it is unlikely that inherited mutations in SERPINA10 present a challenge to survival of the individual, suggesting that SERPINA10 does not represent an essential gene. Finally, many diseases, especially cancers, are caused by translocation events that produce chimeric proteins. While a genomic region may have a great density of disease loci due to translocations, these regions would not show microsynteny conservation, as they are high in rearrangements.
Discrepancies between microsynteny conservation and the density of disease-related gene orthologs may also arise because other factors contribute to selective pressure on genome evolution. For example, previous studies have suggested that mammalian genes are clustered into groups based on co-expression [54,55]. It is proposed that gene expression is therefore an evolutionary constraint on genome organization, although the effect is weak as gene clusters are found only slightly more often than by chance [55]. There is also evidence that many over-lapping gene pairs exist in mammalian genomes, and that these gene pairs are conserved in multiple species, probably because recombination or mutation in these regions of the genome would cause deleterious mutations in both genes [56]. Alternative mechanisms for the presence of essential genes constraining genome structure have also been proposed [24].

Conclusion
We have demonstrated the non-random distribution both of genes with conserved microsynteny and genes with disease orthologs. This observation suggests that there are constraints on genome organization in the mouse. Moreover, we have demonstrated that there is a correlation between mammalian genome architecture and gene function. It is likely that this correlation arises from gene function constraining genome organization, resulting in essential disease genes being located in regions of the mammalian genome with high conservation. The identification of a correlation between microsynteny conservation and density of disease gene orthologs suggests that additional experimental analysis of mouse genes in highly conserved genomic regions will produce new mammalian disease models by creating mutations in the orthologs of human disease genes. Once conserved microsynteny had been defined for each gene the percentage of genes with microsynteny was determined for 20 Mb intervals of the mouse genome, staggered by a 5 Mb sliding window. To define the percentage of microsynteny for a given window, the number of genes with conserved synteny was divided by the total number of genes in that window. The position of each gene was chosen as the start site listed in Ensembl. For genes with multiple transcripts, the position of the start site of the longest transcript was used. The Shapiro-Wilk test was applied to non-overlapping 20 Mb regions of the mouse genome to determine whether the distribution gene microsynteny was normal. Z-scores were calculated for each region of the mouse genome using the equation:

Microsynteny conservation
Where x is the proportion of genes with conserved synteny within each window, μ is the mean proportion of genes with conserved synteny in all windows and s is the standard deviation of genes with conserved synteny in all windows. Thus the Z-score represents the number of standard deviations above or below the mean.

Comparison with sequence-based synteny blocks
Each highly conserved mouse genomic region was compared to synteny maps based on sequence alignment. Maps of the mouse compared to dog, rat, and human were retrieved from the Ensembl genome browser (Comparative Genomics -Synteny, http://www.ensembl.org/ Mus_musculus/Location/Synteny) database. Genomic positions of synteny blocks were retrieved from the database and tabulated.

Statistical analysis of conserved and disease gene pairs
Genes annotated as having conserved synteny or as disease orthologs were examined to determine whether the following gene on the chromosome was similarly annotated. The number of such gene pairs was determined per chromosome. By this definition a run of three genes with similar annotation would be counted as two pairs.
To determine the significance of the frequency of such gene pairs, we compared the observed value to a random distribution. The random distribution was calculated by keeping the both the total number of genes and the number of genes with a given annotation constant, but randomizing the assignment of those annotations over the total number of genes. The annotations were randomized 10,000 times, and the number of neighbors with similar annotations calculated. The significance of the observed value was determined from this simulation i.e. the likelihood of the observed value is the proportion of random values that are greater than or equal to the observed value.

Disease-related ortholog distribution
Human genes with a disease-associated mutant allele were identified from the OMIM Morbid Map database http:// www.ncbi.nlm.nih.gov/Omim/getmorbid.cgi. These were cross-referenced to the mouse genome using Ensembl BioMart homology filters to identify mouse orthologs. The distribution of disease-related gene orthologs was analyzed by sliding window analysis, in the same manner as for conserved synteny. For each 20 Mb window the number of disease gene orthologs was divided by the total number of genes in a window to determine the proportion of disease gene orthologs in the window. Z-scores were then calculated for each window using the equation above.

Correlation analysis
The Pearson's correlation between the microsynteny conservation Z-scores and disease-related ortholog Z-scores was calculated, and significance calculated using the chisquared test. As a control, we assigned new disease genes at random for each chromosome, keeping the total number of disease genes per chromosome the same. We then recalculated the density of disease genes in each sliding window throughout the genome. We plotted the Zscores for each window containing these alternate disease orthologs, and compared them to the Z-scores for microsynteny conservation. The Pearson's correlation was then re-calculated, showing no significance for the alternate set of disease genes. Correlation analysis was also performed on window sizes of 10 Mb, staggered by 2.5 Mb, 5 Mb, staggered by 1.25 Mb, 2 Mb, staggered by 0.5 Mb, and 1 Mb, staggered by 0.25 Mb. Randomization trials were performed on 10,000 random annotations for each alternate window size as a control. To account for windows with no genes at the window sizes of 2 Mb and 1 Mb, the correlation analysis was repeated with all windows containing 0 genes removed from the actual and randomized datasets.