DNA damage repair system in C57BL/6 J mice is evolutionarily stable

Background DNA damage repair (DDR) system is vital in maintaining genome stability and survival. DDR consists of over 160 genes in 7 different pathways to repair specific type of DNA damage caused by external and internal damaging factors. The functional importance of DDR system implies that evolution could play important roles in maintaining its functional intactness to perform its function. Indeed, it has been observed that positive selection is present in BRCA1 and BRCA2 (BRCA), which are key genes in homologous recombination pathway of DDR system, in the humans and its close relatives of chimpanzee and bonobos. Efforts have been made to investigate whether the same selection could exist for BRCA in other mammals but found no evidence so far. However, as most of the studies in non-human mammals analyzed only a single or few individuals in the studied species, the observation may not reflect the true status in the given species. Furthermore, few studies have studied evolution selection in other DDR genes except BRCA. In current study, we used laboratory mouse C57BL/6 J as a model to address evolution selection on DDR genes in non-primate mammals by dynamically monitoring genetic variation across 30 generations in C57BL/6 J. Results Using exome sequencing, we collected coding sequences of 169 DDR genes from 44 C57BL/6 J individual genomes in 2018. We compared the coding sequences with the mouse reference genome sequences derived from 1998 C57BL/6 J DNA, and with the mouse Eve6B reference genome sequences derived from 2003 C57BL/6 J DNA, covering 30 generations of C57BL/6 J from 1998 to 2018. We didn’t identify meaningful coding variation in either Brca1 or Brca2, or in 167 other DDR genes across the 30 generations. In the meantime, we did identify 812 coding variants in 116 non-DNA damage repair genes during the same period, which served as a quality control to validate the reliability of our analytic pipeline and the negative results in DDR genes. Conclusions DDR genes in laboratory mouse strain C57BL/6 J were not under positive selection across its 30-generation period, highlighting the possibility that DDR system in rodents could be evolutionarily stable. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07983-7.


Background
A genome is constantly damaged by internal metabolic factors and external environmental factors. In order to maintain genome stability, living organisms are equipped with a highly sophisticated DNA damage repair (DDR) system to effectively repair the damages. The DDR system is composed of multiple pathways including homologous recombination (HR), non-homologous end joining (NHEJ), Fanconi anemia pathway (FA), base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR), and single-strand annealing (SSA). Each pathway consists of a group of genes to repair a specific type of DNA damage through their collaborative action.
As DNA damage repair is vital for survival, it would be expected that evolution selection play roles in maintaining a highly functional DNA damage repair machinery for survival and better fitness. BRCA1 and BRCA2 (BRCA) are two important DDR genes for repairing DNA double-strand break through homology recombination (HR) pathway and mutation in BRCA substantially increases cancer risk [1,2]. Studies indeed revealed that BRCA in the humans and its close relatives of chimpanzee and bonobos are under positive selection [3]. However, the same type of selection was not observed in other mammals [4][5][6][7][8][9]. This raises the possibility that the same DNA damage repair genes in different species could be under different evolution selections [10]. Except a few cases, however, nearly all BRCA variation data reported from non-human mammals were derived from a single individual in the tested species. From population genetics point of view, it is questionable if the observation made in a single individual could represent the situation in the tested species. Further, few other DDR genes except BRCA have ever been analyzed for their evolution selection (Table 1). Therefore, it remains unclear for the relationship between DDR system and evolution selection, a fundamental question in biology for the mechanisms of genome stability maintenance.
Dynamic monitoring of genetic variation is a powerful approach to study evolution selection. This is best exemplified by the variation studies in E. coli by following its constant growth for four decades of over 60,000 generations under laboratory cultural conditions [15], and in laboratory rat by following its genetic variation in the genes involving in learning, circadian rhythm, and metabolism [16]. C57BL/6 J is one of the most used laboratory mouse models in biological and oncogenic studies. C57BL/6 J is the descendent of cryopreserved embryo stock with clear genetic background (Fig. 1). Its DNA extracted in 1998 was used for the Mouse Genome Project to generate the mouse genome reference sequences [18], and its DNA extracted in 2003 was sequenced again to generate the mouse genome reference sequences B6Eve [17]. From 1998 and 2018, C57BL/6 J has passed 30 generations. We hypothesized that this period can be longer enough as an excellent model to test evolution selection in DDR system in C57BL/6 J, and the information could be helpful to understand evolution selection on DDR system in rodents as represented by C57BL/6 J.
In present study, we sequenced the coding region of C57BL/6 J genome using the DNA collected from 44 C57BL/6 J individuals in 2018. We searched the variants arisen after 1998 by comparing the mouse genome reference sequences derived from 1998 C57BL/6 J DNA and mouse genome reference sequences B6Eve derived from 2003 C57BL/6 J DNA. We found no evidence for genetic variation arisen in the 169 DDR genes including Brca1 and Brca2 during this period, while we did identify the genetic variation in 116 non-DDR genes involved in other functional categories. From the data, we conclude that DDR system in C57BL/6 J is evolutionarily stable during its 30-generation period.

Results
Identifying genetic variants C57BL/6 J genome in 1998 was sequenced by the Mouse Genome Project to generate the mouse genome reference sequences. Since then, C57BL/6 J mice has been inbreeded for 30 generations (24 in Jackson Laboratory and 4 in University of Macau Animal Facility) by 2018 (14, Fig. 1). We collected genomic DNA in 2018 from 44 C57BL/6 J mice and performed exome sequencing and called coding variants. We applied the following procedures to ensure the accuracy for the variants called from the exome sequences: 1) Only the variants present in > 50% (22 individuals) of the mice were kept for further analysis; 2) Using both mouse genome reference sequences mm7 and mm10 assemblies as the references for variant calling; 3) use B6Eve variants as the third reference; 4) Using Sanger sequencing to validate the called variants. From the exome sequences collected in the 2018 C57BL/6 J DNA, we identified a total of 3024 variants (Supplementary Table 1), of which 883 (29.2%) were singleton, 1329 (43.9%) were between 2 and 21, and 812 (26.9%) were present in at least 22 mice and used for further analysis (Supplementary Table 2). We reasoned that by setting up this high bar, we can address better population variation rather than individual variation.

Variants in DDR genes
We searched the 812 variants but didn't identify the variants in Brca1 and Brca2. We further searched the variants in the rest of 167 DDR genes involved in 7 DNA damage repair pathways but didn't identify any variants in these genes neither (Supplementary Table 3A, B).

Variants in non-DDR genes
We then annotated the 812 variants and identified 116 non-DDR genes with these variants, of which Mroh2a, a HEAT-domain-containing protein with unknown function, had the highest number of 85 variants, and c4b, a component in Complementary system, had the 2nd highest number of 53 variants (Table 3, Supplementary  Table 4). We used Sanger sequencing to validate a set of the variants in the original 2018 DNA samples used in exome sequencing. Of the 15 variants tested, 10 (67%) were validated (Supplementary Table 5). The variants identified in the non-DDR genes provided the internal control in ensuring that the absence of variation in DDR genes were a true biological phenomenon instead of missed identification due possibly to technical errors.    The presence of new variants in over a hundred of non-DDR genes during the same period provided a strong assurance for the reliability of the observed lack of selection in DDR genes, and ruled out the possibility that the lack of variation in the full set of DDR genes was due to technical failure. The data from our study  indicate the absence of positive selection in DDR genes in C57BL/6 J during the 30-generation period. The lack of positive selection in DDR genes is unlikely due to the short period of C57BL/6 J under investigation. The 20-years of 30 generations in C57BL/6 J is equivalent to 800 years in the humans when counting 1 year in mouse equals to 30-years in the humans per generation [19]. Studies showed that many BRCA variations in the humans occurred in recent human history. For example, 185delAG in BRCA1, a founder variant in Ashkenazi Jews population, was arisen around 750-1500 years ago [20]; 1499insA in BRCA1, a founder variant in Tuscany of Italy, was originated 750 years ago [21]; BRCA1 c.5266dupC, another founder variant in Ashkenazi Jews population, was originated 1800 year ago [22].
Possibility exists that animal under long-term protected laboratory environment could experience relaxed selection pressure, leading to altered genetic variation [23]. If the time period is longer enough and the starting genome sequences are available, testing genetic variation in wild mice would determine if such possibility could exist for the observation made in C57BL/6 J in our study.
The reference genome sequences used can have impact on the variation identification. After mouse genome project accomplished in 2001, 10 different versions of C57BL/6 J genome reference sequences were generated, including the first version of mm1 released in 2010 to mm10 released in 2011, before the mm39 released in 2020 (https://genome.ucsc.edu/FAQ/FAQreleases.html). The different versions of the mouse genome reference sequences used basically the same raw sequence data generated by the mouse genome project, but the variation data between different version were substantially different, which unlikely reflects true variation but annotation artifacts. As such, using all different versions as the reference for variant identification could lead to high complexity and data inconsistence, and decrease reliability of the resulting variation data. On the other hand, using a single version of reference sequences for variant identification could miss potential variants not identifiable in the single version. To address the concerns, we used two later versions of mouse genome reference sequences, mm7 and mm10, as the references for variant identification; we also used the variation data from Eve B6 genome sequences derived from 2003 C57BL/6 J DNA as another reference; we further used Sanger sequencing to validate selected variants. The combinational use of these approaches in our study ensured reliability and sensibility of the variants identified from our study to address the issue of evolution selection in DDR system in C57BL/6 J. The evidence for the presence of positive selection in DDR genes is mainly from BRCA in human, chimpanzee and bonobos [3]. We propose explanations for why positive selection in BRCA exists in humans and its close relatives, but not in other mammals as represented in laboratory mouse C57BL/6 J: The basic function of BRCA is to repair DNA double-strand break in order to maintain genome stability in mammals. Like many genes involving in essential biological function, BRCA must be maintained in stable condition to perform their essential work [24]. During evolution process, however, BRCA in humans, chimpanzee and bonobos acquired new function such as enhancing intelligent development [25], gene expression regulation [26], and reproduction [27] etc. Positive selection on these function is beneficial for better fitness; whereas BRCA in other mammals retains the classical function of DNA damage repair, therefore, maintains high stability in order to keep genome stability. The explanations may also be applicable to other  DDR genes. It will be interesting to find more evidence to support these explanations in different mouse strains and different species.

Conclusion
DDR genes in laboratory mouse strain C57BL/6 J were not under positive selection across its 30-generation period, highlighting the possibility that DDR system in rodents could be evolutionarily stable.