The Ifi200 gene cluster developed as a consequence of gene duplications and rearrangements resulting in a divergence in the number of genes between various inbred strains of mice and in repetitive sequences even in coding regions between the different gene members. In order to clarify the genomic alteration responsible for the Ifi202b deficiency in the B6 mouse we used the PacBio system, single-molecule real-time (SMRT) sequencing approach, for de novo assembling of the critical region in the NZO strain.
For the screening of the NZO BAC clones containing the relevant Ifi202b upstream sequence a probe matching a unique Ifi202b sequence was used. Additionally a probe specific for the Olfr432 gene was chosen to define the distal border of the region of interest; in contrast to the genomic Ifi200 region the Olfr432 gene represents a unique sequence within the mouse genome. In total, sequencing of the NZO BAC clones mapped 17,802 PacBio RS reads with a mean read length of 14,357 kb (maximal read length 30,378 kb) and a mean read quality of 0.865. De novo assembly of the reads resulted in 4 contigs. However, two of them were not considered for further analysis (unitig2: 35 kb, mean coverage 24 and unitig3: 38 kb, mean coverage 26) due to poor sequence quality. With the two remaining contigs (unitig1: 36.5 kb, mean coverage 365 and unitig0: 300 kb, mean coverage 603; Fig. 1a and b) it was possible to assemble a region covering 6 genes that belongs to the Ifi200 gene family and the olfactory receptor Olfr433 as the distal boundary (Fig. 2b, upper panel). As described earlier the NZO strain carries two copies of the Ifi202b gene which differ in only 8 bp within the coding region, respectively 7 amino acids [6]. In addition, sequence analysis of the BAC identified two copies of other family members; Ifi205 and Ifi203. Interestingly, by comparing the assembled NZO sequence with the B6 reference genome we identified a 261,797 bp deletion affecting the Ifi200 locus in respect to gene duplications.
With a second-generation sequencing (SGS) approach it would have been impossible to solve the organization of the Ifi200 cluster in NZO as sequences are mapped to the B6 reference genome and gaps within the reference genome will result in an incorrect alignment [8]. While the SGS approach is efficient for accurately identifying SNPs in the genome, it does not enable a thorough characterization of structural variations such as insertions and deletions [9,10,11]. The short sequence read data has complicated the assembly of repetitive structures leading to the translation into gaps, missing data and more incomplete assembly [12,13,14]. In contrast, the main advantage of TGS is the long read nature, which was reported to be as long as 3,000 bp on average, and some reads are supposed to be 20,000 bp or even longer. The long read length provides an important benefit for de novo assemblies, it allows the discovery of large structural variants, and it provides accurate microsatellite lengths, detection of sensitive SNPs, and haplotype blocks [8,16,17,, 15–18]. TGS has successfully been used for de novo assembling of hundreds of microbial genomes and reconstruction of plant and animal genomes [18,19,20,21,22,23]. It has also been applied to resequencing analysis, to create detailed maps of structural variations and phasing variants across large regions of human chromosomes [23,24,25].
The evolutionary analysis revealed a remarkable plasticity in the mammalian Ifi200 genes, suggesting the existence of strong evolutionary pressures that have shaped the Ifi200 sequences and functions throughout the mammalian lineage [26]. Here, we report the identification of structural variations within the Ifi200 (PYHIN) gene cluster in the obese NZO strain. Cridland and colleagues presented a map comparing the human, C57BL/6 mouse, and rat Ifi200 gene loci. The mouse contains at least 14 mouse Ifi200 genes, whereas the human and rat genome expresses only 4, respectively 5 [5]. It was already published that the Ifi200 gene locus is divergent between various mouse strains as the number of genes present at the locus and the sequence is different [5, 6]. The number of predicted mouse genes has increased with each new update of the mouse genome database and in the current study with de novo assembling of the PacBio sequencing reads we can strengthen and expand this assumption to the obese NZO strain [5]. The NZO strain carries two copies of Ifi202b (Ifi202a and b) which was also found in the 129X1/SvJ mouse genome in addition to a pseudogene (Ifi202c), whereas only one truncated copy is present in C57BL/6 that is not expressed in metabolically relevant tissues [6, 27, 28]. Another family member, Ifi203, showed two extra copies in NZO in comparison to B6. Also the Ifi205 gene was duplicated as two regions, spanning the coding sequence of the gene, could be mapped in the NZO BAC clones (Fig. 2b). To further verify the sequencing results we performed a comparative genomic hybridization assay (CGH) of genomic DNA obtained from the B6 and the NZO strain to detect copy number variations (CNVs) within the cluster. This analysis further supports that the NZO strain carries at least two copies of the genes Ifi202b, Ifi203, and Ifi205 (Fig. 3). Other studies also show the presence of gene duplications. She and colleagues (2008) assessed CNVs between the B6 strain and 15 mouse strains (including NZO) which were used for genetic association studies, sequencing, and the Mouse Phenome Project [29]. The analysis also showed a duplication of the Ifi203 gene. Similar results were detected for Ifi205 in the study by Cahan et al., 2009 where CNVs in 17 mouse strains were analyzed [30]. In conclusion, de novo assembling of the NZO BAC clone reads and the analysis of CNVs revealed structural variations between different inbred strains of mice within a complex region on chr. 1 caused by duplications and genomic alterations.
It is also documented that the corresponding region in humans is affected by genomic alterations. According to the 1000 Genomes project several deletions, CNV , and duplications can be mapped within this locus [31]. Cagliani and colleagues performed an evolutionary analysis of the human family members (MNDA, PYHIN1, IFI16, and AIM2) by analyzing inter- and intraspecies diversity and revealed that the genes have been repeatedly targeted by natural selection. Especially the IFI16 gene region shows a high nucleotide diversity in human populations and indicates that the region has been a target of long-standing balancing selection [32].
The main goal of the current study was to analyze the chromosomal alterations leading to the Ifi202b deficiency in the B6 strain. With the BAC sequencing we identified a deletion spanning approximately 261.8 kb within the B6 genome, a sequence present in NZO. The deletion includes different copies of Ifi200-family members, Ifi203, Ifi205, and exon 1 of Ifi202b (Fig. 2b). In our previous study we identified an alternative first exon in the B6 reference genome (Vogel et al., 2012). With the current study we are finally able to define the exact chromosomal region deleted in B6 and we can explain how this alternative exon 1 - which is an intronic sequence in NZO - is spliced to exon 2 of Ifi202b in the B6 genome (Fig. 2b, lower panel). The fact that B6 do not express Ifi202b in the same tissues (e.g. adipose tissue, liver, and skeletal muscle) as NZO indicates that in addition to the first exon also the promotor or at least part of it was deleted as well.
It is also reasonable to assume that the deleted region in B6 contains enhancer motifs/long-range control elements that drive and regulate the expression of other genes. In a previous study we reported that the genes Lefty1, Pcp4l1, and Apoa2, located in the same diabesity susceptibility locus as Ifi202b (Nob3), are exclusively present in islets of the diabetes-resistant B6 strain in contrast to the diabetes-prone NZO mouse. The identified genes are furthermore involved in the adaptive islet hyperplasia and prevention from severe diabetes in B6-ob/ob mice [33]. With the hereby reported data we hypothesize that the genomic alterations within the cluster may also include enhancer elements that carry the potential to regulate the expression of Lefty1, Pcp4l1, and Apoa2. By using the Nsite program, a computer tool to search for regulatory elements (REs), we found 5 predictive enhancer motifs that are located within the deleted sequence in the B6 genome which can potently be responsible for the described expression differences. A number of longe-range regulatory disruptions affecting the expression of genes have already been described [34, 35]. One of the oldest examples of a human gene in which long-range regulations has been implicated and studied is SOX9, a gene responsible for autosomal sex reversal and Campomelic Dysplasia (CD). All rearrangements including deletions are found from 50 kb to 950 kb upstream of SOX9 suggesting that a similar mechanism could also account for the expression differences between the diabetes-prone NZO and diabetes-resistant B6 strain of genes located within the Nob3 locus [34, 35].
Finally, to elucidate whether the genomic alteration on chr. 1 is also associated with metabolic alterations we generated and characterized congenic mice carrying 14.2 Mbp (163.5-177.7 Mbp) of the NZO genome (Nob3.14
N/N), including the Ifi200 gene cluster, on B6 background. On HFD, homozygous NZO allele carriers developed a higher body weight and fat mass (Fig. 4a and b), in particular gonadal white adipose tissue (gonWAT, Fig. 4c), than the corresponding controls (Nob3.14
B/B). Histological analysis of the gonWAT demonstrated that the adipocytes were larger in the Nob3.14
N/N group than those of Nob3.14
B/B mice (Fig. 4d). As these data points towards a role of the cluster in adipose tissue biology we tested the expression of proteins involved in adipocyte differentiation and lipolysis. Western blot analysis indicated an increased expression of the adipogenic marker PPARy (Peroxisome proliferator-activated receptor gamma) and a decreased activation of the lipolytic enzyme HSL (Hormone sensitive lipase) in gonWAT of NZO allele carriers in comparison to controls (Fig. 4e and f). As obesity and hypertrophy of adipose tissue are also known to impair insulin sensitivity and glucose tolerance, we measured the glucose levels of the congenic lines. Blood glucose levels were measured randomly and started to differ at the age of 20 weeks between the two groups with higher concentrations in NZO allele carriers (Fig. 5a). Glucose clearance during oral glucose tolerance tests was not different between the two genotypes (Fig. 5b). However, the Nob3.14
N/N mice required higher levels of insulin than Nob3.14
B/B mice to clear blood glucose, pointing towards an insulin resistance (Fig. 5c) which is also indicated by calculating the HOMA-IR (Fig. 5d). In conclusion, introducing the genomic region of the Ifi200 gene cluster of the NZO genome into the B6 genome results in the development of obesity and is associated with insulin resistance which demonstrates the functional consequences of the alteration on chr.1.
In different reports it was already published that rare GSVs are associated with obesity [36]. A rare (0.7%), 593 kb deletion on chromosome 16p11.2 (at 29.5–30.1 Mbp) was shown to be significantly (p = 6.4 × 10−8) enriched in obese patients compared to controls, whereas a duplication of the same locus has the opposite effect, being associated with underweight [1, 37, 38]. Another study by Wang et al. [39] also showed large and rare CNVs that are associated with a higher risk to develop obesity. They reported several CNVs that affect known candidate genes for obesity, such as a 3.3-Mbp deletion disrupting NAP1L5 and a 2.1-Mbp deletion disrupting UCP1 and IL15. One prominent example for chromosomal syndromes with obesity is the Prader-Willi syndrome (PWS) in which a 5–7 Mb deletion of the paternally inherited chromosomal 15q11.2-q13 region is responsible for a neurobehavioral disorder manifested by infantile hypotonia and feeding difficulties in infancy, followed by morbid obesity secondary to hyperphagia [40].