Phylogenetic and DNA methylation analysis reveal novel regions of variable methylation in the mouse IAP class of transposons

Background Select retrotransposons in the long terminal repeat (LTR) class exhibit interindividual variation in DNA methylation that is altered by developmental environmental exposures. Yet, neither the full extent of variability at these “metastable epialleles,” nor the phylogenetic relationship underlying variable elements is well understood. The murine metastable epialleles, Avy and CabpIAP, result from independent insertions of an intracisternal A particle (IAP) mobile element, and exhibit remarkably similar sequence identity (98.5%). Results Utilizing the C57BL/6 genome we identified 10802 IAP LTRs overall and a subset of 1388 in a family that includes Avy and CabpIAP. Phylogenetic analysis revealed two duplication and divergence events subdividing this family into three clades. To characterize interindividual variation across clades, liver DNA from 17 isogenic mice was subjected to combined bisulfite and restriction analysis (CoBRA) for 21 separate LTR transposons (7 per clade). The lowest and highest mean methylation values were 59% and 88% respectively, while methylation levels at individual LTRs varied widely, ranging from 9% to 34%. The clade with the most conserved elements had significantly higher mean methylation across LTRs than either of the two diverged clades (p = 0.040 and p = 0.017). Within each mouse, average methylation across all LTRs was not significantly different (71%-74%, p > 0.99). Conclusions Combined phylogenetic and DNA methylation analysis allows for the identification of novel regions of variable methylation. This approach increases the number of known metastable epialleles in the mouse, which can serve as biomarkers for environmental modifications to the epigenome.


Background
Across mammalian genomes, the retrotransposon class of repeat elements make up nearly half the nuclear DNA, with long terminal repeats (LTRs) alone accounting for 8-10% [1]. Many retrotransposons retain the ability to move (transpose) to new locations in the genome, a potentially deleterious phenomenon that can result in direct disruption to coding regions [2,3]. Therefore, the suppression of retrotransposons is vital to organismal survival, particularly through sensitive life stages, and is primarily accomplished via epigenetic mechanisms, including DNA methylation [4,5]. Epigenetic suppression is typically established early in development at the blastocyst stage and maintained during differentiation of the primordial germ cells [6]. Importantly, in certain cases suppression is incomplete, leading to adverse consequences [7]. Alternatively, this incomplete suppression may be an evolutionarily selected method of fine-tuning gene expression [8]. Genome-wide, little is known about the extent to which interindividual variation of DNA methylation at these elements is affected by environmental factors and/or their phylogenetic lineage [9].
Intracisternal A particles (IAP) are a class of murine retrotransposons named for the appearance of budding doughnut shaped particles in the cisternae of the endoplasmic reticulum [10]. IAPs are members of the endogenous retrovirus family (ERV) class II [11]. Structurally, a full length IAP consists of gag, pro, and pol genes capped at each end by a direct pair of LTRs, each approximately 350 bp in length [12]. However, the largest majority of IAP transcripts issue from truncated 5.4 kb copies of IAP of subtype IΔ1 rather than full length 7.2 kb copies [13]. Previous work has identified only a subset of elements likely to be active, due to genomic position, sequence mutation, or methylation status [13]. The number of paralogous IAP LTRs in the genome has been uncertain with estimates ranging from less than 1000 to over 9000 with significant polymorphism among strains [10,14,15]. This family of ERVs is highly active and rapidly transposing, with estimates of 60% of elements being strain specific [15]. It is advantageous for these mobile elements to be kept under tight control by heavy DNA methylation and tight chromatin structure.
A small number of "metastable epialleles" have been identified in mice (e.g. A vy , Cabp IAP , and Axin Fu ) in which variable methylation of the 50 LTR of an antisense IAP insertion results in dysregulation of gene expression concomitant with the level of methylation [16][17][18][19][20]. Metastable epialleles show variable expression among individuals with epigenetic profiles that are consistent across tissues and are thus likely to have been set prior to germ line differentiation [21][22][23]. Further, the distribution of variable expressivity has been shifted at these metastable epialleles following maternal exposure to nutritional and environmental factors [21,22,24,25]. To date, no studies have determined the genetic relationship underlying the ability of these IAP LTRs to be variably methylated. Two of the epialleles, A vy and Axin Fu , were identified due to dramatic phenotypes induced by their insertion into nearby proteincoding genes [18,26]. Conversely, Cabp IAP was identified by a search of Genbank cDNA databases for chimeric sequence containing IAP LTR and genic sequence [20].
To systematically investigate the relationship of genetics and metastability we applied phylogenetic methods to identify candidate IAP retrotransposons and validate their variable methylation status. Through this approach, we find the phylogeny of metastable loci sheds light on the evolutionary basis of metastability. Additionally, the validated analysis of epigenetic state across individual isogenic mice now increases the number of known metastable epialleles, which can serve as a test panel for environmental modifications to the epigenome.

IAP distribution in the genome
Initial pairwise alignment of the two most well-studied metastable epialleles, A vy and Cabp IAP revealed that the LTRs of these two insertions exhibit 98.5% sequence identity over the length of shared sequence ( Figure 1). RepeatMasker was used to scan the C57BL/6 (mm9) mouse genome and identified 10802 IAP LTR elements. The majority of these LTRs were pairs found on the 5' and 3' end of the same IAP insertion, however a fraction were present as solo LTRs. Since the IAPs underlying both A vy and Cabp IAP are of the class IΔ1, type IAPLTR1_Mm as identified in RepBase, this subtype was used to filter the complete list of transposons, resulting in a total list of 1388 IAPLTR1_Mm elements after removing 65 unmapped, short, or non-alignable elements. There was no bias in the orientation of insertion as 691 sense and 707 anti-sense IAPLTR1_Mm elements were identified throughout the genome.

Phylogenetic similarity of known metastable IAPs
A neighbor-joining tree of these 1388 elements revealed three distinct clades with A vy and Cabp IAP clustering within clade 1 (red), more closely than 99% of all other elements ( Figure 2). The largest cluster, clade 3 (black), contains the most conserved elements, and consists of 1130 sequences with an average sequence divergence of only 1.8% from the consensus. Clade 1 (red) and clade 2 (green) contain 147 and 113 elements each, and are more divergent, with 9.1% and 12.9% average differences from the consensus respectively. Bias in location of these insertions was not identified, with members of each clade represented across numerous chromosomes and locations ( Figure 3). Unique identifiers were assigned to each element in our list, in the format IAP27-IAP1414, used throughout the study.

Characterization of interindividual metastability
From these candidates 7 from each clade were randomly selected to test for variable methylation using combined bisulfite and restriction analysis (CoBRA). Primers for CoBRA were designed with the forward primer internal to the IAP sequence and the reverse primer located in the flanking sequence ( Figure 4A), providing an amplicon specific to an insertion and containing homologous IAP element sequence (Table 1). BceAI enzyme was chosen since each element contains two CpG sites as part of a BceAI restriction site, ACGGCG. Loss of methylation at either or both would prevent BceAI activity. Consequently, the relative intensity of cut vs. uncut bands gives   a semi-quantitative measurement of methylation at these sites ( Figure 4A). Uncut bands correspond to unmethylated DNA while cut bands correspond to methylated DNA. The two CpG sites examined here exhibited interindividual variation in previous studies from our group [24,27], and are identified as sites 2 and 3 in this study. A quantitated CoBRA gel image of IAP236 is depicted in Figure 4B, revealing an 18% range of methylation at the cut site.
A group of 17 isogenic A vy /a genotype mice of both genders (N = 7 males and N = 10 females) from 7 litters was sacrificed at weaning and DNA was extracted from the liver. Following bisulfite conversion, candidate regions were amplified via PCR and digested with BceAI ( Figure 5). A total of 21 representative loci, 7 from the red clade including Cabp IAP , and 7 each from the green and black clades were evaluated. The 0% control (lane 1) was uncut in all loci, while the high percent methylated control was nearly completely digested in all loci (lane 2) indicating sufficient enzyme activity and specificity.
The mean methylation across all mice for all IAP insertions was 72.5% and no mouse deviated by more than 3% (range 71-74%), showing no statistical difference (p > 0.99) ( Table 2). Specific loci however, exhibited much greater variability. We found that mean methylation of individual IAPs fluctuated from a low of 59% to a high of 88% and  standard deviations varied from 1.9 to 7.3, indicating that individual transposons can differ dramatically in both average value and standard deviation among mice. Thus, methylation variability ranged from 9% for the least variable element to greater than 34% for the most variable element. There were no significant differences in methylation between sexes. Importantly, the average methylation was significantly lower for both red clade elements, 70% (p = 0.017), and green clade elements, 68% (p = 0.040) than for black clade elements, 79%. Interestingly, the range through which a particular element can be methylated is also correlated with its position within the phylogenetic tree ( Figure 6). The average range for red clade elements is 21%, 17% for green clade elements, and just 11% for black clade elements. The element with the greatest range is in the red clade (Cabp IAP ) while the element with the greatest dispersion as measured by interquartile range is found in the black clade (IAP506).

Candidate loci with interindividual variation
Epigenomic studies are increasingly cataloging variation across tissues, disease states, ages, and environmental exposures. It is therefore important to understand the baseline naturally extant variation in a population of isogenic, unexposed, age-matched animals and to provide assays to detect shifts in variation mediated by other conditions. Determination of interindividual variation in DNA methylation has followed a number of methods, including coat color shifts in A vy mice [27], bisulfite sequencing, northern blots for transcript quantity [31], methylation specific PCR, and pyrosequencing [32]. The method we use here, combined bisulfite and restriction analysis (CoBRA), exploits site-specific enzyme digestion to determine methylation status. Advantages of CoBRA include semi-quantitative analysis, and relatively inexpensive costs, thereby facilitating the high throughput necessary for epidemiological testing. Indeed, CoBRA has been used to assess the epigenotype of colorectal tumors, providing direct health implications [33]. Additionally, automation platforms such as the QIAxel system (Qiagen Inc., Valencia, CA) would increase the speed of sample quantitation after enzyme digestion. Our candidate loci were also mined from the C57BL/6 genome, and as such, they are more likely to be useful as epigenetic markers in both mutant and exposed mice than previous epialleles (A vy and Axin Fu ), which are restricted to specific lines.
Previous studies utilizing non-phylogenetic methods of variably methylated candidate identification have identified retroelements co-located with active histone modification (H3K4me3) [34]. Further, IAPs, unlike other Green clade  Black clade   Overall  Mean   CabpIAP  IAP31  IAP44  IAP51  IAP77  IAP90  IAP110  IAP176  IAP182  IAP186  IAP195  IAP236  IAP281Y  IAP268  IAP506  IAP655  IAP1112  IAP1248  IAP1252  IAP1259 Figure 6 Boxplot of methylation values. Interindividual variation in each IAP correlates with phylogenetic clade. Average methylation values are lower for red and green clades (70% and 68%) than for the black clade (79%) suggesting higher methylation for older insertions. The average range per clade, in parentheses, is also lower, however the standard deviation is similar across clades.

Red clade
transposons, can induce changes in nearby chromatin marks, but no phylogenetic basis underlying epigenetic states has been explored [2]. The close sequence identity of A vy and Cabp IAP insertions may be associated structurally with directed epigenetic modifications by targeting enhanced recruitment of trans factors governing their metastability. Alternatively, high sequence similarity in the divergent clades may have resulted from very recent mobilization and high activity of the parent element and its rapid proliferation. We speculate that variable DNA methylation, being a general property of elements we tested from these clades, explains both the success of this lineage in duplication, and their interindividual variation. This hypothesis is borne out by the average methylation being higher in the black clade, containing the largest number, most conserved, and likely oldest, elements. Though in general, new insertions are considered deleterious, some have been co-opted by the genome as a mechanism of driving gene expression, and epigenetically labile insertions may be under selection to "fine tune" that expression [35]. Our analysis supports the notion that mutation to escape genomic control is correlated with both increased retrotranspositional activity and decreased methylation. Consequently, these data are a strong indication that interindividual variation in methylation at repetitive elements is the default condition, rather than a rare occurrence, and that shifts in methylation profiles could affect numerous transcripts in the genome. Thus, environmental exposures capable of altering the epigenome could potentially have widespread effects for gene regulation.

Conclusions
Combined phylogenetic and DNA methylation analysis resulted in the identification of novel regions of variable methylation. The phylogenetic analysis revealed two duplication and divergence events subdividing this family into three clades. Importantly, average methylation was significantly lower for both the red and green clade elements, compared to black clade elements, supporting the notion that phylogenetically older and more uniform elements tend to be more highly methylated than younger and diverged elements. Thus, through this endeavor the number of known metastable epialleles in the mouse has dramatically increased. These sites can now serve as biomarkers for environmental modifications to the epigenome. Parallel approaches in humans are underway.

Computational analysis
For the A vy allele, the LTR studied was obtained from Genbank accession number  [36]. The output was filtered for elements annotated as IAPLTR and counted to 10802. Since each IAP element contains two LTRs, this represents approximately 5000 full-length IAP elements of all subtypes. We next filtered for elements of type IΔ1, subtype IAPLTR1_Mm and removed non-chromosomal elements and 40 short elements below 330 bp. The resulting dataset contained 707 elements on the sense strand and 691 on the complementary strand (Additional file 1: Table S1). A further 10 elements were removed for lack of alignment to the IAPLTR1_Mm consensus. Alignment was performed by Geneious software version 5.6.5 [http://www.geneious. com/] using the Geneious alignment algorithm with default parameters (cost matrix = 65% similarity, gap open penalty = 12, gap extension penalty = 3). The phylogenetic tree was built using neighbor-joining, no outgroup, Jukes-Cantor genetic distance model. Bootstrap of 1000 iterations showed no substantial differences in the divergence of the major clades (data not shown).
Genomic sequence for each element (for alignment and tree building) and 400 bp of flanking sequence (for primer design) was obtained by submission of the RepeatMasker generated coordinates to the Galaxy Project [https://main. g2.bx.psu.edu/] [37]. Unique numbers were assigned to each IAP here to aid in reference and used throughout the study (Additional file 1: Table S1). A second alignment was performed with flanking sequence to select all sites with 5' LTR sequence, complete IAP structure, and 3' LTR sequence. Removal of duplicate entries that matched to the 3' LTR copy and other LTRs with repetitive flanking sequence resulted in a total of 366 candidate elements for CoBRA [38,39]. All primers were designed to amplify the antisense strand of the 5' LTR and proximal flanking sequence (Table 1).

DNA extraction and conversion
Mice were obtained from a forced heterozygous colony carrying the A vy allele that has been maintained with sibling mating for over 220 generations, resulting in a genetically invariant background initially based upon the C3H strain [21]. Virgin wild-type dams, 6 weeks of age, were fed phytoestrogen free AIN-93 G diets (diet 95092 with 7% corn oil substituted for 7% soybean oil; Harlan Teklad, Madison, WI) and were mated. A vy /a offspring were sacrificed at d21 and livers flash frozen for DNA extraction. A total of 17 animals (N = 7 males and N = 10 females) and some sets of littermates were used for this study. DNA extraction was performed using a standard Phenol-Chloroform protocol. Using the Qiagen Epitect kit automated on the Qiagen QIAcube W purification system, approximately 1 μg of genomic DNA was treated with sodium bisulfite. The treatment converts unmethylated cytosines to uracil, read as thymine during polymerase chain reaction (PCR), whereas the methylated cytosines are protected and remain unconverted [40].
Animals used in this study were maintained in accordance with the Guidelines for the Care and Use of Laboratory Animals (Institute of Laboratory Animal Resources, 1996) and were treated humanely and with regard for alleviation of suffering. The study protocol was approved by the University of Michigan Committee on Use and Care of Animals.

Combined bisulfite restriction analysis
CoBRA [39] was performed with enzyme BceAI (New England Biolabs W ) using the following conditions for PCR: 50 ng DNA, 0.5 μl each forward and reverse primer at 10pM/μl concentration, 15 μl HotStarTaq master mix (Qiagen Inc., Valencia, CA), 3 μl Rediload™ (Invitrogen Inc., Grand Island, NY), 10 μl water for a total of 30 μl reaction volume. The reaction was run at 52°C annealing temperature for 45 cycles on a C1000 Bio-Rad thermal cycler pcr machine (Hercules, CA). Some reactions were run with altered conditions (Table 1). Enzyme digestion was performed according to manufacturer protocol, briefly, 10 μl of PCR product, 3 μl BSA (10×), 3 μl Rediload™, 3 μl NEB buffer 3 (10×), 1 μl BceAI enzyme (1 Unit), and 10 μl of water were combined for a 30 μl reaction and incubated at 37°C for 8 hours, followed by 65°C for 20 minutes and run out on a 2% agarose gel. Gels were visualized on a Gel Doc XR™ by Bio-Rad. Image quantitation was performed with Quantity One version 4.6.7 using default options (all lanes, auto detect = yes, normalize = yes, sensitivity = 10, size scale = 5) and reporting relative quantity of each band per lane. Lanes with top bands too faint for detection were considered below the limit of detection (LOD) and removed from further analysis (marked as blank in Table 2). Statistical analysis to determine significance of clade methylation differences was performed using SPSS (IBM, New York, NY) by ANOVA with multiple comparisons Tukey adjustment.
Control DNA was generated in-house by whole genome amplification using the GenomePlex W kit (Sigma-Aldrich, St. Louis, MO) for 0% methylated DNA and by CpG methylase, M.SssI treatment (Zymo Research, Irvine CA) for highly methylated control, both according to manufacturer protocols.