Phenotypes, genotypes and genome-wide association study
Friesian horses (n = 29) were diagnosed as being dwarfs by local equine veterinarians in consultation with the Equine Clinic of Utrecht University, usually via a digital in vivo picture for confirmation of the phenotype. Thirteen of the horses were male, 11 female and the sex of 5 dwarfs was unknown. The group of unaffected controls (n = 65) were Friesian horses without the characteristic appearance of dwarfism [1]. In addition, we obtained blood samples for DNA isolation from 8 parents of dwarfs and DNA samples from 177 Friesian horses that were part of a DNA bank maintained at the Dr. Van Haeringen Laboratory B.V..
Blood samples were taken and DNA was isolated as described by Orr et al. [4]. Genotypes of 19 dwarfs and 65 controls were obtained using the Illumina® EquineSNP50 Genotyping BeadChip containing 54,602 SNPs. Quality control was performed using the check.marker function in the GenABEL package in R [23]. SNPs with MAF <5 % and call-rate <90 % were discarded, leaving 29,840 SNPs (54.7 % of all SNPs) for the analysis.
The ccfast function in GenABEL package in R [23] was used to determine the significance of allelic differences between dwarfs and unaffected horses with a χ2-test (1df). The Bonferroni corrected significance level applied was 1.68 × 10−6. Homozygosity mapping in the significantly associated region was performed by eye to identify overlapping regions of homozygosity between dwarfs.
Genome sequencing
Four dwarf cases and three unrelated control were paired-end sequenced with 150 nucleotide reads for the full genome on Illumina NextSeq500 to an average coverage of 4-9x according to the manufacturers protocols. To increase the power to detect causal candidate variants as a fully homozygous variant, we merged the data for the four cases yielding an mean coverage of 36x.
The Illumina data was processed with our inhouse developed pipeline v 1.2.1 [24] including GATK v3.2.2 [25] according to the best practices guidelines [26]. Briefly, we mapped the pairs with BWA-MEM v0.7.5a [27], marked duplicates, merged lanes, realigned indels. Base recalibration did not improve our results, so this step was skipped. Next, GATK Haplotypecaller was used to call SNPs and indels. Variants are flagged as PASS only if they do not meet the following criteria: QD < 2.0, MQ < 40.0,FS > 60.0, HaplotypeScore > 13.0, MQRankSum < -12.5, ReadPosRankSum < -8.0, snpclusters > =3 in 35 bp. For indels: QD < 2.0, FS >200.0, ReadPosRankSum < -20.0.
Detecting recessive candidate variants was done with snpSift [28] fitting the model of reference or potential carrier status in the control and homozygous state in the cases. Moreover, coverage > 10, a genotypequality of >30 and effect impact ‘HIGH’ or ‘MODERATE’ was required. Additional evaluation of the variant of interest was performed with PolyPhen-2 [9].
The observed possibly detrimental DNA variant of B4GALT7 was confirmed and evaluated in the complete cohort by Sanger sequencing of PCR fragments. The PCR primer sequences were 5’- AGTTTCTCGGAGTGTAGAG-3’ (UP1F) and 5’-AGAGACATAGACCCTCAGAG-3’ (IN1R). The PCR was performed with 50 ng genomic DNA, 3 U Platinum Taq DNA polymerase (Thermo Fisher Scientific, Waltham, MA), 2 mM MgCl2, 0.2 mM each dNTP, 0.5 μM each primer, 1 M betaine and 1× Platinum buffer. Temperature cycling conditions were 5 min at 95 °C, 35 cycles of 30 s at 95 °C, 30 s at 55 °C, 30 s at 72 °C, and a final elongation step at 72 °C for 10 min. All amplifications were performed on an ABI 9700 Thermal Cycler (Applied Biosystems, Foster City, CA). The PRC primers were degraded by addition of 1 U Exonuclease I (Thermo Fisher Scientific, Waltham, MA) and incubation for 15 min. at 37 oC and 15 min. at 85 oC. DNA sequencing tercycle reactions were performed using BigDye v3.1 (Thermo Fisher Scientific, Waltham, MA) according to the manufacturer’s protocol. The products were analysed on a 3130XL Genetic Analyzer (Applied Biosystems, Foster City, CA) and the data was analysed with Lasergene (version 11 DNASTAR).
Homologous DNA and protein sequences from different species were retrieved from GenBank and aligned one by one by eye. Identities and differences were indicated by using a word processor. The species were selected arbitrary to represent close and distant members of the animal kingdom.
RNA analysis
Fibroblasts were grown from 6 mm punch biobsies from the skin of a dwarf, a carrier of the mutation of interest and a Friesian horse homozygous for the reference allele. The biopsies were washed in Euroflush (IMV technologies, L’Aigle, France) containing 5000 IU/ml heparin, cut with scissors and incubated in petri dishes with DMEM/M199 (1:1) medium containing pen/strep (10,000 U/ml (all from Thermo Fisher Scientific, Waltham, MA), 2.5 ng/ml basic-FGF5 (Peprotech, Rocky Hill, NJ) and 20 % FCS at 38.5 oC with 5 % CO2 and 5 % O2. Proliferating fibroblasts were harvested and passaged in culture flasks using standard procedures. RNA was isolated from cultured fibroblasts with the RNeasy kit with an on-column DNase digestion according to the instructions of the manufacturer (Qiagen, Hilden, Germany). The RIN value of the RNA was meausured with an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA) and was found to be 9.5 or higher. cDNA was synthesized with the iScript kit (Bio-Rad Laboratories, Hercules, CA) using 500 ng of RNA in 20 μl reactions. Splicing products were PCR amplified from 0.5 μl cDNA product with B4GALT7 exonic primers 5’-CTGGGAGCTCGAGCTCCATG-3’ (EX1F) and 5’-CTCAGGAAGCGGTGCATGTG-3’ (EX2R) as described above. Unspliced products were amplified with primer EX1F and the intronic primer IN1R described above. In a semi quantitative experiment, the 3 primers were combined in a single PCR using the same component concentrations and cycling conditions as above. The fragments were visualized by electrophoresis on a 1 % agarose gel in 0.5x TBE with 0.5 μg/ml ethidium bromide followed by UV irradiation of the gel.
For DNA sequence analysis and confirmation of the origin of the products, the bands were cut from the gel and the DNA was isolated with QIAquick gel extraction kit (Qiagen, Hilden, Germany). The DNA sequencing procedure was as described above.
Quantitative PCR for measurement of B4GALT7 transcripts was performed with forward primer 5’- GACGGCAGGTCCAGGTTG-3’ and reverse primer 5’- ACAGGCAACGAAGAGGTGG-3’ at an annealing temperature of 55 oC. The forward primer bridges exons 1 and 2, while the reverse primer is situated in exon 2. The qPCR reactions contained 1x IQ SYBRGreen SuperMix (Bio-Rad, Laboratories, Hercules, CA), 400 nM of each primer and 1 μl cDNA, obtained as described above, in a total volume of 15 μl. The reference genes were RPS19 and RPL13A, which were analyzed as described [29]. The reactions were performed in a MyiQ2 thermal cycler and the data was analyzed with IQ5 software (both from Bio-Rad, Laboratories, Hercules, CA).