Whole genome resequencing reveals an association of ABCC4 variants with preaxial polydactyly in pigs

Background Polydactyly is one of the most common congenital limb dysplasia in many animal species. Although preaxial polydactyly (PPD) has been comprehensively studied in humans as a common abnormality, the genetic variations in other animal species have not been fully understood. Herein, we focused on the pig, as an even-toed ungulate mammal model with its unique advantages in medical and genetic researches, two PPD families consisting of four affected and 20 normal individuals were sequenced. Results Our results showed that the PPD in the sampled pigs were not related to previously reported variants. A strong association was identified at ABCC4 and it encodes a transmembrane protein involved in ciliogenesis. We found that the affected and normal individuals were highly differentiated at ABCC4, and all the PPD individuals shared long haplotype stretches as compared with the unaffected individuals. A highly differentiated missense mutation (I85T) in ABCC4 was observed at a residue from a transmembrane domain highly conserved among a variety of organisms. Conclusions This study reports ABCC4 as a new candidate gene and identifies a missense mutation for PPD in pigs. Our results illustrate a putative role of ciliogenesis process in PPD, coinciding with an earlier observation of ciliogenesis abnormality resulting in pseudo-thumb development in pandas. These results expand our knowledge on the genetic variations underlying PPD in animals.

growth and polarity of vertebrate limb buds. In the process of limb development, the Zone of Polarizing Activity (ZPA) signaling center determines the formation of anteriorposterior axis of limb buds [7]. The ectopic expression of SHH gene in the anterior part of limb bud is the main causative agent of PPD. Normally, SHH signal is only expressed at the posterior portion of ZPA region of limb bud [8,9]. The disruption or mis-regulation of SHH pathway often results in congenital birth defects, such as holoprosencephaly and polydactyly [10]. Lettice et al. [11][12][13][14] have found that the disruption of Zone of Polarizing Activity Regulatory Sequence (ZRS), a long-range cis-regulator for SHH located in the fifth intron of LMBR1 gene, results in the ectopic expression of SHH responsible for PPD. Many single nucleotide polymorphisms (SNPs) in ZRS have been reported to be in association with PPD [15].
In the early stages of embryonic development, Gli3 polarizes the limb into anterior-posterior axis through the antagonism with HAND2 [16,17]. SHH mediates the limb patterning by regulating Gli2 and Gli3, which act as fulllength transcriptional activators (GliA) in the presence of SHH and are cleaved into a short form as a truncated repressor (GliR) in its absence [18,19]. Gli3 mutant limbs are characterized by severe polydactyly and associated with ectopic anterior expression of Hoxd gene [20,21].
Additionally, as an important signal transduction and sensory center in eukaryotic cells, cilia play a crucial role in SHH signal transduction and regulation of its downstream target genes [22][23][24], such as Gli gene family [25]. The disruption or mis-regulation of ciliogenesis process results in serious ciliopathies, including polydactyly [23,26]. Abnormalities in cilia due to the defects of DYNC2H1 in retrograde intraflagellar transport (IFT) have caused the short-rib polydactyly syndrome in human [27]. Moreover, defects of DYNC2H1 and PCNT proteins in IFT and ciliogenesis process can cause pseudo-thumb (PPD) both in red and giant pandas [28]. CYLD (cylindromatosis) mediates ciliogenesis by deubiquitinating Cep70 and inactivating HDAC6, and CYLD knockout mice exhibit polydactyly [29]. In addition, previously reported candidate genes associated with polydactyly traits include: EN2 [30], MIPOL1 [31], TWIST1 [32], PITX1 [33] etc. and reviewed in Malik et al. [2] and Deng et al. [4]. Among all of these identified candidate genes, majority of them are involved in the SHH signaling pathway or ciliogenesis process.
In this study, four pigs from two pedigrees of Large White pig breed were identified with PPD deformity. We aimed at screening for the genomic variants of PPD phenotype through whole genome association studies based on high quality resequencing data. Genetic differentiations between PPD affected and normal groups were calculated and identification of ATP Binding Cassette Subfamily C Member 4 (ABCC4) (NCBI gene access ID: 100152536) strong association with PPD phenotype. Furthermore, a missense mutation in ABCC4 was detected but not in large normal samples. These findings highly prompt us in hypothesizing that ABCC4 is probably a new candidate gene for PPD through the regulation of ciliogenesis.

Pedigree and phenotypic analysis
In this study, four PPD affected individuals were collected from two separate families and their genealogical information are shown in Fig. 1a. All of the four PPD pigs were affected at one side of the forelimb, the F1-a1 male (Fig. 1b) and the F0-4 female (Fig. 1c) were affected on the left side of the forelimb, the F1-a2 female (Fig. 1d) and F1-a3 male (Fig. 1e) were affected on the right side of the forelimb.
To show the detail of the additional digit and for further classification, two affected pigs (F1-a1 and F1-a3) were euthanized to get the hooves for radiograph analysis. The detailed phenotypes and radiograph information of F1-a1 and F1-a3 are shown in Fig. 1f and Fig. 1g, respectively. According to the classification method proposed by Wassel [3], the PPD phenotypes in this study are classified into PPD type VI (duplicated metacarpal). Except for an extra toe on anterior side of the forelimb, there were no additional abnormal phenotypes observed in appearance and behavior.

Characterization of variants
To improve the reliability of data quality and called variants, the genome was sequenced with a higher depth varying from 29.06× (F1-b9) to 38.34× (F0-4) and on average 34.10×, and the average coverage with respect to the pig reference genome sequence (Sus scrofa 10.2) is 88.79% (Additional file 1: Table S1). After applying stringent quality control criteria, we identified a total of 13,624,224 SNPs and 2,903,785 Insertion/Deletion (INDELs) in the whole genome and most of them were located in noncoding sequences (Additional file 2: Table S2).

Reported candidate genes analysis
To investigate whether this PPD phenotype was caused by candidate genes mutations previously identified in human or other species. We scanned the homologue genes of all these candidate regions, but there were no mutations in the protein coding regions, untranslated regions (UTR) sequences, transcription factor binding sites and other highly conserved sequences surrounding the PPD affected individual's candidate genes. Variants in intergenic region (IGR), intron and other regions were not significantly differentiated between PPD affected and normal pigs, so we ruled out the possibility of the previously reported candidate genes as the causing loci for this PPD phenotype in our study.

Screening of variants associated with PPD phenotype
For screening of some haplotypes which were inherited by the PPD affected individuals (but not exist in unaffected groups). We calculated the Cross Population Extended Haplotype Homozygosity (XP-EHH) value [34] between PPD affected and unaffected groups based on whole-genome SNPs. Our results showed that a haplotype including ABCC4 gene on SSC11 (Sus scrofa chromosome 11) was highly associated with PPD phenotype (Fig. 2a). We further located and annotated the highly remarkable regions with mean XP-EHH value larger than 2 (XP-EHH > 2) (Additional file 3: Table S3). Among these regions, SNPs in ABCC4 region had the most significant value and counted for the biggest proportion of 70.49% in all the highly remarkable SNPs.
Moreover, the fixation index (F ST ) [35] analyses results showed that the region of ABCC4 gene on SSC11 was highly differentiated between the affected and normal groups based on both SNPs (Fig. 2b) and INDELs (Fig.  2c). All highly differentiated windows (Weighted F ST > 0.6 and number of SNPs≥10 in each slid window) based on whole-gnome SNPs (Fig. 2b) are listed in Additional file 4: Table S4. All of these regions were located in IGR except ABS2 on SSC7 and ABCC4 on SSC11 (Fig. 2b). In addition, we checked these highly differentiated signal regions and further analysis showed that the genotypes did Pedigree and phenotypes information of the four PPD affected pigs. a Pedigree of the three PPD affected pigs and one random case. Squares represent males and circles represent females. Shaded symbols denote polydactyly pigs. b Phenotypes of the affected male (F1-a1) which expressing a preaxial polydactyly on left forelimb. c Affected female (F1-a2) expressing a preaxial polydactyly phenotype on right side of fore limb. d Affected male (F1-a3) appears a preaxial polydactyly phenotype on right side of fore limb. e Affected female (F0-4) appears a preaxial polydactyly phenotype on left forelimb. f Radiograph and phenotype of the affected male (F1-a1). g Radiograph and phenotype of the affected male (F1-a3). The four hooves were placed based on the position of alive pigs, at the front were forelimbs and on the right were right limbs not fully correspond with phenotypes of these regions, and there is no literature showing that these regions are associated with polydactyly. We combined results from haplotype stretches (XP-EHH), SNP differentiation, INDEL differentiation and association analysis to identify the ABCC4 as a candidate gene. For some other regions, there were some signatures in SNP differentiation, but they were not replicated in other paralleling analysis.
Within these highly differentiated window regions, SNPs with F ST > 0.6 were annotated and listed in detail (Additional file 5: Table S5). Among these highly differentiated SNPs, 61.65% of them were located in intergenic regions, 20.62% in introns and only one missense mutation (NC_ 010453.4:g.70439379A > G) in the third exon of ABCC4 (Additional file 6: Figure S1). Meanwhile, the highly differentiated regions (Weighted F ST > 0.6 and number of  Table S6. All of these regions were located in ABCC4 gene. Furthermore, whole genome association study was performed to further analyze the effective SNPs of this deformity based on the basic case-control association test model of PLINK [36] (Fig. 2d). We also identified some highly associated SNPs which were located in ABCC4 on SSC11, including the missense mutation (NC_010453.4:g.70439379A > G, P = 1.19 × 10 − 5 ). The top ten highly associated SNPs (P ≥ 1.19 × 10 − 5 ) are listed in Table 1 and eight of them were also included in the list of highly differentiated SNPs identified through genome scanning (Additional file 5: Table S5).
To exclude the possibility of PPD in two families with different genetic causes, the affected individual F0-4 was removed in all repeated analyses and similar results were obtained (Additional file 8: Figure S2). These results highly indicate the possibility of ABCC4 gene as a new candidate gene for PPD abnormal. Moreover, previous findings revealed that Iguana/DZIP1 (DAZ Interacting Zinc Finger Protein 1) is an important protein coding gene located in around 800Kb upstream of ABCC4 (Additional file 9: Figure S3), and was also related to the Hedgehog signaling pathways [37,38] with an important role in cilium formation [39]. But in this study, we did not detect any variant of significant differentiation between PPD affected and normal groups in DZIP1. From these evidences, we concluded that ABCC4 gene on SSC11 is probably associated with an important role in the early limb formation stage and may affect the formation of PPD.
We further conducted copy number variation (CNV) and long-range structural variants (SV) analysis to explore putative association between PPD and CNVs/SVs. The genome coverage, sequencing depth and mapping quality at the identified putative loci were analyzed in all individuals using the Integrative Genomics Viewer (IGV) [40]. The result showed that no large INDELs, CNVs and SVs were located around these candidate regions.

Variants analysis
In order to further compare the genetic differentiation between PPD affected and normal groups in the two major candidate genes (SHH and LMBR1 (ZRS region was covered)) and ABCC4, pairwise F ST analysis was carried out based on SNPs and INDELs (Fig. 3a-f and Additional file 10: Figure S4). The results showed that the genetic differentiation in ABCC4 region was significantly higher than LMBR1 and SHH between affected and normal groups, and there was no significant differentiation in LMBR1 and SHH gene between the two groups ( Fig. 3a-f). The results of SNPs ( Fig. 3a-c) and INDELs ( Fig. 3d-f) were consistent. In addition, the density of variants in ABCC4 was higher than LMBR1 and SHH (Additional file 10: Figure S4).
For further identification of the causing mutation of PPD, the top ten highly associated SNPs identified through whole genome association study (Table 1) were selected to calculate the derived allele frequency (DAF) (Fig. 3g). The results showed that these SNPs had significant difference between PPD affected and normal groups, which were in concordance with the results obtained in whole genome screening. In addition, phased genotyping results showed that all of these SNPs were homozygous mutations in all affected individuals except the most significant SNP rs791053563 and F1-a3 in rs709805150 (0/0) ( Table 1 and Additional file 11: Table S7).
Furthermore, to investigate the genotypes of the highly differentiated INDELs between affected and normal pigs, 14 INDELs from the highly differentiated window regions (Additional file 7: Table S6) with which F ST > 0.2 were selected for further genotyping (Additional file 12: Table S8). Among these highly differentiated INDELs, eight of them were insertions and six were deletions, all of these were located in intron and far from the nearest exon. The results showed that there was a high allele frequency of mutant in this population and most of which were homozygous mutations. The derived allele frequency of these INDELs showed that there was a remarkable difference between PPD affected and normal groups (Fig. 3h). However, the genotypes of these INDELs were not in well concordance with the phenotype (Additional file 12: Table S8). So, we are not certain whether these INDELs contribute to the mutation in PPD. We detected the noticeable signal of these INDELs possibly due to the linkage disequilibrium.

Analysis of ABCC4 mutations
In order to further investigate the potential effect of highly associated mutations in ABCC4 on PPD abnormal. We focused on the missense mutation (NC_ 010453.4:g.70439379A > G) on the third exon of ABCC4, which resulting in the change of isoleucine to threonine  (Table 2), both regions were annotated at the third exon of ABCC4 and the variant ID of this SNP is rs1110129849 on Sus scrofa 11.1. Cross-species alignment of this protein region showed that this locus was highly conserved in vertebrates (Fig. 4a), indicating that this locus likely has an important biological function under strong evolutionary constraint. Moreover, our result showed that the allele frequency of this mutation is 0.3529 in the two families, but we didn't detect this homozygous mutation (G/G) in the large unpublished samples of normal individuals (559 individuals covering 66 domestic pig breeds and wild boars) from our lab (Additional file 13: Table S9). Only seven of them were heterozygous (A/G) (Additional file 14: Table S10) but four of them were filtered by quality control. The statistical results of the derived allele frequency showed that there was a great difference between population in this study and unpublished data. The potential association between this nonsynonymous mutation and PPD phenotype by genotyping illustrated that all of the four affected pigs were mutant homozygous (G/G). There were two kinds of genotypes in normal individuals, nine of them were wild type homozygous (A/A) and four individuals were heterozygous (A/G) ( Table 1 and Additional file 11: Table S7). In addition, the most significant SNP (rs791053563, P = 5.422 × 10 − 7 ) identified by association study was located in the intron region, and interestingly, all the affected individuals were homozygous wild-type at this locus, while the normal individuals were mutant genotypes at this locus. Besides, we have verified this SNP in the large normal population (n = 559; 66 breeds; 441 no-missing) and found that the majority (82.97%) of these nomissing normal individuals were homologous for ancestral allele (0/0), and some of them were heterozygous (0/ 1; 9.73%) or homozygous (1/1; 7.30%) for the derived allele (Additional file 14: Table S10). So, the possibility of this SNP as a potential pathogenic mutation was excluded, and other variants were in similar situation, either the individuals with the homozygous mutation were not affected, or the individuals affected were not all homozygous mutation. The haplotype patterns of ABCC4 region between PPD affected and normal groups are shown in Fig. 5a and all of the ten highly associated and significant SNPs are shown in Fig. 5b.
Furthermore, we used I-Mutant2.0 [41] to predict the protein stability changes for this mutation (NC_010453.4: Table 2 Blast results of different reference genomes The bold and enlarged positions represent the missense mutation SNP (NC_010453.4:g.70439379A > G) Fig. 4 Conservation analysis and structure prediction of the missense mutation site. a Cross-species alignment of amino acids sequences of the SNP in ABCC4. b Changes in the primary structure of this amino acid residues. c The full 3D structure of the protein encoded by ABCC4. d The 3D structure of isoleucine residues domain (wild-type). e The 3D structure of threonine residues domain (mutant). f Merged structure of wild-type (green) and mutant (red) g.70439379A > G), and the results showed that the protein stability is "decrease". In addition, we predicted whether this amino acid substitution has an impact on the biological function of the protein encoded by ABCC4 through PRO-VEAN [42]. Results showed that the mutation of isoleucine to threonine at position 85 is "deleterious" with the PRO-VEAN score of − 3.084. The HOPE [43] web site reported that the mutated residue is located in a domain that is important for binding of other molecules, the mutant residue is more hydrophilic than the wild-type residue and might disturb this function. Furthermore, to compare the structural changes before and after mutation of NC_010453.4: g.70439379A > G, we constructed the 3D structure of this protein encoded by ABCC4 with HOPE [43] and STRUM [44] web server (Fig. 4b-f). The results showed that the mutated residue is located in the alpha helices and is smaller than the wild-type residue. Changes in this amino acid residue might have probably affected the protein's ability to bind to the membrane or other molecules.

Discussion
In this research, we utilized pig as animal model in studying polydactyly and identified ABCC4 as a new candidate gene for PPD abnormal possibly through the regulation of ciliogenesis. A missense mutation was detected in ABCC4 which might have disrupted ciliogenesis. Our results further confirmed that the pseudo-thumb development in pandas by disrupting ciliogenesis and pointed out the fundamental role of ciliogenesis underlying PPD in different species.
Animal models are crucial in understanding both genetic and non-genetic diseases. Our study focused on pig based on its unique advantages of pig in medical and genetic researches, including more offspring in shorter generation interval, the anatomical similarities to humans (body size, cardiovascular system), functional similarities (gastrointestinal system, immune system) and the availability of disease models (diabetes, atherosclerosis) [45]. Pig has been considered as an ideal animal model to study human diseases.
SHH signal of anterior-posterior axial plays an important role in the early limb development and patterning. The downstream transduction of SHH signal is received and transported by cilia through a complex network [46]. As an important signal center and sensory organelle, cilia are commonly known as cell's antenna. Cilia Fig. 5 Haplotype pattern around ABCC4 and the genotype of the top ten highly associated and significant SNPs. a Haplotype pattern comparation between PPD affected and normal groups in ABCC4. b The genotype of the top ten highly associated and significant SNPs defects can result in a wide range of diseases known as ciliopathies, such as polydactyly, hydrocephalus, obesity and Marden-Walker syndrome [47]. Combined with previous researches [28,29,48,49], we conclude that different genes in different species control the same phenotype through the regulation of ciliogenesis. This finding also emphasized the important role of cilia in limb development.
ABCC4, a member of the ATP-binding cassette transporter family is also known as multidrug resistanceassociated protein 4 (MRP4) [50]. Most importantly, ABCC4 encodes an important transmembrane transporter known to transport PGE2 and other molecules across cellular membranes. Further, it is also involved in ciliogenesis by mediating PGE2 signaling acts through a ciliary G-protein-coupled receptor, EP4, to upregulate cAMP synthesis and increase anterograde IFT [51][52][53].
As an important transmembrane protein, the mutation or mis-regulation of ABCC4 results in the disorder of PGE2 transmembrane transport and further results in the misfunction of the stimulation of anterograde IFT through EP4 and protein kinase A (PKA). Jin et al. [51] reported that the T804M mutation in ABCC4 showed cilium loss and cilium-associated phenotypes in zebrafish, including ventrally curved body axis, hydrocephalus, abnormal otolith number and laterality defects of the brain and other organs.
In our study, based on two uncorrelated families' association analysis showed that some SNPs and INDELs in ABCC4 were highly associated with PPD phenotype. Derived allele frequency analyses around ABCC4 revealed that there were significant differences between affected and normal groups. Furthermore, a homozygous missense mutation was identified in all affected individuals but not in normal groups. More interestingly, we did not detect this mutation in large unpublished samples. Based on the important role of cilia in SHH single transduction and the function of ABCC4 in cell ciliogenesis, we firmly suggest ABCC4 as a new candidate gene for polydactyly in pigs. As the sample size in this study is relatively small and there are no additional expression data to further validate our results, we therefore suggest additional experimental verification in future studies.

Conclusions
In this study, we identified ABCC4 as a new candidate gene involved in PPD regulation possibly through ciliogenesis process. Our analysis detected a highly associated missense mutation in all affected individuals but not in normal groups. Prediction of protein structure and function with different methods showed that the mutated residue is located in an important domain for binding of other molecules. Mutation of the residue might have disturbed this function, resulting in the inactivation of ABCC4 and further into the disorder of ciliogenesis.
To the best of our knowledge, this study is the first to report on the genetic variation identification of PPD in artiodactyls, and these results expand our understanding of PPD in further studies of limb malformation and enrich our knowledge on cilium as an important signaling center during vertebrate development. Finally, this study serves an example to study human diseases through the whole genome sequencing of pig as an animal model.

Samples
Three sibling Large White pigs from one litter were detected having preaxial polydactyly in a farm in Hebei province of China (Fig. 1), and all the available samples of this family were collected form the farm, including three PPD affected (F1-a1, F1-a2 and F1-a3) and two normal individuals (F0-1and F1-a4) from one litter, 11 normal individuals (F1-b1 to b11) from another litter. The remaining individuals (F0-2, F0-3 and F1-a5 to a9) were recorded but without tissue samples. Meanwhile, another Large White female case of PPD (F0-4) was identified in another farm in Yunnan province, China. Both the parents of this affected individual are normal, but there was no further information on this individual's offspring or sibling. Two affected pigs (F1-a1 and F1-a3) were euthanized with pentobarbital sodium solution (100 mg/kg) to get the hooves for radiograph analysis.

DNA extraction and sequencing
Ear tissue samples were collected from all the 17 available individuals and stored at − 80°C. For each sample, 30 mg ear tissue was used to extract genomic DNA with the standard phenol-chloroform method. Quality and quantity were assessed by Nanodrop spectrophotometer 2000 and gel electrophoresis experiment. Library construction for re-sequencing was performed according to the Illumina library prepping protocols (Illumina Inc., San Diego, CA, USA) with the insert size of 380 bp. Pair-end (PE) reads length of 150 bp were generated from the resequencing libraries on the Illumina Hiseq X Ten platform (Illumina Inc., San Diego, CA, USA), and all individuals were resequenced above 30× depth of coverage.

Quality control and mapping
Clean reads were trimmed from raw reads that were pre-processed to remove index adaptors and low-quality reads. Quality control for removing the low-quality reads was done based on the following criteria: Up to 10% of the read bases include "N" content in each sequenced read, up to 50% of the read bases include low quality (Q < = 5) bases content in any sequenced reads. After trimming, clean reads of each sample were aligned to the pig reference genome Sus scrofa 10.2 [54] using BWA program ver.0.7.10-r789 [55] with default parameters. SAMtools software [56] were used to convert the SAM files from different libraries belonging to the same individual to BAM files and sort and merge them. And then, PCR duplicated reads were marked based on Picard ver. 2.12.1 tools (http://broadinstitute.github.io/picard/). Finally, read depth and coverage of each individual were estimated based on the results of SAMtools ver. 1.9 [56].

Variants identification and filtration
RealignerTargetCreator, IndelRealigner, and BaseRecalibrator tools in the Genome Analysis Toolkit (GATK) (ver. 3.7-0-gcfedb67) [57] were used for local realignment and base quality recalibration. SNPs were called using Unified-Genotyper (default parameters) and filtered SNPs and small size INDELs were identified using the UnifiedGenotyper algorithm of GATK [57]. In order to reduce the error rate of calling variations, SNPs were filtered by Var-

Annotation and genotype phasing
All called variants were annotated with GTF file downloaded from the Ensembl website (ftp://ftp.ensembl.org/pub) with a custom Perl script. According to the genomic position, SNPs were classified into protein coding (synonymous or nonsynonymous) regions, introns, UTR and IGR. Moreover, to further annotate the SNPs in putative regulatory regions, we downloaded human genome annotation data from the ENCODE (Encyclopedia of DNA elements) [58] project and identified putative regulatory sequences in pig genome orthologous to human counterpart with an ENCODE annotation, such as transcription factor DNA-binding motif, transcription factor binding site and histone binding site. The annotation method was referred to Lü et al. [59]. Haplotype phasing was implemented with Beagle 4.1 [60].

Reported candidate genes analysis
Based on the identified SNPs and INDELs in this study, the candidate loci which reported previously were graphically showed with Integrative Genomics Viewer (IGV) [40] to scan if there is any variation that was only occurred in affected individuals.
Genome-wide screening of candidate variants XP-EHH [34] scores were calculated with a local script to detect alleles frequencies fixed or nearly fixed and to compare the affected and normal groups. Moreover, F ST [35] was estimated to provide insights into the genetic differentiation between the affected and normal groups based on wholegenome SNPs and INDELs with 10 kb window and 2 kb overlapping slides using VCFtools 0.1.14 [61]. In addition, F ST of the three candidate genes (ABCC4, LMBR1 and SHH) were calculated based on every SNP and INDEL loci.

Case-control association studies
For analyzing of SNP effects related to the deformity of this two families, four affected and 13 available normal pigs from two families were compared using the basic case-control association analysis and family-based association test of PLINK v1.07 [36]. Meanwhile, based on the identified highly associated variations, genotyping and derived allele frequency were performed. Besides, we performed the familybased association test on all 16 individuals from the first family (the fourth affected individual F0-4 was excluded) to repeat the analyses for further support our results.

Structure and function prediction
In order to further confirm the amino acid substitution (I85T) impact on protein structure and function, we used I-Mutant2.0 [41] (http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi) to predict the protein stability changes upon the mutation of rs791053563, the parameter of temperature was set to 38°Cand pH was 7.0. In addition, we used PROVERAN web server [42] (http:// provean.jcvi.org/index.php) and HOPE [43] (http://www. cmbi.ru.nl/hope/) to predict this amino acid substitution's impact on the biological function and structural effect of the protein. PROVERAN [42] will give a score of the variant and the default threshold is − 2.5, that is: variants with a score equal to or below − 2.5 are considered "deleterious" and variants with a score above − 2.5 are considered "neutral." HOPE [43] is a web service that will produce a point mutation report based on the available information that collected and combined from a series of web services and databases. Moreover, in order to further compare the changes in protein structure caused by the missense mutation, we constructed the 3D structure of protein encoded by ABCC4 with STRUM [44] (https://zhanglab.ccmb.med. umich.edu/STRUM/). All the prediction was based on the protein sequence of UniPortKB (ID: A0A287A6F6_PIG) or Ensembl (Transcript ID: ENSSSCT00000037963.1).
Additional file 1: Table S1. Information of all samples and their genome resequencing data characteristics.
Additional file 2: Table S2. Distribution and annotation of SNPs identified in this study. ENCODE indicate that the SNPs which were located in the homologous sequences of human counterparts have ENCODE annotations; Motif indicates that the SNPs which were located