Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss

Background The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case–control cohort. Results Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 × 10-7). Conclusions We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1155) contains supplementary material, which is available to authorized users.


Background
Hereditary sensorineural hearing loss (SNHL) is a highly prevalent disorder in humans, affecting 1 in 500 newborns [1]. There is considerable genetic heterogeneity underlying SNHL. Approximately 133 autosomal non-syndromic loci (55 dominant and 78 recessive) have been mapped, and within these, 78 genes are causally implicated in nonsyndromic hearing loss: 30 for dominant and 48 for recessive hearing loss. In addition, there are three nonsyndromic X-linked genes known to date (http://hereditaryhearingloss.org, accessed February 24, 2014). Despite the large number of implicated loci only one region has been shown to be a major etiological contributor to bilateral autosomal recessive non-syndromic hearing loss (ARNSHL). The DFNB1A/B locus contains the GJB2 and GJB6 genes, which encode the connexin 26 and connexin 30 proteins, respectively, and GJB2 has been shown to be frequently mutated in individuals with severe ARNSHL [2,3]. Because the region responsible for many of the remaining cases has not been identified there are likely to be other yet-to-be-discovered genetic contributors that may underlie a significant proportion of cases.
Most SNHL loci have been discovered using homozygosity mapping and other forms of linkage analysis in large consanguineous families [4]. There have been few genome-wide association studies on SNHL [5,6], and most mutations associated with SNHL have been SNVs (http://deafnessvariationdatabase.com). Only one study has investigated the effects of copy number variants (CNVs) on SNHL in a comprehensive, though low-resolution fashion. This study found only one CNV, a deletion of the stereocilin gene STRC, that was associated with SNHL [6]. However, CNVs are enriched in genes involved in sensory perception of the environment [7] including smell and taste receptors [8]. Thus, there is a need to investigate the effects of CNV on SNHL in a high-resolution, unbiased, genomewide manner, and further to investigate the integrated effects of multiple types of variants on this phenotype [9].
In this study, we explored the utility of diverse and complementary high-resolution approaches to detect genetic variants associated with SNHL. We used multiple whole genome variant-mapping technologies, including exome sequencing and high-resolution array comparative genomic hybridization (aCGH), as well as familial and association strategies, to determine individually rare and frequent genetic contributors to SNHL in three families and in over 150 individual probands, for whom no conclusive genetic etiology had previously been established. We report the discovery of rare compound heterozygous mutations in the myosin heavy chain 7B gene, MYH7B, as a novel likely cause of SNHL, by exome sequencing a family of five individuals. We also report several individually rare, novel candidate mutations for SNHL, revealed by exome sequencing of two additional families (of four and five individuals, respectively) and 13 unrelated probands. Finally, we conducted the first high-resolution, genome-wide CNV investigation for hearing loss. We report several novel CNV associations found in a cohort of 150 affected individuals and 157 controls, including a deletion on chromosome 16 encompassing the PDXDC1 gene, the has-mir-1972 micro RNA, and part of the NPIP. These results support our hypothesis that SNHL may manifest due to either underlying shared or individually rare genetic etiologies in different cases, and arise by multiple mechanisms.

Two strategies to investigate novel genetic contributors to SNHL
We used two strategies to investigate the genetic variations underlying SNHL in individuals for whom no previous genetic etiology had been established. The first strategy involved analyzing novel SNV and indel associations with SNHL using exome sequencing in three affected families and 13 additional isolated probands ( Table 1). The second approach involved analyzing genome-wide copy number changes using high-resolution aCGH to discover CNVs associated with SNHL in a cohort of 150 probands and 157 controls ( Table 2). This cohort includes the 13 isolated probands from the exome sequencing study. This multiple strategy approach was designed to provide a detailed, yet comprehensive investigation of the type and nature of mutations affecting hearing.

Exome sequencing of individuals with familial and sporadic hearing loss
We performed exome sequencing on three families with different levels of sensorineural hearing loss. The severity of the hearing loss was determined by behavioral pure tone audiometry. Family 1 is of middle-eastern descent and afflicted with severe-profound bilateral hearing loss (>90 dB) and megalocornea with secondary glaucoma. Family 2 is of European-Caucasian descent and afflicted with moderate hearing loss (~50 dB). Family 3 of European-Caucasian descent has mild hearing loss (~40 dB) ( Figure 1). We also performed exome sequencing on 13 additional probands and searched for rare, highly penetrant SNVs and indels that may explain the phenotype. In each case we aligned 100 bp paired-end sequencing reads and called SNVs and indels using the Nucleotide-level Variation tool from DNAnexus. For the three families we also independently aligned reads and called SNVs and indels using the Variant 1.0 algorithm from Real Time Genomics (RTG) (Figure 2a). We note that the total number of variants called per genome differ significantly between the two algorithms due to technical differences. DNAnexus calls variants individually in each genome and then we used family structure to apply various segregation models. RTG uses the familial structure a priori to call variants segregating in the family under various inheritance models (see Methods). We then used Ingenuity Variant Analysis (IVA) to filter the variants based on quality, frequency in known populations, predicted deleteriousness, genetic analysis (families only) and biological context (Figure 2b). We discovered multiple potential genetic etiologies in the studied families and in the individual probands.
Compound heterozygous missense mutations in MYH7B are the likely cause of hearing loss in family 1 Family 1 has a severe hearing defect that segregates in a recessive manner (Figure 1a). The exomes of the two unaffected parents and three affected children were analyzed as described above. Of the 88,975 and 346,430 variants called by RTG and DNAnexus, respectively,    Relaxing the rarity filter to encompass variants that occurred at a frequency ≤ 15% in the public genomes did not yield any additional candidates. One of the two MYH7B variants (v1: p.Arg1693Gln) is heterozygous in the father and the other (v2: p. Asp557Asn) is heterozygous in the mother (Figure 3a). Each parent carries only one mutant allele but all three affected children are compound heterozygous for both mutations (Figure 3b). The maternal variant was present in dbSNP (Build 137) and present in the 1000 Genomes Project and Exome Sequencing Project samples at a frequency of 0.05% and 0.01%, respectively. The paternal variant was not present in dbSNP, the 1000 Genomes Project, or the Exome Sequencing Project samples. Neither variant was present in any of the other exomes of probands in our cohort. Thus, both variants appear to be rare and the paternal variant may be private to this family. We further analyzed these mutations using the Combined Annotation Dependent Depletion algorithm (CADD), a general framework for estimating the relative pathogenicity of genetic variants in humans [21]. They received scaled C scores of 27 and 33 for v1 and v2, respectively, indicating that these variants are in the 0.2 and 0.05 percentile of most deleterious substitutions in the human genome, respectively ( Figure 3c). These variants were verified by Sanger sequencing in all family members ( Figure 3d).
MYH7B encodes a heavy chain of myosin II, a member of the motor-domain superfamily. The myosin II molecule is a multi-subunit complex made up of two heavy chains and four light chains. The heavy chain comprises a catalytic globular motor domain, which carries out ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. The maternal variant lies in the relay loop of the catalytic motor domain and the paternal variant is located in the tail domain, which is responsible for dimerization ( Figure 3a). The MYH7B gene has not been previously implicated in hearing loss but has been linked to differentiation of inner ear hair cells [22] and shown to control actin networks within neurons [23].

MYH7B is expressed in the inner ear
We next examined expression of the MYH7B gene in the literature and in the Allen Brain Atlas. MYH7B was found to be concordantly expressed in embryonic mouse inner ear tissue but not in non-inner ear tissue, with atonal homolog 1a, ATOH1, a gene required for hair cell differentiation [22]. This concordant expression likely indicates a role for MYH7B in development of hair cells in the inner ear. In addition, microarray data from the Allen Brain Atlas indicates high MYH7B expression in the primary auditory cortex and regions of the auditory pathway such as the cochlear nuclei and inferior colliculus, in adult humans [24]. Furthermore, a reduction in MYH7B expression in cultured mature rat hippocampal neurons can cause profound alterations to dendritic spine morphology, excitatory synaptic strength, and the actin cytoskeleton [23]. It is possible that these effects may extend to other neuronal tissues including the auditory complex.
Variant filtering results in a shortlist of potential causative mutations underlying the hearing loss in families 2 and 3 The hearing loss in family 2 is moderate and bilateral and appears to segregate in either a recessive or dominant de novo fashion in female twin offspring ( Figure 1b). An additional male sibling has multiple congenital abnormalities, including hearing loss, which can be explained by chromosomal abnormalities that are absent in the twins. Accordingly, his hearing loss is different from that in the twins as confirmed by audiogram (not shown) and he was excluded from the analysis of this family. The exomes of the other four family members were analyzed as above. In this family, 469,864 variants were called in total, either by RTG (94,974 total), DNAnexus (438,031 total), or both (63,141). Filtering was performed using IVA on the union set of variants called by both algorithms to maximize findings. After removing common variants and low quality calls, two genetic models were applied; dominant de novo and recessive inheritance. We searched for dominant de novo mutations that were called by both RTG and DNAnexus that occurred in both twins. No such mutations were found. However, two potential candidates were found by DNAnexus only. These were a heterozygous in-frame deletion, (p.Ala23-Leu25del), in the CTBS gene (encoding Di-N-acetylchitobiase), and a heterozygous missense mutation (p.Thr26Ala) in the RBMXL1 gene (encoding RNA binding motif protein, X-linked-like 1) and in an intron of the gene CCBL2 (encoding Cysteine Conjugate-Beta Lyase 2). The search for underlying variants following a recessive pattern of inheritance did not generate any robust candidates. Family 3 is characterized by mild to moderate bilateral hearing loss appearing to segregate in an autosomal dominant fashion ( Figure 1c). However, the level of hearing loss in the mother is milder than that of the children. This indicates either incomplete penetrance of the causative allele that is segregating in an autosomal dominant manner, or compounding of the phenotype due to the additive effects of some putative paternal variant along with the maternal variant. It is also possible that the hearing loss in the mother is different from that in the children. However, this scenario is less likely given the seemingly Mendelian inheritance. The exomes of all four family members were analyzed as above. In total, 327,843 variants were called either by RTG (91,397 total), DNAnexus (293,696 total), or both (57,250). Filtering was performed in IVA on the union set of variants called by both algorithms. After removing common variants and low quality calls, only 14 variants were called by both RTG and DNAnexus, were predicted to be deleterious by IVA, and segregated in an autosomal dominant manner (Table 3). All but one of these has been reported in dbSNP. In this set, seven missense mutations were found, of which four were predicted to be damaging and three were predicted to be activating by SIFT. Five of these were heterozygous in all affected individuals while two were homozygous in the affected mother and heterozygous in the children. Of these seven variants, six occur in biological pathways at most two nodes away from some gene known to be involved in autosomal dominant non-syndromic hearing loss. These six missense variants located in networks containing known hearing loss genes (missense variants in Table 3 except the GAL3ST2 mutation) are the present leading causative candidate mutations in this family.
Exome sequencing of individual probands reveals rare deleterious mutations in genes known to be associated with hearing loss It is unknown how often sequencing of individuals with hearing loss will identify likely underlying causes of the disease. We therefore also sequenced the exomes of 13 individual probands for which additional family members were not available. We used DNAnexus Nucleotide-Level Variation Analysis to detect SNPs and indels in each proband. Between 114,135 and 228,298 variants were found in each exome (mean = 188,659 variants per exome). Using the IVA variant filtering scheme described in the methods and published at https://variants.ingenuity. com/Haraksingh-etal-2013-HHLa, 21,554 potentially deleterious variants in 9,715 genes were revealed in this set of probands. Of these 133 variants occurred in 46 genes that have previously been associated with hearing loss. Between 12-23 predicted deleterious variants (mean = 17) located in 10-21 genes (mean = 13) known to be associated with hearing loss were found in each proband ( Figure 4). In each proband, between one and six known hearing loss genes (mean = 3) with more than one predicted deleterious mutation were found.
In five of the 13 probands likely causative mutations can be identified. These are rare homozygous variants that are predicted to be damaging in known hearing loss genes. Proband 2 carries a stop loss in the MYO7A gene (p.*1179Gly) and a missense mutation (p.Pro426Leu) in the MYO1A gene. Proband 3 has a missense mutation (p.Leu2886Phe) in the USH2A gene that has been previously associated with Usher Syndrome in a Spanish family. There is a rare in-frame variant (p.398delGln) in the TRIOBP gene in probands 7 and 13, and proband 13 carries a missense mutation (p.Met6159Val) in the GPR98 gene. Proband 9 carries a missense mutation (p.Lys130Glu) in the USH1G gene. These damaging homozygous mutations are the strongest candidates for causing hearing loss in the five associated probands. Retinal abnormalities had not been observed in the two probands with mutations in Usher syndrome genes, but both were young children.
Seven of the remaining eight probands carry at least one rare homozygous variant in genes one node away from a known hearing loss gene in Ingenuity-curated biological pathways. These variants were not found in the unaffected family members from our cohort. These rare, homozygous, deleterious variants represent the most likely causative alleles in these probands. The variants can be viewed at https://variants.ingenuity.com/ Haraksingh-etal-2013-HHLb by invoking the 'Homozygous' filter followed by the 'Hearing-relevant' filter after the 'Rarity' filter.
Additionally, many genes were found containing recurrent predicted deleterious variants in at least two probands of our cohort. For example, under a dominant model, 398 genes contained recurrent variants in more than two probands and not in the unaffected members of families 1, 2, or 3. These genes may represent additional recurrent candidates underlying the hearing loss in our set of probands. The results of this analysis and other adjustments to the genetic model can be explored at https://variants.ingenuity.com/Haraksingh-etal-2013-HHLb. The dominant analysis results are obtained by moving the 'Homozygous' and 'Hearing-relevant' filters to the bottom of the cascade.
Overall, these results indicate that strong candidates can often be found by exome sequencing of genomic DNA of hearing loss patients. In other cases, larger numbers of candidates can be identified, the meaning of which is more difficult to distil.

Genome-wide CNV mapping reveals several CNVs associated with SNHL
In our second approach, we carried out a high-resolution, genome-wide CNV association study of SNHL using 150 affected individuals, including the 13 isolated probands whose exomes were sequenced, and 157 controls. We mapped CNVs by aCGH on the NimbleGen 2.1 M CNV array, the most sensitive array-based CNV detection platform available at the time [13]. We then called CNVs using two algorithms, Nexus Copy Number 6 (Biodiscovery) and NimbleScan 2.6 (NimbleGen). Association testing was performed for genomic regions affected by CNVs including single loci, genes, and pathways, as well as for overall CNV load. We found an associated deletion Between two and 6,172 CNVs were called by Nexus (median = 89), and between 394 and 1,731 were called by NimbleScan (median = 1,036.5) in the individual genomes (Additional file 1: Figure S1). It was found that the individual cases and controls have similar genomewide CNV loads. (However, note that a handful of outliers in the control group showed hundreds more Nexus CNV calls than the rest of the cohort). The Nexus CNV calls tend to be much larger for the cases than the controls. There is an enrichment of CNV calls between 30-80 kb in the cases, as well as a higher relative frequency of Nexus CNV calls that are greater than 100 kb. The NimbleScan case and control CNV calls are generally similar in size (Additional file 2: Figure S2).
CNVs were tested for association with the phenotype using the Classic calculation option of the Comparisons function in Nexus 6.0. Smallest regions of overlap of the individual CNVs were used. The most significant association is an approximately 72.5 kb (smallest region of overlap) deletion on chromosome 16 (hg18; chr16:14,956,245-15,028,783) encompassing the first 15 exons of the PDXDC1 gene, the has-mir-1972 micro RNA, and the intergenic region between PDXDC1 and the upstream NPIP gene (Figure 5a). Some of the individual deletions extend far enough upstream to include the NPIP gene as well as the five 3' most exons of the NOMO1 gene which is further upstream. This region was previously reported in the Database of Genomic Variants (DGV) and is thought to be the result of a duplication expansion in the human genome. The smallest region of overlap of the deletions is called in 23 cases and 7 controls by both algorithms independently producing an odds ratio of 3.91 (95% CI: 1.62-9.40, p = 1.45 × 10 −7 ). An additional 17 subjects and eight controls carry the deletion as called by a Figure 4 Predicted deleterious variant load in known SNHL genes derived from exome sequencing of SNHL probands. Black lines demarcate families. Purple bars indicate unaffected parents of probands in families. All included variants had a call quality greater than 20, and a frequency less than or equal to 15% in the 1000 genomes project, ESP, and Complete Genomics. The data represents a total of 134 variants in 46 genes (exons, splice sites, and miRNAs only). single algorithm. Counting CNVs called by at least one algorithm, indicates that the deletion is present in 40 affected individuals and 15 controls producing an odds ratio of 3.47 (95% CI: 1.82-6.60). The deletion is present in the same number of Mexican cases and controls but significantly more East Asian (12 versus two) and Caucasian (five versus two) cases than controls when considering cases where both algorithms called the CNV (Figure 5b).
We observed deletions encompassing the entire STRC gene in seven cases and two controls as called by both algorithms, and in an additional control called by just the NimbleScan algorithm. The deletions ranged in size from 70-239 kb with the smallest region of overlap being hg18; chr15:41,639,153-41,709,787. This is the only CNV that has previously been reported to be associated with mild to moderate hearing impairment in GJB2 mutation negative probands [6].
Each gene in the NCBI Reference Sequence Database (RefSeq), including 10 kb up and downstream of the gene, was tested for association with the phenotype under the premise that different mutations in the same gene can lead to the phenotype. The frequency at which each gene overlapped a CNV by at least one base pair was calculated for the cases and controls. A Fisher's Exact Test was performed to determine whether the frequency differences between the cases and controls were significant. Associations that either contained the lowest p-values (before Bonferroni correction, which is frequently too stringent for GWAS studies) with the case frequency being higher than the control frequency, the lowest control frequency, or the most consistent trend from both CNV calling algorithms were the most functionally promising ( Table 4). As expected, the three genes in the deletion on chromosome 16 (NOMO1, NPIP and PDXDC1) found to be associated were among the top candidates in this second set of association tests. Additionally, the OTOA gene, known to be associated with SNHL [1], was found to be significantly associated in this cohort. The NBPF4 gene was found to be associated as well. This gene has no known function, but it is one of five genes that lies within the region of overlap of two previously discovered deafness associated genomic regions, DFNB82 and DFNB32 [15].
In order to test the hypothesis that different individuals may carry distinct mutations in a particular pathway which all result in the same phenotype, we carried out pathway association tests. Each pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG) containing a gene previously known to be associated with hearing loss was tested for association in this cohort. A Fisher's Exact Test was used to determine if particular pathways were significantly enriched for CNVs in the cases versus the controls. No such pathway was found.
Finally, the CNV load in the cases versus controls of the set of 46 genes known to be associated with SNHL was tested. There was no significant difference in CNV load in the cases versus the controls for this set of genes.

Combined effects of CNVs and point mutations in the DFNB1 locus may explain the hearing loss in several probands
Deletions of the DFNB1 locus at chromosome 13q11-q12 have been described previously but are uncommon in most populations. This locus includes GJB2 and GJB6 encoding connexin 26 and connexin 30, respectively, the two main connexins expressed in the cochlea. To date, four recessive GJB6 mutations have been reported [25][26][27][28][29][30][31][32]. The two most common are del(GJB6-D13S1830) and del (GJB6-D13S1854), which truncate GJB6 and affect expression levels of the GJB2 gene [33,34]. The other two are private. One deletes both GJB2 and GJB6 [31] and the other (del(chr13:19,837,343-19,968,698) lies upstream from the GJB6 gene and does not affect either gene directly [30,32].
In our study, we discovered two probands with hetero-zygous~232 kb del(GJB6-D13S1854) deletions in the DFNB1 locus encompassing parts of the GJB6 and CRYL1 genes and the putative regulatory region of the GJB2 gene. These probands carry additional heterozygous deleterious point mutations in the GJB2 gene as discovered by the APEX array and Sanger sequencing; the missense p.Gln80Pro and frameshift g.35delG mutations respectively. Although we cannot determine from our data whether the deletion and point mutations occur in cis-or trans-configurations, it is likely that the compounded effects of a point mutation in GJB2 and a deletion of its putative regulatory elements explain the hearing loss in these probands. Our cohort also contained one proband and one control who were carriers of a previously identified heterozygous~309 kb del (GJB6-D13S1830) deletion, which was confirmed by our CNV analysis. Additionally, we discovered a novel smaller heterozygous deletion (~2-4 kb) in the DFNB1 locus that was called by at least one algorithm in 19 affected individuals and 40 controls. This deletion is~60 kb upstream of the GJB6 gene and does not overlap any other genes. Of the 19 cases with this deletion, nine carried an additional known heterozygous deleterious mutation in GJB2 that was determined by the arrayed primer extension (APEX) array and Sanger sequencing. It is possible that this 2 kb deletion may overlap regulatory elements of the GJB2 or GJB6 genes. The combined effects of the deletion and deleterious point mutations may explain the hearing loss in these nine probands. The remaining cases may contain unidentified deleterious mutations on the nondeleted allele while the controls do not. Alternatively, the 2-4 kb deletion may simply be a benign common CNV. With the current data set, we cannot resolve these possibilities.

Discussion
Using exome sequencing we have identified defects in a myosin II gene, MYH7B, as the likely contributors to hearing loss in one family. Although several other myosin heavy chain genes have been previously implicated in hearing loss, the MYH7B gene has not. However, there is indirect support for MYH7B involvement in hearing including expression in embryonic mouse inner ear tissue [22], expression in the primary auditory complex in humans, and the control of dendritic spine morphology, excitatory synaptic strength, and the actin cytoskeleton in rat neurons [23]. This, along with the segregation pattern, rarity, high quality, and location of the variants in functional domains strongly suggest that the predicted deleterious compound heterozygous mutations in MYH7B cause the hearing loss in family 1. The same sequence changes may also be responsible for the megalocornea phenotype in this family as it has been shown that MYH7B transcripts are present in extraocular muscles from human, rat, and mouse, and in developing mouse eye skeletal muscle [35,36]. Changes in extraocular muscle tension can produce significant changes in corneal topography [37]. Interestingly, a single proband in our CNV cohort was found to harbor a deletion of the MYH7B gene, as called by Nexus. Our results extend the role of cytoskeletal proteins in hearing and offer the possibility that mutations in the MYH7B gene may constitute a rare cause of hearing loss.
Exome sequencing of isolated probands revealed likely causative variants for hearing loss in five cases. Interestingly, two of these probands had two homozygous rare mutations in known hearing loss genes. The rarity of such an occurrence suggests that it is plausible that both mutations may be required for hearing loss. For the remaining probands, multiple homozygous mutations in genes in hearing-relevant pathways and multiple heterozygous deleterious mutations were present. It is possible that hearing loss in these patients is due to rare deleterious homozygous mutations in novel hearing-associated genes, or to codominant, compound heterozygous, or nonallelic non-complementation of heterozygous mutations in distinct genes previously not known to affect hearing. Despite extensive variant filtering and prioritization we are still left with unmanageable numbers of potentially causative mutations in many probands, which we were unable to further refine. Complete distillation of the extensive findings of potentially causative mutations will require expression database analysis (e.g. http://hereditaryhearingloss.org/main.aspx?c=.HHH&n=86597), functional assays in cell and animal models, meta-association analyses of integrated data from multiple genomic studies, and development of novel methods for discerning combinatorial effects of variants.
To our knowledge, this is the first high-resolution genome-wide CNV association study of hearing loss. We discovered novel CNV associations in both known hearing loss-associated genes and in novel candidates. Of note, we found a strong association between a deletion encompassing part of the PDXDC1 gene and hearing loss. The function of this gene is unknown but it is widely and highly expressed in the cerebral cortex, including the primary auditory cortex, in newborn and adult mice [38] (http://mouse.brain-map.org/experiment/ show?id=77869146). This work suggests a need to extend the types of variants typically analyzed in diagnostic hearing loss testing. Furthermore, we have shown the importance of testing for multiple types of variants occurring in combination in individual probands, such as known deletions and point mutations in the DFNB1 locus. While these heterozygous mutations do not individually explain the phenotype, their compounded effects may well be pathogenic.
Although we were unable to definitively identify the causative SNHL variants for many probands in our cohort, we have found novel mutations that have credible potential to cause or contribute to hearing loss. Maintaining accurate and comprehensive databases will be paramount in driving progress in molecular hearing loss diagnoses.

Conclusions
Our studies have revealed three important aspects of identifying mutations associated with SNHL. First, exome sequencing of families can reveal novel mutations segregating with SNHL, although not in every instance. Second, exome sequencing of a small number of isolated probands can reveal strong candidate hearing loss mutations, although in some cases it remains challenging to ascertain disease-causing mutations. Third, analysis of CNVs can reveal novel mutations and loci associated with hearing loss. By employing both familial and association studies we have successfully identified rare and potentially private as well as more frequent variants in both novel and previously known candidate genes and loci. Our results indicate that multiple strategies and study designs will be necessary to fully resolve the entire collection of mutations that underlie complex human disorders such as hearing loss. We anticipate that future advances in methods to determine the combinatorial effects of mutations will enable effective assessment of factors including long-range genetic interactions, and will facilitate integrated association analyses of panels of variants and specific phenotypes. At present however, studies like this continue to reveal novel aspects of the multifaceted and expansive genetic architecture underlying hearing loss.

Ethics statement
Informed consent, including consent to publish, was obtained from all enrolled study subjects or their guardians under Internal Review Board approved protocols from Stanford University Medical Center. Controls were recruited under informed consent, including consent to publish, as part of Internal Review Board protocols at Stanford University, Mount Sinai University, and Yale University.

Sample selection Exome sequencing study samples
The study included 13 probands who were diagnosed with bilateral non-syndromic SNHL, ranging in severity from mild to profound. In addition, the study encompassed parents and siblings of another three probands with SNHL, for a total number of 27 study participants ( Table 1). The average age of the probands was four years. Study subjects were enrolled at Stanford University under IRB approval. Prior to inclusion, the probands were, at a minimum, tested for mutations in the GJB2 gene by DNA sequencing, as part of their routine clinical care. Probands were eligible for this study if this or additional testing had identified no conclusive genetic etiology for their hearing loss. Genomic DNA was isolated from peripheral blood by standard methods. Mutation analysis by APEX microarray identified or confirmed sequence variants in 16 of the 18 probands (data not shown) [39]. Individuals with environmental causes for the hearing loss, which may include a history of trauma, exposure to noise or ototoxic medications, intra-uterine infection, and tumors or other conditions that can affect hearing, were excluded. Individuals with a recognized genetic syndrome were also excluded from this study.

CNV study samples
The 150 participating individuals were mostly children; the average age was 10 years. These probands had bilateral non-syndromic sensorineural hearing loss ranging from mild to profound. They were recruited at Stanford University under IRB approval. All probands were tested for mutations in GJB2 prior to enrollment. Identical selection criteria applied to the different study groups. The set of 13 probands analyzed by whole exome sequencing were included in the CNV analysis.
Genomic DNA was isolated from peripheral blood by standard methods. Mutation analysis by an APEX microarray identified or confirmed sequence variants in 117 of the probands (data not shown) and 44 of these were additionally tested for mutations in the promoter and in exon 1 of the GJB2 gene (data not shown). Controls for these participants were matched unaffected individuals of the same sex, age range (or older), and in the same ethnic group to the extent possible. Controls were recruited under informed consent as part of IRB protocols at Stanford University (n = 31), Mount Sinai University (n = 88), and Yale University (n = 38) ( Table 2).

Exome sequencing and SNP/indel calling
Exome capture and library preparation was performed using the Agilent SureSelectXT HumanAllExon V4 (50 Mb, product No. 5190-4631). Briefly, 3 μg of gDNA was sheared to a peak size of 150-200 bp using Covaris. Fragmented DNA was cleaned with AmpPure XP beads to remove fragments < 100 bp. The purified DNA fragments were then end-repaired, A-tailed and ligated to indexing-specific paired-end adaptor using the Agilent SureSelect Library Prep Kit, ILM, according to the manufacturer's instructions.
The adaptor-ligated libraries were amplified for five cycles with the SureSelect Primer and the SureSelect Indexing Pre-Capture reverse primer. PCR reactions were cleaned using the Agencourt AMPure XP. To capture exonic regions, 500 ng of each prepared library was hybridized to biotinylated cRNA oligonucleotides for 24 hours at 65°C. The captured libraries were pulled down using Dynabeads MyOne Streptavidin T1. A post capture PCR was then performed to amplify the captured libraries and to add the barcode sequences for multiplex sequencing for 14 cycles. Amplified libraries were purified with AmpPure XP beads. Qubit fluorometer and Bioanalyzer high sensitivity chips were used to determine the final concentration of each captured library. One library was prepared per sample. Libraries were pooled in pairs, and each pair of libraries was paired-end sequenced on a single Illumina HiSeq lane at the Stanford Center for Genomics and Personalized Medicine according to standard protocols.
Raw fastq files were aligned to hg19, and SNPs and indels were called using two separate pipelines. Fastq files were aligned to the hg19 using DNAnexus mapper with default settings and variants were called using the DNAnexus Nucleotide-level Variation tool. In addition, sequence data from the family pedigrees were aligned to the human reference (hg19 with decoys) and variant identification was performed with the RTG Variant 1.0 software (commercially available from Real Time Genomics, San Bruno, CA). This software includes a read hash-table based alignment step with base recalibration, and a Bayesian variant caller that performs simultaneous multi-sample scoring for pedigrees and uses priors for Mendelian variant segregation ( [40]; see Additional file 3 for more details). Sex chromosomes are handled as special cases, and offspring genotypes are phased by transmission.

Variant filtering
Variants called in the individual probands by DNAnexus were filtered using Ingenuity Variant Analysis as follows. Variants with a call quality of at least 20.0 were kept. Then variants that were observed with an allele frequency ≥ 15.0% of the genomes in the 1000 genomes project (v3), or ≥ 15.0% of the public Complete Genomics genomes (11/2011), or ≥ 15.0% of the NHLBI ESP exomes (All) were excluded. Then variants that were experimentally observed to be associated with a phenotype: We also removed variants lying in genes that have emerged as hyper-variable in published exome-sequencing studies in some analyses. The variant filtering scheme for each family and the isolated cases were slightly different. These differences are discussed with the results for each sample set. Of note, for the family analysis a threshold of 3.0% rather than 15.0% frequency in the public genomes in order to study the rarest deleterious mutations segregating in the families.

Sanger sequencing validation of MYH7B variants
Sequences surrounding the two missense mutations (v1 and v2) were amplified by PCR using the Finzymes Phusion High-Fidelity PCR Master Mix (Thermo Scientific) and the following forward (F) and reverse (R) primers: v1F -5' CGG CTC AAG AAG AAG ATG GA v1R -5' CCT GCT CGT GGA GCT CAG v2F -5' GCA GTT CTT CAA CCA GCA CA v2R -5' ACA CCC TCC CTT CCT CAA AG PCR cycling was carried out using an optimized version of the manufacturer's protocol involving 35 cycles with a 30 s annealing step at 65°C and a 10 s elongation step at 72°C. The PCR products were purified using gel electrophoresis followed by extraction using a Qiagen MinElute Gel Extraction kit. The purified products were sent to Elim Biopharmaceuticals (Hayward, CA, U.S.A.) for Sanger sequencing using the following sequencing primers.

Genome-wide copy number analysis
CNVs were mapped genome-wide to hg18 in all samples by aCGH using the NimbleGen 2.1 M CNV array (Roche NimbleGen) followed by analysis in Nexus Copy Number 6 (Biodiscovery) and NimbleScan 2.6 (Roche NimbleGen).
Genomic DNA from each sample was labeled with cy3 dye and genomic DNA from a control pool of seven female individuals (Promega) was labeled using cy5 dye according to the NimbleGen CGH protocol. 34 μg of test and control DNA were mixed together and hybridized to an array for 60-72 hrs. The arrays were washed using the NimbleGen Wash kit and scanned using the MS 200 scanner (Roche NimbleGen) in two channels: 532 nm and 635 nm. Images were normalized using NimbleScan 2.6 (NS). Normalized data were used to derive LRRs using two algorithms: NimbleScan 2.6 segMNT algorithm (default parameters) and Nexus Copy Number 6.0 Rank Segmentation algorithm (significance threshold = 1.0 −9 ). Data were loaded into Nexus 6.0 and copy number calls were generated genome-wide for each sample based on fixed thresholds for deletions and duplications specified in the settings.

Quality control
Samples were only included in the subsequent analysis if their hybridization passed two quality control filters. The first quality control metric is the mad1.dr score calculated by the segMNT algorithm in NimbleScan 2.6. This score is the median absolute deviation of the LRR difference between consecutive probes along the chromosome and is a proxy for the overall noisiness of the hybridization. Hybridizations obtaining a mad1.dr score of more than 0.23 are considered by the manufacturer too noisy to be able to discern true differential hybridization from background noise. The second quality control filter, the Robust Variance Sample QC score calculated by Nexus 6.0, is also a measure of probe noise. The probe-toprobe variance is calculated but the quality control score takes into account that a certain percentage of variance outliers are expected due to CNV breakpoints. The score is calculated by ordering the magnitudes of the variance between adjacent probes, and then removing the top and bottom 3% of values. The Nexus recommendation for an acceptable Robust Variance Sample QC score is less than 0.15-0.2.

CNV association analysis
Genomic region association CNV association analysis was carried out using the Comparisons function of Nexus 6 with the classic option. A Fisher's Exact Test was performed to determine if the difference between the frequencies of a CNV region in the cases and in the controls is significant. The output of the Comparisons function is a list of regions meeting a maximum p-value (max p-value) and frequency difference (differential threshold) between the case and control groups. These regions are reported in a table such that each region has constant frequency. That is, if a contiguous genomic segment for a given event has different frequencies, the region is split into multiple regions. The Q-bound value corrects for multiple testing by performing a False Discovery Rate correction. Regions containing CNVs that were present at a much larger frequency in the cases versus the controls and incorporating functionally interesting elements were considered top candidates for association. Regions containing CNVs at significant frequencies in the cases and at very low frequencies in the controls were selected for manual examination. Odds ratio (OR) and confidence interval (CI) calculations were carried out using MedCalc for Windows, version 12.7.2 (MedCalc Software, Ostend, Belgium).
Gene association Single genes were tested for association using a custom built Perl algorithm. The HG18 coordinates of each RefSeq gene were obtained from UCSC genome browser tables. An overlap algorithm was applied to determine which RefSeq IDs including 10 kb up-and downstream overlapped a CNV call from the cohort by at least one base pair. Those RefSeq IDs that did contain overlapping CNVs were subjected to a Fisher's Exact Test to determine whether it was significantly enriched for overlapping CNVs in the cases versus the controls.
Pathway association 46 genes known to be associated with hearing loss [1] were found to be located within 36 biological pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. These 36 pathways contain 4,548 RefSeq genes in total. Of these genes, 2,729 were affected by a CNV called in our sample set (i.e. each of these genes had a minimum of 1 bp overlap with a CNV call). Each of these 2,729 genes was tested for association with hearing loss as above. In addition, each of the 36 pathways was also tested for association. In each sample a pathway was counted as being affected by CNVs if at least one of its genes was affected by a CNV. A Fisher's Exact Test was used to determine whether any of the 36 pathways were more significantly affected by CNVs in the cases than in the controls.

CNV validation
All cases and 28 controls were genotyped on the Illumina Omni-Quad at Centrillion Biosciences. The data were analyzed for CNVs using the cnvPartition algorithm of the Illumina GenomeStudio software suite and CNVision [41]. These data were visualized in Nexus 6. CNVs of interest were validated by comparison to CNV calls from the SNP genotyping data with acceptable overlap.

Sample ethnicity determination
Illumina Omni-Quad SNP data were used to determine the ethnicities of the cases and to confirm a subset of the self-reported ethnicities in the medical records of the controls. Specifically, we used the markers on the Illumina Human Omni1Quad array that belonged to the Human Genome Diversity Project SNP collection as input. Sample data were formatted using PLINK (http:// pngu.mgh.harvard.edu/purcell/plink/) [42]. Principle component analysis was performed to determine the ethnicities of the samples using EIGENSTRAT [43].

Integrated analysis of CNV and SNV data
Genome-wide CNV and SNV data were overlaid using custom algorithm and IVA in order to detect genomic loci harboring multiple types of deleterious variants in 13 probands. The CNV data were mapped from hg18 to hg19 using the UCSC liftover tool in order to match the SNV data.

Real time genomics analysis
Reads were aligned with the RTG map algorithm to the hg19 reference with decoys used by the 1000 Genomes Project 1 . RTG map creates a hash table that indexes the reads and streams the reference sequence to identify mapping locations. Mapping of paired-end reads is performed concurrently in a collection window that is much larger than the library insert size (in this case the window was 1,000 bp). RTG maps also calculate base QV recalibration tables, which are needed for variant calling, and outputs standard BAM format files. The RTG variant caller uses a Bayesian framework (originally proposed by Marth et al. 2 ) that estimates diploid genotype posterior probabilities per and uses priors for polymorphism rates based on the data of the 1000 Genomes Project 1 . Platform-specific error rates are modelled as priors and mapping quality values from the mapper are incorporated as part of the data. Depth of coverage is also considered during scoring penalizing variants with higher-than expected coverage. For this, depth of coverage needs to be estimated before variant calling; in the case of exomes a BED file with the target regions is used to estimate target depth appropriately. Complex regions are identified by various criteria, mainly including regions with apparent indels, MNPs, or clusters of SNVs. A specialized Bayesian caller is used for these regions ("complex caller") which iteratively selects pre-existing single-read alignments in the region as hypothesis, aligns the rest of the reads to the hypothesis by a probabilistic Goth algorithm and estimates the posterior probability of each hypothesis considering diploid indels and MNP variants. The final call is the hypothesis with the highest posterior probability and accounts for about 10% of the total variant calls 3 . In the case of data from pedigrees, alignments are evaluated simultaneously across pedigree members at every position using a scoring method that assumes Mendelian variant segregation. Sex chromosomes are handed as special cases. This dramatically reduces Mendelian inconsistencies without filtering of variants, and improves the genotype qualities (GQ) of true positives, while decreasing the GQ of probably false positives (unpublished). In order to evaluate the possibility of de novo mutations, a small prior is allowed for such type of events and a specific score is calculated for the de novo mutation hypothesis and it is included in the output VCF. In the case of nuclear families, offspring genotypes are phased by transmission. The output is a multi-sample VCF conforming to v 4.1 specifications and includes all variants through the score range (i.e. no filtering is performed by default).
screening of samples for known hearing loss mutations and contributed to editing the manuscript. JG collected and prepared control samples for the microarray experiments and contributed to editing the manuscript. KCN collected and prepared control samples for the microarray experiments and contributed to editing the manuscript. JSO collected and prepared patient samples for the exome sequencing experiments and contributed to editing the manuscript. IS collected and screened all patient samples for this study, wrote sections and edited the manuscript, advised and contributed to the conception and design of the study. MPS contributed to the conception, design, and coordination of the study, wrote sections of and edited the manuscript. All authors read and approved the final manuscript.