Chromosome microarray testing for patients with congenital heart defects reveals novel disease causing loci and high diagnostic yield

Background Congenital heart defects (CHD), as the most common congenital anomaly, have been reported to be frequently associated with pathogenic copy number variants (CNVs). Currently, patients with CHD are routinely offered chromosomal microarray (CMA) testing, but the diagnostic yield of CMA on CHD patients has not been extensively evaluated based on a large patient cohort. In this study, we retrospectively assessed the detected CNVs in a total of 514 CHD cases (a 422-case clinical cohort from Boston Children's Hospital (BCH) and a 92-case research cohort from Shanghai Children’s Medical Center (SCMC)) and conducted a genotype-phenotype analysis. Furthermore, genes encompassed in pathogenic/likely pathogenic CNVs were prioritized by integrating several tools and public data sources for novel CHD candidate gene identification. Results Based on the BCH cohort, the overall diagnostic yield of CMA testing for CHD patients was 12.8(pathogenic CNVs)-18.5% (pathogenic and likely pathogenic CNVs). The diagnostic yield of CMA for syndromic CHD was 14.1-20.6% (excluding aneuploidy cases), whereas the diagnostic yield for isolated CHD was 4.3-9.3%. Four recurrent genomic loci (4q terminal region, 15q11.2, 16p12.2 and Yp11.2) were more significantly enriched in cases than in controls. These regions are considered as novel CHD loci. We further identified 20 genes as the most likely novel CHD candidate genes through gene prioritization analysis. Conclusion The high clinical diagnostic yield of CMA in this study provides supportive evidence for CMA as the first-line genetic diagnostic tool for CHD patients. The CNVs detected in our study suggest a number of CHD candidate genes that warrant further investigation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1127) contains supplementary material, which is available to authorized users.


Background
Chromosomal microarray (CMA) analysis, which can better define the size of microdeletions/microduplications and their gene content, enables novel disease gene discoveries and genotype-phenotype correlation studies [1,2]. The diagnostic yield of CMA testing ranges from approximately 5% to 20% for patients with developmental delay/intellectual disability (DD/ID), autism spectrum disorder (ASD), or multiple congenital anomalies (MCAs), significantly higher than that of G-banded karyotyping (3%) [3]. The American College of Medical Genetics and Genomics (ACMG)recommends the use of CMA as thefirst-tier diagnostic test for these patients [4].
Congenital heart defect (CHD) is among the most common birth defects and is a leading cause of infant mortality around the world. It affects approximately 0.8-1% of newborns [5,6]. Recent studies have shown that pathogenic CNVs are identified in a substantial proportion of CHD patients [7,8]. Multiple recurrent CNV loci such as 22q11.2 (the DiGeorge syndrome region), 7q11.23, 8p23.1, 9q34.3, and 1q21.1 were found to confer significant risk for syndromic or isolated CHD [9][10][11]. These loci only explain a fraction of the genetic underpinnings of CHD [7]. In recent years, CMA has been routinely offered to patients with CHD. Several studies have evaluated and reported the clinical diagnostic yields of such practice but largely based on small patient cohorts [12][13][14][15][16][17][18][19]. Clinical diagnostic CMA data have proven to be an invaluable source for genetic discoveries and genotype-phenotype correlation studies. Here, we retrospectively reviewed the CNV detection in unselected clinical CHD cases at the Genetic Diagnostic Laboratory of Boston Children's Hospital (BCH) and selected research CHD cases from Shanghai Children's Medical Center (SCMC). We assessed the clinical significance of each CNV and evaluated the overall diagnostic yield. We further uncovered novel CHD-associated CNVs and potential CHD candidate genes through gene prioritization and pathway analysis.

Methods
Study subjects and phenotype classification 422 patients (56% male and 44% female, median age = 7 years) with at least one congenital heart defect who underwent clinical CMA testing at BCH between December 2006 and April 2013 were included in this study. The relevant medical records, including clinical notes and echocardiography reports, were reviewed. In addition, 92 CHD patients (61 male and 31 female, median age = 3 years) from SCMC were included in this study. This group of patients was evaluated by echocardiography, magnetic resonance imaging, cardiac catheterization or surgical reports to determine the type of CHD. SCMC patients with gross chromosomal aberrations (e.g., trisomy 21 and trisomy 18) were excluded from CMA analysis. Both studies were approved by respective IRBs of Boston Children's Hospital and Shanghai Children's Medical Center. Informed consent for patients from SCMC was obtained from parents. No identifiable information was used in the manuscript. Cases ascertained at BCH included all CHD phenotypes that were further subcategorized using the classification system established by National Birth Defects Prevention Study (NBDPS) [20], whereas cases ascertained at SCMC primarily had conotruncal defects (CTD). Patients who only had mild CHD abnormalities (i.e. isolated patent ductus arteriosus and patent foramen ovale) or were affected only by arrhythmia or cardiomyopathy were excluded from this study.
For comparison of CNV detection rate, a control cohort was assembled from previously published studies [8,21,22], which used high-density microarray platforms comparable to the ones used in this study.

Chromosomal Microarray testing and CNV evaluation
DNA samples from all cases were extracted from peripheral blood with standard procedures. CHD patients at BCH were tested on the Agilent 244 K comparative genomic hybridization (CGH) array platform or a 4 × 180 K SNP + CGH microarray in a clinical diagnostic setting. CNVs were identified and evaluated as previously described [23].
CHD cases at SCMC were tested on the Affymetrix Cytoscan™ HD microarray platform in a research setting. Data was visualized and analyzed by Chromosome Analysis Suite (ChAS) software package (Affymetrix, USA) with a minimal cutoff of 20 consecutive markers for CNV calling. All CNVs reported are based on NCBI human genome build 37 (hg 19).
Detected CNVs were evaluated through a filtering procedure and classified into five categories based on the ACMG guideline [24] (for details see Additional file 1).

Statistical analysis
Two-sided Fisher's exact test was used to compare the frequencies of recurrent (n ≥ 3) CNVs between the case and the control cohorts, the CNV detection rates between isolated CHD and syndromic CHD, and the CNV burden for each subcategory of CHD. A p value < 0.05 was considered significant throughout this study.

Gene prioritization for novel CHD candidate gene identification
We developed an analytic process by integrating various tools and data sources to prioritize the genes involved in detected CNVs. RefSeq genes encompassed in the pathogenic CNVs and likely pathogenic CNVs were assembled as the starting gene list. The genes in deletions and duplications were analyzed separately (Additional file 1). Independently, we also used the same prioritization process to evaluate the novel CHD candidate genes involved in the pathogenic CNV(s) of each patient.

Diagnostic yield of CMA testing for patients with CHD
Among 422 CHD patients from BCH, 12 individuals were found to have gross chromosomal aberrations including five trisomy 21, five monosomy X, one trisomy 18, and one 18q partial trisomy. In the remaining patients, we detected 50 pathogenic CNVs in 42 patients (10.2%) and 28 likely pathogenic CNVs in 24 patients (5.8%) (Additional file 1: Table S1, S2). The overall diagnostic yield of CMA testing for patients with CHD was 18.5% when considering pathogenic, likely pathogenic CNVs and aneuploidies as positive finding. The minimal diagnostic yield was 12.8% if only the cases with pathogenic genomic imbalances (including aneuploidies) were included. The majority of pathogenic CNVs (~74%) detected in patients were smaller than 10 Mb in size, which would presumably be missed by karyotyping, again demonstrating the superior technical validity of microarray in detecting clinically relevant CNVs over karyotyping.
Diagnostic yield of CMA in syndromic vs. non-syndromic CHD All 12 patients carrying aneuploidy exhibited a syndromic CHD phenotype. The remaining 410 individuals were divided into two groups: isolated CHD or syndromic CHD based on medical records. The former consisted of 162 patients and the latter consisted of 248 individuals exhibiting extracardiac phenotypes in addition to heart defects. The most common extracardiac phenotypes were ID/DD, ASD, behavioral features, hypotonia and craniofacial dysmorphism. Even after excluding aneuploidy cases, there were significantly more non-polymorphic CNVs (CNVs not recurrent in general population) in syndromic CHD than in isolated CHD (Table 1, p = 0.0078). The p value for pathogenic CNVs only (p = 0.0013) and for pathogenic and likely pathogenic combined category (p = 0.0024) also reached statistical significance. Based on this analysis, the diagnostic yield of CMA for isolated CHD was 4.3% (pathogenic CNVs)-9.3% (pathogenic and likely pathogenic CNVs), whereas the diagnostic yield for all syndromic CHD (excluding aneuploidy cases) is 14.1 (pathogenic CNVs)-20.6% (pathogenic and likely pathogenic CNVs) ( Table 1).

Diagnostic yield of CMA related to CHD sub-types
We further compared the CNV detection rates between syndromic CHD with co-occurring neurodevelopmental disorders (NDD) including DD, ID and ASD with those without NDD. Twice as many pathogenic CNVs were detected in CHD patients with NDD than those without NDD ( Table 1), indicating that patients with co-morbid features of CHD and NDD were more likely to harbor pathogenic CNVs. This finding also suggested that CNVs detected in syndromic CHD patients were not solely contributing to NDD which are known to be associated with CNVs.
To further delineate the association of CHD subcategories with CMA detection rates, we classified the BCH cases into nine categories (Additional file 1: Table S3). Among patients with isolated CHD, those with compound CTD (category F), hypoplastic left heart syndrome (category G) and obstruction of left ventricular outflow tract (category D) were more likely to harbor pathogenic CNVs (Table 2). When all CHD cases were considered, patients with isolated CTD (category E) exhibited the highest CMA diagnostic rate (14.8%). In addition, CHD patients with compound CTD (category F) and septal defects (category A) also reached a >10% diagnostic rate. In contrast, CHD patients with heterotaxy (category H) or valve defects (categories C) were less likely to have a pathogenic CNV.
Among 92 SCMC patients mainly affected with CTD, a total of 26 non-polymorphic CNVs were detected, 11 of them were classified as pathogenic or likely pathogenic (Additional file 1: Table S1, S2). The CNV detection rate for this cohort was about 12%, which is similar to that of BCH patients with the same sub-phenotype, thus we independently confirmed the significant involvement of CNV in CTD.

Identification of known and novel recurrent CNVs associated with CHD
A genome-wide CNV analysis for a total of 502 CHD cases (410 from BCH cohort and 92 from SCMC cohort; Additional file 1: Figure S1) led to the detection of 209 (183 in BCH cohort and 26 in SCMC cohort) nonpolymorphic CNVs. As a result, a total of 89 CNVs at 57 unique chromosome loci were considered to be of known or possible clinical relevance in this study. They were widely distributed on different chromosomes ( Figure 1). We observed 32 recurrent (n ≥ 3) CNVs distributed at six chromosomal loci (Additional file 1: Table S1, S2) which include 12 imbalances (nine deletions and three duplications) at 22q11.2 and five aberrations (three deletions and two duplications) at the 8p23.1 involving the GATA4 gene, both loci are known to be associated with syndromic or isolated CHD. In this study, we also identified five patients with 4q terminal deletions which range from 4, 600 kb to 19, 300 kb in size ( Figure 2A). Similar deletions were not detected in 9170 control cases (Table 3), and are not reported in DGV (http://dgv.tcag.ca/ accessed March, 2014). 4q terminal deletion is known to cause 4q-syndrome where 50% of affected individuals have CHD, and a cardiovascular critical region has been narrowed down to 4q32.2-q34.3 [25]. The smallest overlapping region (SOR) among our 4q terminal deletion cases was about 4.6 Mb in size at 4q35.1-qter. This SOR didn't overlap with the previously defined critical region ( Figure 2A). Thus our study potentially maps a novel CHD critical region at the 4q terminus. There were 24 Refseq genes at this interval, although no known CHD genes existed, we propose several possible candidate genes in discussion.
In addition, we identified three other genomic loci with significantly higher frequencies in cases than in controls. These three loci were 15q11.2 (p = 0.0289), 16p12.2 (p = 0.0025) and Yp11.2 (p < 0.0001) ( Table 3) respectively, which were also considered as possible novel loci associated with cardiac development.

Identification of novel CHD candidate genes
Among 57 CNV regions of interest, ten CNVs contained genes known to be causal for CHD (Additional file 1: Figure S1 and Table S4). In order to identify novel CHD candidate genes, we examined the genes within the remaining 47 loci (Additional file 1: Table S5). Starting from 647 genes in deletion CNVs and 517 genes in duplication CNVs (Additional file 1: Figure S2), we performed a gene prioritization process using Endeavour and ToppGene. 18  genes in deletion CNVs and 18 genes (Additional file 1: Table S6 and Figure S2) in duplication CNVs in the category of "Cardiovascular System Development and Function" were identified as novel CHD candidate genes through mouse embryonic expression pattern analysis and Ingenuity Pathway Analysis (IPA) analysis (for details see Additional file 1). Furthermore, the same gene prioritization process was performed for individual cases carrying pathogenic CNVs of unknown CHD significance. A total of 39 genes were identified in 19 cases (Additional file 1: Table S7).
Of note, 20 of these genes were also contained in the global prioritization list (bold genes in Additional file 1: Table S7). These shared genes are considered to be the most likely dosage sensitive novel CHD candidate genes.

Discussion
Diagnostic yields of CMA testing CMA has been recommended as the first-line test in the initial postnatal genetic evaluation of individuals with MCAs, DD/ID and ASD [4]. CHD is known to be frequently associated with CNVs (Additional file 1: Table S8).  Currently, patients with CHD are routinely offered CMA testing. In contrast to previous publications (Table 4), the present study documents the diagnostic yields for several sub-categories of syndromic and non-syndromic CHD using the largest cohort. Previous clinical studies have demonstrated a higher CNV diagnostic yield in syndromic CHD than that in isolated CHD, but these studies were all done separately for either syndromic CHD or isolated CHD by different array platforms (Table 4). Using the largest cohort of CHD from one clinical setting, we were able to assess the CMA diagnostic yields for both syndromic and isolated CHD patients by the same CMA platform. Our study convincingly demonstrated a significantly higher CNV detection rate in syndromic CHD (18.1%) than isolated CHD (4.3%).
Importantly, our data also revealed that the CNV diagnostic yields differ among different CHD subcategories, indicating different CHD sub-phenotypes may have different pathogenic mechanisms. The findings that isolated CTD, compound CTD, and septal defects were more likely to be associated with CNVs than heterotaxy or valve defects provided practical guideline for referring CHD patients for CMA testing. However, the number of cases in each sub-category was still small in this study.
Research involving a larger sample size is warranted to further delineate the correlation between CNV rate and CHD sub-phenotypes.
In the clinical setting, the pathogenicity of a CNV is assessed based on gene content, CNV size, and literatures. In many instances, the causal relationship between a particular CNV and a particular phenotype cannot be easily established. It is likely that not all of the pathogenic CNVs detected are directly causative of CHD. Thus, the exact diagnostic yield of causal CNVs for CHD may be less than the overall pathogenic CNV detection rate (12.8%). We identified 19 CNVs with known CHD genes (Additional file 1: Table S2). In addition, we also detected 12 aneuploidies that are known to be causally associated with CHD phenotypes. Therefore, a total of 31 cases out of 422 (7.3%) have chromosomal imbalances that are known to cause CHD. Additionally, we believe that a significant fraction of the remaining CNVs currently with unproven causal relationship with CHD may turn out to be novel CHD loci (such as 4q terminal deletion and other novel candidate CHD loci).Thus, although the exact diagnostic yield of causative CNVs is difficult to assess, it is reasonable to believe that the actual diagnostic yields is higher than 7.3%, and somewhat smaller than 12.8%. This level of diagnostic yield is similar to that for ASD (7%) [26], DD/ID or MCAs (10-12%) [3,27]. Thus our findings provide strong evidence for CMA to be used as the firstline genetic diagnostic test for patients with CHD as well.
Many CHD patients in our study exhibited comorbid features of DD, MCAs or ASD, which are known to have a significant association with CNV. The fact that syndromic CHD patients have a higher pathogenic CNV detection rate than cases with only DD/ID/ASD suggests that not all CNVs detected in our syndromic CHD patients can be attributed to the DD/ID or ASD phenotype. Our data demonstrated an additive effect on CNV burden when these phenotypes co-occur with CHD. The fact that Diagnostic yield was defined as the number of patients with abnormal aberrations divided by the total number of cases tested. In patients with syndromic CHD, pathogenic chromosomal imbalances were detected in about 16%-25% of cases. But the diagnostic yield of CMA in isolated CHD cohort was poorly studied. patients with isolated CHD exhibited a CNV diagnostic rate of 4.3% further supports the significant contribution of CNV to the pathogenesis of CHD. High diagnostic yields provided strong supporting evidence to justify the routine use of CMA test in clinical evaluation of patients with CHD. While diagnostic yield is important, we are also interested to assess the clinical utility of CMA test for patients with CHD. A follow-up study will be focusing on how the CMA test results impact patient care and management.

Discovering novel CHD loci and candidate genes through CNV detection
In this study, the top five most frequently detected genomic imbalance events in CHD cases were 22q11.2 deletion/duplication, 8p23.1 deletion/duplication, trisomy 21, monosomy X and 4q terminal deletion. The first four genomic imbalances are known to be causally related to CHD. TBX1 and GATA4 are the known key causative genes for CHD phenotypes for 22q11.2 and 8p23.1, respectively. The fact that all five 4q terminal deletion cases were detected in CHD patients and none in the control strongly supports the notion that 4q terminal deletion is a novel CHD-causing locus.
4q terminal deletion is a subgroup of 4q-syndrome, which has CHD in about 50% of the cases. Our five patients presented different CHD phenotypes, including CTD, hypoplastic left ventricle, septal defects and obstruction of left ventricular outflow tract. Additionally, three of them have comorbid features of extracardiac presentations (Additional file 1). A cardiovascular critical region (4q32.2-q34.3) has been mapped for the 4q-syndrome, and three genes (TLL1, HPGD, and HAND2) were proposed to be the key genes responsible for the cardiovascular phenotypes [25]. Interestingly, the SOR region of our five 4q terminal deletion cases does not overlap with this cardiovascular critical region (Figure 2A), suggesting that our 4.6 Mb region represents a novel CHD critical locus.
4q terminal deletions often co-occur with terminal duplications of other chromosome as a consequence of imbalanced segregation of a balanced parental translocation. In our study, the three largest 4q terminal deletion cases also carried terminal duplications on another chromosome (Additional file 1). The fact that the other involved chromosomes were different in each case and that the remaining two cases only carried the pathogenic 4q terminal deletion makes a strong argument that it is the genes within the 4q terminal deletion region, not on the other involved chromosomes, that are causal for CHD phenotype. Through literature review, we identified two other CHD cases with small 4q terminal deletions that overlapped with our SOR region ( Figure 2B). All three 4q terminal deletions involved the SORBS2 gene, encoding a signal transducer that is highly and nearly exclusively expressed in epithelia and cardiac muscle tissue in the mouse embryo (Additional file 1: Figure S3A). Strong expression in cardiac tissue suggests that this gene may play a significant role in heart development. Several previous studies also support the SORBS2 gene as a critical gene for CHD [28][29][30].Our SOR and case 2 ( Figure 2B) also contained the PDLIM3 gene. The functional disruption of Pdlim3 in mice results in right ventricular dysmorphogenesis, trabeculation failure, and chamber dilatation [31,32], supporting the involvement of this gene in heart development. Maurin et al. suggested both PDLIM3 and SORBS2 were involved in cardiac and muscle development, and could be responsible for cardiac defects observed in terminal 4q35.1 deletions [29]. Additionally, the SLC25A4 gene (MIM 103220), which encodes a member of the mitochondrial carrier subfamily of solute carrier protein, was previously associated with familial hypertrophic cardiomyopathy [33] and showed high expression in mouse embryonic heart (Additional file 1: Figure S3B). In fact, one case with 4q terminal deletion in our study presented with cardiomyopathy. It is currently unknown if any single gene at 4q terminal is sufficient to cause CHD, or if CHD occurs due to multiple gene deletion. Based on the above analysis, we propose that SORBS2, PDLIM3 and SLC25A4 are the critical genes associated with 4q terminal deletion for the CHD phenotype.
Recurrent deletions at locus 15q11.2 were statistically enriched in our CHD cohort. The region between BP1 and BP2 at 15q11.2 has been previously implicated as a contributory genetic cause of susceptibility to schizophrenia, behavioral disturbances, and intellectual disability [34,35]. It is well known that the 15q11.2 deletion has low penetrance (for example only 2% for schizophrenia) [36]. Soemedi et al. was the first one to report the strong association of this variant with the risk of multiple heart defects, especially left-sided malformations [8]. However, no additional study followed. Our study provides independent support for the contributory role of 15q11.2 in CHD pathogenesis. We detected a total of 33 cases with 15q11.2 deletion. Three of them (9.1%) exhibited CHD phenotypes. Thus the penetrance of 15q11.2 deletion for CHD is also low. Additional genetic factors may be required for the manifestation of CHD.
We also identified three 16p12.1(hg 18) microdeletions involving the EEF2K and CDR2 genes, which have been previously linked to intellectual disability and neuropsychiatric phenotypes [37]. Other features including cardiac anomalies are frequently observed in individuals with 16p12.1 deletion. Girirajan et al. identified seven individuals with CHD phenotype out of 21cases carrying this imbalance [38], suggesting its significant predisposing role to heart malformations. In our study, three out of five patients with 16p12.1 deletion exhibited CHD phenotype, demonstrating a relative high CHD penetrance of this imbalance.
CNVs detected in CHD patients provide a unique source for identifying novel CHD candidate genes. In this study, using gene prioritization approaches, we identified 20 novel candidate genes (11 genes in deletion CNVs and nine genes in duplication CNVs, Additional file 1: Table S9). We gathered additional supporting evidence including gene expression and mouse phenotype (Additional file 1: Table S9). We found that all of them had positive expression in mouse embryonic or adult heart. Some genes such as Ets1, Nfatc1, Cnn1 and Rps6ka2 exhibited a high expression level. The knock-out mice of all genes in deletion CNVs (except Ptch1) exhibited abnormal cardiovascular development. Of note, mice homozygotes for the targeted null allele of Crk, Efnb2, Hey1, Nfatc1 and Shh display defects in heart morphogenesis. Although the function of these genes on human heart development is still poorly studied, two heterozygous mutations in NFATC1 were recently reported in a patient with tricuspid atresia [39], and another recent study supported that NFATC1 plays an important role in cardiac development [40]. For genes in duplication CNVs, Dll1 knock-in mice and mice with mutations of the Qki gene displayed CHD involving impaired blood vessel morphology and abnormal heart looping. Taken together, these 20 genes are considered to be the most likely candidate CHD genes. Mutation screening in human CHD patients and functional studies will provide further evidence to demonstrate their causal relevance with CHD.

Conclusion
In summary, the high clinical diagnostic yield of CMA for patients with CHD justify CMA to be used as a first-tier genetic test. Syndromic CHD cases are expected to have a much higher pathogenic CNV detection rate. CMA also provides diagnostic value for isolated CHD patients. The CNVs detected in CHD patients represent a wealth of CHD candidate genes that warrant further investigation.