Skip to main content

Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach



The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and disease-markers has been one of the central topics in CHD research. By using the high-throughput genomic data of CHD which are available recently, network-based methods provide powerful alternatives of systematic analysis of complex diseases and identification of dysfunctional modules and candidate disease genes.


In this paper, by modeling the information flow from source disease genes to targets of differentially expressed genes via a context-specific protein-protein interaction network, we extracted dysfunctional modules which were then validated by various types of measurements and independent datasets. Network topology analysis of these modules revealed major and auxiliary pathways and cellular processes in CHD, demonstrating the biological usefulness of the identified modules. We also prioritized a list of candidate CHD genes from these modules using a guilt-by-association approach, which are well supported by various kinds of literature and experimental evidence.


We provided a network-based analysis to detect dysfunctional modules and disease genes of CHD by modeling the information transmission from source disease genes to targets of differentially expressed genes. Our method resulted in 12 modules from the constructed CHD subnetwork. We further identified and prioritized candidate disease genes of CHD from these dysfunctional modules. In conclusion, module analysis not only revealed several important findings with regard to the underlying molecular mechanisms of CHD, but also suggested the distinct network properties of causal disease genes which lead to identification of candidate CHD genes.


Congenital heart disease (CHD) is among the most common human congenital defects, and is the leading cause of infant morbidity in the world [1, 2]. Although CHD is known to arise from abnormal heart development during embryogenesis [3, 4], its molecular mechanism remains far from clear. Currently, about 30 different genes have been known to cause CHD. Understanding the molecular functions, molecular interactions and represented pathways implicated in these CHD genes contribute to our knowledge of CHD pathogenesis, and therefore help improve clinical diagnosis and medical care of this disease. Network-based methods are powerful tools of systematic analysis of complex diseases, and identification of major pathways, responsive modules and candidate genes [5]. Previous works used those approaches to analyze heart development and cardiovascular disorders [6, 7], however, there is no such study on CHD due to lack of its genome-wide expression data, which is publicly available only until very recently [8, 9]. Therefore, in this paper by exploring high-throughput genomic data to elucidate essential roles of local network structures in CHD progression, we aim to provide such a network-based study on CHD as well.

To discover molecular pathogenesis in complex disease, considerable efforts have been made to elucidate the relations between variability in gene expression and genotype [1012], and putative disease genes curated from literature research can be regarded as the source of molecular perturbations while differentially expressed genes identified from mRNA profiling can represent the responsive components of source perturbations. It is also noted that disease genes are not necessarily differentially expressed [13]. Differential gene expression level represents the changed phenotype that is potentially associated with the causal disease genes. Hence, linking causal disease genes with responsive differentially expressed genes by modeling the information flow in protein interactome can better reveal dysfunctional subnetworks and help the identification of disease modules. Previous research has established the analogy between random walks and electric networks [10, 14]. Doyle and Snell [14] showed that when a unit current flow enters a source node and leaves a sink in the circuit network, the amount of current passing through any intermediary node or edge is proportional to the expected number of times the random walker visits that node or edge. Then, the amount of current passing though each node can be computed by solving a system of linear equations based on Kirchhoff's and Ohm's laws. Several recent studies also used such circuit flow networks to discover causal genes and associated pathways or to analyze gene network centrality [10, 11, 15, 16].

In this paper, we construct a CHD subnetwork and identify dysfunctional modules by developing a novel network-based computational approach which integrates protein-protein interactions, gene coexpression profiles and causal paths from putative CHD genes to target genes. We evaluate the functional implications of our modules for phenotype classification, and further reveal their higher order topological relationships by exploring their represented biological processes and crosstalk. Results show that our modules are better disease-markers than documented pathways, and have the discriminative power stably across several independent microarray datasets. In particular, correlation analysis reveals that each module is also a group of significantly coexpressed genes; module interaction analysis characterizes the higher-order topology of these identified modules; functional enrichment and module-pathway crosstalk analysis suggests that the topology of a module is highly related to its roles in CHD. While the modules in central place of the CHD subnetwork are enriched in core CHD-related dysfunctional processes, such as anatomical structure morphogenesis, cell differentiation and cytoskeleton organization, and regulate key pathways of CHD such as cardiac muscle contraction, Notch signaling pathway and ECM (extracellular matrix)-receptor interaction, the modules in peripheral place are enriched in auxiliary processes, such as cell communication and various metabolic processes, and regulate less disease-related pathways. In addition, we find that CHD casual genes exhibit different network features, i.e. disease genes tend to have lower current flow and participate in fewer dysfunctional subnetworks than expected. Moreover, we provide a list of candidate CHD genes by module analysis, where the top ranked genes in the candidate list are all well supported by literature and experiment evidence. The results not only elucidate the functional roles of the modules on CHD, but also provide some insights into the underlying molecular mechanisms of CHD which lead to identification of candidate CHD genes.


Identified dysfunctional modules

The work flow of our method of identifying dysfunctional modules is shown in Figure 1. To capture the information flow from causal genes to target genes and to identify dysfunctional modules from these causal paths, we first identified 85 target genes which are defined as those differentially expressed (DE) in sufficient proportion of patients (Additional File 1), and then connected each known causal genes of CHD with these target DE genes via shortest paths shown in Figure 1A and 1B. We model the protein interaction network as an electrical circuit where correlation coexpression of two end nodes of an interaction is used as the conductance of a resistor, and biological signals from disease genes propagate through PPI edges to responsive genes just like electrical current flows through resistors [11, 15]. Information from source genes will propagate its effect via protein-protein interactions, and DE genes which cover the majority of patients represent common dysregulated pathways in CHD. In the third step, we merged the paths from one causal gene to all target genes into a subnetwork, and computed the current flow for each gene to evaluate its importance in this local subnetwork as shown in Figure 1C. To measure the importance of one node in conducting electrical current, we computed the current flows through the node using the electronic laws [14], and defined the information flow score of the node as the sum of current through the node among all pair-wise combinations of the source node (the causal disease) and all target nodes (all responsive differentially expressed genes). Since the causal subnetwork can overlap, i.e. a gene can have several current flow scores, we assigned the gene to the subnetwork in which its current flow is maximum to derive mutually exclusive modules. The modules can also be thought of as an information-processing unit [5, 17], therefore we put the gene into the module where the largest amount of information passes through it. At last, reasoning that highly connected genes are more likely to coordinate and perform the same biological functions, we pruned the modules and iterated this process until each of them was connected as shown is Figure 1D (further details can be referred to Methods).

Figure 1
figure 1

Method Overview. Triangles represent disease genes, rectangles represent target genes and grey circles represent other genes on the PPI network. (A) We overlaid expression profiles to PPI network, and mapped putative disease genes and differentially expressed target genes onto this weighted PPI network. (B) For each disease gene, we found its shortest paths to all target genes. (C) We computed the information flow for each node in a given subnetwork, and genes participating in several subnetworks can have several information scores. (D) We identified modules by assigning a gene to the module in which its information score is maximum.

Totally, our method resulted in a connected CHD subnetwork consisting of 498 nodes and 2413 edges (Figure 2A). We identified 12 major disjoint modules by assigning genes to the group which maximizes the information flow scores while keeping each module inter-connected the CHD subnetwork (Figure 2B-E). The size of each module is shown in Table 1 and the list of all genes in each module is shown in Additional File 2.

Figure 2
figure 2

CHD subnetwork and representative modules. (A) CHD subnetwork extracted from the weighted PPI network. Different colors represent different modules. All the modules with size smaller than 10 are in the same green color. (B) Pie chart of module size, where modules with size smaller than 10 are grouped into category "others". (C-E) Three representative modules. Dark red diamonds represent putative disease genes, and red rectangles represent final candidate genes identified by our method.

Table 1 Module information.

Modules as disease-markers

To quantitatively investigate the relationships of the identified modules with disease phenotype, we computed mutual information (MI) of each module with sample phenotype. The results are shown in Table 1. We used the microarray data, which represents mRNA extracted from right ventricles (RV) of 16 children with Tetralogy of Fallot (TOF, the most common type of CHD) at the time of reconstructive surgery with 5 controls (NCBI GEO Accession ID GSE26125), to compute the activity vector of each module across all 21 samples. We used MI as the discriminative score to assess whether each module activity vector has significant relationship with phenotypes. Among 12 modules, 10 have significant mutual information (empirical p-value<0.05) compared with 1000 random gene sets of the same size (shown in Figure 3). Most modules are significantly correlated with disease phenotype, reflecting the synergistic differential expression within them.

Figure 3
figure 3

Synergistic differential expression of modules compared with random gene sets of the same size.

To further evaluate the quality of our modules in terms of distinguishing disease from control samples, we computed the activity matrix of 12 modules (modules versus samples) and build logistic regression models [18] in a five-fold cross-validation approach. For comparison, we computed activity matrix of 12 KEGG pathways that are most enriched in the CHD subnetwork (Additional File 3). We also used 12 random gene sets of the same size as each module for control, and repeated this process 100 times. For experiments within the gene expression dataset of GSE26125, modules were first ranked by MI, and features were sequentially added to the classifier. Pathways were ranked by enrichment p-value, and were added sequentially. The result shows that modules are consistently better in the classification accuracy than pathways and random sets, while pathways are better than random sets only for the first five features (Figure 4A). To test the robustness of our module classifier, we also performed an independent cross-dataset experiment, in which classifiers were trained on GSE26125 and validated on GSE14970 (Figure 4B). In cross-dataset validation, both modules and pathways achieved similar high performance of classification accuracy, and clearly are much better than the random sets. Therefore, our modules are better classifiers of phenotype than pathways, and that their high classification performances are reproducible across independent microarray dataset. This provides evidence for the dysfunctional implication and importance of these identified modules.

Figure 4
figure 4

Within- and cross-validation performance comparison of modules and pathways. (A) Within-dataset classification evaluation. X-axis corresponds to the number of features added to the logistic classifier. Assuming that there are K features (modules/pathways/random gene sets), then for each kK, select the first k th modules to train the classifier. The final classification performance was reported as the AUC on the testing set using the classifier optimized from the validation set in a five-fold cross-validation. Modules were ordered in decreasing significance of MI, pathways were ordered in decreasing significance of enrichment, and random gene sets were in the order as their compared modules. (B) In cross-dataset classification evaluation, all 12 features were trained on GSE26125 and validated on 12 disease samples from GSE14970 combined with 5 controls from GSE26125.

Within module coexpression and cross module coordination

Besides comparing module expression profiles under disease and control status to evaluate its synergistic differential expression, another way to utilize microarray data is to exploit gene coexpression relationships within modules and across modules. Under the well-accepted hypothesis that genes exhibiting similar expression patterns across sample status are likely to have functional relevance [19], we reasoned whether genes within modules are significantly coexpressed, and whether modules have intense interactions in terms of interacting partners that are highly coexpressed. To analyze coexpression within modules, we computed the average Pearson's correlation coefficient (PCC) of all interacting proteins in each module, and compared it with that of 1000 random selected protein pairs of the same size (Figure 5A, Table 1). Result shows that 9 modules have significant coexpression (Mann-Whitney test, p-value<0.05), and all of them also have significant MI at the same time. If all 12 modules are regarded as a whole and compared with 12 random gene sets, the result is also significant (Mann-Whitney test, p-value = 2.96e-06).

Figure 5
figure 5

Inter-module coexpression and intra-module coordination. (A) To analyze coexpression within modules, we computed the average Pearson's correlation coefficient (PCC) of all edges in each module, and compared it with that of 1000 random control edge sets of the same size. (B) Module interaction identification was based on the weighted protein-protein interactions between modules. Edge width corresponds to the absolute value of PCC of two end nodes. Color edges represent cross-module interactions which will be used to compute module interaction, while grey edges are interactions within modules. (C) Intra-module coordination. Node colors range from green to red correspond to significance -log(p-value) of inter-module coexpression, node size corresponds to module size, and edge width corresponds to strength of interactions.

To analyze intra-modular connectivity, we detected the interaction among disjoint modules based on weighted protein interactions (Figure 5B). Module interaction network is visualized in Figure 5C, where only the 22 significant interactions (corrected p-value<0.05) are shown. For each module, we defined its intra-modular connectivity as the sum of the absolute value of correlation for all its outgoing edges, and used this score to evaluate its network centrality (Table 1). Module interaction network not only reveals connectivity patterns between modules and their positions in CHD subnetwork, but also sheds light on the implications of various biological processes and pathways represented by each module. Therefore, besides topological meanings of this score, we would identify its biological underpinnings after examining the compositions of modules using annotated gene ontologies in the next subsection.

Functional enrichment analysis of modules reveals crucial biological processes of CHD

Utilizing our previously developed NOA (Network Ontology Analysis) method of functional enrichment analysis [20], we identified enriched GO terms in each module. The representative enriched GO terms with FDR<0.05 in each module are presented in Table 2, and the list of all enriched GO biological processes are in Additional File 4. The top-scored Module 10 (Figure 2D) is enriched in anatomical structure morphogenesis, cytoskeleton organization and cellular component assembly. Since CHD manifests itself through various aberrations in heart morphogenesis, and more than half of known causal disease genes are transcriptional regulators of heart morphogenesis, the fact that anatomical structure morphogenesis ranks 1st in the most important module is expected. The other two categories, i.e. cytoskeleton organization and cellular component assembly, can contain downstream effector genes involved in muscle development, such as sarcomerec gene like ACTC1, DYNLL1 and DMD. Particularly, ACTC1 is a known causal gene in Module 10 [21, 22]. It encodes the cardiac actin protein that is an essential structural component of the thin filaments of sarcomeres. Mutations in ACTC1 seem to reduce affinity of actin for myosin, and cause various CHD phenotypes such as ASD (atrial septal defect) and VSD (ventricular septal defect) [22]. ACTC1 contains all GO terms significantly enriched (p-value<0.05) in Module 10, demonstrating the functional relevance of Module 10 to CHD. Module 12 (Figure 2E) ranks 2nd, and is enriched in negative regulation of apoptosis, programmed cell death, cell differentiation and developmental process. Module 12 is also highly enriched in Notch signaling pathway, which plays roles in various developmental processes by controlling cell fate decisions, and is known to be very important in cardiac development. Module 12 contains 2 causal genes, NOTCH1 [23] and JAG1 [24], again, both of which participate in all of the 40 top enriched GO processes (p-value<1.6E-4). Table 2 also suggests that in each module, CHD genes participate in most of its top ranked GO terms, and the functional of the module is very similar to its contained CHD genes. This can also prove the functional coherence within each module, and functional correlation between these modules.

Table 2 Enriched GO terms in each module.

We have shown that modules with highest scores, being in the central place of the subnetwork, are enriched in core processes related to heart development, and that CHD genes participate in all the top ranked GO processes in a module. We then investigated the modules with lowest scores in the marginal place of subnetwork and identified their relations with CHD. Module 4 (Additional File 5) is ranked as the last, and is enriched in phosphorus metabolic process. Module 9 (Additional File 5) is enriched in cell communication, response to steroid hormone stimulus, lipid metabolic process and positive regulation of transcription. Module 5 (Additional File 5) is enriched in regulation of macromolecule biosynthetic process and nitrogen compound metabolic process. Although these GO processes seem to have less relationship with CHD, these auxiliary roles are important to facilitate key processes of heart development. Considering the fact that fetal heart developmental program involves intense transcription regulation, ligand-receptor interactions and signaling transduction, various macromolecules including hormones, cytokines and growth factors in the circulation or in the extracellular space of the heart, acting as ligands, can stimulate receptors in the cell membrane of cardiac cell [2], it is reasonable to assume that cell communication, transcription regulation, steroid hormone stimulus and macromolecules metabolism are more generally associated with CHD and can actually facilitate all related processes. Since they are not specific to CHD, they appear on the marginal place of the subnetwork. The fact that these modules can also contain causal disease genes demonstrates their relevance to CHD. For example, GATA4 and TBX5 are in Module 9, and both of them are important transcription factors in developmental processes. TBX5 protein associates with cardiac transcription factors including GATA4 and NKX2-5, and they activate many downstream cardiac effector targets such as sarcomeric proteins and vasoactive proteins. Various kinds of mutations in GATA4 and TBX5 can lead to various subtypes of CHD [25, 26].

Module-pathway crosstalk analysis reveals hub modules regulating key pathways of CHD

To analyze pathways represented by modules, and how they are coordinated by modules to account for the observed phenotypes, we went beyond identifying lists of pathways significantly enriched in each module, or simply counting the number of overlapping genes shared by the module and the pathway. As previously shown, there is a distinction between a pathway and a module. A pathway is a specific information-flow conduit, consisting of a series of molecular interactions while a module is an information-processing unit with self-contained cellular functions [17]. Therefore, modules can contain multiple pathways while pathways can be coordinated by various modules to allow inter-module connections. With this in mind, we implemented an analysis of module-pathway crosstalk similar to previous module interaction procedure. For a given module-pathway pair, we considered both common genes and weighted protein-protein interactions between them (See Methods for details of constructing module-pathway crosstalk network). Network view of module-pathway crosstalk is shown in Figure 6A, where circles represent modules, rectangles represent pathways, and edge width corresponds to strength of interaction between module and pathway. Heatmap of pathway-module crosstalk is shown in Figure 6B, where rows represent 85 pathways that are significantly (empirical p-value<0.01) influenced by at least one module, and columns represent modules and their influence to the 85 pathways. Module 10 is the most influential and many of its influenced pathways are closely related to heart progression, such as cardiac muscle contraction, Dorso-ventral axis formation, gap junction and regulation of actin cytoskeleton. Specifically, the gap junction is very important in cardiac muscle, through which the signal to contract is efficiently passed, allowing the heart muscle cells to contract in tandem. For example, GJA1 encodes a gap junction protein, and gene conversion in GJA1 has been found in patients with CHD [27]. In brief, these influenced pathways are consistent with major roles of Module 10 in biological processes like anatomical structure morphogenesis and cytoskeleton organization. Module 12 is the only one that can influence Notch signaling pathway, which is also consistent with its GO enrichment. Module 12 and module 2 are the only two modules that regulate ECM (extracellular matrix)-receptor interaction, a very important pathway in tissue and organ morphogenesis and in the maintenance of cell and tissue structure and function. The extracellular matrix protein, FLNA, for example, can cross-links actin filaments and participates in anchoring actin cytoskeleton to membrane proteins. Loss-of-functions mutations in FLNA are lethal to male and cause various CHD-related phenotypes in female [28, 29]. Although Module 12 and Module 2 do not affect as many pathways as Module 10 does, their dominant roles in regulating two essential pathways demonstrate their substantial importance during heart development. From Figure 6B, Module 7 also seems important because it interacts with many pathways, while we find that it has much less influence to pathways highly related to CHD. Compared with Module 10, it shows significant drop for many essential pathways related to CHD such as cardiac muscle contraction, Dorso-ventral axis formation, gap junction and regulation of actin cytoskeleton, while some pathways without direct relations to CHD such as Type I diabetes mellitus and Olfactory transduction, exhibit significant increase. Therefore, we reasoned that the importance of a module should be evaluated not by the number of pathway that it affected, but by its influence to important pathways implicated in CHD.

Figure 6
figure 6

Module-pathway crosstalk. In both subfigures, only pathways which contain at least one gene on the CHD subnetwork are shown. (A) Network view of module-pathway crosstalk. Blue circles represent modules and green rectangles represent pathways. Edge width corresponds to strength of interaction. (B) Heatmap of module-pathway crosstalk. For clarity, module-pathway activity matrix is x-scaled, which means, for each pathway, colors ranging from blue to red represent the influential power of one particular module compared with all 12 modules.

Prioritization of candidate disease genes by module analysis

We have proved the robustness of our modules for disease phenotype classification, investigated their functional roles during heart development, their implications in CHD, and their interactions with each other as well as with those documented pathways. We also provided an application of identified dysfunctional modules to detect disease genes. We brought out a list of prioritized candidate disease genes by integrating module memberships, network centrality analysis and GO semantic similarities (See Additional File 6 for detailed methods). We first identified candidate genes that are either on the shortest paths connecting 14 causal disease genes or the first interacting neighbors of them in the CHD subnetwork, then assigned a score to each of them with investigation of its module membership and its GO semantic similarity with the putative disease genes in the same module, and at last filtered genes with significant different network locations (Additional File 6), which resulted in a final list of 60 candidate disease genes. The top ten candidate genes are shown in Table 3 and the full list of candidate disease genes are shown in Additional File 6.

Table 3 Top 10 candidate disease genes and supporting evidence.

HAND2 has significantly higher score than other candidate genes and ranks 1st in our list. HAND2, and causal disease gene HAND1, whose somatic mutations have been reported to contribute to CHD [30], both belong to basic helix-loop-helix (bHLH) transcription factors, and are expressed in cardiac mesoderm during embryogenesis. HAND1 expression is limited in future left ventricle while HAND2 expression is limited in future right ventricle [31]. Targeted gene deletion of HAND2 in mouse embryos resulted in embryonic lethality from heart failure [32]. In a recent research, Shen et al. [33] screened for mutations in the HAND2 genes in 131 patients with various forms of CHD. Seven mutations in HAND2 were identified in 12 out of these patients. It is very likely that HAND2 is indeed a CHD disease gene. FOS is ranked 2nd, and can form transcription factor complex AP-1 with proteins from the JUN family. FOS proteins are implicated as regulators of cell proliferation, differentiation and transformation. Literature text-mining using GeneCards Version 3 [34] reveals that FOS is involved in cardiac hypertrophy. NOTCH2 ranked 3rd, and together with NOTCH1, whose autosomal dominant mutations can cause CHD [23], belong to single-pass transmembrane receptors that regulate many developmental pathways. Mutations in gene encoding NOTCH2 leads to Alagille syndrome [35], with cardiac phenotypes of peripheral pulmonary artery stenosis and septal defects. Besides genes with high score and well supported by literature search, our candidate gene list also contains genes with low scores. Currently, these genes have fewer, if any, favorable GO evidence similar to disease genes in the module, but some of them would be potential novel disease genes and worth further experimental research.


It has long been noted that genetic, physical interaction and transcriptional expression data are complementary to each other in response to molecular perturbations [3638]. We also found that not all CHD genes are differentially expressed and that none of them can be the target gene. Therefore, using a source-sink relationship is better than simply using disease genes as seeds for module identification in that the former regards the whole transcritome as phenotype data while the later merely utilizes disease/control status as phenotype. Actually, it was pointed out previously that a gene-disease phenotype association provides little insight into the molecular mechanism for the association [10, 39]. Bridging the discordance between CHD genes and target genes through a source-sink approach can not only identify intermediary genes not discovered in the transcriptional data itself, but also reveal dysfunctional or even causal pathways and modules for CHD. In our approach, information from the source gene will propagate its effect via protein-protein interactions, and DE genes which cover the majority of patients represent common dysregulated pathways in CHD. It is reasoned the common pathways, regardless of various kinds of source mutations, are responsible for the final set of similar CHD phenotypes. One of the applications of methods like ours is that by identifying intermediary genes involved in signal transduction, it can suggest potential intervention nodes for drug targeting.

Our method simulated the causality-response relationship between the known disease genes and differentially expressed genes. The dysfunctional information transferring between them in the network is utilized to identify the disease modules, which have proved to be highly related to the phenotypes. The robust performance has to be evaluated for our approach in computing information score -- we identify the shortest paths from a source gene to all target genes, and compute information score in this subnetwork, which means that the information score for each gene in this subnetwork is only related to a particular source gene. The choice of subsets of source genes changes available subnetworks, and can influence the result during the module identification procedure, in which a gene is reassigned to the modules which maximize its information score while maintaining connectivity. To evaluate the robustness and reproducibility of modules when some known disease genes are withheld (Additional File 6), we generated 100 alternative sets of testing modules and computed the overlap with the reference modules. The modules have significantly consistent overlap with testing modules (Additional File 6). This indicates that our methods are robust to identify these modules when we eliminate some source genes. The details of the tests and results are available in Additional File 6. From these dysfunctional modules, we also identified some candidate CHD genes which are based on the score of associating with the causal disease genes.


Using a network-based approach, we identify dysfunctional modules and disease genes in CHD by modeling the information flow from source disease genes to targets of differentially expressed genes. Several other studies also inferred causal genes, dysregulated pathways and central nodes of a network using flow-based approach [11, 15, 38]. However, we went a step further and brought a group of dysfunctional modules. By considering only one source node and multiple target nodes in each circuit flow instance, and iterating over all source nodes, we identified 12 dysfunctional modules which are well validated by various types of statistical measures, an independent dataset and biological annotations. Although not required by our method, each module contains CHD genes, making it possible to prioritize candidate CHD genes using a "guilt-by-association" approach. The identified dysfunctional modules show their biomarker properties for CHD phenotypes and these candidate disease genes will benefit further research.



GSE26125 consists of RNA extracted from right ventricle of 16 idiopathic TOF (Tetralogy of Fallot) patients and 5 controls obtained at the time of reconstructive surgery. The microarrays were CodeLink Human Whole Genome Bioarray, which contain > 54,000 probes. Raw data was preprocessed as described in the original paper [8, 9]. The microarrays of GSE14970 were Affymetrix Human Genome U133 Plus 2.0 Array. Raw data was preprocessed using RMA algorithm in R bioconductor [40]. Probe sets were mapped to NCBI entrez genes, and for genes with multiple probe sets, the average expression value of all corresponding probe sets was used. Together 8547 genes were expressed in the right ventricle tissues on GSE26125, which will be used for our study. 6818 genes were expressed in both GSE26125 and GSE14790, whose expression values will be used for cross-data validation. GSE14970 consists of 12 diseased samples (5 pre-operation acyanotic TOF and 7 pre-operation cyanotic TOF in the RV tissue) without controls but is the only alternative CHD-related dataset at this time, we added 5 control samples from GSE26125 and z-transformed the combined dataset.

Conditional-specific interaction network

We utilized protein-protein interaction data from several curated interaction databases in human, i.e., HPRD, BIND, BioGrid, IntAct, and MINT [4145]. Those interactions contained in at least three of the five databases were selected. Only genes both in the PPI network and in the microarray were used in the following study, and we extracted the maximum connected component of the network, which consists of 4761 nodes and 18084 edges. To construct a context-specific protein interaction network, the weight of each interaction between protein u and v, as well as the conductance of each edge e(u,v), was defined as their correlation value, i.e. w(u,v) = |corr(u,v)|

Module identification

Source CHD genes were extracted from a recent literature review [2], and 14 of them were mapped to the conditional-specific protein interaction network. Similar to the identification of Differentially Expressed (DE) genes and target genes in previous work [11], we have scaled and centered gene expression values into a Z-score. A gene is defined to be a differentially expressed gene in a disease case if its normalized gene expression value has a p-value <0.01 in the given disease case versus all controls using t-test, and a gene is defined as a target gene, if it has p-value<0.01 in more than 75% disease cases. Based on the definition of information flow score in previous work [15], we computed the information flow score of node k for a given source gene s as the sum of current through node k among all pair-wise combinations of the source s and all target DE genes Tar, detailed rationales for this information flow score can be referred to the original paper [15]. There are together 14 source genes, and the information flow through node k for a given source s is

I k = 1 2 t T a r N e i I k N e i ,

where I kNei is the current between node k and its neighbor, ∑ Nei is the sum of all neighbors connected to node k.

For a given pair of source node s and a target node t(tTar), we have the following equation for node k(kt) according to Kirchhoff's current law:

N e i v N e i - v k R k N e i + I k = X ,

where v k is the voltage at node k, and the sum ∑ Nei is for all direct neighbors of node k. X is a unit value of current when k = s and 0 otherwise. Let G=(N, E) represent the protein interaction network where N is the set of nodes and E is the set of edges, node voltages V can be computed by the following linear system of equations according to Ohm's law:

P × V = I ,

where P is a symmetric (N-1)×(N-1) conductance matrix, V is a (N-1)×1 vector of node voltages and I is a (N-1)×1 vector of currents passing through the nodes, the row and column of ground node is removed since its voltage is zero.

Such a linear system considered all interactions in the global network and required that every interaction can have a regulatory role for the expression of the target gene. We therefore implemented a straightforward heuristic approach: we identified the nodes that are on the maximum weighted shortest paths from a source gene to all target genes and performed the calculations of information flow score on this subnetwork. The modules were identified using the following algorithms.

Pesudocode for identifying modules is shown below:

  1. 1.

    For each causal disease gene si,

    1. a.

      For each target DE gene t j ,

      1. i.

        identify all shortest paths from si to t j ,

      2. ii.

        choose the path with maximum sum of weight,

    2. b.

      merge all nodes on the chosen paths into a subnetwork Sub i , for si,

    3. c.

      compute the information flow score I k for each node in Sub i .

  2. 2.

    For each gene on the subnetworks, put it into the module M m where its information flow score is maximum,

  3. 3.

    For each module M m , identify the unconnected genes. After extracting the unconnected genes from all modules, simultaneously put each of them into another module M n where the information score is the second largest.

  4. 4.

    Repeat step 3 until all modules are interconnected.

  5. 5.

    For genes that cannot connect to any module in which it has an information flow score, regard each of their connected components as a new module.

The source code for module identification and related data in this study can be downloaded at

Mutual information

For each of the identified dysfunctional modules, we quantified its discriminative power by an information-theoretic scheme [13]. Various methods have been developed to infer expression activity from a group of functionally related genes. Just as reasoned in previous research [46] that not all genes within a pathway are highly discriminative and that using a subset of "condition-responsive genes" achieves better discriminative power than conventional gene and pathway based approach, we defined the differentially expressed (t-test, corrected p-value<0.05) genes as those that can represent the synergistic activity of a module or a pathway. For these differentially expressed genes, let a(M) denote the preprocessed and z-transformed m-dimensional microarray expression vector across 16 disease and 5 control samples on GSE26125. Let c denote an m dimensional binary vector representing the phenotype class of each sample, and c i = 1 if the ith sample is disease, 0 if it is control. The aggregated expression activity vector a(M) for the module induced by its differentially expressed genes is

a M = g M k a g k .

We defined the discriminative score S(M) as MI (a'(M),c), and the mutual information MI between a', a discretized form of a, and c as

S M = M I a M , c = x a y c p x , y log p ( x , y ) p ( x ) p ( y ) .

Here, a' is obtained by discretizing a into [log2(# of samples)+1] = 6 equally spaced bins [13], and the midpoint of each bin is used as the value for a'. × and y enumerate values of a and c, p(x,y) is the joint probability density function (pdf) of a' and c, and p(x) and p(y) are the marginal pdf's of a and c.

To get empirical p-value for the discriminative score, we also generated N random gene sets of the same size as each module R, aggregated their genes' expression vectors a(R) and computed their MI with phenotype S(R):

empirical  p  - value =  the number of  S R > S M N ,

where N is set as N = 1000.

Phenotype classification comparison

To evaluate the ability of our modules for discriminating phenotypes, we performed both within-dataset and cross-dataset classification validation. To allow for comparison with pathways and random controls, we aggregated the expression values of DE genes in each module and in each pathway from the CHD subnetwork. We also aggregated the expression of random gene sets with the same size as the module. The activity vector aggregation method is the same as in the former Mutual Information section. Modules were ordered in decreasing significance of MI, pathways were ordered in decreasing significance of enrichment and random gene sets were ordered the same sequence as modules. Then, logistic regression models were trained on the module activity matrix (modules versus samples), pathway activity matrix (pathways versus samples) and random gene sets (gene sets versus samples). For within-dataset experiments, each time 4/5 of the samples were implemented as the training set to build the classifier and the remaining 1/5 samples were used as testing set to evaluate the performance (five-fold cross validation). Assuming that there are K modules, then for each kK, select the first kth modules to train the classifier. Pathways and random gene sets were also used in similar manner. The final classification performance was reported as the Area Under ROC Curve (AUC [47]) on the testing set using the classifier optimized from the validation set. For cross-dataset experiment, all 12 modules/pathways/random gene sets were trained on GSE26125 and validated on 12 disease samples from GSE14970 combined with 5 controls from GSE26125. All preprocessed microarray expressions were z-transformed before activity vector is aggregated. For both within- and cross-dataset experiments, random controls were repeated 100 times.

Correlation and crosstalk

Previous work [48] has identified crosstalk of pathways based on overlapping genes and weighted interactions, we used similar metrics to measure module coordination and module-pathway crosstalk in the present study. For each module-module pair X and Y, we firstly identified the edges with two end nodes, each of which belongs to one module, and then we summed up the weights of all these connecting edges using

s c o r e X , Y = x X y Y w e i g h t x , y .

For module-pathway crosstalk, we only considered the pathway which contains at least one gene on the CHD subnetwork in order to filter out unspecific pathways. For each module-pathway pair M and P, we firstly identified the edges with two end nodes, each belonging to the module and the pathway, and detected the nodes in both the module and the pathway. Then we summed up the weights edges and number of nodes using

s c o r e M , P = x M , y P w e i g h t e x , y + x M P 1 M P .

where ∑ x M ,yP|weight(e(x,y))| corresponds to the sum of the weight of all edges, and {e(x,y):xM,yP} are those edges that connect a pathway P and a module M. x M P 1 corresponds to the total number of common nodes shared by M and P, where each node is given the weight equals to 1. M P corresponds to the square root product of the number of nodes in module M and pathway P. The summation is such divided in order to normalize the size of module and pathway.

To get empirical p-value, we generated 1000 gene sets pairs containing the same number of nodes as the module-module or module-pathway pair, and computed their interaction scores. Random controls were generated from genes on the whole weighted PPI network, and genes in the KEGG. The significance of interaction is defined as

empirical  p  - value =  the number of  S R > S M N 2 ,

where N2 is set as N2 = 1000.


  1. Pierpont ME, Basson CT, Benson DW, Gelb BD, Giglia TM, Goldmuntz E, McGee G, Sable CA, Srivastava D, Webb CL, et al: Genetic basis for congenital heart defects: current knowledge: a scientific statement from the American Heart Association Congenital Cardiac Defects Committee, Council on Cardiovascular Disease in the Young: endorsed by the American Academy of Pediatrics. Circulation Research. 2007, 115 (23): 3015-3038. 10.1161/CIRCULATIONAHA.106.183056.

    Article  Google Scholar 

  2. Wessels MW, Willems PJ: Genetic factors in non-sunbdormic congenital heart malformations. Clinical Genetics. 2010, 78: 103-123. 10.1111/j.1399-0004.2010.01435.x.

    Article  CAS  PubMed  Google Scholar 

  3. Bruneau BG: The developmental genetics of congential heart disease. Nature Insight Review. 2008, 451 (7181): 943-948. 10.1038/nature06801.

    CAS  Google Scholar 

  4. Cecchetto A, Rampazzo A, Angelini A, Bianco LD, Padalino M, Stellin G, Daliento L: From molecular mechanisms of cardiac development to genetic substrate of congenital heart diseases. Future Cardiol. 2010, 6 (3): 373-393. 10.2217/fca.10.10.

    Article  CAS  PubMed  Google Scholar 

  5. Chen L, Wang RS, Zhang XS: Biomelecular Networks. 2009, John Wiley & Sons

    Book  Google Scholar 

  6. Dewey FE, Perez MV, Wheeler MT, Watt C, Spin J, Langfelder P, Horvath S, Hannenhalli S, Cappola TP, Ashley EA: Gene coexpression network topology of cardiac development, hypertrophy, and failure. Circulation: Cardiovascular Genetics. 2011, 4 (1): 26-35.

    CAS  Google Scholar 

  7. Lage K, Møllgård K, Greenway S, Wakimoto H, Gorham JM, Workman CT, Bendsen E, Hansen NT, Rigina O, Roque FS: Dissecting spatio-temporal protein networks driving human heart development and related disorders. Mol Syst Biol. 2010, 6 (381):

  8. Bittel DC, Butler MG, Kibiryeva N, Marshall JA, Chen J, Lofland GK, O'Brien JE: Gene expression in cardiac tissues from infants with idiopathic conotruncal defects. BMC Med Genomics. 2011, 5 (4): 1-

    Article  Google Scholar 

  9. Ghorbel MT, Cherif M, Jenkins E, Mokhtari A, Kenny D, Angelini GD, Caputo M: Transcriptomic analysis of patients with tetralogy of Fallot reveals the effect of chronic hypoxia on myocardial gene expression. J Thorac Cardiovasc Surg. 2010, 140 (2): 337-345. 10.1016/j.jtcvs.2009.12.055.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Suthram S, Beyer A, Karp RM, Eldar Y, Ideker T: eQED: an efficient method for interpreting eQTL associations using protein networks. Molecular Systems Biology. 2008, 4 (162):

  11. Kim YA, Wuchty S, Przytycka TM: Identifying causal genes and dysregulated pathways in complex diseases. PLoS Comput Biol. 2011, 7 (3): e1001095-10.1371/journal.pcbi.1001095.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ, Zhang CLJ, Edwards S, Sieberts SKLA, Castellini LW, Wang S, Champy MF, Zhang B, Emilsson V, Doss S, Ghazalpour A, Horvath S, Drake TA, Lusis AJ, Schadt EE: Variations in DNA elucidate molecular networks that cause disease. Nature. 2008, 452 (7186): 429-435. 10.1038/nature06757.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007, 3 (140):

  14. Doyle PG, Snell J: Random walks and electric networks. Am Math Mon. 2000, 94: 202-

    Google Scholar 

  15. Missiuro PV, Liu K, Zou L, Ross BC, Zhao G, Liu JS, Ge H: Information Flow Analysis of Interactome Networks. PLoS Comput Biol. 2009, 5 (4): e1000350-10.1371/journal.pcbi.1000350.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Huang SS, Fraenkel E: Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks. Sci Signal. 2009, 2 (81): ra40-10.1126/scisignal.2000350.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Carter GW, Rush CG, Uygun F, Sakhanenko NA, Galas DJ, Galitski T: A systems-biology approach to modular genetic complexity. Chaos. 2010, 20 (2): 026102-10.1063/1.3455183.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Agresti A: Categorical data analysis. 1990, John Wiley & Sons

    Google Scholar 

  19. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression Analysis of Human Genes Across Many Microarray Data Sets. Genome Res. 2004, 14: 1085-1094. 10.1101/gr.1910904.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Wang J, Huang Q, Liu ZP, Wang Y, Wu LY, Chen L, Zhang XS: NOA: a novel Network Ontology Analysis method. Nucl Acids Res. 2011, 39 (13): e87-10.1093/nar/gkr251.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Monserrat L, Hermida-Prieto M, Fernandez X, Rodríguez I, Dumont C, Cazón L, Cuesta MG, Gonzalez-Juanatey C, Peteiro J, Alvarez N, et al: Mutation in the alpha-cardiac actin gene associated with apical hypertrophic cardiomyopathy, left ventricular non-compaction, and septal defects. Eur Heart J. 2007, 21 (16): 1953-1961.

    Article  Google Scholar 

  22. Matsson H, Eason J, Bookwalter CS, Klar J, Gustavsson P, Sunnegårdh J, Enell H, Jonzon A, Vikkula M, Gutierrez I, et al: Alpha-cardiac actin mutations produce atrial septal defects. Hum Mol Genet. 2008, 17: 256-265.

    Article  CAS  PubMed  Google Scholar 

  23. McBride K, Riley MF, Zender GA: NOTCH1 mutations in individuals with left ventricular outflow tract malformations reduce ligand-induced signaling. Hum Mol Genet. 2008, 17: 2886-2893. 10.1093/hmg/ddn187.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Krantz ID, Smith R, Colliton RP, Tinkel H, Zackai EH, Piccoli DA, Goldmuntz E, Spinner NB: Jagged1 mutations in patients ascertained with isolated congenital heart defects. Am J Med Genet. 1999, 84 (1): 56-60. 10.1002/(SICI)1096-8628(19990507)84:1<56::AID-AJMG11>3.0.CO;2-W.

    Article  CAS  PubMed  Google Scholar 

  25. Basson CT, Bachinsky DR, Lin RC, Levi T, Elkins JA, Soults J, Grayzel D, Kroumpouzou E, Traill TA, Leblanc-Straceski J, et al: Mutations in human TBX5 cause limb and cardiac malformation in Holt-Oram syndrome. Nat Genet. 1997, 15 (1): 30-35. 10.1038/ng0197-30.

    Article  CAS  PubMed  Google Scholar 

  26. Rajagopal SK, Ma Q, Obler D, Shen J, Manichaikul A, Tomita-Mitchell A, Boardman K, Briggs C, Garg V, Srivastava D, et al: Spectrum of heart disease associated with murine and human GATA4 mutation. J Mol Cell Cardio. 2007, 43 (6): 677-685. 10.1016/j.yjmcc.2007.06.004.

    Article  CAS  Google Scholar 

  27. Dasgupta C, Martinez AM, Zuppan CW, Shah MM, Bailey LL, Fletcher WH: Identification of connexin 43 (alpha1) gap junction gene mutations in patients with hypoplastic left heart syndrome by denaturing gradient gel electrophoresis (DGGE). Mutat Res. 2001, 479 (1-2): 173-186.

    Article  CAS  PubMed  Google Scholar 

  28. Kyndt F, Gueffet JP, Probst V, Jaafar P, Legendre A, Le Bouffant F, Toquet C, Roy E, McGregor L, Lynch SA, et al: Mutations in the gene encoding filamin A as a cause for familial cardiac valvular dystrophy. Circulation Research. 2007, 115 (1): 40-49.

    Article  CAS  Google Scholar 

  29. Sheen VL, Dixon PH, Fox JW, Hong SE, Kinton L, Sisodiya SM, Duncan JS, Dubeau F, Scheffer IE, Schachter SC, et al: Mutations in the X-linked filamin 1 gene cause periventricular nodular heterotopia in males as well as in females. Hum Mol Genet. 2001, 10 (17): 1775-1783. 10.1093/hmg/10.17.1775.

    Article  CAS  PubMed  Google Scholar 

  30. Reamon-Buettner SM, Ciribilli Y, Inga A, Borlak J: A loss-of-function mutation in the binding domain of HAND1 predicts hypoplasia of the human hearts. Hum Mol Genet. 2008, 17 (10): 1397-1405. 10.1093/hmg/ddn027.

    Article  CAS  PubMed  Google Scholar 

  31. Firulli AB: A HANDful of questions: the molecular biology of the heart and neural crest derivatives (HAND)-subclass of basic helix-loop-helix transcription factors. Gene. 2003, 17 (312): 27-40.

    Article  Google Scholar 

  32. Srivastava D, Thomas T, Lin Q, Kirby ML, Brown D, Olson EN: Regulation of cardiac mesodermal and neural crest development by the bHLH transcription factor, dHAND. Nat Genet. 1997, 16 (2): 154-160. 10.1038/ng0697-154.

    Article  CAS  PubMed  Google Scholar 

  33. Shen L, Li XF, Shen AD, Wang Q, Liu CX, Guo YJ, Song ZJ, Li ZZ: Transcription factor HAND2 mutations in sporadic Chinese patients with congenital heart disease. Chin Med J (Engl). 2010, 123 (13): 1623-1627.

    CAS  Google Scholar 

  34. Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M, Nativ N, Bahir I, Doniger T, Krug H, et al: GeneCards Version 3: the human gene integrator. Database (Oxford). 2010, baq020-

    Google Scholar 

  35. McDaniell R, Warthen DM, Sanchez-Lara PA, Pai AKI, Piccoli DA, Spinner NB: NOTCH2 mutations cause Alagille syndrome, a heterogeneous disorder of the notch signaling pathway. Am J Hum Genet. 2006, 79 (1): 169-173. 10.1086/505332.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Haugen AC, Kelley R, Collins JB, Tucker CJ, Deng C, Afshari CA, Brown JM, Ideker T, Van Houten B: Integrating phenotypic and expression profiles to map arsenic-response networks. Genome Biol. 2004, 5 (12): R95-10.1186/gb-2004-5-12-r95.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Birrell GW, Brown JA, Wu HI, Giaever G, Chu AM, Davis RW, Brown JM: Transcriptional response of Saccharomyces cerevisiae to DNA-damaging agents does not identify the genes that protect against these agents. Proc Natl Acad Sci USA. 2002, 99 (13): 8778-8783. 10.1073/pnas.132275199.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Yeger-Lotem E, Riva L, Su LJ, Gitler AD, Cashikar AG, King OD, Auluck PK, Geddie ML, Valastyan JS, Karger DR, et al: Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nature Genetics. 2009, 41 (3): 316-323. 10.1038/ng.337.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, Guhathakurta D, Sieberts SK MS, Reitman M, Zhang C, Lum PY, Leonardson A, Thieringer R, Metzger JM, Yang L, Castle J, Zhu H, Kash SF, Drake TA, Sachs A, Lusis AJ: An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet. 2005, 37 (7): 710-717. 10.1038/ng1589.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80.-

    Article  PubMed  PubMed Central  Google Scholar 

  41. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003, 13 (10): 2363-2371. 10.1101/gr.1680803.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E: The Biomolecular Interaction Network Database and related tools - 2005 update. Nucleic Acids Res. 2005, D418-D424. 33 Database

  43. Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34: D535-D539. 10.1093/nar/gkj109.

    Article  CAS  PubMed  Google Scholar 

  44. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A: IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004, D452-D455. 32 Database

  45. Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, Perfetto L, Castagnoli L, Cesareni G: MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 2010, D532-539. 38 Database

  46. Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D: Inferring Pathway Activity toward Precise Disease Classification. PLoS Comput Biol. 2008, 4 (11): e1000217-10.1371/journal.pcbi.1000217.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Swets JA, Dawes R, Monahan J: Psychological Science Can Improve Diagnostic Decisions. Psychological Science in the Public Interest 1. 2000

    Google Scholar 

  48. Liu ZP, Wang Y, Zhang XS, Chen L: Identifying dysfunctional crosstalk of pathways in various regions of Alzheimer's disease brains. BMC Syst Biol. 2010, 4 (Suppl 2): S11-10.1186/1752-0509-4-S2-S11.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


This work was supported by the National Natural Science Foundation of China (NSFC) under Grants No. 31100949, No. 61072149 and No. 91029301; by Shanghai NSF under Grant No. 11ZR1443100; and by the Chief Scientist Program of Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) with Grant No. 2009CSP002; by the Knowledge Innovation Program of SIBS of CAS with Grant No. 2011KIP203; by SA-SIBS Scholarship Program; by Shanghai Pujiang Program; This research was also partially supported by National Center for Mathematics and Interdisciplinary Sciences, CAS; and by the Aihara Project, the FIRST program from JSPS, initiated by CSTP.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Zhi-Ping Liu or Luonan Chen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

D.H., Z.P.L. and L.C. conceived the research, Z.P.L. and D.H. designed the framework and experiments, D.H. and Z.P.L. performed the experiments, D.H. and Z.P.L. drafted the manuscript, L.C. revised the manuscript and supervised the project. All authors read and approved the final manuscript.

Electronic supplementary material

Additional file 1:List of selected target genes and sample distance of 21 expression profiles. (DOC 174 KB)

Additional file 2:Genes in each module. (DOC 438 KB)


Additional file 3:Pathways enriched in CHD subnetwork using hypergeometric test. Top 12 pathways are used for classification evaluation. (DOC 94 KB)

Additional file 4:List of enriched GO terms in each module. (XLS 130 KB)

Additional file 5:Network view of Modules 4, 5 and 9. (PDF 14 KB)


Additional file 6:Details of prioritizing of candidate disease genes by module analysis and robustness analysis of identified modules. (DOC 168 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

He, D., Liu, ZP. & Chen, L. Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach. BMC Genomics 12, 592 (2011).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: