Skip to main content

A novel algorithm for alignment of multiple PPI networks based on simulated annealing

Abstract

Proteins play essential roles in almost all life processes. The prediction of protein function is of significance for the understanding of molecular function and evolution. Network alignment provides a fast and effective framework to automatically identify functionally conserved proteins in a systematic way. However, due to the fast growing genomic data, interactions and annotation data, there is an increasing demand for more accurate and efficient tools to deal with multiple PPI networks. Here, we present a novel global alignment algorithm NetCoffee2 based on graph feature vectors to discover functionally conserved proteins and predict function for unknown proteins. To test the algorithm performance, NetCoffee2 and three other notable algorithms were applied on eight real biological datasets. Functional analyses were performed to evaluate the biological quality of these alignments. Results show that NetCoffee2 is superior to existing algorithms IsoRankN, NetCoffee and multiMAGNA++ in terms of both coverage and consistency. The binary and source code are freely available under the GNU GPL v3 license at https://github.com/screamer/NetCoffee2.

Introduction

Protein function is a fundamental problem that attracts many researchers in the fields of both molecular function and evolution. Proteins were involved in almost all life processes and pathways. Although many researchers have put a great of efforts to develop public protein annotation databases, such as Uniprot [1], NCBI protein, RCSB PDB [2] and HPRD [3], the task of protein characterization is far to be completed. Thanks to the development of next-generation sequencing [4], computational methods become a major strength for discovering the molecular function and phylogenetic [517].

Global network alignment provides an effective computational framework to systematically identify functionally conserved proteins from a global node map between two or more protein-protein interaction (PPI) networks [1820]. These alignments of two networks are called pairwise network alignment [21, 22]. These of more than two are termed as multiple network alignment [2325]. The node map of a network alignment is actually a set of matchsets, which consists of a group of nodes (proteins) from PPI networks [24]. There are two types of node maps: one-to-one and multiple-to-multiple. In a one-to-one node map, one node can match to at most one node in another network [26]. In a multiple-to-multiple map, each matchset can have more than one node of a network. With a global network alignment, one can easily predict function of unknown proteins by using “transferring annotation”.

IsoRank was the first algorithm proposed to solve global network alignment, which takes advantage of a method analogous to Google’s PageRank method [27]. An updated version IsoRankN was proposed to perform multiple network alignment based on spectral clustering on the induced graph of pairwise alignment score [28]. Intuitively guided by T-Coffee [29], a fast and accurate program NetCoffee [30] was developed to search for a global alignment by using a triplet approach. However, it cannot work on pairwise network alignment. There are four major steps in the program: 1) the construction of PPI networks and bipartite graphs; 2) the weight assignment based on a triplet approach; 3) the selection of candidate match edges; 4) optimization with simulated annealing. To improve the edge conservation, a genetic algorithm MAGNA was proposed, which mimics the evolutionary process [26]. It starts with an initial population of members. Each member is an alignment. Two members can produce a new member with a crossover function. A fitness function was designed to evaluate the quality of alignments in each generation. MAGNA++ speeds up the MAGNA algorithm by parallelizing it to automatically use all available resources [31]. A more advanced version multiMAGNA++ was applied to find alignment for multiple PPI networks [32]. However, there still exists a gap between network alignment and the prediction of unknown protein function in a systematical level, due to the large amount of molecular interactions and the limitation of computational resources.

Here, we present a novel network alignment algorithm NetCoffee2 based on graph feature vectors to identify functionally conserved proteins. A target scoring function was used to evaluate the quality of network alignment, which integrates both topology and sequence information. Unlike NetCoffee, NetCoffee2 can perform tasks of both pairwise and multiple network alignments. Furthermore, it outperforms existing alignment tools in both coverage and consistency. It includes three major steps: 1) calculation of sequence similarities for pairs of nodes; 2) calculation of topological similarities; 3) maximizing a target function using simulated annealing.

Definition and notation

Network alignment is a problem to search for a global node mapping between two or more networks. Suppose there is a set of PPI networks {G1,G2,...,Gk},k≥2, each network can be modeled as a graph Gi={Vi,Ei}, where Vi and Ei represents proteins and interactions appearing in networks. A matchset consists of a subset of proteins from \(\bigcup _{i=k}^{k} V_{i}\). A global network alignment is to find a set of mutually disjoint matchsets from a set of PPI networks. Note that, each protein can only appear in one matchset in a global alignment solution. Each matchset represents a functionally conserved group of proteins. Pairwise network alignment aims to find an alignment for two PPI networks, whereas multiple network alignment aims to find an alignment for more than two PPI networks. Unlike the previous algorithm NetCoffee, our updated version NetCoffee2 can be applied to search for both pairwise network alignment and multiple network alignments.

Method

An integrated model

Sequence information is one of important factors in charactering biological function of genes, RNA and proteins[33]. For example, proteins of a typical family not only share common sequence regions, but also play similar roles in biological processes, molecular function and cellular component. As only a small fraction of a protein sequence is in the functional region, a sequence-based similarity measure is insufficient for the annotation of protein function [34]. PPI network topology can provide complementary information for the prediction of protein function. As used in many other network aligners such as IsoRank, Fuse [35] and Magna, both topology and sequence information are integrated in one similarity measure to search for functionally conserved proteins across species. There are two basic assumptions underlying this methodology: 1) a sequence similarity implies functional conservation; 2) functions are encoded in topology structure of PPI networks.

Sequence-based similarity

Intuitively guided by an assumption that structures determine functions, most of existing network aligners use both amino acid seqeuences and network topology to predict protein functions. Here, we performed an all-against-all sequence comparison using BLASTP [36] on all protein sequences. These protein pairs with significant conserved regions are taken into consideration for further filtrations. Note that e-value is an input parameter to control the coverage of network alignment. Let Ω denote the candidates of homology proteins. Given a protein pair u and v, the sequence similarity s (u,v) can be calculated in the following formula, s h(u, v)=\(\frac {\varepsilon (u, v)-\varepsilon _{min}(u, v)}{\bigtriangleup \varepsilon }\). Here, ε(u,v) can be log(evalue) or bitscore of the protein pair u and v, and ε is the largest difference between any two pairs of homolog in Ω,ε= εmax(u,v)−εmin(u,v), which servers as a normalization factor. The most similar one is 1, the least 0.

Topology-based similarity

As protein functions are also encoded in the topology of PPI networks, topological structure can guide us to find functionally conserved proteins. To find the topologically similar protein pairs, a similarity measure is necessary for evaluating the topological similarity for each pair of nodes. The mathematical question is how to calculate a similarity of a pair of nodes, which are from two different networks [37]. In the aligner of IsoRank, it was calculated based on the principle that if two nodes are aligned, then their neighbors should be aligned as well. Our method works on a principle that if two nodes are aligned, then the local induced-subgraphs should be similar.

Given a network G=(V,E),V={v1,v2,...,vn}, we design a 5-tuple-feature vector (γ,σ,τ,η,θ) for each node in V to represent local connections of its corresponding node. Without loss of generality, we denote the adjacent matrix of G as M n×n. Since M is real and symmetric, there must exist a major normalized eigenvector K=(k1,k2...k n). In another words, K is the normalized eigenvector of the largest eigenvalue. Then, ki,1≤in represents the reputation of the node v i. The greater the reputation is, the more important the node is. Therefore, we use k i as the first element of the 5-tuple-feature vector (i.e. γ) to character the node v i. Let us denote the neighbor of v as N v. Then, we use |Nv| as the second element of the 5-tuple-feature vector (i.e. σ), the sum of the reputation of these nodes \(\sum _{x\in N_{v}}k_{x}\) as the third element (i.e. τ). Let us denote these nodes that are 2-step away from v as N\(_{v}^{2}\). It notes that all nodes in \(N_{v}^{2}\) are not directly connected to v. Then, we use \(|N_{v}^{2} |\) as the fourth element (i.e. η). The last element η is calculated by the formula \(\frac {1}{2}\sum _{x\in N_{v}^{2}}k_{x}p_{xv}\). Here, we denote the number of the shortest paths from x to v as p xv. As shown in Fig. 1a, there are two networks G1 and G2. Based on the definition stated above, the 5-tuple-feature vector of a1,a2,a3,a4,a5 in G1 are (1,3,2.63,1,0.16),(0.88,3,2.33,1,0.75),(0.33,1,0.88,2,1),(0.75,2,2,1,0.88),(1,3,2.63,1,0.16), respectively. They are the same for b1,b2,b3,b4,b5 in G2. The vector of each element of all nodes should be normalized in the following step as shown in Fig. 1b. With the normalized 5-tuple-feature vector, the node similarity of any two nodes st(u,v) can be calculated with the Gaussian function \(s_{t}(u, v)= exp(-\frac {1}{2}x^{2})\), where x represents the Euclidean distance between the 5-tuple-feature vector of node u and v. For instance, as shown in Fig. 1a, the vector of ai and bi are the same. Therefore, the diagonal of the similarity matrix is (1,1,1,1,1).

Fig. 1
figure1

The calculation of similarity matrix between two networks G1 and G2. a A 5-tuple-feature vector (γ,σ,τ,η,θ) was calculated on each node. Here, the vector of γ, (1,0.88,0.33,0.75,1) T, is the normalized major eigenvector of the adjacent matrix of the graph. Vectors of σ and η are the number of 1-step neighbors and 2-step neighbors for each node. Vectors of τ and θ describe the influence of each node to their 1-step neighbors and 2-step neighbors. b Vectors of σ,τ,η,θ were normalized by its maximal element. c The similarity matrix was calculated by a Gaussian-based similarity measure \(s_{t}(u,v)=exp\left (-\frac {1}{2}x^{2}\right)\). Here, u and v is a pair of nodes, and x is the Euclidean distance between the two feature vectors of u and v

Simulated annealing

To find an optimal network alignment, we applied a linear model to integrate both sequence and topology information. The alignment score can be formulated as \(f(\mathbbm {A})=\sum _{m\in \mathbbm {A}}s_{m}\), where \(\mathbbm {A}\) and m is refer to a global alignment and a matchset, respectively. Suppose m={m1,m2,...,mv}, the alignment score of the matchset is \(s_{m}= \sum _{i=m_{1}}^{m_{v-1}} \sum _{j=i}^{m_{v}} \alpha s_{h}(i,j) + (1-\alpha)s_{t}(i,j)\). By default, α=0.5. User can increase α when he consider the sequence similarity is more important and decrease α when he consider the topological similarity is more important. Therefore, the problem of global network alignment can be modeled as an optimization problem, which is to search for an optimal alignment \(\mathbbm {A}^{*}\), such that \(\mathbbm {A}^{*}=arg \max \limits _{\mathbbm {A}}f(\mathbbm {A})=\sum _{m\in \mathbbm {A}}s_{m}\).

To solve this problem, we used a simulated annealing algorithm [38] to search for an approximately optimal solution. Simulated annealing is a commonly used approach in the discovering of network alignment solutions, as it can rapidly converge in a favorable time complexity [39]. As shown in the pseudocode of simulated annealing, the alignment A was firstly initialized to an empty set \(\varnothing \). Then we repeatedly perturb the current alignment A with a Metropolis scheme P(Δf)=\(e^{\frac {\Delta f}{(Ti*s)}}\) as the equilibrium distribution till the alignment score converges.

Result and discussion

Test datasets and experimental setup

To test our method on real biological data, PPI network of five species were downloaded from the public database IntAct [40] (https://www.ebi.ac.uk/intact/). The five species include mus musculus (MM), saccharomyces cerevisiae (SC), drosophila melanogaster (DM), arabidopsis thaliana (AT) and homo sapiens (HS). Interactions could be detected by different methods, such as ubiquitinase assay, anti tag/bait coimmunoprecipitation. However, some experimental methods such as Tandem Affinity Purification do generate molecular interactions that can involve more than two molecules. An expansion algorithm was applied to transform these n-ary interactions into a set of binary interactions. To improve the data quality, these interactions of the spoke expanded co-complexes are filtered out. As shown in Table 1, 41,043 proteins and 193,576 interactions were collected as test datasets. In order to measure the biological quality for alignment results, we analyzed the functional similarity based on Gene Ontology terms [41], which include molecular function (MF), biological process (BP) and cellular component (CC). The functional annotation data were downloaded from the gene ontology annotation database (GOA) [42]. All of our test datasets can be freely accessible at http://www.nwpu-bioinformatics.com/netcoffee2/dataset.tar.gz.

Table 1 Statistics of PPI networks of five species: mus musculus (MM), saccharomyces cerevisiae (SC), drosophila melanogaster (DM), arabidopsis thaliana (AT) and homo sapiens (HS)

We have implemented NetCoffee 2 in C++ using the igraph library (version 0.7.1) [43]. The source code and binary code are freely available on the GitHub repository under the GNU GPL v3 license https://github.com/screamer/NetCoffee2. To compare algorithm performance, we ran our algorithm and three other algorithms NetCoffee, IsoRankN and multiMAGNA++ on a set of real biological datasets. The suggested parameters were used for running all alignment tools. As seen in Table 2, eight datasets were generated as benchmark datasets. The number of PPI networks in eight benchmark datasets ranges from two to five. The biggest PPI network is HS, so we generated datasets based on the follow rules: the datasets include HS or not. dataset1 and dataset2 include two PPI networks, so one dataset includes HS, and another do not include HS. dataset3 to dataset6 include three PPI networks, so two dataset includes HS, and another two do not include HS. To reduce the running time of the algorithm, we generate dataset7 without HS. All the four algorithms were performed on a same machine with CPU Intel Xeon E5-2630v4.

Table 2 Algorithms performance were tested on eight datasets, which were represented as D1, D2,..., D8

Performance and comparison

Our goal is to identify a set of matchsets that are biologically meaningful. To verify the biological quality of aligment results, we take two aspects into consideration: 1) each matchset is functionally conserved; 2) the alignment node map cover as many proteins as possible. Therefore, we use coverage and consistency to evaluate the biological quality of alignment results. Coverage serves as a proxy for sensitivity, indicating the amount of proteins the alignment can explain. Consistency serves as a proxy for specificity, measuring the functional similarity of proteins in each match set. There is a trade-off between coverage and consistency.

Given an alignment solution, we used the percentage of aligned proteins as coverage. As the number of nodes varies in different networks, some proteins might be lost in a one-to-one node mapping. This can be explained by gene loss events in evolution. And these homogeneous proteins from one species can be accounted for gene duplication in evolution. In our test, multiMAGNA++ is the only algorithm that supports one-to-one node mapping. All other algorithms allow multiple-to-multiple node mapping. As NetCoffee is not applicable on pairwise network alignment, there is no NetCoffee result for D1 and D2. From Fig. 2, we can see that NetCoffee2 stably found a coverage of 76.7% on average for all the eight datasets. It is followed by multiMAGNA++, which found 70.4% proteins on average. Although the coverage of MultiMAGNA++ can be more than 80% on D3, D4 and D7, it rapidly fell to 50% on D1, D2 and D5. NetCoffee approximately identifies about 35% proteins on average, which is less than the coverage of NetCoffee2 and multiMAGNA++. IsoRankN found only an average of 9.6% proteins on eight datasets, which is obviously smaller than the coverage of the other competitor. Overall, the results show that NetCoffee2 is superior to multiMAGNA++, NetCoffee and IsoRankN in terms of coverage and it is more stable than all of its competitors.

Fig. 2
figure2

Coverage of NetCoffee, IsoRankN, multiMAGNA++, and NetCoffee2 on eight test datasets. Coverage was measured by the percentage of aligned proteins in alignments

Consistency is used to measure the biological quality of matchsets in alignment results. We employed two concepts to evaluate global alignment algorithms based on Gene Ontology (GO) terms: mean entropy (ME) and mean normalized entropy (MNE) [28, 30].Given a matchset m={v1,v2,...,vn}, the entropy of m was calculated by the formula \(E(m)=\sum _{i=1}^{d}p_{i}\times log(p_{i})\). Here, d represents the number of different GO terms, pi the proportion of the ith GO term in all annotations of v. The mean entropy (ME) is the arithmetic mean of entropy for all matchsets. The normalized entropy of m is defined as \(NE(m)=-\frac {1}{log(d)}\sum _{i=1}^{d}p_{i}\times log(p_{i})\). The mean normalized entropy (MNE) is the arithmetic mean of normalized entropy for all matchsets in a global alignment. It should be noted that these alignments with lower ME and MNE values are more functionally coherent. As can be seen in Table 3, NetCoffee2 has the best performance on D2, D7 and D8 in terms of ME, which are 0.73, 1.01 and 1.10, respectively. And mutliMAGNA++ obtains the best ME on D1 (0.94), D3 (0.91), D5 (0.98) and D6 (1.00). NetCoffee gets the best ME on D4 (0.85) and D6 (1.00). Overall, NetCoffee2 found the best ME (0.973) on average, which is followed by multiMAGNA++ (1.005), NetCoffee (1.022) and IsoRankN (1.144). Furthermore, NetCoffee2 obtains an average of 0.53 in terms of MNE, which is followed by NetCoffee (0.55), multiMAGNA++ (0.56) and IsoRankN (0.58). It outperforms it competitors on all the eight datasets in terms of MNE. Therefore, we can draw a conclusion that NetCoffee2 is superior to the existing algorithms multiMAGNA++, NetCoffee and IsoRankN in terms of both ME and MNE.

Table 3 Consistency was measured by mean entropy (ME) and mean normalized entropy (MNE)

Conclusion

Network alignment is a very important computational framework for understanding molecular function and phylogenetic relationships. However, there are still rooms for improving existing algorithms in terms of coverage and consistency. Here, we developed an efficient algorithm NetCoffee2 based on graph feature vectors to globally align multiple PPI networks. NetCoffee2 is a fast, accurate and scalable program for both pairwise and multiple network alignment problems. It can be applied to detect functionally conserved proteins across different PPI networks. To evaluate the algorithm performance, NetCoffee2 and three existing algorithms have been performed on eight real biological datasets. Gene ontology annotation data were used to test the functional coherence for all alignments. Results show that NetCoffee2 is apparently superior to multiMAGNA++, NetCoffee and IsoRankN in term of both coverage and consistency. It can be concluded that NetCoffee2 is a versatile and efficient computational tool that can be applied to both pairwise and multiple network alignments. Hopefully, its application in the analyses of PPI networks can benefit the research community in the fields of molecular function and evolution.

Availability of data and materials

Not applicable.

Abbreviations

GNA:

Global network alignment

PPI:

Protein-protein interactions

SA:

Simulated annealing

References

  1. 1

    Consortium UP. Uniprot: a hub for protein information. Nucleic Acids Res. 2015; 43(Database issue):204–12.

    Article  CAS  Google Scholar 

  2. 2

    Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, Green RK, Goodsell DS, Prlić A, Quesada M. The rcsb protein data bank: new resources for research and education. Nucleic Acids Res. 2013; 41(Database issue):475.

    Google Scholar 

  3. 3

    Goel R, Muthusamy B, Pandey A, Prasad TSK. Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology. Mol Biotechnol. 2011; 48(1):87–95.

    CAS  PubMed  Article  Google Scholar 

  4. 4

    Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: From basic research to diagnostics. Clin Chem. 2009; 55(4):641–58.

    CAS  PubMed  Article  Google Scholar 

  5. 5

    Marcotte EM, Pellegrini M, Ng HL, Rice DW. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999; 285(5428):751–3.

    CAS  PubMed  Article  Google Scholar 

  6. 6

    Hu J, Shang X. Detection of network motif based on a novel graph canonization algorithm from transcriptional regulation networks. Molecules. 2017; 22(12):2194.

    PubMed Central  Article  Google Scholar 

  7. 7

    Hu J, Gao Y, Zheng Y, Shang X. Kf-finder: Identification of key factors from host-microbial networks in cervical cancer. BMC Syst Biol. 2018; 12(S4):54.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8

    Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics. 2017; 18(16):573.

    PubMed  PubMed Central  Article  Google Scholar 

  9. 9

    Peng J, Wang Y, Chen J, Shang X, Shao Y, Xue H. A novel method to measure the semantic similarity of hpo terms. Int J Data Min Bioinform. 2017; 17(2):173.

    Article  Google Scholar 

  10. 10

    Zeng X, Zhang X, Zou Q. Integrative approaches for predicting microrna function and prioritizing disease-related microrna using biological interaction networks. Brief Bioinform. 2016; 17(2):193.

    CAS  PubMed  Article  Google Scholar 

  11. 11

    Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microrna-disease network: a survey. Brief Funct Genomics. 2016; 15(1):55.

    CAS  PubMed  Google Scholar 

  12. 12

    Liu Y, Zeng X, He Z, Quan Z. Inferring microrna-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2016; PP(99):1.

    Google Scholar 

  13. 13

    Zhu L, Su F, Xu Y, Zou Q. Network-based method for mining novel hpv infection related genes using random walk with restart algorithm. Biochim Biophys Acta. 2017. https://doi.org/10.1016/j.bbadis.2017.11.021.

    CAS  Article  Google Scholar 

  14. 14

    Zhu L, Deng SP, You ZH, Huang DS. Identifying spurious interactions in the protein-protein interaction networks using local similarity preserving embedding. IEEE/ACM Transactions on Computational Biology Bioinformatics. 2017; 14(2):345–352.

    PubMed  Article  Google Scholar 

  15. 15

    You ZH, Lei YK, Gui J, Huang DS, Zhou X. Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics. 2010; 26(21):2744–51.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16

    Hu J, Zheng Y, Shang X. Mitefinderii: a novel tool to identify miniature inverted-repeat transposable elements hidden in eukaryotic genomes. BMC Med Genom. 2018; 11(5):101.

    CAS  Article  Google Scholar 

  17. 17

    Hu J, Wang J, Lin J, Liu T, Zhong Y, Liu J, Zheng Y, Gao Y, He J, Shang X. Md-svm: a novel svm-based algorithm for the motif discovery of transcription factor binding sites. BMC Bioinformatics. 2019; 20(7):200. https://doi.org/10.1186/s12859-019-2735-3.

    PubMed  PubMed Central  Article  Google Scholar 

  18. 18

    Flannick J, Novak A, Do CB, Srinivasan BS, Batzoglou S. Automatic parameter learning for multiple network alignment. In: International Conference on Research in Computational Molecular Biology: 2008. p. 214–31. https://doi.org/10.1007/978-3-540-78839-3_19.

  19. 19

    Klau GW. A new graph-based method for pairwise global network alignment. Bmc Bioinformatics. 2009; 10(Suppl 1):1–9.

    Google Scholar 

  20. 20

    Hu J, Gao Y, He J, Zheng Y, Shang X. Webnetcoffee: a web-based application to identify functionally conserved proteins from multiple ppi networks. BMC Bioinformatics. 2018; 19(1):422.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21

    Kalaev M, Smoot M, Ideker T, Sharan R. Networkblast: comparative analysis of protein networks. Bioinformatics. 2008; 24(4):594–6.

    CAS  PubMed  Article  Google Scholar 

  22. 22

    Narad P, Chaurasia A, Wadhwab G, Upadhyayaa KC. Net2align: An algorithm for pairwise global alignment of biological networks. Bioinformation. 2016; 12(12):408.

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23

    Sahraeian SME, Yoon BJ. Smetana: Accurate and scalable algorithm for probabilistic alignment of large-scale biological networks. PLoS ONE. 2013; 8(7):67995.

    Article  CAS  Google Scholar 

  24. 24

    Kalaev M, Bafna V, Sharan R. Fast and accurate alignment of multiple protein networks. J Comput Biol J Comput Mol Cell Biol. 2009; 16(8):989–99.

    CAS  Article  Google Scholar 

  25. 25

    Hu J, Reinert K. Localali: an evolutionary-based local alignment approach to identify functionally conserved modules in multiple networks. Bioinformatics. 2015;31(3). https://doi.org/10.1093/bioinformatics/btu652.

    PubMed  Article  CAS  Google Scholar 

  26. 26

    Saraph V, Milenković T. Magna: Maximizing accuracy in global network alignment. Bioinformatics. 2013; 30(20):2931.

    Article  CAS  Google Scholar 

  27. 27

    Mongiovì M, Sharan R. Global Alignment of ProteinŰProtein Interaction Networks. Methods Mol Biol (Clifton, N.J.) 2013; 939:21–34.

    Article  CAS  Google Scholar 

  28. 28

    Liao CS, Lu K, Baym M, Singh R, Berger B. Isorankn: spectral methods for global alignment of multiple protein networks. Bioinformatics. 2009; 25(12):253–8.

    Article  CAS  Google Scholar 

  29. 29

    Notredame C, Higgins DG, Heringa J. T-coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000; 302(1):205–17.

    CAS  PubMed  Article  Google Scholar 

  30. 30

    Hu J, Kehr B, Reinert K. Netcoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks. Bioinformatics. 2015; 30(4):540.

    Article  CAS  Google Scholar 

  31. 31

    Vijayan V, Saraph V, Milenković T. Magna++: Maximizing accuracy in global network alignment via both node and edge conservation. Bioinformatics. 2015; 31(14):2409–11.

    CAS  PubMed  Article  Google Scholar 

  32. 32

    Vijayan V, Milenković T. Multiple network alignment via multimagna++. IEEE/ACM Trans Comput Biol Bioinform. 2017; PP(99):1.

    Google Scholar 

  33. 33

    Deng S, Yuan J, Huang D, Zhen W. Sfaps: An r package for structure/function analysis of protein sequences based on informational spectrum method. In: IEEE International Conference on Bioinformatics Biomedicine: 2014. https://doi.org/10.1109/bibm.2013.6732455.

  34. 34

    Brutlag DL. Inferring Protein Function from Sequence. In: Bioinformatics?From Genomes to Therapies, chapter 30. Wiley. p. 1087–119. https://doi.org/10.1002/9783527619368.ch30.

  35. 35

    Gligorijević V, Maloddognin N, Prźulj N. Fuse: Multiple network alignment via data fusion. Bioinformatics. 2015; 32(8):860–70.

    Google Scholar 

  36. 36

    Lobo I. Basic local alignment search tool (blast). J Mol Biol. 2012; 215(3):403–10.

    Google Scholar 

  37. 37

    Hu J, He J, Gao Y, Zheng Y, Shang X. Netcoffee2: A novel global alignment algorithm for multiple ppi networks based on graph feature vectors In: Huang D-S, Jo K-H, Zhang X-L, editors. Intelligent Computing Theories and Application. Cham: Springer: 2018. p. 241–6.

    Google Scholar 

  38. 38

    Kirkpatrick S. Optimization by simulated annealing: Quantitative studies. J Stat Phys. 1984; 34(5-6):975–86.

    Article  Google Scholar 

  39. 39

    Laarhoven PJM, Aarts EHL. Simulated annealing: theory and applications. Acta Applicandae Math. 1988; 12(1):108–11.

    Google Scholar 

  40. 40

    Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackescarter F, Campbell NH, Chavali G, Chen C, Deltoro N. The mintact project-intact as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014; 42:358–63.

    Article  CAS  Google Scholar 

  41. 41

    Consortium TGO. Gene ontology consortium: going forward. Nucleic Acids Res. 2015; 43(Database issue):1049–56.

    Article  CAS  Google Scholar 

  42. 42

    Huntley RP, Sawford T, Mutowomeullenet P, Shypitsyna A, Bonilla C, Martin MJ, O’Donovan C. The goa database: Gene ontology annotation updates for 2015. Nucleic Acids Res. 2015; 43(Database issue):1057–63.

    Article  CAS  Google Scholar 

  43. 43

    Csardi G. The igraph software package for complex network research. Interjournal Compl Syst. 2006; 1695:1–9. http://igraph.sf.net.

    Google Scholar 

Download references

Acknowledgements

Not applicable.

About this supplement

This article has been published as part of BMC Genomics Volume 20 Supplement 13, 2019: Proceedings of the 2018 International Conference on Intelligent Computing (ICIC 2018) and Intelligent Computing and Biomedical Informatics (ICBI) 2018 conference: genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-20-supplement-13.

Funding

Publication costs were funded by the National Natural Science Foundation of China (Grant No. 61702420); This project has been funded by the National Natural Science Foundation of China (Grant No. 61332014, 61702420 and 61772426); the China Postdoctoral Science Foundation (Grant No. 2017M613203); the Natural Science Foundation of Shaanxi Province (Grant No. 2017JQ6037); the Fundamental Research Funds for the Central Universities (Grant No. 3102018zy032); the Top International University Visiting Program for Outstanding Young Scholars of Northwestern Polytechnical University.

Author information

Affiliations

Authors

Contributions

JH designed the computational framework and implemented the algorithm, NetCoffee2. JH implemented the NetCoffee2 algorithm jointly with JH. JH performed all the analyses of the data. JH, JH, JL, YZ and YG jointly wrote the manuscript. XS is the major coordinator, who contributed a lot of time and efforts in the discussion of this project. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xuequn Shang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, J., He, J., Li, J. et al. A novel algorithm for alignment of multiple PPI networks based on simulated annealing. BMC Genomics 20, 932 (2019). https://doi.org/10.1186/s12864-019-6302-0

Download citation

Keywords

  • Network alignment
  • PPI networks
  • Simulated annealing
  • Optimization
  • Functional conserved proteins