Skip to main content

Table 2 Criteria used to filter true fusions from false positives

From: Systematic identification and analysis of frequent gene fusion events in metabolic pathways



Biological meaning


Protein length must exceed 600 amino acid residues

Fusion proteins should be longer than single-domain proteins


All non-overlapping CDDs together must align to at least 40 % of the gene length

Fused-domains should cover the full length of the fused gene


A minimum alignment length of 50 for all non-overlapping CDDs

Fused-domains should represent entire genes and should not be overly short


Gap between fused domains must be at least 60 residues and 10 % of gene length from end of gene

Point of fusion should be fairly centrally located in fused gene


At least two distinct CDD sets represented in the gene

Fused domains should not belong to the same CDD


Less than half of the CDD alignments for the gene should cross the gap between fused domains

A fused gene should be characterized more as a fusion of multiple domains than as a match to a single domain


All non-overlapping CDDs must co-occur with fewer than 1500 different CDD sets

Fused domains should not be overly promiscuous


Fewer than 1000 matches among the non-overlapping CDDs

Fused domains should be different from one another