Skip to main content

Table 2 Criteria used to filter true fusions from false positives

From: Systematic identification and analysis of frequent gene fusion events in metabolic pathways

ID

Criteria

Biological meaning

1

Protein length must exceed 600 amino acid residues

Fusion proteins should be longer than single-domain proteins

2

All non-overlapping CDDs together must align to at least 40 % of the gene length

Fused-domains should cover the full length of the fused gene

3

A minimum alignment length of 50 for all non-overlapping CDDs

Fused-domains should represent entire genes and should not be overly short

4

Gap between fused domains must be at least 60 residues and 10 % of gene length from end of gene

Point of fusion should be fairly centrally located in fused gene

5

At least two distinct CDD sets represented in the gene

Fused domains should not belong to the same CDD

6

Less than half of the CDD alignments for the gene should cross the gap between fused domains

A fused gene should be characterized more as a fusion of multiple domains than as a match to a single domain

7

All non-overlapping CDDs must co-occur with fewer than 1500 different CDD sets

Fused domains should not be overly promiscuous

8

Fewer than 1000 matches among the non-overlapping CDDs

Fused domains should be different from one another