Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: UMGAP: the Unipept MetaGenomics Analysis Pipeline

Fig. 2

Sample DNA fragment extracted from the Acinetobacter baumannii 118362 genome (NCBI Assembly ASM58051v1, positions 37.700-39.530) containing three RefSeq annotated coding regions of a major Facilitator Superfamily protein (EXA88265), a tetR family protein (EXA88191) and a translocator family protein (EXA88255), marked with yellow lines (top). Blue lines indicate coding regions predicted by FGS. Green dots indicate starting positions of 9-mers with an LCA* on the A. baumannii lineage (true positive identifications). Red dots indicate starting positions of 9-mers with an LCA* outside the A. baumannii lineage (false positive identifications). Opacity of colored dots indicates depth in the taxonomic tree: opaque colors indicate highly specific LCA* (species level) and translucent colors indicate nonspecific LCA*. This example illustrates the following general observations: (1) the frameshift-correcting topology of the FGS hidden Markov model often incorrectly interprets coding regions of genes that are very close or overlapping as frameshifts and glues them together; (2) missing dots at the end of coding regions is merely an artefact of the visualization: the last 8 codons (24 bases) are never starting positions of k-mers; (3) FGS may identify false coding regions or (4) frame shifts, but the extracted peptides from those and (5) translations from non-coding regions in a six-frame translation are mostly filtered automatically as they have no exact match with any UniProt protein or can be filtered with additional heuristics

Back to article page