Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains

Fig. 1

Illustration of the k-mer analysis procedure. For space reasons, the k-mer analysis is illustrated with 11-mers and 3-base step size (while 49-mers with one-base step size were actually used). The central line represents the genome sequence, each circle one base. The black arrow indicates one k-mer length (11 bases), the grey arrow a half-kmer ((kmer-1)/2, thus 5 bases); The central base (red) marks a divergent base pair, which represents in our case a sequencing error. Below are k-mers extracted from Illumina reads, which have the correct base (green) at the position of the sequencing error. K-mers, which do not contain the central base (indicated by grey color) are present in the genome and thus contribute to genome coverage. In our algorithm, the coverage count is increased at each of the bases covered by the k-mer. In contrast, k-mers, which contain the central base (indicated by blue color) are not present in the genome and thus do not contribute to genome coverage. Genome coverage is schematically illustrated in the top graphic (light grey shading). Starting left, genome coverage stays high as long as k-mers do not reach the central base and then drops continuously up to the central, incorrect base. Thereafter, coverage increases continuously and reaches full coverage as soon as the first k-mer base is beyond the central base. For computation of coverage drops, bases adjacent to the query base are classified as inner bases (black) and outer bases (dark grey); inner bases are a half k-mer on each side of the query base and outer bases are the adjacent half k-mer. The sum of coverage is computed over inner bases (illustrated by the black arrow) and also over outer bases (illustrated by the dark grey arrow). The coverage ratio of outer bases over inner bases is the coverage drop. High coverage drop values indicate potential genome sequencing errors. Frequently occurring read k-mers, which are not found in the genome (blue) may also hint to genome sequence problems

Back to article page