From: SNP hot-spots in the clam parasite QPX

Differences among four QPX strains and with the reference genome. The principal component analysis is based on 100 annotated functional sequences that are selected after MG-RAST quality-control checks of the annotation process and sequence similarity search on 5 QPX libraries that include four QPX strains and the reference genome. Feature selection is done with an identity score (maximum of 1 mismatch) for annotating the QPX contigs against protein domains, an alignment length score (similarity ≥ 80%) that represent the coverage length and similarities between estimated alignments, an e-value score (10-10) for predicted functional similarities, and the number of times an annotated contigs is identified. The arrows are eigenvectors, they represent linear trends in the data

