Skip to main content
Figure 2 | BMC Genomics

Figure 2

From: A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment

Figure 2

A. PCA plot based on oligonucleotide frequencies (340 features), projected onto three uncorrelated axes (principal components). Each dot represents a psbA sequence color based on GOS classification [11]:Synechococcus (light blue), Synechococcus-like Myovirus (light sky blue), Synechococcus-like Podovirus (turquoise), HL-Prochlorococcus (rosy brown), LL-Prochlorococcus (deep pink), Prochlorococcus-like Myovirus (gold) and Prochlorococcus-like Podovirus (light salmon). B. PCA plot showing the distribution of DNA sequences extracted from station M from deep sea in March 2006. The Mediterranean data are presented on the background of the GOS data (GOS sequences are colored as in A.). The Mediterranean data are shown in darker colors Synechococcus (royal blue), Synechococcus-like Myovirus (blue), Synechococcus-like Podovirus (dark green), HL-Prochlorococcus (dark purple), LL-Prochlorococcus (dark red), Prochlorococcus-like Myovirus (dark orange) and Prochlorococcus-like Podovirus (dark brown). Black dots represent sequences for which there was no agreement between our independent classifiers. Manual examination suggests that this is a new subclass of sequences not represented in the GOS database. C and D. represent PCA plots showing the distribution of DNA and RNA sequences, respectively. The Mediterranean Sea sequences colored in dark colors (as is B) on the background of the GOS data (colored as in A) were extracted from surface water sampled in March 2006. As demonstrated, the bacterial sequences are distributed similarly between subclasses in both the DNA and RNA sequences, while the viral psbA s are mostly found at the DNA level.

Back to article page