Characteristics of genomes used in this study. (A) The phylogenetic tree was inferred from 16 S rRNA gene sequences using a Bayesian approach as described in the Methods section. The posterior probabilities are indicated at the nodes when equal to or greater than 80%. The length of the thick line at the bottom represents 0.1 mutations per position. The tree shown is substantially the same as that derived from other methods (Additional file1). The shading highlights the well isolated clade of small marine Prochlorococcus and Synechococcus (Group 2). At the end of each leaf is the nickname of the organism used in this study. 3-letter nicknames are those used by KEGG. (B) Other characteristics of the genome. HIP1 frequency is given as the number of GCGATCGC sequences per 1 million nucleotides of genome sequence. The transposase (Tn) frequency is given as the number of annotated transposase genes per 1 million nucleotides of genome sequence. The source of the genome sequence is NCBI, with the given accession number, unless otherwise specified. The other sources are Kazusa DNA Research Institute, Joint Genome Institute (JGI), and the J. Craig Venter Institute. Published sources, when available, are given in references[27–40]. n.d. = not determined.