Correlation of gene order conservation with sequence identity and GC content ( Salmonella vs E. coli K12). (A) Sequence identity frequency distributions of proteins encoded by GCO/nGCO genes for datasets used in this study. Boxandwhisker plots illustrate the differences between the medians and the dispersion of the respective datasets. Orange and light blue represent GCO versus nGCO datasets. Average values are displayed on top of the box plots. The two leftmost box plots (GCO, nGCO) depict differences between those two gene classes within the overall protein sequence data set. The next two data sets depict differences between duplicated GCO genes (DGCO) and duplicated nGCO genes (DnGCO). The last two datasets represent HNS repressed genes (HNSGCO: HNS repressed GCO genes, HNSnGCO: HNS repressed nGCO genes). (B) GC content of GCO genes, nGCO genes and genes with no homolog in E. coli K12 NH for datasets used in this study. Additionally to the coloring scheme of Fig. 1A, we use light grey for sequences that had no homolog in E. coli K12. The dashed horizontal line corresponds to the overall GC content of S. Typhimurium genome (52.2%). (For a more detailed description, including statistical analysis see Additional files 8, 9).