Vibrionaceae core and shell genes are non-randomly distributed into spatially distinct intracellular domains

Background: The genome of Vibrionaceae bacteria, which consists of two circular chromosomes, is replicated in a highly ordered fashion. In fast-growing bacteria, multifork replication results in higher gene copy numbers and increased expression of genes located close to the origin of replication of Chr 1 (ori1 ). This is believed to be a growth optimization strategy to satisfy the high demand of essential growth factors during fast growth. The relationship between ori1 -proximate growth-related genes and gene expression during fast growth has been investigated by many researchers. However, it remains unclear which other gene categories that are present close to ori1 and if expression of all ori1 -proximate genes is increased during fast growth, or if expression is selectively elevated for certain gene categories. Results: We calculated the pangenome of all complete genomes from the Vibrionaceae family and mapped the four pangene categories, core, softcore, shell and cloud, to their chromosomal positions. This revealed that core and softcore genes were found heavily biased towards ori1 , while shell genes were overrepresented at the opposite part of Chr 1 (i.e., close to ter1 ). RNA-seq of Aliivibrio salmonicida and Vibrio natriegens showed global gene expression patterns that consistently correlated with chromosomal distance to ori1 . Despite a biased gene distribution pattern, all pangene categories contributed to a skewed expression pattern at fast-growing conditions, whereas at slow-growing conditions, softcore, shell and cloud genes were responsible for elevated expression. Conclusion: The pangene categories were non-randomly organized on the two chromosomes, with an overrepresentation of core and softcore genes around ori1 . We mapped our gene distribution data on to the intracellular positioning of chromatin described for V. cholerae , and suggested that core/softcore and shell/cloud genes were enriched at two spatially separated intracellular regions in the cell. The concurrence of the spatial distribution of core and the level of gene expression in one intracellular region, implied that there is a link between the structural organization of core genes and their cellular function in the cell. our Circular phylogenetic relationship of the investigated isolates. Each is centered at a gene assumed to be close to the replication origin: mioC on Chr and rtcB on 2. As shown, a majority of core genes on Chr 1 is located closer to ori1 than to Shell genes show the opposite distribution pattern on Chr 1, where majority of shell genes accumulate closer to ter. On Chr 2 both core and shell genes are randomly distributed. The dashed line “i” indicates a region on Chr 1 surrounding ori1 that contains very few core genes. The dashed line “ii” shows a region on Chr 1 of approximately 500 kb surrounding ter that are more sparsely populated with core genes than the rest of the chromosome.


Background
Bacteria that belong to the family Vibrionaceae are rich in most aqueous habitats, from the deep seas to fresh and brackish waters, and in temperature zones ranging from the polar to tropical areas [1][2][3]. They exist as free-swimming cells or associated with other organisms, either in a symbiotic relationship or as pathogens of e.g. sh, corals and even humans [3,4]. Despite the notorious reputation of some Vibrionaceae species, (e.g., Vibrio cholerae and Vibrio vulni cus) it is the diversity of non-pathogenic Vibrionaceae species that makes these bacteria so successful and ecologically important. The facultative anaerobic bacterium Vibrio natriegens, for example, xes atmospheric nitrogen (N 2 ) into ammonia (NH 3 ), and thus provides its surroundings with a critical nutrient [5].
As of April 2020, the RefSeq database contains 306 complete Vibrionaceae genomes (representing 57 species), with genomes from new species being added on a regular basis. One characteristic feature shared by all Vibrionaceae genomes is a highly unusual bipartite structure consisting of a large (Chr 1) and a smaller (Chr 2) chromosome [6,7]. It is proposed that bacteria with bipartite genomes have a selective advantage for the adaptation to very different environmental conditions (Val et al 2008) [8], and that division into multiple smaller replicons may reduce replication time, thus allowing for faster generation time and a competitive advantage [9]. The unconventional genome constellation is expected to require tightly regulated and synchronized replication to ensure proliferation and control of gene expression during changes in the surrounding environment.
In V. cholerae, replication of Chr 1 and Chr 2 is highly coordinated [10]. When the replication fork approaches crtS in Chr 1 (Chr 2 replication triggering site), a hitherto unknown mechanism triggers replication of Chr 2 [11,12]. Interestingly, there is a short pause (corresponding to replication of approx. 200 kbp) between the crtS replication and the initiation of Chr 2 replication. The exact function of this pause is yet unknown, but it is hypothesized to be needed for activation of the rctB (Chr 2's own replication initiator) and ori2 initiation system [12]. In other words, the chromosomal position of crtS and the pause contribute to synchronize termination of Chr 1 and Chr 2 replication. Furthermore, the synchronized termination is likely linked to coordination of chromosome segregation and cell division [12].
Another intriguing phenomenon regarding replication of vibrio genomes is that genes surrounding ori1 can be found in multiple copies during the replication process due to successive initiations of replication from ori1 (i.e., multifork replication) [13,14]. This phenomenon is a hallmark of fast-growing bacteria, such as Vibrio cholerae and Vibrio natriegens, and is believed to be a growth optimization strategy to satisfy the high demand of essential growth factors during fast growth [15][16][17]. Using an elegant genetic approach, Soler-Bistué et al. (2015) showed that by relocating the major ribosomal protein gene locus (s10-spec-α) of V. cholerae further away from ori1, growth rate, the gene copy number and mRNA abundance of this cluster were reduced [18]. The authors concluded that there is a strong correlation between chromosomal gene position and effects on the bacterial physiology. Later, the same model system (i.e., V. cholerae with relocated s10-spec-α locus) was used to study effects on bacterial tness under slow growth conditions (i.e., no multifork replication) [19]. One conclusion from this study was that bacterial tness was reduced when the s10-spec-α locus was located distal to ori1, which demonstrates that genomic positioning of ribosomal protein genes not only affects growth, but also cell tness across the whole life cycle. In a recent study, Soler-Bistué et al. (2020) showed that relocation of the s10-spec-α locus lead to higher cytoplasm uidity and the authors suggested that changes in the macromolecular crowding of the cytoplasm impacts the cellular physiology of V. cholerae. Interestingly, the protein production capacity in V. cholerae was independent of the position of the s10-spec-α locus [20].
In an interesting approach, Dryselius et al. (2008) used qPCR and microarray to study how copy numbers of genes vary across the entire genome of several Vibrio species (V. parahaemolyticus, V. cholerae and V. vulni cus) under different growth conditions, and then monitored how the data correlated with gene expression levels (also using microarray) [21]. The authors found greatest differences in gene copy numbers across the large chromosome (Chr 1) compared to the smallest (Chr 2) when grown in a rich medium. In general, the trend is that gene copy numbers increase from the terminus towards the origin of replication, and that this increase is re ected by increasing gene expression levels. The same trend was not found for slow-growing bacteria (i.e., when grown in minimal medium). Also, for Chr 2 gene expression levels were low and apparently independent of gene copy number effect. Similar ndings were later described in V. splendidus [22]. Here, genes located on Chr 1 were 3.6 × more expressed compared to those located on Chr 2, and the highest expression values were typically associated with genes surrounding the origin of replication on Chr 1.
In summary, the genome of Vibrionaceae bacteria, which consists of two circular chromosomes, is replicated in a highly ordered fashion. In fast-growing bacteria, replication results in higher gene copy numbers, and increased expression of genes located close to the origin of replication of Chr 1. That the expression of growth-related genes located close to ori1 is elevated during fast growth is known, but a general picture of which gene types are found close to ori1, and how expression of each gene type is affected, is however not known. To address this knowledge gap we revisited the intriguing topic of genome architecture in Vibrionaceae. In a pangenome approach we used available genomes to calculate and divide clusters of orthologous genes into the main categories "core", "softcore", "shell" (accessory) and "cloud" (unique), and used this information to determine how the corresponding genes are distributed on Chr 1 and Chr 2 of selected Vibrionaceae genomes. Data from publicly available gene expression experiments was mapped back to the pangenes to determine gene expression pro les under different environmental conditions such as expression data from the fast-growing bacterium Vibrio natriegens grown under optimal or minimal growth conditions, and data from the sh-pathogen Aliivibrio salmonicida grown under salt concentration and temperature that mimics the physiological conditions during infection. Our results show a non-random distribution of genes on the two chromosomes of Vibrionaceae. The gene distribution was then compared with global gene expression trends, and we nd a strong correlation between expression levels and distance from ori1. Surprisingly, despite a biased gene distribution pattern, all pangene categories contribute to a skewed expression pattern at fast-growing conditions. Finally, based on our data we propose a model that describes how pangenes are spatially distributed inside Vibrionaceae bacterial cells, and we discuss possible implications of the proposed model.

Results
Pangenome calculations based on 124 complete Vibrionaceae genomes identi es 710 clusters of orthologous core genes To categorize all genes associated with Vibrionaceae genomes into distinct classes, we downloaded all complete genomes from the NCBI RefSeq database (124 as of May 2018, see Additional le 1: Table S1), and then used GET_HOMOLOGUES v3.1.0 [23] to cluster orthologous protein sequences based on the OrthoMCL algorithm. The pangenome calculations identi ed a total of 61,512 clusters, of which 710 were encoded by genes found in all 124 genomes (i.e., core genes). The remaining clusters are distributed among softcore (encoded by ≥ 95% genomes), shell (encoded by ≥ 2 genomes) and cloud (encoded by single genomes) genomes, and contain 1,796, 14,642 and 45,074 clusters, which represents 3%, 23% and 73% of the total clusters, respectively. In individual genomes, core gene clusters represent 1.2% of the pangenome, and comprise 10-17% of the total genes. Similarly, softcore constitutes 24-34% (1,489-1,796 genes per genome) of the total genes. Core and softcore genes densely populate the upper half of Chr 1 The four gene categories core, softcore, shell and cloud, were next mapped to their chromosomal locations to investigate whether they are randomly or non-randomly distributed on each chromosome.
First, genes of eleven selected Vibrionaceae representatives were classi ed as either upper or lower (i.e., upper or lower half of the chromosome) based on their chromosomal location on Chr 1 and Chr 2 in relation to their distance of the origin of replication. As presented in Fig. 1 (complete table is available as Additional le 2: Table S2), core and softcore genes are signi cantly overrepresented (adjusted chi-square P-value ≤ 0.05) in the upper half of Chr 1 in all investigated genomes. Similarly, shell and cloud genes on Chr 1 are signi cantly overrepresented (adjusted chi-square P-value ≤ 0.05) in the lower half of Chr 1 in 8 and 7 genomes, respectively, supporting a non-random distribution of genes on Chr 1. In contrast to Chr1, genes of all categories on are much more evenly distributed on Chr 2. Although shell, cloud and softcore genes show non-random distribution on Chr 2 in some of the investigated genomes (softcore 3/11, shell 2/11, cloud 3/11), the majority of genomes show no signi cant bias (adjusted chi-square P-value ≤ 0.05). Furthermore, core genes were not signi cantly overrepresented in either lower or upper half of Chr 2 in any of the genomes.
To provide a more ne-grained picture of the core (710-721) and shell (749-2753) gene distributions, we plotted the distribution of core and shell genes on Chr 1 and Chr 2 of eleven Vibrionaceae taxa using the genome comparison tool Circos [24] (Fig. 2). Each plot was centered on mioC (Chr 1) and rctB (Chr 2). Our results show that although the exact distribution pattern varies between species, the biased distributions of core and shell, as described above, are striking and readily visible with the naked eye. Interestingly, although core genes densely populate the upper half of Chr 1, the region immediately surrounding ori1 contains very few core genes. This region (denoted "i" in Fig. 2) is, in contrast, densely populated by softcore genes (at least in V. natriegens and A. salmonicida, see section below). Also, a region (denoted "ii" in Fig. 2) of approximately 500 kb surrounding ter1 are more sparsely populated with core genes than the rest of the chromosome. Figure 2b shows that the shell genes are distributed in an evenly fashion without any large gaps on both chromosomes. However, genera represented with one or few genomes in the dataset have fewer shell genes and hence more gaps (e.g. G. hollisae ATCC 33564, Photobacterium damselae KC-Na-1 and P. profundum SS9).
In summary, the results presented here reveal that core, softcore, shell and cloud genes are non-randomly distributed on Chr 1. Core and softcore genes are more likely to be located on the upper half of Chr 1, whereas shell and cloud genes tend to be located closer to the replication terminator. For Chr 2, the distribution of the four pangene categories are in general randomly distributed showing locational bias only for a few genomes.
Expression levels of genes located on Chr 1 of V. natriegens and A. salmonicida generally correlate with distance to ori1 Figure 3 shows how core, softcore, shell and cloud pangenes are distributed on Chr 1 and Chr 2 of V. natriegens and A. salmonicida. The pattern is consistent with the biased gene distribution pattern described above, with core and softcore genes being overrepresented at the upper half of Chr 1, and shell and cloud genes being overrepresented at the lower half. The two species were chosen as models for comparison of gene expression data with pangene distribution patterns. Speci cally, we were curious to examine if regions that are densely populated by core/softcore pangenes are expressed at high levels, compared to regions more sparsely populated by core/softcore pangenes. This expectation is based on previous data from V. parahaemolyticus and V. cholerae, which showed that growth rates of these bacteria have large impacts on the copy number (gene dosage) of genes located on Chr 1, as well as on gene expression levels [10,21,25]. Fast-and slow-growing bacterial representatives were therefore chosen for this particular comparative analysis. V. natriegens is a fast-growing bacterium commonly found in estuarine mud, with doubling times below 10 minutes at favourable conditions [26]. A. salmonicida is, in contrast, a slow growing Vibrionaceae bacterium, and the causative agent of cold-water vibriosis in e.g., Atlantic salmon and cod [27,28]. To correlate gene distribution with gene expression data, publicly available RNA-seq data of V. natriegens and A. salmonicida were downloaded from the Sequence Read Archive [29] at NCBI. For V. natriegens, datasets from growth in minimal (BioSample no. SAMN1092609, SAMN10926310 and SAMN10926313) and optimal (rich) medium (sample no. SAMN10926311, SAMN10926312 and SAMN109329) at 37 °C to OD 600nm 0.3-0.5 were chosen [30].
These conditions were selected because they represent slow, as well as fast growth conditions. For A. salmonicida, a dataset (sample no. SAMEA4548122, SAMEA4548133, SAMEA4548134) originating from growth in LB medium containing 1% NaCl at 8 °C to mid log phase (OD 600nm ~ 0.5) was used [31]. The salt concentration is expected to be similar to the concentration the bacterium would experience inside its natural host (Atlantic salmon), where the bacterium is known to cause cold water vibriosis at temperatures below 10 °C [27,28]  similar in all three datasets, i.e., RPKM values are typically above the median value at the upper half (i.e., the region closest to the origin of replication), but lower at the region surrounding the terminus, independent of growth conditions. This is somewhat surprising since the observed pattern was expected for fast growing cultures (i.,e V. natriegens in rich medium), but not for slow growing cultures (i.e., V. natriegens in minimal medium (see Additional le 3: Fig. S1), and A. salmonicida in LB, 1% NaCl and are expected to be correlated with growth rates/multifork replication [21]. For Chr 2, the results are more ambiguous, although overall similar between minimal and rich growth. For A. salmonicida, expression around the terminus is, on average, higher compared to that of regions adjacent to ori2. For V. natriegens, expression is generally higher than median in regions surrounding the terminus, but varies across the remaining parts of Chr 2. Similar to Chr 1, little difference could be determined between the slow-and the fast-growing datasets of Chr 2.
In summary, we found that global expression levels for Chr 1, consistently correlate with the distance to the origin of replication. The log2 ratio of RPKM CDS:RPKM median decreases as the distance from origin of replication increases.
All pangene categories contribute to higher expression levels around ori1 at fast-growth conditions, but not at slow-growth conditions The global trend described above can be explained by generally higher expression levels of all pangene categories located close to ori1, or, generally higher expression of three or less of the four pangene categories. To discriminate between the two alternatives, we calculated the RPKM median value for each pangene category, and compared the median values for genes located on the upper or lower halves of Chr 1 ( Table 1). The Wilcoxon signed-rank test strongly support (P-adj ≤ 0.05) that median values for all four pangene categories are signi cantly higher for genes located on the the upper half, i.e., when V. natriegens is cultured at fast-growth ("optimal") conditions. Notably, when grown under slow-growing conditions, median values for softcore, shell and cloud genes located on the upper half are signi cantly higher. Core genes are in contrast, expressed at equal levels on both halves. This applies for both V. natriegens (RPKM median = 370 and 360, P-adj = 0.321) in minimal medium, and A. salmonicida (RPKM median = 301 and 309, P-adj = 0.717) at suboptimal conditions. To summarize, we conclude that gene expression levels correlate with distance to ori1 (Fig. 4), and genes from all four pangene categories contributes to this trend when grown under fast-growing conditions, whereas softcore, shell and cloud genes contributes at slow-growing conditions.

Discussion
Inspired by the discovery of multifork replication and increased copy numbers of genes surrounding the origin of replication, researchers have for decades studied how different categories of genes are distributed on chromosomes and at which level these genes are expressed. Here, we revisited this topic and describe hitherto hidden/unrecognized global gene distribution and expression patterns in Vibrionaceae. First, we mapped pangenes to their chromosomal positions and revealed that core and softcore genes are found heavily biased towards the ori1 of Chr 1. Shell genes are, in contrast, overrepresented at the opposite part of Chr 1 (i.e., close to ter). We next found that gene expression strongly correlates with chromosomal distance to ori1. This trend is caused by higher expression of all pangene categories at fast-growing conditions, whereas softcore, shell and cloud genes are responsible for biased (higher) expressing on the upper half of Chr 1 at slow-growing conditions.
Pangene categories are non-randomly distributed on Chr 1 In this work we report a clear pattern where core/softcore genes are overrepresented on the upper half of Chr 1 of Vibrionaceae, particularly at regions corresponding to 10-11 and 1-2 O'clock on Chr 1, and shell/cloud genes are overrepresented in the ter1 region (Fig. 2). In comparison, no clear pattern was recorded for Chr 2, i.e., the distribution of pangenes appear generally independent of location. For Chr 1, the core/softcore gene distribution pattern resembles that described for genes involved in translation and transcription in E. coli [16,17,33] and in several Vibrio species [16,17,21]. More precisely, Couturier and Rocha (2006) showed that genes involved in translation and transcription in four Vibrio species are typically found close to ori1 of Chr 1. Chr 2 contained, in contrast, fewer genes related to translation and transcription than would be expected. Iida and coworkers [21] later found that genes related to growth (both essential and contributing) are located in close proximity to ori1 in V. cholerae. Overrepresentation of core/softcore genes, many of which are important for growth, at the region proximate to ori1 of Vibrionaceae Chr 1 can be explained by an increase in demand for ori1-proximate gene products during fast growth (i.e., multifork replication results in elevated gene copy numbers and increased transcription levels). For example, genes that encode ribosomal RNA and ribosomal proteins are found clustered in the upper half of Chr 1, and are expressed at extremely high levels, which support this hypothesis.
Moreover, we found that during fast growth of V. natriegens, core, softcore, shell and cloud genes are all expressed at higher levels on the upper half of the chromosome compared to the lower half. In slowgrowing V. natriegens and A. salmonicida, only softcore, shell and cloud genes followed the same trend, which suggests that regulatory mechanisms other than "gene dosage" are in play, to ensure a relatively low and uniform expression of core genes independent of chromosomal position during slow growth.
Why are core and softcore genes clustered at the old pole area of cells?
It is well documented in the literature that the intracellular space of bacteria is highly organized, with de ned structures at speci c locations (reviewed by Surovtsev and Jacobs-wagner 2019) [34]. For example, Chr 1 and Chr 2 of V. cholerae are spatially organised in a longitudinal orientation inside the cell, with their chromatin stretching from one pole to the other [35,36]. ori1 and ter1 of Chr 1 are located at the old and new poles, respectively, whereas ori2 and ter2 of Chr 2 stretches from the old pole towards the cell's center, respectively (Fig. 5). In the light of this knowledge, our data then suggest that core/softcore and shell/cloud genes are enriched at two spatially separated intracellular regions, i.e., at the two extreme poles of Vibrionaceae cells, given that the spatial positioning of chromatin described for V. cholerae applies to all representatives within the family.
So, why are core and softcore genes clustered at the old ( agellated) pole area? The non-random structural organization of the genes suggests to us that there is a strong link between gene placement and their function, and that the underlying reasons for the strong distribution pattern could be very complex. The full complexity of factors that affects gene expression can be illustrated by e.g., chromatin packing [37][38][39][40][41], nucleoid-associated proteins (NAPs) [42][43][44], Structural Maintenance of Chromosome complex (SMC) [45], RNA polymerase (RNAP) [46][47][48][49][50], transcription factors and promoter strength/chromosomal position [43,51] and macromolecular crowding [20]. Perhaps the most fundamental factor is chromatin packing and organization. The density of chromatin is determined by a number of circumstances, including differential abundance/availability of macromolecular machineries [38, 41, 46-50, 52, 53]. In this respect the bipartite DNA organization of Vibrionaceae represents a special case because Chr 1 stretches from pole to pole, whereas Chr 2 prolongates from the new pole towards the cell center, thus suggesting that the chromatin density varies between the two halves of the cell. Higher chromatin density will presumably reduce the diffusion of macromolecular particles, such as proteins and ribosomes, in the nucleoid/DNA meshwork. Given that the DNA density is lower in the old pole area, the extra cytoplasmic space will presumably result in increased diffusion and transport of gene products, which provides a plausible explanation for the high abundance of core genes (many of which are growth related), and also the RP and rRNA clusters, in this subcellular region. Production of core gene products will therefore coincide and co-localize with the greatest number of growth/survival-related reactions and processes in the cell. A number of such cases can be mentioned, albeit we highlight two potential cases below.
The insertion of peptidoglycan (PG) in the cell wall happens in a dispersed manner, with the active growth zones along the axis [54]. To form the inner curvature of Vibrio cells, PG insertion is biased along the outer curve. Genes involved in cell wall synthesis are located in close proximity to ori1 on V. cholerae Chr 1, with the main gene cluster related to nascent PG synthesis positioned approximately 0.38 Mb from ori1. This suggests that the rst step of PG synthesis preferentially takes place in the old pole area. Similarly, motility related genes are found clustered 0.6 Mb from ori1, which is spatially close to the agellum at the old pole. To summarize, the spatial organization of Chr 1 and Chr 2 and the biased organization pangenes suggests that there is a strong link between gene placement and their function.

Conclusions
Our results show a non-random organization of pangene categories on the two chromosomes of Vibrionaceae, with an overrepresentation of core and softcore genes around ori1. Gene distribution was compared with global gene expression trends and showed that during fast growth, all pangene categories contribute to a skewed expression pattern in respect to ori1. From our data and previous literature, we can deduce that core and softcore genes are overrepresented at the old pole area of V. cholerae. We hypothesize that this pattern can be bene cial due to spatial links between the structural organization of core genes and their cellular function, and that differences in intracellular DNA densities might further contribute to the biased gene distribution. These ndings add to the growing list of examples of spatial order in bacteria, and scientists will surely continue to study the interplay between genome organization, gene activity and cellular function. We envision to explore how different pangene categories are distributed on chromosomes of other bacterial orders, and to search for similar spatial links to gene functions to investigate if our current ndings are part of a general trend in Bacteria, or speci c to Vibrionaceae.

Genome retrieval and gene annotation
As of May 2018 a total of 124 complete Vibrionaceae genomes were publicly available at the National Center for Biotechnology Information (NCBI) which were downloaded from the RefSeq database at NCBI [55] (see Additional le 1: Table S1 for a complete list). All genome sequences were re-annotated using RAST (Rapid Annotation using Subsystem Technology) version 2.0 [56] with default settings. The annotation of the 124 genome sequences resulted in a total of 555,513 annotated protein sequences.
Pangenome approach to extract core, softcore, shell and cloud genes from large genome dataset To categorize the annotated Vibrionaceae protein sequences into four categories (core, softcore, shell and cloud genes) we performed pangenome analysis using the software package GET_HOMOLOGUES (v3.1.0 (20180103)) [23]. The clustering algorithm OrthoMCL was used to cluster homolog protein sequences.
The parameter "minimum percent sequence identity" was set to 50 and "minimum percent coverage in BLAST query/subj pairs" was set to 75 (default).
Comparison of core, softcore, shell and cloud genes from 11 species We chose 11 representative species (based on phylogeny and scienti c interest i. e. number of papers published in PubMed) to study the distribution of core, softcore, shell and cloud genes on Chr 1 and Chr 2. Chr 1 and Chr 2 were divided into "upper half" (close to ori) and "lower half" (close to ter) and the number of core, softcore, shell and cloud genes in each half were counted. The 11 species were used to study the exact chromosomal positions of core and shell genes on Chr 1 and Chr 2. The DoriC database [57] was used to locate ori1 and ori2 in Chr 1 and Chr 2 to subsequently center the plotted chromosomes at origin of replication, respectively at mioC on Chr 1 and rtcB on Chr 2. The software package Circos [24] was used to visualize the gene distributions on the chromosomes.

Analysing gene expression: Mapping of read les on reference genomes
To study gene expression of core, softcore, shell and cloud genes in A. salmonicida LFI1238 and V. natriegens ATCC 14048 (NBRC 15636, DSM 759), the following datasets were downloaded from the Sequence Read Archive [29] at the NCBI: for V. natriegens ATCC 14048 datasets from growth in minimal (BioSample accession no. SAMN10926309, SAMN10926310 and SAMN10926313) and optimal (rich) medium (sample no. SAMN10926311, SAMN10926312 and SAMN109329) at 37 °C to OD 600nm 0.3-0.5 [30]; for A. salmonicida LFI1238 one dataset (sample no. SAMEA4548122, SAMEA4548133, SAMEA4548134) originating from growth in LB medium containing 1% NaCl at 8 °C to mid log phase (OD 600nm ~ 0.5) [31]. The quality of the reads was checked using FastQC [58]. EDGE-pro v1.0.1 (Estimated Degree of Gene Expression in Prokaryotes) [32] in Galaxy was used to align cDNA reads to V. natriegens ATCC 14048 (assembly no. GCA_001456255.1) and A. salmonicida LFI1238 (assembly no. GCF_000196495.1) and estimate gene expression as reads per kilobase per million (RPKM) for all protein coding sequences (CDS). The RPKM values were then used to calculate the log 2 ratio RPKM CDS:RPKM median to make global expression maps for each of the three datasets.

Statistical analysis
Statistical analysis was performed using R in RStudio. Signi cance of gene distribution on either the upper or lower half of the chromosomes was performed using R's chisq.test() function for the nonparameteric chi-squared test (see Additional le 4: Table S3 for data). Signi cance of gene expression between gene classes located on the upper or lower half of the chromosomes was performed using R's wilcox.test() function for unpaired Wilcoxon signed-rank tests (see Additional le 4: Table S3 for data).
For both analyses P-values were Bonferroni corrected for multiple comparisons using R's p.adjust() function.

Declarations
Availability of data and materials All data analysed during this study are included in this published article, its additional les and publicly available repositories. The RNA-seq datasets used in this study are available at Sequence Read Archive at Bioproject Accession PRJNA522293 [30] and PRJEB17700 [31].
Ethics approval and consent to participate. Not applicable.
Consent for publication. Not applicable.
Competing interests. The authors declare that they have no competing interests.