Genomic features separating ten strains of Neorhizobium galegae with different symbiotic phenotypes

Background The symbiotic phenotype of Neorhizobium galegae, with strains specifically fixing nitrogen with either Galega orientalis or G. officinalis, has made it a target in research on determinants of host specificity in nitrogen fixation. The genomic differences between representative strains of the two symbiovars are, however, relatively small. This introduced a need for a dataset representing a larger bacterial population in order to make better conclusions on characteristics typical for a subset of the species. In this study, we produced draft genomes of eight strains of N. galegae having different symbiotic phenotypes, both with regard to host specificity and nitrogen fixation efficiency. These genomes were analysed together with the previously published complete genomes of N. galegae strains HAMBI 540T and HAMBI 1141. Results The results showed that the presence of an additional rpoN sigma factor gene in the symbiosis gene region is a characteristic specific to symbiovar orientalis, required for nitrogen fixation. Also the nifQ gene was shown to be crucial for functional symbiosis in both symbiovars. Genome-wide analyses identified additional genes characteristic of strains of the same symbiovar and of strains having similar plant growth promoting properties on Galega orientalis. Many of these genes are involved in transcriptional regulation or in metabolic functions. Conclusions The results of this study confirm that the only symbiosis-related gene that is present in one symbiovar of N. galegae but not in the other is an rpoN gene. The specific function of this gene remains to be determined, however. New genes that were identified as specific for strains of one symbiovar may be involved in determining host specificity, while others are defined as potential determinant genes for differences in efficiency of nitrogen fixation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1576-3) contains supplementary material, which is available to authorized users.


Background
The nitrogen-fixing soil bacterium Neorhizobium galegae has an easily distinguishable phenotype on the host plant species Galega orientalis Lam. and G. officinalis L. It is the only rhizobial species known to induce root nodules on Galega plants so far, making studies of its genomics an attractive area in the field of research on determinants of host specificity and nitrogen fixation efficiency. Although research on nitrogen fixation with N. galegae has been conducted since the description of the new species in 1989 [1], the mechanism(s) behind the specific interactions between Galega plants and their microsymbiont is still not well understood. The division of N. galegae strains into two symbiovars [2] with different phenotypes on the two host plant species brings further challenge into the study of this bacterium. Work has been done on the rhizobial signalling molecules, the Nod factors, of N. galegae [3,4], and the function of the rare acetyl substitution has been investigated. However, a clear explanation for the host specificity observed on Galega has not been found within the Nod factors. It is obvious that more information is needed both on characteristics distinguishing N. galegae from other rhizobial species but also strains within the species having different symbiotic phenotypes, as well as on characteristics of the host plant that may act to discriminate between strains of the same bacterial species.
The complete genomes of two strains of N. galegae were recently sequenced to shed some light on the basic genomic features separating N. galegae from other rhizobia and its symbiovars from each other [4]. The complete genome sequences of these strains, the type strain HAMBI 540 T representing symbiovar (sv.) orientalis and strain HAMBI 1141 representing sv. officinalis, are invaluable to this research but not enough to observe genomic patterns related to the bacterial species or its symbiovars. Therefore, we have produced draft genomes of eight additional strains of N. galegae, four strains each of the symbiovars orientalis and officinalis, which combined with the previously sequenced complete genomes make a good representation of the N. galegae population. These data also enable a deeper study of the genomic patterns separating the two symbiovars as well as strains showing different nitrogenfixing capacities, than was previously possible. In this study, data from the eight newly sequenced strains were combined with the whole-genome data of strains HAMBI 540 T and HAMBI 1141, and analysed to produce information on genomic characteristics of the species N. galegae. Analysis of subgroups within the species, defined by the symbiotic phenotypes observed on Galega plants, together with experimental evidence revealed that the sv. orientalisspecific rpoN2 gene as well as the nifQ gene are necessary for nitrogen fixation. Genes possibly related to enhanced plant growth promoting capabilities are also discussed.

Symbiosis gene regions are well conserved within the symbiovars
Draft genomes of eight strains of N. galegae were produced, generating genomes consisting of between 54 and 148 contigs. The total size of the sequenced genome is between 6 and 7 Mbp for all strains (Table 1), comprising two to four replicons per strain, as predicted by the number of repABC operons and preliminary assembly of contigs. The eight new genomes were analysed together with the previously sequenced strains HAMBI 540 T and HAMBI 1141 [4], to produce new information on the genomic differences separating strains of the two symbiovars.
Upon analysis of the nod, noe, nif and fix genes found in the gene regions corresponding to the symbiosis gene regions of HAMBI 540 T and HAMBI 1141 [4], the gene content was found consistent with regard to the symbiovar ( Figure 1). The same nod, noe, nif and fix genes can be found in all strains. The four sv. orientalis strains all share the nod, nif and fix gene structure found in strain HAMBI 540 T , with one exception: a predicted transposase gene located between the gene for the T1SS HlyD family protein and nodN in strain HAMBI 2605. Beside this minor difference, the symbiosis gene region only differs between strains in the genes separating the nodE -nodJ cluster from the fixU -nifH gene cluster, as well as the genes separating nifH from nodU and nodU from the nodD2 -noeT gene cluster.
The analysed sv. officinalis strains have two main symbiosis gene region structures. Based on preliminary assembly of the contigs included in this region, strains HAMBI 490 and HAMBI 1146 had an almost identical gene content and gene arrangement (although there is no information on the exact sequence between nodJ and nifH in HAMBI 490), the only difference between the two being a predicted hypothetical protein located in between fdxB and fixA in HAMBI 490. The symbiosis gene region of these two strains is also very similar to that found in strain HAMBI 1141 ( Figure 1). On the other hand, strains HAMBI 1145 and HAMBI 1189 seem to have identical symbiosis gene regions, while differing from strains HAMBI 490 and HAMBI 1146 to some extent. The structure of the main gene clusters is the same, but the genes separating these nod, nif and fix gene clusters differ from those in the latter two strains both in the number of genes present and the predicted function of the same.
The rpoN2 gene is specific to sv. orientalis strains In addition to the nod, noe, nif and fix genes, all sv. orientalis strains have a hypothetical protein gene followed by an rpoN gene copy included in the symbiosis gene region, similar to the situation in strain HAMBI 540 T . None of the analysed sv. officinalis strains have these  Figure 1 Schematic representation of gene regions containing known symbiosis genes of strains sequenced in this study. Strains HAMBI 540 T and HAMBI 1141 are reference strains for which the complete genomes were sequenced previously [4]. A) Symbiovar orientalis strains. The genes PA10320 and PA10330 are T1SS genes. B) Symbiovar officinalis strains. The genes PB00900 and PB00910 are T1SS genes.
green, indicating that nitrogen fixation was impaired and that this copy of rpoN is required for functional symbiosis on G. orientalis.
The nifQ gene is divergent but functional Another gene located in the symbiosis gene region that makes a clear distinction between strains of the two symbiovars is the nifQ gene. Alignment of the protein sequences deduced from this gene showed that the NifQ protein is well conserved within the symbiovars, while there was a remarkable amount of substitutions when sequences were compared between symbiovars ( Figure 2). In addition, the NifQ sequences in N. galegae were highly divergent from those of the model species Azotobacter vinelandii and Klebsiella pneumoniae ( Figure 2). The NifQ sequences of all five sv. orientalis strains were 100% identical, while some minor differences in predicted start codon and location of the stop codon could be observed between strains of sv. officinalis. Gene replacement deletion mutants were constructed also for nifQ in both HAMBI 540 T and HAMBI 1141, and the mutant strains (HAMBI 3479 and HAMBI 3481 respectively) verified by genome sequencing and mapping to the respective reference strains. The mutant strains were tested on their respective host plants, generating results similar to those observed with the rpoN2 mutant. Figure 2 Alignment of NifQ seuquences. The NifQ protein sequences of the ten sequenced N. galegae strains were aligned to those of the model system species A. vinelandii and K. pneumoniae, and the rhizobial relatives Sinorhizobium fredii and Rhizobium tropici. The last two amino acid residues of S. fredii are not visible in the figure. Bottom line: Amino acid residues that are conserved in all N. galegae strains are indicated with "^", residues conserved in all included rhizobial strains with "!" and residues conserved in all strains in the alignment are indicated with "#". The molybdenum-binding motif region (Cx 4 Cx 2 Cx 5 C [38]), indicated with a yellow box, has not been preserved in N. galegae.
Plants were short and pale, clearly suffering from nitrogen deprivation, comparable to the outcome when G. orientalis is inoculated with sv. officinalis strain HAMBI 1141 and G. officinalis inoculated with sv. orientalis strain HAMBI 540 T . The nodules formed were small and white on G. officinalis, while nodules on G. orientalis were more pale pink or greenish, but still much more pale and smaller than effective nodules formed by wild-type HAMBI 540 T . Unfortunately, attempts to express the nifQ gene of HAMBI 1141 in HAMBI 540 T and vice versa were not successful. However, these results indicate that even if the nifQ genes found in N. galegae have diverged from the corresponding genes in related rhizobial species, these are required for nitrogen fixation.
Secretion systems of type IV and VI are common in N. galegae Because of their interesting ecological functions, the presence of type IV and type VI secretion systems (T4SSs and T6SSs) similar to those previously found in HAMBI 540 T and HAMBI 1141 was investigated in the eight newly sequenced strains ( Table 2). A T6SS was previously identified in HAMBI 540 T , while strain HAMBI 1141 has two different types of T4SSs [4]. The two T4SSs in HAMBI 1141 belong to the quorum sensing (QS)-regulated conjugation system (type I) and a type IV conjugation system, based on homology to systems assigned to these types defined by both structural and phylogenetic analyses by Ding and co-workers [5]. Among the three different kinds of secretion systems analysed in N. galegae, the T4SS of type I found on the symbiosis plasmid of HAMBI 1141 was the least well represented in the new strains. Although the T4SSs could be identified as being homologs of one of the two systems present in HAMBI 1141, the gene content was not always entirely the same as is found in HAMBI 1141. In HAMBI 1141 there is a traI/traR/traM QS regulation system present on the plasmid pHAMBI1141b together with the T4SS genes of type I. In HAMBI 490, there is what seems to be an incomplete set of QS-related genes located 16 kb from the T4SS of type IV, with two traR-like genes and one traI homolog but no traM gene. In HAMBI 2605, the traMR genes are present associated with the T4SS of type I, but no traI gene could be found. In addition, the traACDG genes are missing. However, the identified type I conjugation system genes in HAMBI 2605 are split onto two different contigs, and thus it is possible that the traIACDG genes could not be identified due to a gap in between these contigs. In the same strain there is also a T4SS of type IV with a virB8-like gene that is interrupted by a stop codon in the middle of the gene.
T6SS genes were found in strains HAMBI 540 T , HAMBI 2427, HAMBI 2566, HAMBI 2605, HAMBI 1145 and HAMBI 1189. These strains all have the same T6SS genes arranged in the same gene organisation, the whole gene region showing 83.1-85.0% nucleotide identity compared to HAMBI 540 T , with the exception of strain HAMBI 2427 which has a nucleotide identity of 99.8%. However, despite the relative abundance of these secretion systems observed in the sequenced N. galegae strains, the presence of a certain kind of secretion system could not be linked to strains of either symbiovar.

Ortholog groups define genes common in specified subgroups
Analysis of ortholog groups was performed on the proteomes of the ten sequenced N. galegae strains, to find genes typical for the species and those typical of subgroups of the species. Based on the 4255 ortholog groups shared by all ten strains, the core genome of N. galegae comprises between 4323 (HAMBI 1189) and 4346 (HAMBI 540 T ) proteins per strain. The number of strain-specific genes (i.e. singletons in the OrthoMCL analysis) ranges from 111 in HAMBI 2610 to 456 in HAMBI 2605. To investigate whether there are symbiovarspecific genes that could be revealed through analysis of the CDSs of the ten sequenced strains, ortholog groups containing genes from all five strains of one symbiovar, but not a single gene of strains representing the other symbiovar, were targeted. This analysis revealed 40 orientalis-specific ortholog groups and 28 officinalis-specific ortholog groups (Table 3). The officinalis-specific genes are interesting in that all of the 23 genes which are not located on the chromosome of the reference genome, are located either within the symbiosis gene region defined by the nod, nif and fix genes, or within 38 genes downstream of nodE. Also among the orientalis-specific genes, 20% are located in the corresponding downstream region of nodE in HAMBI  540 T . In order to investigate a possible connection between the rpoN2 gene and orientalis-specific genes, the putative promoter regions of the orientalis-specific genes were scanned for possible RpoN binding sites in strain HAMBI 540 T . However, the only possible (although not perfect) motif found was located 238-223 bp upstream of the hypothetical protein gene PA10540. Another OrthoMCL analysis was performed comparing the proteomes of HAMBI 540 T , HAMBI 1141 and eight strains representing closely related rhizobial species (Rhizobium leguminosarum sv. viciae, R. leguminosarum sv. trifolii, R. etli, R. tropici, Sinorhizobium fredii, S. medicae, S. meliloti and Mesorhizobium ciceri). When the N. galegae-specific ortholog groups of this analysis were compared to the core genome of N. galegae revealed by the analysis of the ten N. galegae strains, finally 441 ortholog groups were found to be common to and specific for all N. galegae strains (Additional file 1). Among these, 139 groups consisted of hypothetical proteins, and based on current knowledge none of the remaining ones seem to be directly related to known symbiotic functions.
Based on results from a greenhouse experiment testing the plant growth promoting capacity of N. galegae strains, the sv. orientalis strains HAMBI 540 T , HAMBI 2427 and HAMBI 2566 are very good nitrogen fixers (fix ++ ), while strains HAMBI 2605 and HAMBI 2610 show a lower level of nitrogen fixation (fix + ) ( Figure 3). To test whether there is a genomics-based pattern that could explain the differences in plant growth promoting efficiency, the OrthoMCL results were analysed from a point of view focusing on the nitrogen fixation properties of the strains. When ortholog groups containing genes shared by the fix ++ sv. orientalis strains but not present in the fix + strains were analysed, 54 such groups were found (Table 4). Among these groups, 11 were unique to the fix ++ sv. orientalis strains, i.e. genes not found in any other of the analysed strains but HAMBI 540 T , HAMBI 2427 and HAMBI 2566.

Discussion
Eight draft genomes of N. galegae were produced to enable a more profound study of the bacterial genomics contributing to the host specificity observed in the nitrogen-fixing symbiosis between this bacterium and its host plants.
Results of this study show that a major part of the genomes consist of genes common to all analysed N. galegae strains, while the two symbiovars can be separated based on a fairly limited set of genes only. The number of strain-specific genes varies a lot between strains, as does the number of replicons present, even if none of the sequenced strains has more than four replicons. Similar conclusions have been drawn for e.g. S. meliloti [6] and R. etli [7].
T4SSs and T6SSs are probably not related to symbiosis in N. galegae Type IV and type VI secretion systems in bacteria are important machineries contributing to ecological functions. T4SSs contribute to horizontal gene transfer and might be responsible for transfer of symbiosis genes between strains. In Österman et al. 2014 [4] we speculated that the T6SS could play a role in the host specificity of N. galegae, being present in HAMBI 540 T but not in HAMBI 1141. However, the present study shows that the presence of a certain secretion system can not be attributed to one symbiovar or the other, but these can be found in strains of both  symbiovars of N. galegae. There are some indications that the T4SSs found in N. galegae strains are undergoing changes. The atypical set of QS-related genes in strain HAMBI 490, which has a T4SS classified as type IV (which is not associated with QS regulation [5]), is not necessarily related to regulation of T4S. Moreover, it remains unknown whether the interrupted virB8-type gene in HAMBI 2605 renders this T4SS non-functional. The identified T6SS, on the other hand, seems well conserved in the strains of N. galegae where such a system is found.
The specific functions of rpoN2 and nifQ are important for nitrogen fixation The presence of a gene coding for a hypothetical protein followed by a second rpoN gene copy (i.e. another one in addition to the chromosomally located rpoN) in the symbiosis gene region appears to be a symbiovar orientalis-specific trait. The absence of amino acid substitutions in the RpoN2 proteins of the analysed strains indicates that the product of this second rpoN might perform a very specific function. When compared to the RpoN1 proteins, there is an accumulation of substitutions in the N-terminal region (region I). These differences might affect interactions with target DNA and activator proteins, resulting in transcription of different types of genes as observed in e.g. Rhodobacter sphaeroides [8] or transcription under different conditions. Two rpoN genes have been found also in other rhizobia, but the reported involvement in symbiosis differs. In Bradyrhizobium japonicum, both rpoN genes could replace each other functionally, although Annotations according to function assigned to strain HAMBI 540. Genes unique to symbiovar orientalis fix ++ strains (and thereby not present in any of the sv. officinalis strains) indicated in boldface.
one of them was regulated in response to oxygen [9]. In R. etli, symbiotic nitrogen fixation was drastically reduced when the rpoN2 gene was mutated, while mutation of the rpoN1 gene did not affect nitrogen fixation levels [10]. The rpoN2 gene was not expressed aerobically, but was strongly induced in bacteroids. Also in Mesorhizobium loti, the rpoN2 gene located on the symbiosis island has been shown to be essential for nitrogen fixation, whereas the chromosomally located rpoN1 gene is dispensable for nitrogen fixation [11]. The results of this study clearly show that the rpoN2 gene of N. galegae is required for nitrogen fixation, but a possible function in determining host specificity should be further investigated. The hypothetical protein gene preceding the rpoN gene varies in size between strains, but is nonetheless always part of the same ortholog group. There are, however, no clues to the possible function of this gene. The nifQ gene of N. galegae was previously the target of speculations that this gene is non-functional in this species, because of the apparent diversification of this gene from the corresponding gene in related species, as well as the lack of a molybdenum-binding motif [4]. However, this study showed that NifQ is in fact very well conserved within symbiovars, indicating that this could be an important protein after all. Gene replacement deletion studies performed in this work showed that the nifQ gene in both symbiovars is required for nitrogen fixation, although the importance of the observed differences in its protein sequence would deserve attention in future studies. Experiments with different levels of molybdenum could have been done to study the effect of the concentration of available molybdenum on the mutant phenotype. However, the results obtained with the conditions used provided enough evidence of the involvement of NifQ in symbiosis, leaving studies on the effect of the molybdenum level on functional symbiosis to the future.

Analysis of ortholog groups revealed potential future target genes in research on the effectiveness of nitrogen fixation
OrthoMCL analyses allowed the definition of the N. galegae core genome as well as a set of genes from the core genome that could not be found in related rhizobial species. Information about the core genome is useful when characteristics of all strains of N. galegae are studied, while information on the N. galegae-specific portion of the core genome might be useful when studying differences between N. galegae and other nitrogen fixers. More specifically, the OrthoMCL analysis of ten N. galegae strains revealed symbiovar-specific gene sets, as well as genes present in strains of sv. orientalis known to be good nitrogen fixers while missing in strains known to be less efficient plant growth promoters.
The genes found to be symbiovar-specific are mostly genes involved in transcriptional regulation and metabolic functions. These are not genes that have previously been directly associated with nitrogen fixation, but their possible involvement needs to be investigated in future experiments. In addition, as seen with the rpoN2 gene located in the symbiosis gene region of sv. orientalis strains, symbiovar-specific gene variants that have high sequence homology with other genes within the genome may not be detected as symbiovar-specific by the OrthoMCL analysis even if these are functionally different and obviously contribute to the pool of genes separating the two groups of strains. The possibility of the orientalis-specific genes being regulated by the orientalis-specific RpoN2 was investigated by searching for known RpoN binding motifs in upstream intergenic regions of these genes. The absence of probable RpoN binding motifs could mean that RpoN2 is either not connected to these genes or it is so specific that it recognises a modified RpoN binding motif compared to the known −24/-12 promoter [12].
Among the genes typical for good N. galegae nitrogen fixers in sv. orientalis were the norEFCBQD genes, which are all part of the nitric oxide (NO) reductase [13], and the nnrSR genes. The nnrS gene codes for a haem-and copper-containing membrane protein that is regulated by the product of nnrR [14]. These genes were also found as members of the accessory genome relevant for symbiotic interactions in S. meliloti [6]. NO production has been observed in functional nodules in bacteroidcontaining cells during Medicago truncatula -S. meliloti symbiosis [15] and has been found to have an important role in stress adaptation and the early stage of Lotus japonicus -M. loti symbiosis [16]. The fixLJ genes are positive regulators of symbiotic expression of nif and fix genes [17], but also expression of the nor genes is dependent on the FixLJ-FixK 2 regulatory cascade in concert with NO-activated nnrR under microaerobic conditions [18,19]. In the light of this information, the presence of NO reduction genes in the efficient nitrogen fixers of N. galegae indicates that the possibility to reduce NO might be an advantage for nitrogen fixation.
B. japonicum USDA 110 has been found to express genes involved in organic sulphur utilisation in root nodules [20]. The transporter genes for aliphatic sulfonates also found in the sv. orientalis fix ++ -specific gene set are involved in transport of alternative sources of sulphur [21] that can be used for amino acid synthesis. This possibility might be an advantage for strains under stressful conditions.
Another gene found to be present in all superior sv. orientalis nitrogen fixers but none of the less efficient ones, was an ntrP gene. The ntrP gene is an antitoxin gene forming a toxin-antitoxin (TA) module with ntrR in S. meliloti [22], found to regulate metabolic processes under stressful conditions such as those encountered when entering symbiosis. The presence of ntrP, binding to ntrR, lowers the negative effect of ntrR, thereby allowing a higher level of expression of genes favourable for nitrogen fixation [22].

Conclusions
Based on the genomic comparisons performed in this study, differences in genes known to be directly symbiosisrelated are small between strains of different symbiovars of N. galegae. Nevertheless, the observed symbiovar orientalisspecific rpoN2 gene as well as the nifQ gene, found in both symbiovars, were shown to be important for functional nitrogen fixation. The specific impact of these genes on host specificity should be further investigated. Secretion systems of type IV and type VI are common among strains of N. galegae, but do not seem to be involved in symbiotic functions. Based on the functional annotations of genes present in strains known to be good plant growth promoters but not in less efficient ones, an improved ability of nitrogen fixation seems to be correlated with an improved ability to use different metabolic substrates and an optimised regulation of metabolic functions under stressful conditions.

Bacterial strains and growth conditions
N. galegae strains used in this study are listed in Table 5. All strains were obtained from the HAMBI culture collection (University of Helsinki, Department of Food and Environmental Sciences, Division of Microbiology and Biotechnology). Strains were grown on TY or YEM agar plates and in TY broth at +28°C.

DNA isolation
Total DNA of strains used for genome sequencing (Table 5) was isolated using a modified CTAB (hexadecyltrimethylammonium bromide) procedure as described in Österman et al. 2014 [4]. DNA for PCR screening was isolated from 6 additional strains of sv. orientalis ( Table 5) using one of two different techniques. Most samples were prepared using an UltraClean Microbial DNA Isolation Kit (MO BIO Laboratories, Inc.), but DNA of strains HAMBI 2423 and HAMBI 2433 was prepared using the PrepMan Ultra Sample Preparation Reagent (Life Technologies), applying the protocol for preparation of samples for bacterial and fungal testing from culture broths.

Screening for rpoN
PCR screening for the second rpoN gene originally observed in the symbiosis gene region of HAMBI 540 T [4] was performed using primers rpoN-25F (5′-CCGAGTCACACCCAATGTGC-3′) and rpoN-1551R (5′-CGGACGGCCCGGCTATCC-3′) internal to the HAMBI 540 T gene. Amplification was done with Phusion High-Fidelity DNA Polymerase (Thermo Scientific) and the HF buffer, using a PCR cycle with initial denaturation at 98°C 30 s, 35 cycles of denaturation 98°C 10 sannealing 69°C 30 selongation 72°C 50 s, and final elongation at 72°C for 10 min. PCR products were verified on a 1% agarose gel. Strains used for screening are listed in Table 5.

Genome sequencing, assembly and annotation
Genomic DNA (1 μg) was fragmented in a microTube (100 μL) using Covaris S2 (LGC Genomics). Half of the fragmented DNA (50 μL) was purified using a MinElute Reaction Cleanup kit (Qiagen) and eluted in 25 μL EB buffer. End repair and A-tailing was done on the purified DNA (25 μL) using DNA T4 Polymerase ( . The PCR reaction was purified using AMPure XP and eluted in a volume of 20 μL. The library was checked using Bioanalyzer on a DNA High Sensitive chip (Agilent Technologies). The concentration was measured using a High Sensitive kit on Qubit (Invitrogen). The libraries were pooled and sequenced in two partial paired-end runs on an Illumina MiSeq sequencer using the v2 and v3 sequencing kit. The obtained MiSeq raw sequences from both sequencing rounds were subjected to quality filtering and overlapping sequences with a Phred quality score Q30 or above were extended using FLASH [23] and assembled using Newbler (Roche). Gene prediction was done with Prodigal ver. 2.50 [24] followed by functional annotation with the PANNZER tool [25]. The tRNA genes were annotated using tRNAscan-SE 1.3.1 [26] and rRNA genes identified with RNAmmer 1.2 [27] and alignment to corresponding genes of HAMBI 540 T and HAMBI 1141. The gene predictions of known symbiosis-related genes were manually checked. The draft genomes were submitted to the European Nucleotide Archive in the form of contigs [EMBL: ERS526350-ERS526357]. The sequences can be accessed through the link http://www.ebi.ac.uk/ena/ data/view/PRJEB6976.

Mutant construction and verification
Gene replacement deletion mutants of rpoN2 (ΔrpoN2::Ω-Km) and nifQ (ΔnifQ::Ω-Km) were constructed by marker exchange where the target gene was replaced with the Ω-Km interposon [28] containing the nptII gene. The rpoN2 gene was mutated in strain HAMBI 540 T (mutant strain HAMBI 3480) and the nifQ gene in both HAMBI 540 T (mutant strain HAMBI 3479) and HAMBI 1141 (mutant strain HAMBI 3481). Upstream flanking regions, the left arms, of the genes to be mutated were amplified (1167 bp for rpoN2, primers RpoNLLSpeI and RpoNLRBamHI; 1161 bp for HAMBI 540 T nifQ, primers nifQLLSpeI-ori and nifQLRBamHI-ori; 1092 bp for HAMBI 1141 nifQ, primers nifQLLSpeI-2 and nifQLRBamHI-2) as well as downstream flanking fragments, the right arms (1073 bp for rpoN2, primers RpoNRLBglII and RpoNRRXhoI; 1092 bp for nifQ in both HAMBI 540 T and HAMBI 1141, primers nifQRLBamHI and nifQRRXhoI), using Phusion or DyNAzyme II polymerase (Thermo Scientific). Primer sequences are listed in Table 6. The amplified fragments contained short regions of the 5′ and 3′ ends respectively, of the genes. The primers contained restriction endonuclease sites (BamHI and SpeI for the left arm and BglII or BamHI and XhoI for the right arm) to facilitate directional cloning. The Ω-Km interposon was released from pHP45Ω-Km [27] by BamHI digestion, purified and ligated along with the PCR products (digested with BamHI + SpeI and BglII/BamHI + XhoI respectively and purified) into pJQ200SK [29] that had been digested with SpeI and XhoI and dephosphorylated. The resulting constructs where the Ω-Km interposon was inserted between the two PCR products was transferred into Escherichia coli S17-1 λpir by electroporation (ca 30 ng of plasmid construct into 40 μL of electrocompetent cells, electroporation at 2.5 kV, 25 μF and 200 Ω in 0.2 cm spaced cuvettes) and confirmed by restriction analysis and sequencing (sequencing primers T3 as well as gene-specific primers rpoNL-646, oriNifQL-570 or offNifQL-661 for the left arm; M13 UP as well as gene-specific primers rpoNR-413, oriNifQR-583 or offNifQR-560 for the right arm). The verified constructs were then transferred into E. coli ST18 [30], the donor strain used to transfer each construct into R. galegae HAMBI 540 T and HAMBI 1141 (nifQ only) by biparental spot mating. Mating was conducted by mixing stationaryphase recipient with late log-phase or stationary-phase donor, pelleting the cells, followed by resuspension in 50 μl of MilliQ water and spotting on a TY plate with 5-aminolevulinic acid (ALA; 50 μg/mL). Exconjugants were plated onto def8 agar [31] containing 5% sucrose and neomycin (25 μg /mL), or TY agar containing 5% sucrose and neomycin (50 μg/mL), to select for cells in which the suicide plasmid had been inserted and pJQ200SK removed via recombination events. Mutant candidate clones were colony purified on TY (Nm 50 μg/mL) plates and tested for sensitivity to gentamicin. The final neomycin resistant, gentamicin sensitive gene replacement mutants were further confirmed by PCR analysis and sequencing. The insert-flanking regions were amplified with two sets of primers: rpoNmutLL (rpoN), 540nifQL-1234 F (HAMBI 540 T nifQ) or 1141nifQL-1184 F (HAMBI 1141 nifQ) together with hsnTmutLR, amplifying from within the Rhizobium DNA upstream of the left arm to the 5′ end of the interposon; and primers hsnTmutRL and rpoNmutRR (rpoN) or nifQR-1209R (HAMBI 540 T and 1141 nifQ), amplifying a fragment from within the 3′ end of the interposon to the Rhizobium DNA downstream of the right arm. These PCR fragments were sequenced over the junctions to confirm that homologous recombination had worked properly.
The mutant strains were finally whole-genome sequenced to verify that the insert was present only in the intended location, replacing the deleted gene, and that no other deviations from the reference strain were present. Total DNA of the strains was isolated using the CTAB method as described in section "DNA isolation". Sequencing was done on an Illumina MiSeq sequencer as described above for the eight genome-sequenced strains, using the v3 sequencing kit, to coverages of 26 x (HAMBI 3480), 23 x (HAMBI 3479) and 24 x (HAMBI 3481). Raw sequence reads were quality filtered and the sequences with a Phred quality score Q25 or above were extended with FLASH [23] and both extended and non-extended fragments assembled using Newbler (Roche). The resulting contigs were then mapped to the reference genome (HAMBI 540 T or HAMBI 1141) using the BWA-SW algorithm of the BWA software package [32]. The mapping results were then manually checked for consistency.

Plant tests of mutants
Nodulation tests of the mutant strains were performed on their respective original hosts, Galega orientalis or Galega officinalis. G. orientalis seeds were surface sterilised by washing the seeds in 96% ethanol for 1 minute, 3% sodium hypochlorite for 3-5 minutes and washing with sterile water 5-6 times for 1-2 minutes. The sterilised seeds were germinated on TY agar plates at room temperature in darkness. G. officinalis seeds were surface sterilised by the following procedure: washed with concentrated sulphuric acid for 15 minutes, rinsed with sterile water 8 times for 2 minutes, kept in 96% ethanol for 1 minute and finally washed with sterile water 6 times for 2-5 minutes. Germinated seeds were transferred to glass jars containing a mixture of Leca gravel (4-10 mm), sand (0.5-1.2 mm) and vermiculite. The components were washed and mixed at a ratio 3:5:5, jars were filled with the mixture and sterilised at 160°C for 24 h. Each jar (containing about 700 mL of soil mixture) was watered with 125 mL quarter-strength nitrogen-free Jensen medium [33] before seeds were planted, three plants per jar. Inoculant strains were grown in TY medium to an OD 600 of about 1.0, pelleted and resuspended in water, and plant seedlings inoculated with 1 mL bacterial suspension. Negative control seedlings were inoculated with 1 mL of sterile MilliQ water. Inoculated seedlings were covered and each jar watered with 50 mL sterile water. Plants were then grown in a growth chamber