Characterisation of multiple heavy metal resistance loci in the genome of the novel species Cupriavidus neocaledonicus STM 6070, a nickel- and zinc-tolerant Mimosa pudica microsymbiont isolated from mining site soil CURRENT STATUS: ACCEPTED

Background Cupriavidus strain STM 6070 was isolated from nickel-rich mine roadside soil near Koniambo massif, New Caledonia, using the invasive legume trap host Mimosa pudica. STM 6070 is a heavy metal-tolerant strain that is highly effective at fixing nitrogen with M. pudica. Here we have provided an updated taxonomy for STM 6070 and described salient features of the annotated genome, focusing on heavy metal resistance (HMR) loci and heavy metal efflux (HME) systems. Results The 6,771,773 bp high-quality-draft genome consists of 107 scaffolds containing 6,118 protein-coding genes. ANI values show that STM 6070 is a new species of Cupriavidus. The STM 6070 symbiotic region was syntenic with that of the M. pudica -nodulating Cupriavidus taiwanensis LMG 19424T. In contrast to the nickel and zinc sensitivity of C. taiwanensis strains, STM 6070 grew at high Ni2+ and Zn2+ concentrations. The STM 6070 genome contains 55 genes, located in 12 clusters, that encode HMR structural proteins belonging to the RND, MFS, CHR, ARC3, CDF and P-ATPase protein superfamilies. These HMR molecular determinants are putatively involved in arsenic (ars), chromium (chr), cobalt-zinc-cadmium (czc), copper (cop, cup), nickel (nie and nre), and silver and/or copper (sil) resistance. Seven of these HMR clusters were common to five symbiotic and three non-symbiotic Cupriavidus species, while four clusters were specific to STM 6070, with three of these being associated with insertion sequences. Within the specific STM 6070 HMR clusters, three novel HME-RND systems (nieIC cep nieBA, czcC2B2A2, and hmxB zneAC zneR hmxS) were identified, which constitute new candidate genes for nickel and zinc resistance. Conclusions STM 6070 belongs to a new Cupriavidus species, for which we have proposed the name Cupriavidus geographical patterns of symbiont diversity in the invasive legume Mimosa pudica can be explained by the competitiveness of its symbionts by

organism for heavy metal resistance [21]) and its heavy metal-sensitive derivative AE104 (CH34 T devoid of the plasmids pMOL28 and pMOL30 that confer heavy-metal-resistance [38]) at various concentrations of Ni 2+ (Figure 1). Of the tested strains, STM 6070 had the highest tolerance to Ni 2+ and was the only strain capable of growth at 15 mM NiSO 4 .
C. metallidurans CH34 T grew in the presence of 10 mM NiSO 4 , while AE104 was unable to grow at 3 mM NiSO 4 . Previous studies had established that other symbiotic C. taiwanensis strains LMG 19424 T from Taiwan [13] and C. taiwanensis STM 6018 from French Guiana [6] were also unable to grow at 3 mM NiSO 4 (data not shown).
In light of the observed Ni 2+ tolerance of STM 6070, we examined the tolerance of the Cupriavidus symbionts to other metal ions. In the presence of Cu 2+ , STM 6070, 6018 and LMG 19424 T were able to grow in media containing 1.0 mM Cu 2+ , however, growth of STM 6070 was inhibited from 0.6 mM Cu 2+ (supplementary Figure S2). In addition, STM 6070 was able to grow in media containing 15 mM Zn 2+ . In contrast, STM6018 and LMG 19424 T were far more sensitive and could not grow at this concentration of Zn 2+ . Since STM 6070 was highly tolerant to Ni 2+ and Zn 2+ , the genome of this strain was examined, in particular for putative HMR determinants.

STM 6070 Minimum Information for the Genome Sequence (MIGS) and genome properties
The classification, general features and genome sequencing project information for Cupriavidus strain STM 6070 are provided in Table S1, in accordance with the minimum information about a genome sequence (MIGS) recommendations [39] published by the Genomic Standards Consortium [40]. The genome sequence consisted of 6,771,773 nucleotides with 67.21% G+C content and 107 scaffolds (Table 1) and contained a total of 6,182 genes, of which 6,118 were protein encoding and 64 were RNA only encoding genes. The majority of protein encoding genes (81.69%) were assigned a putative function, whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 2.

Phylogenetic placement of STM 6070 within the Cupriavidus genus
Previous studies have shown that STM 6070 is most closely related to C. taiwanensis LMG 19424 T [11] and C. alkaliphilus ASC-732 T [34], according to recA phylogenies [13]. This was confirmed by a phylogenetic analysis based on an intragenic fragment of the 16S rRNA gene ( Figure S3). To determine the taxonomic placement of STM 6070 at the species level, the average nucleotide identity (ANI) of the STM 6070 whole genome was established in pairwise comparisons with the genomes of other sequenced strains and type strains of six non-symbiotic and four symbiotic Cupriavidus species (Table 3; Table S2).
ANIb [41] and ANIg [42] comparisons showed that the STM 6070 genome displayed the highest ANI values with the C. taiwanensis strains STM 6018 and LMG 19424 T . The ANIb and ANIg values were lower than the species affiliation cut-off of at least 95% (over 69% of conserved DNA [43]) and 96.5% [42], respectively. This reveals that STM 6070 (and isolates of the same rep-PCR group isolated from New Caledonia soils [13]) represent a new Cupriavidus species,forwhich we propose the name Cupriavidus neocaledonicus sp. nov. (i.e. from New Caledonia). The ANIb and g values also suggest that the UYPR2.512 and AMP6 isolates may each represent a new Cupriavidus species.

Synteny between genomes
To assess how the observed differences in genome size (6.48 -7.86 Mb) affected the distribution of specific genes within the five symbiotic strains of Cupriavidus, we used progressive Mauve [44] to align the draft genomes of STM 6070, STM6018, UYPR2.512 and AMP6 to the finished genome of C. taiwanensis LMG19424 T (Figure 2). The alignments of the STM 6018 and STM 6070 genomes against that of C. taiwanensis LMG 19424 T showed a high similarity of collinear blocks within the two largest replicons (Figure 2A), the sequence of the LMG 19424 T chromosome 1 (CHR1) being more conserved than that of the chromosome 2 (CHR2 or chromid). We identified eight scaffolds specific to STM 6070 (A3AGDRAFT_scaffold_31.32_C, _43.44_C, _54.55_C, _39.40_C, _104.105_C, _101.102_C, _99.100_C, and _89.90_C) that could not be aligned to the LMG19424 T genome sequence, as well as two STM 6070 scaffolds (A3AGDRAFT_scaffold_84.85_C and _75.76_C) that were absent from LMG 19424 T but present in STM 6018. A putative genomic rearrangement was also detected within one scaffold of STM 6070 (A3ADRAFT_scaffold_0.1), in which one part of the scaffold mapped to chromosome CHR1 and another part mapped to the chromid CHR2 of LMG 19424 T (see red lines on Figure 2A).
In contrast, the genome alignment of UYPR2.512 and AMP6 with LMG19424 T showed important differences in replicon conservation ( Figure 2B). Earlier studies on comparative genomics of Cupriavidus species have suggested that the largest CHR1 replicon probably constitutes the ancestral one, while the smaller CHR2 replicon was acquired as a plasmid during the evolution of Cupriavidus and gradually evolved to a large-sized replicon following either gene transfer from CHR1 or horizontal gene transfer [35]. Large secondary replicons, or "chromids" [45], such as CHR2, have been detected in many bacterial species and carry plasmid-like partitioning systems [35,25] and some essential genes, as well as many genes that are conserved within a genus, and the vast majority of genes conserved among strains within a species. This may well explain the greater degree of sequence divergence observed in CHR2 as compared with CHR1 in the symbiotic Cupriavidus genomes.
Finally, we observed that whereas most of the LMG 19424 T pSym sequence was well conserved in the STM 6018 and STM 6070 genomes (Figure 2A), only a few LMG19424 T pSym genes (including the nod, nif, fix and fdx genes) were conserved across all five genomes. The M. pudica microsymbionts (LMG 19424 T , STM 6018 and STM 6070) had almost identical pSyms (conserved pSym synteny with nod genes characterized by 100% protein identity). In contrast, the Parapiptadenia rigida (UYPR2.512) and Mimosa asperata (AMP6) nodulating strains harboured divergent pSyms (low synteny, with nod genes characterized by 80-94 and 95-98.4% protein identity to those of LMG 19424 T , respectively). Based on phylogenetic analyses of symbiotic and housekeeping loci, our results support the hypothesis that symbiotic Cupriavidus populations have arisen via horizontal gene transfer [46].

Comparisons of Cupriavidus neocaledonicus STM 6070 with other sequenced genomes of symbiotic Cupriavidus
The comparison of gene orthologues of STM 6070 withthose of the symbiotic Cupriavidus strains LMG 19424 T , STM 6018, UYPR2.512 and AMP6, performed using the "Gene Phyloprofile" tool in the Microscope MaGe platform [47] (Figure 3A), showed that these strains have a large core set of 4673 genes, representing from 55.5 to 78.1% of the total number of genes in these organisms (70.2% for STM 6070). Each species harbours a set of unique genes, which range from 226 for LMG 19424 T to 1993 for UYPR2.512; larger genomes had a greater number of unique genes ( Figure 3). STM 6070 harbours 483 unique genes, which represent 7.2% of the total number of genes in the genome. The majority of these unique genes (376) encode hypothetical proteins. Only 22.2 % of the 483 STM 6070 unique genes could be ascribed to functional COG categories ( Figure 3B). Within the functional COG category "Cellular processes and signaling", the largest number of genes were found in Cell wall/membrane/envelop biogenesis (M), Signal Transduction (T), Defense mechanisms (V) and Intracellular trafficking, secretion, and vesicular transport (U) ( Figure 3B). This may be related to processes required for plant host relationships and bacterial adaptation to the host environment. For example, within functional category M we detected several genes encoding glycosyl transferases, which are putatively involved in biosynthesis of exopolysaccharides and/or polysaccharides, products that have been shown to play a major role in rhizobial infection [48].
Unique STM 6070 genes within functional category T included four genes encoding putative universal stress proteins (UspA family), additional response regulators and a sensor protein (RcsC), while category V includes genes encoding type I and III restriction modification systems, as well as genes encoding multidrug resistance efflux pumps, which could reflect adaptation to ultramafic soils. A high number of specific genes was assigned to "Information storage and processing". For example, 38 genes encoded putative transcriptional regulators (COG category K, Transcription) of various families (AraC, CopG, GntR, LacI, LysR, LuxR, MerR, NagC, TetR and XRE), suggesting a requirement for supplementary regulatory mechanisms of cellular and metabolic processes. Finally, a high number of specific genes was assigned to metabolic functions, represented mainly by Amino Acid (E), Carbohydrate (G) and Inorganic ion transport and metabolism (P), Energy production and conversion (C), Lipid metabolism (I) and Secondary metabolites biosynthesis, transport and catabolism (Q).

Metal resistance determinants in the STM 6070 genome
To understand the genetic basis of STM 6070 metal tolerance, we then searched for the presence of common and specific heavy metal resistance (HMR) markers within the genomes of STM 6070 and the other symbiotic Cupriavidus species, using the TransAAP tool on the TransportDB website (http://www.membranetransport.org/) [49] to find genes encoding predicted transporter proteins.
Given that STM 6070 is nickel-and zinc-tolerant, we were particularly interested in identifying HMR proteins belonging to the following transporter superfamilies ( (Table 4, Table S3). The STM 6070 genome contained higher numbers of MFS, RND, CHR and P-ATPase genes compared to the genome of the comparatively heavy metal sensitive LMG19424 T strain, and also had higher numbers of MFS, RND and ACR3 genes than C. metallidurans CH34 T , a bacterium known to have high and diverse metal resistance capacities.
Of the 156 TransportDB predicted transporters, 23 HME transporter genes were identified in the STM 6070 genome. Based on gene arrangements and homology with characterised HMR loci, a total of 55 structural HMR genes (TransportDB predicted HME genes plus associated genes) were located in 12 clusters (clusters A -L, Figure 4). These genes were compared with those described for C. metallidurans CH34 T , C. necator H16, and the symbiotic species C. taiwanensis LMG19424 T [35], UYPR2.512 and AMP6 (Table S3).

MFS proteins
The Major Facilitator Superfamily (MFS) is one of the two largest families of membrane transporters found in living organisms. Within the MFS permeases, 29 distinct families have been described, each transporting a single class of compounds [53]. It is thus not surprising that of the 106 STM 6070 TransAAP identified genes encoding putative MFS proteins, only two genes, nreB and arsP, were associated with a HME function. The nreB gene is located in cluster I in an nreAB operon, that constitutes a putative nickel efflux system [52], while the arsP gene, located in cluster K, is within the operon arsRIC1C2BC3H1P, which encodes a putative arsenic efflux system ( Figure 4).

CDF proteins
The Cation Diffusion Facilitator (CDF) proteins are single-subunit systems located in the cytoplasmic membrane [35]. The CDF family transporters act as chemiosmotic ion-proton exchangers and include HMR proteins such as CzcD, which provides resistance to cobalt, zinc and cadmium [52]. Four genes encoding CDF proteins were detected in the STM 6070 genome (Table S3), but only one, czcD,is located in an HME cluster (czcDI2C3B3A3, cluster K) ( Figure 4). This locus encodes a CDF efflux protein with 67.2 % identity to CH34 T CzcD, which mediates the efflux of Co +2 , Zn +2 , and Cd +2 ions [54]. The proteins encoded by the three remaining STM 6070 CDF genes showed very low homology to CzcD. The second CDF gene (dmeF, Table S3) encodes an efflux protein with highest identity (76.1 %) to the CH34 T DmeF protein, which has a role in cobalt homeostasis and resistance [54], while the other two CDF genes (fieF1 and fieF2, Table S3) encode efflux proteins with homology to CH34 T FieF (70.8 and 69.8 % identity, respectively). FieF has a role in ferrous iron detoxification but was also shown to mediate low level resistance to other divalent metal cations such as Zn 2+ and Cd 2+ [55,56].

RND-HME systems
The RND-HME transporters are transmembrane proteins that form a tripartite protein complex consisting of the RND transmembrane transporter protein (component A), a membrane fusion protein (MFP) (component B), and an outer membrane factor protein (OMF) (component C). These components have been designated as CBA efflux systems, or CBA transporters [52], to differentiate them from ABC transporters, and they export toxic heavy metals from the cytoplasm, or the periplasm, to the outside of the cell. Within a CBA system, the RND transmembrane protein [52] and, as reported recently, the MFP protein [57], mediate the active part of the transport process, determine the substrate specificity, and are involved in the assembly of the HME-RND protein complex.
Phylogenetic analysis of the eight TransAAP predicted STM 6070 RND transmembrane proteins [52,59,60], together with the analysis of the conserved motifs within the proteins, suggests that three of these proteins belong to the HME1 class, two belong to the HME3a class and the remaining three proteins belong to the HME3b, HME4 and HME5 classes, respectively ( Figure 5 and Table S3). The STM 6070 T genome lacks genes encoding the HME2-type transmembrane proteins, such as the C. metallidurans CH34 T CnrA and NccA, which are involved in heavy metal resistance and have predicted substrate specificity for cobalt and nickel [52] ( Figure 5).
The essential amino acid residues that form the proximal and distal heavy-metal-binding sites have been identified for the CH34 T zinc-specific RND HME3a transmembrane transporter ZneA [61]. Using the ZneA protein as a backbone, we aligned the eight STM 6070 T RND transmembrane proteins with those used for phylogenetic analysis ( Figure 5), to compare and identify the corresponding essential amino acid residues that form the putative proximal and distal heavy-metal-binding sites in these transporters (Table S3). In addition, we used the MaGe Microscope annotation platform [47] to analyse the syntenic arrangements of the eight HME-RND efflux systems present in STM 6070 and compare them with the HME-RND efflux systems found within six other Cupriavidus strains, as outlined below.
The czcA1 gene was within an operon located in cluster F ( Figure 4) and annotated as czcJ1I1C1B1A1.
In addition to the czcCBA genes, the cluster contained a czcI1 gene encoding a transcriptional regulator that probably controls the expression of czcC1B1A1 [62,52] and the czcJ1 gene (of unknown function), which was reported to be strongly induced by Cd 2+ , Cu 2+ , Ni 2+ , and Zn 2+ in CH34 T [35,63]. This operon was located in a genomic region showing high synteny with corresponding regions in the other symbiotic Cupriavidus strains and in C. necator N-1, and the STM 6070 CzcA1 protein showed high identity (80-96%) with the other Cupriavidus CzcA orthologues (Table S4). In C. metallidurans CH34 T , the corresponding czc cluster (czcMNICBADRSEJ, locus tag Rmet_5985-74) is located on the plasmid pMOL30 and contains additional genes that are not found in STM 6070 [35].
The second STM 6070 RND-HME1 efflux system was annotated as czcC2B2A2 and, together with several other HMR operons, formed part of a large group of HMR loci within cluster I ( Figure 4). The nreB gene, encoding a putative nickel resistance MFS protein, is located immediately upstream of czcC2B2A2. A similar arrangement has been observed for the CH34 T nccCBA nreB cluster (locus tag Rmet_6145) found on plasmid pMOL30 [35]. However, STM 6070 CzcA2 and CzcB2 showed higher identity to CH34 T CzcA and B (64 % and 78.6 %, respectively) than to CH34 T NccA and B (49.1 % and >30 %, respectively). It is interesting to note that cluster I was delimited by transposases ( Figure 4) and no conserved syntenic arrangement with the six other Cupriavidus genomes was observed (Table   S4).
The third RND-HME1 efflux system, located in cluster K, was annotated as czcD czcI2C3B3A3 ( Figure   4). The czcI2 and czcD components encode putative regulator and CDF proteins (see below), respectively. Cluster K, delimited by two Tn3 transposases, had conserved synteny to corresponding regions in the genomes of LMG 19424 T and STM 6018, but not in two other symbiotic Cupriavidus strains, AMP6 and UYPR2.512.

RND-HME3a
STM 6070 contained two putative RND-HME3a efflux systems, one located in cluster G and one in cluster I. Cluster G contained an hmv operon, located in a region that was syntenic to corresponding regions in symbiotic Cupriavidus and C. necator N-1, but not in CH34 T . Despite this lack of synteny, the CH34 T genome contained an orthologous hmvCBA operon (locus tag Rmet_3836-38), which encoded proteins with high identity with STM 6070 HmvCBA (76 %, 75.6 % and 90.8 % identity, respectively) [35,64]. However, the function of the encoded proteins in this operon was not determined, since the CH34 T hmvA gene is truncated [35].
Cluster I contained a putative zinc efflux RND-HME3a system annotated as hmxB zneAC with upstream genes zneRhmxS encoding a two-component sensor regulatory system. The BAC protein components of this RND HME system showed low identity (38-44%) to the corresponding proteins in other Cupriavidus genomes (Table S4). The BAC gene arrangement is atypical, compared to the characterised RND-HME CBA transporter gene arrangement, but is the same as that described in the CH34 T HME3a zinc effluxsystem zneSRBAC (locus tag Rmet_5325-5330), with the zneBAC genes encoding efflux system proteins and zneRS genes encoding two-component regulatory proteins [35,61,57]. The HME3a STM 6070 protein contained the highly conserved amino acids identified in the active and proximal heavy metal-binding sites of the characterised CH34 T ZneA protein [61] (Table   S3). Based on conservation of the essential amino acid residues, these proteins would be divalent cation transporters, putatively involved in zinc efflux. Thus, despite the low sequence identity (38.87, 45.6 and 40.5 % of protein identity), we decided to annotate the genes encoding proteins A, C and R as zneA, zneC and zneR. The genes in this cluster that corresponded to the MFP protein B and the histidine kinase sensor S showed less than 30 % identity with the CH34 T ZneB and ZneS proteins. For this reason, it is proposed that these genes retain their hmxB and hmxS names. Interestingly, the orthologues with the highest sequence similarity to STM 6070 T HmxB ZneAC (70, 86.5 and 69.5%, respectively) were not found in other Cupriavidus genomes, but in the genome of the marine betaproteobacterial species Minibacterium massiliensis, within an operon of similar architecture but of unknown function and substrate specificity [65].

RND-HME3b
An RND-HME3b efflux system was identified in cluster A and annotated as hmyFCBA (Figure 4). This operon showed high similarity (90% identity) to a corresponding CH34 T hmyFCBA cluster (Rmet_4119-4123) located on the chromid. The role of the Hmy efflux system is currently unknown and this system is likely to be inactive in CH34 T since hmyA in this strain is insertionally inactivated by IS1088 [66]. The STM 6070 hmyFCBA cluster was also highly conserved in the four symbiotic Cupriavidus strains and C. necator N-1 (> 80% identity). In CH34 T , the gene immediately upstream of hmyCBA has been annotated as hmyF, and is predicted to encode a small auxiliary protein that is a component of a metal cation-transporting efflux system, as in the characterised CusCFBA HME efflux system of Escherichia coli [67]. However, hmyF of both CH34 T and STM 6070 have very low identity (< 30%) with hmyF of E. coli.

RND-HME4
An RND-HME4 efflux system was identified in cluster J and annotated as a putative silDCBAFoperon, which has been suggested to be important for the efflux of monovalent cations in CH34 T [68]. It is located in a region that showed no synteny to the other Cupriavidus genomes. However, this operon is similar to the CH34 T silDCBA operon (Rmet_5030-5034) located on pMOL30, which encodes a putative silver efflux system (59, 71, 63, 87% identity, respectively), and to the CH34 T cusDCBAF operon (Rmet_6133-6136) located on the chromid, which encodes a putative copper efflux system (50,56,54,86 and 65% identity, respectively) [68]. Similar operons were also identified in the STM 6018, AMP6, N-1 and H16 genomes.

RND-HME5
An RND-HME5 efflux system was identified in cluster B and annotated as nieIC cep nieBA and was located 28 kb downstream of cluster A. This operon possessed an atypical structure, with a gene encoding a conserved exported protein (cep) situated between the nieC and nieB structural genes.
Among the Cupriavidus strains, a similar operon structure was found only in the AMP6 genome, with the structural proteins displaying high identity (86 to 92%) to the corresponding STM 6070 proteins.
This operon structure was also found in the genome of M. massiliensis (with the encoded proteins having 41 to 79 % protein identity with those of STM6070) [65].
As there are no RND-HME5 efflux systems present in CH34 T , the protein encoded by STM 6070 nieA was compared with the characterized RND-HME5 proteins NrsA (involved in nickel resistance) and CopA (involved in copper resistance) of the cyanobacterium Synechocystis sp. PCC 6803 [69,70] ( Figure 5). The phylogenetic analysis ( Figure 5) shows that although these proteins possess a common ancestor, they form two well separated clades, one comprising the HME5 proteins of STM 6070, AMP6 and M. massiliensis, and the second containing the NrsA and CopA of PCC 6803 together with RND-HME5 proteins from the cyanobacterium Anabena sp. PCC 7120 [71]. The betaproteobacterial and cyanobacterial RND-HME5 proteins share less than 41 % identity, resulting in totally different amino acids involved in putative proximal and distal metal-binding sites, as well as differences in the consensus sequence of the TMHIV α-helice (Table S3). Of particular interest was the finding that the three histidines, which are present in the proximal site of NieA and in the proteins of this clade (Table S3), form part of conserved HAEGVH and HRLDH motifs, and match with putative nickel-binding motifs H-X4-H and H-X3-H that are predominant in Ni-binding proteins, as described for the Ni-binding proteins of Streptococcus pneumoniae [72]. Based on these findings, we suggest this operon encodes a new RND-HME system (class 6) putatively involved in nickel efflux, which we have annotated as nieIC cep nieBA. This operon represents an interesting candidate for knockout mutation to determine if it is a major determinant of nickel tolerance in STM 6070 T .

CHR proteins
The Chromate Ion Transporter (CHR) Family proteins efflux chromate from the cytoplasm through an indirect active transport process [73]. Three STM 6070 genes were identified as encoding putative CHR proteins, and two of these (chrA1 and chrA2) encoded HMR determinants. The STM 6070 T ChrA1 and ChrA2 proteins showed higher identity to each other (90.6 %) than to the CH34 T ChrA proteins encoded by genes harboured on pMOL28 and the chromid (86 and 84 %, respectively). Both chrA1 and chrA2 were present in operons that encoded putative chromate transporter systems. The first operon, chrB1A1 (cluster B), was located up-stream of the putative RND-HME5 efflux system nieICcep-nieBA (Figure 4). This chr operon was conserved in the genomes of the symbiotic Cupriavidus strains LMG19424 T and STM6018, forming part of a large synteny block. The second chr operon, annotated as chrB2A2CF-cep-chrL (chrY), was located in cluster I, along with the RND-HME efflux systems czcC2B2A2 and hmxB-zneAC and the nreAB operon (Figure 4). In addition to chrB2A2, the cluster I chr operon contained four other genes: chrC, encoding a putative superoxide dismutase that may reduce chromate and thereby decrease chromate toxicity [74]; chrF, encoding a putative transcriptional repressor [74]; cep, encoding a conserved exported protein containing a Concanavalin A-like lectins/glucanases domain; and finally, chrL, encoding a lipoprotein (protein family, LppY/LpqO [75]) with 71.1 % identity to CH34 T ChrL (also annotated as CH34 T ChrY). In CH34 T the corresponding chrL (chrY) gene (locus tag Rmet_6195) is induced by chromate [35]). Deletion of chrL in the Grampositive Arthrobacter sp. strain FB24 resulted in a noticeable decrease in chromate resistance [75].
Corresponding gene clusters were identified in the UYPR2.512 and CH34 T genomes. Interestingly, the UYPR2.512 cluster contains chr and nre genes (chrB2A2CFcepnreB) but lacks chrL(Y) and nreA genes.
In contrast, this STM 6070 chr operon lacks the chrE, chrO, chrN, chrP and chrZ orthologues found in the corresponding CH34 T chr operon. The different chromate resistance genes might affect tolerance to chromate, or to another metal-oxyanion [35]. The STM 6070 chrB2 gene appears to be inactivated by an insertion that changes the reading frame after 214 amino acids, and shortens the protein to only 293 amino acids, instead of the full length 324 amino acid protein encoded by CH34 T chrB. Since ChrB seems to be important for chromate resistance in CH34 T [76], the tolerance of STM 6070 T to chromate might be compromised. Indeed, in our experimental conditions STM 6070 T only showed slight tolerance to Cr 6+ (0.1 mM) [13].

ARC3 proteins
The Arsenical Resistance-3 (ACR3) family includes permeases involved in arsenate resistance. The two STM 6070 ACR-3 type genes (annotated as arsB1 and arsB2) are arsB orthologues located in two ars operons encoding putative arsenate detoxification systems. The first operon is located downstream of the czc operon in cluster K. Genes in this ars operon had high identity (50 to 91%) with genes of the CH34 T arsMRIC2BC1HP operon encoding an arsenite and arsenate detoxifying system [63,77]. We therefore annotated these genes as arsR1IC1C2B1C3H1P in STM 6070. This ars cluster encoded a putative arsenite/arsenate transcriptional regulator/repressor (ArsR), a glyoxalase family of proteins (ArsI), three arsenate reductases (ArsC1, ArsC2, ArsC3), an arsenite efflux pump belonging to the ACR3 class of permeases (ArsB1), a NADPH-dependent FMN reductase (ArsH1), and a putative permease from the major facilitator family (MFS) (ArsP) [77]. The operon was highly conserved in the genomes of the Cupriavidus symbionts LMG19424 T and STM6018 and formed a large syntenic region.
The second ars operon, in cluster L, was annotated as arsR2C4B2H2. This operon was present in other Cupriavidus genomes (with the exception of UYPR2.512, where the ars operon is absent), but in STM 6070 is missing several genes, including arsI, arsC and arsP. P-ATPase proteins P-type ATPases directly utilise ATP to export metal ions from the cell cytoplasm. Among the 10 STM 6070 genes assigned to the P-type ATPase protein family (Table 4), five genes ( Figure 4 and Table S4) encoded P-type ATPases putatively involved in HME. Four of these six genes were found adjacent to other genes encoding HMR proteins ( Figure 4). The copF P-type ATPase gene in cluster J was located upstream of the silDCBAF operon and could encode an essential copper efflux component, as shown for CH34 T [38]. However, the STM 6070 CopF appears to be truncated in its C-terminus and thus may not be functional.
Two other P-type ATPase-encoding genes were identified in cluster D and annotated as silP and copP.
The encoded proteins had very low identity with proteins of the Cupriavidus genomes (Table S4), except for one P-type ATPAse protein from AMP6 with 86 % identity with the CopP protein. The proteins had higher identity with P-type ATPases encoded by C. necator H16, annotated as SilP (86 %) and CopP (94.7%), and putatively involved in silver and copper ion transport, respectively [25].
Within cluster H, a P-type ATPase-encoding gene, annotated as cupA, was located next to a regulatory gene, cupR, (Figure 4) in a conserved large syntenic block common to all compared Cupriavidus isolates, with high identity between corresponding genes. The cupA and cupR genes are putatively involved in copper ion transport. Finally, zntA, was located within cluster C in a group of genes annotated as czcJ2-hns-czcLRS-ubiGI-zntA. Genes in this cluster had high identity with gene clusters in CH34 T that have been annotated as zntA czcICΔB (Rmet_4594-4597) and czcBA ubiG czcSRL IS hns mmmQ (Rmet_4469-4461). This CH34 T region encodes an RND system (czcICBA), the ZntA ATPase, a two-component regulatory system CzcRS and a 3-demethylubiquinone-9 3-methyltransferase (UbiG) [35,64]. UbiG participates in the biosynthesis of ubiquinone and its activity could be related to the sensor kinase activity of the two-component system CzcRS [78,79]. The czcL, hns and mmmQ genes encode an unknown protein, an H-NS like protein and a small stress responsive protein, respectively.
In CH34 T , this cluster seems to be inactivated by an insertion sequence located between czcL and hns. The STM 6070 cluster C is perfectly conserved in the genomes of the four analyzed symbiotic Cupriavidus, suggesting that it is functional, but in comparison to the corresponding CH34 T cluster it is devoid of the RND system czcCBA. The role of the regulatory loci czcLRS-ubiGI, with regard to zntA expression, would thus be interesting to determine.

Other mechanisms of cation detoxification (not included in TransAAP)
The search for further heavy metal resistance determinants in STM 6070 that were orthologous to those described in CH34 T led to identification of a copper-resistance operon copRSABCD (cluster E).
This had a similar structure to the CH34 T cop cluster (copS2R2A2B2C2D2) located on the chromid, which encodes a copper-resistance mechanism that is thought to sequester copper outside the cytoplasm [80,81]. CopSR is a two-component sensor-regulator system and CopA is a putative multicopper oxidase thought to oxidize Cu 1+ to Cu 2+ . CopA proteins contain several motif variants of MGGM/MAGM/MGAM/MSGM, possibly involved in binding numerous Cu 1+ ions, as determined for Pseudomonas syringae CopA [82]. CopA is exported to the periplasm by the twin-arginine translocation pathway [81], where it may interact with an outer-membrane protein CopB, providing the minimum system required for low level copper resistance. CopD is a membrane protein involved in transfer of Cu 1+ from the periplasm to the cytoplasm for CopA binding [80,81]

Location of HMR determinants
The detected STM 6070 HMR determinants in the 12 clusters (A to L, Figure 4, Table S4) were assigned to putative replicons of the STM 6070 genome, following alignment of contigs to the finished LMG19424 T genome. Two clusters (D and H) could be assigned to chromosome 1 (CHR1), nine clusters (A, B, C, E, F, G, I, J and L) to CHR2 (chromid), and one cluster (K) to the pSym ( Figure 6). As the STM 6070 CHR1 carries only three P-type ATPases, STM 6070 appears to carry the great majority of its HMR clusters on CHR2. In contrast, the C. metallidurans CH34 T CHR2 (chromid) harbours 8 out of 24 HMR clusters [35,52,64]. The genome synteny comparison (Table S4)  Cluster I, located on the chromid, is the largest of these clusters (of approximately 25 kb), is flanked by genes encoding transposases of the Tn3 and IS66 type, and carries four different HMR determinants, including czcC2B2A2 and hmxB zneAC. Cluster K is also flanked by two Tn3 transposases, however, unlike Cluster I there is a high conservation of architecture and gene identity with the closely related C. taiwanensis strains (LMG 19424 T and STM 6018). This may indicate that Cluster I contains HME determinants that are important for survival in the New Caledonian ultramafic soils. In C. metallidurans, the acquisition of mobile genetic elements that contain metal resistance genes appears to be a strategy important for its adaptation to environments that contain elevated levels of heavy metals [63,85].
In contrast, no transposases or insertion sequences could be found around cluster B, or more particularly, around the operon nieIC cep nieBA). This operon, which is absent from LMG 19424 T and STM 6018 genomes, is located in a large highly conserved region, suggesting a gene loss from C. taiwanensis genomes. Interestingly, nieIC cep nieBA (cluster B) and hmxB zneAC (cluster I), two unique RND-HME systems in terms of operon structure and protein sequences, showed significant structure and protein sequence similarity with two operons from the genome of M. massiliensis [65].

Conclusions
New Caledonian Cupriavidus microsymbionts isolated from Mimosa pudica nodules belong to one of five REP-PCR genotypes, which all harbour identical symbiotic nodA and nifH genes [13] but display different metal tolerance phenotypes. Fifteen strains belonging to the REP-PCR genotype III were found to be the most nickel-tolerant. The current study presents an analysis of the genome of strain STM 6070, a representative of the REP-PCR genotype III. STM 6070 was originally placed within C.
taiwanensis on the basis of 16S and recA phylogenies [13], however, our analysis, combined with the genetic and phenotypic data described by Klonowska and colleagues [13], has revealed that STM 6070 represents a new species of Cupriavidus, for which we propose the name Cupriavidus neocaledonicus sp. nov.
The major aim of this study was to gain insights into the molecular basis of the tolerance of C.
neocaledonicus to high levels of nickel and zinc. The genome of C. neocaledonicus STM 6070 contains a very large number of diverse putative HMR determinants belonging to the RND, MFS, CHR, ARC3, CDF and P-ATPase protein superfamilies (Table 4). These constitute putative efflux systems or ion pumps involved in arsenic (2 ars operons), chromium (2 chr operons), cobalt-zinc-cadmium (2 czc operons), copper and/or silver (copA, copP, and silA genes), and nickel (1 nre operon and 1 nie operon) tolerance. The HMR determinants are clustered in 12 loci (cluster A to cluster L) of which two clusters seem to be localised in CHR1, nine on CHR2 (chromid) and one on the pSym.
Among these clusters, six (A, C, E, F, G and H) are common to both symbiotic and non-symbiotic genomes, with the different levels of sequence similarity suggesting their presence in a bacterial ancestor and possible evolution under different evolutionary pressures. Conversely, cluster K, on the pSym, was present only in STM 6070 and the C. taiwanensis strains. The 100 % identity of cluster K encoded proteins among the STM 6070, LMG 19424 T and STM 6018 genomes could be explained by the "recent" transfer of pSym between the M. pudica microsymbionts, in accordance with the findings of Parker [46].
Four of the HMR clusters (B, D, I, and J) are specific to the STM 6070 genome and we propose that these clusters contain genes that are determinants for the adaptation of C. neocaledonicus to high concentrations of nickel and zinc in Koniambo soil in New Caledonia. Indeed, within clusters B and I, the identified nie, czc2 and zne operons (encoding RND-HME5, -HME1 and -HME3a, efflux systems respectively) constitute good candidates for nickel and zinc tolerance molecular determinants.
Moreover, the finding that at least four HMR clusters (D, I, J and K) are directly associated with insertion elements suggests that mobile genetic elements play an important role in adaptation of the STM 6070 genome to the New Caledonian environment. Insertion elements have previously been found to play a role in enabling the host to adapt to new environmental challenges, and to contribute to the genetic adaptation of C. metallidurans to toxic zinc concentrations [86,85]. Future work involving a targeted mutagenesis study should allow us to determine the precise role of the newly identified HMR operons in STM 6070 and will provide an understanding of the specific molecular determinants required for the evolution and adaptation of these bacterial symbionts to the heavymetal-rich New Caledonian soils.

Methods
Bacterial strains and growth conditions C. neocaledonicus STM 6070 was isolated using M. pudica as a trap-host, as previously described, from a soil characterized by high total nickel concentrations (1.56 g kg -1 ) collected at the bottom of the Koniambo Massif, where active nickel mines are located [13]. Bacterial isolates were sub-cultured on yeast mannitol agar plates (YMA, Vincent, 1970) and incubated at 28°C for 48 h. For long-term maintenance, bacterial strains were grown in YM broth and preserved in 20 % glycerol at -80°C. For the comparison of metal tolerance, bacteria were grown in 30 mL liquid 284 Tris-culture medium [19] amended with NiSO 4 (0, 3, 5, 10 and 15 mM) , Cu(NO 3 ) 2 (0, 0.3, 0.6 and 1.0 mM) and ZnSO 4 (0, 3, 5, measuring the OD 600nm in a spectrophotometer. Genomic DNA preparation C. neocaledonicus STM 6070was streaked onto TY solid medium [87] and grown at 28°C for three days to obtain well grown, well separated colonies, then a single colony was selected and used to inoculate 5 ml TY broth medium. The culture was grown for 48 h on a gyratory shaker (200 rpm) at 28°C. Subsequently, 1 ml was used to inoculate 60 ml TY broth medium that was incubated on a gyratory shaker (200 rpm) at 28°C until an OD 600nm of 0.6 was reached. DNA was isolated from 60 ml of cells using a CTAB bacterial genomic DNA isolation method [87]. Final concentration of the DNA was 0.5 mg ml -1 .

Genome sequencing and assembly
The genome of C. taiwanensis STM 6070 was sequenced at the Joint Genome Institute (JGI) using Illumina technology [88]. An Illumina standard shotgun library was constructed and sequenced using

Genome annotation
For the general genome content description genes were identified using Prodigal [91] as part of the DOE-JGI annotation pipeline [92,93]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [94] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [95]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL (http://infernal.janelia.org). Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform (http://img.jgi.doe.gov/er) [93]. The expert annotation of HMR genes was performed within the MaGe platform (https://www.genoscope.cns.fr/agc/microscope/mage) and therefore the gene numbers (CT6070v1_ XXXXXX-XX) are those from the MaGe platform. The corresponding locus tags of genes annotated in the MaGe and JGI platforms are indicated in Table S4.

Phylogenetic analyses
Gene fragments sequences were corrected with Chromas Pro v1.33 software (Technelysium) and aligned using either ClustalX [96] or MUSCLE as implemented in MEGA, version 6 [97]. Alignments were manually edited using GeneDoc software [98]. Phylogenetic analyses were performed in MEGA6 [97] using the Neighbor-Joining method [99]. Bootstrap analysis [100] with 1000 replicates was performed to assess the support of the clusters.

Genome analyses
The comparison of specific and common genes of symbiotic Cupriavidus species, presented in a Venn diagram (Figure 2), was performed using the "Gene Phyloprofile" tool in the Microscope MaGe platform (https://www.genoscope.cns.fr/agc/microscope/mage). The orthologous counterparts in the genomes were detected by applying parameters of a minimum of 30% for protein sequences identity over a minimum of 80% of the protein length (>30% protein MinLrap 0.8). The homologous genes were then removed from the resulting list. Transport systems were identified using the TransAAP tool [49] (TransportDB website (http://www.membranetransport.org/)) for prediction of efflux systems and transporter families.
Two methods were used for the comparison of average nucleotide identities (ANI): ANIg [42] and ANIb [43]. In order to perform the alignments using progressive Mauve software [51], the scaffolds of each draft genome (STM 6070, STM6018, UYPR2.512 and AMP6) were firstly reordered using Mauve software on the basis of the C. taiwanensis LMG19424 T concatenated genome. Then, reordered genomes were used to perform the alignment with progressiveMauve. Circular views by BlastAtlas were performed using the CGview server hosted at Stothard Research Group

Competing interests
The authors declare that they have no competing interests.  ANIb values were calculated with JSpecies (based on whole genome BLAST alignment) [42]. ANIg values (in brackets) were calculated using the ANI tool in IMG [43].  39 a Subclass of transporters as defined in [51,52], also see in the text. b Cmet, C. metallidurans; Cnec, C.
necator; Cneo, C. neocaledonicus; Cpin, C. pinatubonensis; and Ctai, C. taiwanensis. Data includes information from Janssen et al. [35]. c Genes identified in this study as encoding proteins putatively involved in metal tolerance. Figure 1 Bacterial growth in 284 Tris-medium [21], in absence ( ) and in presence of NiSO4 ( : 5 mM,     The HME class of the protein is designated according to the current classification scheme [52,59,60]. HME1 to 5 represent five classes of HME systems, HAE represents here an RND protein group involved in in export of hydrophobic and amphiphilic compounds. The evolutionary history was inferred by the Neighbor-Joining method with a bootstrap consensus tree inferred from 500 replicates. The evolutionary distances were computed

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.