Complete genome sequence and comparative analysis of Acetobacter pasteurianus 386B, a strain well-adapted to the cocoa bean fermentation ecosystem

Background Acetobacter pasteurianus 386B, an acetic acid bacterium originating from a spontaneous cocoa bean heap fermentation, proved to be an ideal functional starter culture for coca bean fermentations. It is able to dominate the fermentation process, thereby resisting high acetic acid concentrations and temperatures. However, the molecular mechanisms underlying its metabolic capabilities and niche adaptations are unknown. In this study, whole-genome sequencing and comparative genome analysis was used to investigate this strain’s mechanisms to dominate the cocoa bean fermentation process. Results The genome sequence of A. pasteurianus 386B is composed of a 2.8-Mb chromosome and seven plasmids. The annotation of 2875 protein-coding sequences revealed important characteristics, including several metabolic pathways, the occurrence of strain-specific genes such as an endopolygalacturonase, and the presence of mechanisms involved in tolerance towards various stress conditions. Furthermore, the low number of transposases in the genome and the absence of complete phage genomes indicate that this strain might be more genetically stable compared with other A. pasteurianus strains, which is an important advantage for the use of this strain as a functional starter culture. Comparative genome analysis with other members of the Acetobacteraceae confirmed the functional properties of A. pasteurianus 386B, such as its thermotolerant nature and unique genetic composition. Conclusions Genome analysis of A. pasteurianus 386B provided detailed insights into the underlying mechanisms of its metabolic features, niche adaptations, and tolerance towards stress conditions. Combination of these data with previous experimental knowledge enabled an integrated, global overview of the functional characteristics of this strain. This knowledge will enable improved fermentation strategies and selection of appropriate acetic acid bacteria strains as functional starter culture for cocoa bean fermentation processes.


Background
Acetic acid bacteria (AAB) are a group of microorganisms that belong to the family of the Acetobacteraceae of the Alpha-proteobacteria [1]. AAB can be found on (tropical) fruits and flowers [2][3][4], in fermented foods [1,3], and as members of the Drosophila gut [5]. Overall, AAB are of industrial interest because of their physiology, which is the case for acetic acid production out of ethanol during vinegar, kombucha, or cocoa bean fermentation [6][7][8] as well as for fine chemical productions such as those of ascorbic acid and cellulose [9,10]. Furthermore, AAB can occur as spoilage bacteria, as can be the case in beer, wine, and cider fermentations [1,3]. One of the key metabolic features of AAB is the conversion of ethanol into acetic acid by two sequential reactions catalyzed by membranebound alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH) enzymes [11].
Acetobacter pasteurianus strains are used for vinegar fermentations worldwide [28][29][30] and also occur in beer as spoilers [3]. Further, it has been shown that this species plays an essential role in the fermentation of cocoa pulpbean mass, the first step in chocolate production [31][32][33]. Spontaneous cocoa bean fermentation is characterized by a succession of microbial activities carried out by yeasts (in particular Hanseniaspora opuntiae/uvarum and Saccharomyces cerevisiae), involved in depectinization and ethanol formation; lactic acid bacteria (LAB, in particular Lactobacillus fermentum), involved in citric acid and fructose conversion and lactic acid production; and AAB (in particular A. pasteurianus), involved in the oxidation of ethanol produced by the yeasts into acetic acid and further overoxidation of acetic acid and lactic acid produced by LAB into carbon dioxide and water [6,34].
Acetobacter pasteurianus 386B originates from a spontaneous cocoa bean heap fermentation carried out in Ghana and has been characterized as an ethanol-oxidizing, lactic acid-oxidizing, and acetic acid-producing strain [18,35]. Furthermore, A. pasteurianus 386B is a thermotolerant strain with high resistance to ethanol and acetic acid [36,37]. These functional properties make it an ideal starter culture strain for cocoa bean fermentations [38]. In this study, we present the complete genome sequence and analysis of A. pasteurianus 386B to obtain insights into the genomic features of this interesting starter culture strain [37,38]. A better understanding of the molecular mechanisms underlying its metabolic capabilities will lead to detailed insights into the mechanisms of niche adaptation of this strain. Furthermore, comparison of A. pasteurianus 386B with other sequenced members of the Acetobacteraceae will address the unique functional properties of this strain as well as the common characteristics of the Acetobacteraceae.

Pyrosequencing and sequence annotation
The 454-pyrosequencing run of the A. pasteurianus 386B genomic DNA yielded 217,169 reads with a total number of 72,387,894 bp that were assembled into 10 scaffolds, consisting of 118 large (> 500 nucleotides) contigs and 25 small (100-500 nucleotides) contigs. Computational analysis of the sequence assembly indicated the presence of seven plasmids. The gaps in the chromosome and plasmids were closed by PCR assays followed by sequencing of the corresponding amplicons, resulting in a final assembly of the circular chromosome with a size of 2,818,679 bp and the seven plasmids ranging in size from 3,851 bp to 194,780 bp (Table 1; Figure 1A). Gene finding and annotation of the A. pasteurianus 386B genome with the GenDB software resulted in 2,595 and 280 protein-coding sequences (CDS) for the chromosome and plasmids, respectively. Furthermore, five ribosomal RNA (rrn) operons were detected and 57 tRNA genes were predicted. Clustered regularly interspaced short palindromic repeats (CRISPRs) were not found. Relevant features deduced from the genome sequence of A. pasteurianus 386B are summarized in Table 2.
General architecture of the A. pasteurianus 386B genome A plot of the calculated G/C skew [(G-C)/(G + C)] indicated a bidirectional replication mechanism of the chromosome ( Figure 1A), which was confirmed by the biased distribution of architecture imparting sequences (AIMS) on the leading and lagging strands [39], dividing the chromosome of A. pasteurianus 386B into two replichores of similar sizes ( Figure 1B). This enabled the prediction of the origin of chromosomal replication (oriC), located near the dnaA-coding region (APA386B_66), as well as a replication termination (dif ) region at position 1,590,515 on the chromosomal map [40]. The sequence of the 32-bp dif region was aligned with the consensus sequence of Gamma-proteobacterial dif sites [41]. The dif region, positioned opposite of the oriC, is involved in replication termination and defines the leading and lagging strands during replication, together with oriC [39]. The occurence of the G/C skew, replichores, and biased distribution of AIMS supports the accuracy of the sequence assembly, as they represent a common general architecture of a genome.

Phylogenetic analysis and comparative genomics
Synteny analysis revealed a highly conserved order of orthologous genes between the genome sequences of A. pasteurianus 386B and A. pasteurianus IFO 3283 ( Figure 2A). The chromosomal synteny was interrupted due to a few transposases/integrases and the presence of genes related to a prophage ( Figure 2A). This prophage genomic segment had a size of about 28.8 kb and comprised 61 genes (APA386B_370 -APA386B_430). Thirteen, six, and six of these genes had homologues in the genomes of A. pasteurianus IFO 3283, Gluconacetobacter diazotrophicus Pal5, and Gluconobacter oxydans 621H, respectively (Additional file 1). The prophage-related region did not contain virulence-associated genes or genes coding for known proteins. However, the region contained an integrase (APA386B_430), a phage terminase (APA386B_403), and a portal protein (APA386B_401), amongst other prophage-related proteins. No head maturation protease, coat protein, or tail measure protein were retrieved, indicating that the prophage is defective [42]. Surprisingly, the genome sequence of A. pasteurianus 386B contained only 50 transposases (including plasmids; Additional file 1), none of which was of the IS1380-type, an insertion sequence abundant in multiple A. pasteurianus strains [20,43], indicating that the 386B strain is genetically more stable than currently    No only 44 and 16 CDS with APA386B_1P, the largest plasmid of A. pasteurianus 386B, respectively ( Figure 3). Furthermore, APA386B_1P contained 165 unique genes, not shared with any of the two largest plasmids of A. pasteurianus IFO 3283.
Comparative analysis of the five available genome sequences of strains of the species A. pasteurianus revealed that there were 2,019 shared orthologous proteins, representing 68% of the predicted proteins from A. pasteurianus 386B. This may correspond with the core genome of the species Acetobacter pasteurianus. Furthermore, A. pasteurianus 386B contained 122 strain-specific genes, of which 95 had no known function, which may be related with niche adaptations. The 27 unique genes with an assigned function (Additional file 3) may contribute to the performance of this strain as a starter culture in the cocoa bean fermentation process. For instance, the presence of an endopolygalacturonase in the genome sequence of A. pasteurianus 386B (APA386B_1663; Additional file 3) indicates a possible role of this strain in pectin breakdown, an important metabolic process in the beginning of cocoa bean fermentations [44,45]. This is the first report of an endopolygalacturonase gene in an A. pasteurianus strain. Indeed, the closest relative possessing such a gene is A. tropicalis NBRC 101654 [2]; A. tropicalis has been isolated from spontaneous cocoa bean fermentation processes as well [32]. Furthermore, a PCR assay indicated that this polygalacturonase gene was not widespread amongst A. pasteurianus strains isolated from spontaneous cocoa bean fermentations (Additional file 4). This suggests that expression of this gene might contribute to the capability of A. pasteurianus 386B to dominate cocoa bean fermentations.
Phylogenetic analysis of the available genomes of the genus Acetobacter showed that the different Acetobacter pasteurianus strains were clustered together ( Figure 2B). Furthermore, A. pasteurianus 386B was most related to A. pasteurianus NBRC 101655, a thermotolerant strain [46,47]. Comparative analysis of A. pasteurianus 386B in relation to other members of the Acetobacteraceae family revealed that this strain had approximately 20% shared genes with members of the genus Acidiphilium (Table 3). Furthermore, 27.1% of the genes are in common with the human pathogen Granulibacter bethesdensis CGDNIH1, whereas members of the genera Gluconobacter and Gluconacetobacter were more closely related to A. pasteurianus 386B, namely 35.6 and 34.5-39.1% shared genes, respectively. The finding that A. pasteurianus 386B had more genes in common with Gluconacetobacter species than with G. oxydans 621H was not in accordance with the phylogenetic relationship based on their complete 16S rRNA gene sequence [48]. This indicates that Gluconacetobacter species are more closely related to Acetobacter species than Gluconobacter species are.

Intracellular metabolism of sugars and sugar derivatives
Acetobacter pasteurianus 386B possessed all genes encoding the enzymes of the Embden-Meyerhof-Parnas (EMP) pathway, except for the phosphofructokinase-coding gene, indicating incomplete glycolysis. The absence of this gene has been reported before in A. pasteurianus IFO 3283, G. diazotrophicus Pal5, and G. oxydans 621H [20,25,26]. However, all genes encoding the enzymes of the pentose-phosphate pathway (PPP) were found (Figure 4), suggesting that glucose is degraded via the PPP, as described previously for A. pasteurianus IFO 3283 and A. aceti NBRC 14818 [20,23]. Glucose, an important substrate in the cocoa bean fermentation process, could be taken up by sugar permeases (APA386B_1532 and APA386B_2419) or a sugar symporter (APA386B_1333). Fructose-6-phosphate could be formed from N-acetyl-glucosamine-6-phosphate as well as from mannitol via fructose by a polyol oxidoreductase (APA386B_2545; Figure 4, reaction 52). Fructose-6-phosphate could be further metabolized by the EMP pathway ( Figure 4). Furthermore, the gene coding for glycerol kinase (glpK; Apa396B_92; Figure 4, reaction 24) was found, allowing the formation of dihydroxyacetone (DHA)-phosphate from glycerol via glycerol 3phosphate, which could be further metabolized by the EMP pathway. DHA-phosphate could be formed by both FAD-and NAD-dependent glycerol 3-phosphate dehydrogenases (APA386B_1931 and APA386B_94; Figure 4, reaction 25). Next to this, the A. pasteurianus 386B genome contained genes coding for a glycerol uptake facilitator protein (APA386B_93), suggesting that this strain is able to take up glycerol from the environment to use it as an energy source, which might be present as a substrate during the cocoa bean fermentation process owing to yeast metabolism. DHAphosphate may be channeled into the lower part of the EMP pathway. The A. pasteurianus 386B genome sequence provided further evidence that acetate is formed out of ethanol by soluble, NAD(P) + -dependent ADH (adh; APA386B_1507; Figure 4, reaction 34) and ALDH (APA386B_909; Figure 4, reaction 35) intracellularly, or out of lactate by means of a lactate dehydrogenase (ldh; APA386B_910; Figure 4, reaction 29) and pyruvate decarboxylase (pdc; APA386B_1186; Figure 4, reaction 33), as proposed before [49]; F. Moens, T. Lefeber, and L. De Vuyst, [unpublished observations]. Acetate, formed intracelluarly or available extracellularly, could be further (over)oxidized via acetyl-CoA into carbon dioxide and water by a modified tricarboxylic acid (TCA) cycle, as explained below. Genes encoding enzymes of the glyoxylate pathway were not present. During the course of all aforementioned reactions, NAD(P)H+H + is produced by several dehydrogenases. Indeed, next to the annotated dehydrogenases, the A. pasteurianus 386B genome revealed several putative dehydrogenases/oxidoreductases with a currently unknown function (Additional file 5). These uncharacterized oxidoreductases included four aldehyde dehydrogenases, 15 short-chain dehydrogenases/reductases (involved in oxidation of alcohols to aldehydes), a zinc-binding dehydrogenase, and an oxidoreductase containing a NAD + -binding Rossmann-fold domain. Intracellular dehydrogenase activity is indispensable for the intermediary metabolism of AAB [25].
Genome analysis showed that A. pasteurianus 386B possessed genes encoding metabolic pathways involved in the de novo synthesis of all nucleotides, amino acids, phospholipids and many vitamins, such as biotin, folic acid, pantothenate, pyridoxine, riboflavin, and thiamine. Ammonia, involved in the activity of glutamate synthase (APA386B_893 -APA386B_894) and glutamine synthetase (APA386B_2129), could be taken up by a specific ammonia transporter (APA386B_239). Furthermore, the genome of A. pasteurianus 386B contained genes to synthesize and use trehalose, which can protect the cell from high osmolarity and/or can be used as an energy source in bacteria and yeast [50]. The pathway consisted of trehalose-6-phosphate synthase (otsA; APA386B_1724), trehalose-6-phosphate phosphatase (otsB; APA386B_17 23), and trehalase (treA; APA386B_104). In addition, genes coding for the mechanosensitive channels MscL (APA3 86B_2572) and MscS (APA386B_1440) were present in the genome sequence of A. pasteurianus 386B, which generally play an important role in osmotolerance [51].

Membrane-bound dehydrogenases and respiratory chain
Acetobacter pasteurianus 386B possessed several membrane-bound dehydrogenases that channel electrons into the respiratory chain ( Figure 5A). A first group of dehydrogenases depend on the cofactor pyrroloquinoline quinone (PQQ), among which PQQdependent alcohol dehydrogenase (PQQ-ADH) and PQQ-dependent glucose dehydrogenase, allowing the conversion of ethanol into acetaldehyde and glucose into gluconate, respectively, both substrates being available during cocoa bean fermentations. The genes coding for the three subunits of PQQ-ADH were not clustered together in the genome, as the gene encoding the smallest subunit (adhS; APA386B_2212) was separated from the other two genes (adhAB; APA386B_1574 - Number of shared protein-coding sequences between A. pasteurianus 386B and members of the Acetobacteraceae family for which a complete genome sequence is available. The percentage of shared genes depicts the number of shared genes between two members in relation to the total amount of genes of both members. APA386B_1575). This gene organization in A. pasteurianus has been suggested before [52]. Further, the genome of A. pasteurianus 386B contained two uncharacterized, membrane-bound, PQQ-dependent oxidoreductases with five transmembrane helices (Additional file 5). Proteins for the synthesis of the cofactor PQQ are encoded by the pqqABCDE operon (APA386B_983 -APA386B_987). In contrast to other AAB, the genome of A. pasteurianus 386B did not possess the major polyol dehydrogenase (SldAB), an enzyme of industrial importance [53]. Indeed, SldAB is for instance able to oxidize Dsorbitol into L-sorbose (as part of the production of vitamin C that is used as food supplement and antioxidant), gluconate into 5-ketogluconate (as part of the production of tartaric acid that is used as antioxidant in the food industry), and glycerol into dihydroxyacetone (used in self-tanning creams) [8,25].
A second group of membrane-bound dehydrogenases contains flavines as cofactor. The genes coding for flavine adenine dinucleotide (FAD)-dependent sorbitol dehydrogenase (APA386B_1096 -APA386B_1098) were present in the A. pasteurianus 386B genome, which points to the ability of this strain to produce fructose from sorbitol. However, it has been shown previously that this dehydrogenase is also responsible for the conversion of mannitol, an important intermediate of the cocoa bean fermentation process, into fructose [54]. As the major polyol dehydrogenase was not present in this strain, it is likely that the FAD-dependent sorbitol dehydrogenase was responsible for the oxidation of mannitol into fructose, as experimentally shown in A. pasteurianus 386B (F. Moens, T. Lefeber, and L. De Vuyst, unpublished observations). In addition, A. pasteurianus 386B possessed six membranebound oxidoreductases with unknown function ( Figure 5B; Additional file 5). These oxidoreductases are also present in the genomes of A. pasteurianus IFO 3283, Ga. diazotrophicus Pal5, and G. oxydans 621H (Additional file 6), and could be involved in the oxidation of a broad range of substrates, such as carbohydrates and polyols [25]. Genome analysis revealed that ubiquinol, generated by the aforementioned membrane-bound dehydrogenases, could be reoxidized by a cytochrome bo3-type ubiquinol oxidase (APA386B_1578 -APA386B_1581) and a cytochrome bd-type ubiquinol oxidase (cyanide-insensitive terminal oxidase), whereby the encoding genes of the latter were present twice in the genome sequence of A. pasteurianus 386B (cydAB; APA386B_472 -APA386B_ 473 and APA386B_1010 -APA386B_1011). Both terminal oxidases reduce oxygen to water when reoxidizing ubiquinol into ubiquinone ( Figure 5B).

In silico analysis of mechanisms involved in acid tolerance
A first strategy of A. pasteurianus 386B to tolerate high levels of acetic acid may consist of a cytosolic acetate-assimilating detoxification pathway. This involves a conversion of acetate to acetyl-CoA, which is performed either by acetyl-CoA synthetase (acn; APA386B_2214 and APA386B_1843; Figure 4, reaction 36) or by acetate kinase (ackA; APA386B_335; Figure 4, reaction 37) and phosphate acetyltransferase (pta; APA386B_336; Figure 4, reaction 38). Both pathways were present in the A. pasteurianus 386B genome and are known to be upregulated when citrate oxidation takes place [58]. This suggests that the presence of two copies of the acn gene in this strain provides an advantage for efficient acetate assimilation. Alternatively, acetate can be converted into acetyl-CoA via a modified TCA cycle [59]. Indeed, all genes encoding the enzymes of the TCA cycle were retrieved, except for succinyl-CoA synthetase. This function is bypassed by succinyl-CoA: acetate CoA transferase (SCACT, aarC; APA386B_2589; Figure 4, reaction 50). Similarly, the gene for malate dehydrogenase was not found, but oxidation of malate into oxaloacetate can be catalyzed by malate:quinone oxidoreductase (mqo; APA386B_2675; Figure 4, reaction 48) [59]. A second mechanism in acid tolerance probably involves the presence of an acetic acid resistance ABC transporter (aatA; APA386B_103), an efflux pump in the cytoplasmic membrane capable of exporting acetic acid [60]. Thirdly, A. pasteurianus 386B contained the gene cluster involved in pellicle polysaccharide formation (polABCDE; APA386B_1394 -APA386B_1398), preventing the diffusion of acetic acid into the cytoplasm [46,61,62]. Fourthly, the genes coding for urease (ureDABCEFG; APA386B_1179 -A PA386B_1184), an urea transporter (urtABCDE; AP A386B_1640 -APA386B_1644), an allophanate hydrolase (APA386B_936 -APA386B_937), and an urea carboxylase (APA386B_218) were present, indicating the ability to transport urea and convert it into ammonia, which may contribute to the survival of A. pasteurianus 386B in acidic environments, such as the cocoa pulp-bean mass (pH 3.5 -4.5). The human pathogenic Gr. bethesdensis CGDNIH1 contains this mechanism too, although it may not be widespread among AAB strains, as it is absent in G. oxydans 621H and Ga. diazotrophicus Pal5 [25,26,63]. Lastly, genome analysis of A. pasteurianus 386B revealed the presence of genes coding for cytoplasmic components that are adapted to intracellular acidification. This is the case for N 5 -carboxyaminoimidazole ribonucleotide (N 5 -CAIR) mutase (purE; APA386B_2565), a protein involved in purine biosynthesis. Indeed, N 5 -CAIR mutase of A. pasteurianus 386B is 99% identical to its orthologue in A. aceti 1023, the latter strain being adapted to an acid cytosol [64]. Similarly, alanine racemase (alr; APA386B_1310), a protein involved in peptidoglycan biosynthesis, is 92% identical to the A. aceti 1023 orthologue, a protein known to function at low pH [65]. In addition, the sequence similarity of both N 5 -CAIR mutase and alanine racemase between A. pasteurianus 386B and A. aceti 1023 was higher than between A. pasteurianus 386B and any other sequenced AAB strain (data not shown), indicating that the presence of acidstable proteins is not widespread.
In silico analysis of mechanisms involved in thermotolerance As described above, genome-wide phylogenetic analysis of A. pasteurianus 386B revealed that this strain is most related to the thermotolerant strain A. pasteurianus NBRC 101655. In addition, adaptive mutation resulted in 14 validated mutations, involved in improved thermotolerance of this strain [66]. Five of these regions were also modified in A. pasteurianus 386B and not in A. pasteurianus NBRC 101655 (Additional file 7). For example, one of the two genes coding for cytochrome c (APA386B_906), contained three synonymous mutations and one nonsynonymous mutation. Furthermore, the genes necessary for growth at high temperatures (42°C) of the thermotolerant strain A. tropicalis NBRC 101654 have been identified recently [4]. Although this strain belongs to a different species than A. pasteurianus 386B, all genes of A. tropicalis NBRC 101654 necessary for growth at high temperatures were found in the genome sequence of A. pasteurianus 386B as well (Additional file 7). This indicates that the latter strain, when thriving in the high-temperature cocoa pulp-bean mass, may use the same mechanisms towards heat stress as A. tropicalis NBRC 101654.

Conclusions
The complete genome sequence of A. pasteurianus 386B, a strain originating from a spontaneous cocoa bean heap fermentation in Ghana, was determined, annotated, and described in this study. The global overview of all genes and pathways obtained provided comprehensive insights into the metabolic features regarding important substrates (such as ethanol, glucose, acetate, lactate, and glycerol) and stresses (such as acidic and heat stress) during the cocoa bean fermentation process.
Comparative genome analysis provided information regarding niche adaptations of this strain. For example, the presence of a gene coding for an endopolygalacturonase was discovered. This enzyme is involved in the breakdown of pectin, a compound responsible for the viscosity of the cocoa pulp-bean mass. Although depectinization is mainly important in the beginning (anaerobic yeast activity phase) of the cocoa bean fermentation process, the activity of the pectinolytic enzymes allows air to enter the cocoa pulpbean mass, which promotes the growth of obligate aerobic AAB. Therefore, the presence of this gene could be an important prerequisite for survival and performance of AAB during cocoa bean fermentations. Furthermore, the comparative genome analysis revealed that the genome of A. pasteurianus 386B contained a low number of transposases, resulting in the absence of truncated genes, which might be important for expression under cocoa bean fermentation conditions. Genome analysis unraveled various mechanisms of A. pasteurianus 386B to tolerate stress conditions occurring in a cocoa bean fermentation ecosystem. As also active prophages were absent in the genome sequence, these findings indicate that A. pasteurianus 386B is genetically more stable compared with other fully characterized AAB, contributing to the prerequisites of a starter culture strain.
All these findings support that this strain is a suitable functional starter culture for controlled cocoa bean fermentation processes. In addition, the results presented in this study will enable analysis of the transcriptome of A. pasteurianus 386B, which will provide insight into its metabolic activity. Finally, the characteristics of A.
pasteurianus 386B revealed in this study are essential to generate further insights into the functional role of AAB in general, and A. pasteurianus in particular, during the cocoa bean fermentation process, which is of great importance to select an appropriate starter culture for homogeneous, fast, and successfully controlled fermentation processes.

DNA extraction and 454 pyrosequencing
Total genomic DNA was extracted from cell pellets using the High Pure PCR Template Preparation Kit (Roche Applied Science, Mannheim, Germany), followed by RNase treatment and purification using the High Pure PCR Product Purification Kit (Roche Applied Science), always according to the manufacturer's instructions. To confirm the identity of the bacterial strain grown, a 16S rRNA gene-specific region was amplified based on the genomic DNA extracted, as described previously [67]. Amplicons were purified using the Wizard SV Gel and PCR Clean up system (Promega, Madison, WI, USA) and sequenced at a commercial facility using Sanger sequencing (VIB Genetic Service Facility, Antwerp, Belgium). The quality of the genomic DNA was assessed by gel electrophoresis; its quantity was estimated by a fluorescence-based method using the Quant-iT dsDNA Assay kit (Invitrogen, Carlsbad, CA, USA) and the DTX800 multimode detector (Beckman Coulter, Pasadena, CA, USA).
For genome sequencing, a total amount of 5 μg of genomic DNA was used for the construction of an 8-kb paired-end library with the GS FLX Titanium Library Paired-End Adaptors Kit and the GS FLX Titanium Rapid Library Prep Kit (Roche Applied Science) according to the manufacturer's instructions. The optimal DNA copy per bead ratio was determined by an emulsion PCR titration using a GS FLX Titanium SV emPCR kit (Lib-L) (Roche Applied Science). The final emulsion PCR was performed using the GS FLX Titanium LV emPCR kit (Lib-L; Roche Applied Science). Pyrosequencing was performed on a Genome Sequencer GS FLX instrument using Titanium chemistry (Roche Applied Science) with the sample occupying one region of a four-region gasket. Library preparation and pyrosequencing were performed by the VIB Nucleomics Core Facility (Leuven, Belgium). Reads were assembled using the GS De Novo Assembler version 2.5.3 with default parameters.

PCR-based gap closure
To close remaining gaps in the assembled genome sequence, PCR primers were designed based on contig ends using the Consed program [68] and synthesized in a commercial facility (Integrated DNA Technologies, Leuven, Belgium). PCR assays were performed using a DNA T3000 thermocycler (Biometra, Goettingen, Germany), containing 50 ng of genomic DNA, 100 μM of each dNTP (Sigma-Aldrich, St. Louis, MO, USA), 5 pmole of each primer, 5 μL of 10 × PCR reaction buffer (Fermentas, St. Leon-Rot, Germany), 1.875 U of Pfu DNA Polymerase (Fermentas), and sterile ultrapure water in a final volume of 50 μL. Following amplification, PCR product sizes were verified using a 1.0-% (w/v) agarose gel and the remaining reaction mixture was purified using the Wizard SV Gel and PCR Clean up system (Promega). Amplicons were sequenced in a commercial facility using Sanger sequencing technology (Macrogen Europe, Amsterdam, The Netherlands). All DNA sequences obtained were uploaded into the Consed program, manually inspected, and integrated into the genome assembly to generate the complete genome sequence of A. pasteurianus 386B. To facilitate gap closure and assembly validation, contigs were mapped to the A. pasteurianus IFO 3283 genome by means of the r2cat tool [20,69].

Genome analysis and annotation
Automatic gene prediction and annotation of the assembled genome sequence were carried out using a local installation of the bacterial genome annotation system GenDB v2.2 [70]. A combined gene prediction strategy was applied by using GLIMMER 2.1 and the CRITICA program suite [71,72]. Putative ribosomal binding sites and tRNA genes were identified with the RBSfinder tool [73] and tRNAscan-SE [74]. The deduced proteins were functionally characterized by REGANOR [75] using automated searches in public databases, including SWISS-PROT and TrEMBL [76], Pfam [77], KEGG [78], and TIGRFAM [79]. Additionally, SignalP (detection of signal peptides) [80], helix-turn-helix (identification of helix-turn-helix DNA binding motifs) [81], and TMHMM (detection of transmembrane regions) [82] were applied. Each gene was functionally classified by assigning a Cluster of Orthologous groups (COG) number [83] and a Gene Ontology (GO) number [84]. The automated gene prediction and annotation was followed by manual curation of the data. To correct for over-annotation, short CDS without functional annotation, with low confidence scores inferred by the GenDB platform, and with overlaps with other CDS were eliminated from the final annotation. A genome plot of A. pasteurianus 386B was generated with the DNAPlotter tool [85]. The origin of chromosomal replication of A. pasteurianus 386B was predicted with the Ori-Finder tool [40]. CRISPRs were searched for with the CRISPRFinder tool [86].

Phylogenetic analysis and comparative genomics
A phylogenetic analysis was performed using complete and draft genome sequences of members of the family Acetobacteraceae. Therefore, the annotated genome sequences of the finished genomes of A. pasteurianus IFO 3283 (including plasmids), Acidiphilium multivorum AIU301, Acidiphilium cryptum JF-5, Ga. diazotrophicus Pal5, Ga. medellinensis NBRC 3288 (formerly Gluconacetobacter xylinus NBRC 3288), and G. oxydans 621H were used [25][26][27]63,87]. Furthermore, the draft genome sequences (contigs and scaffolds) of A. pasteurianus subsp. pasteurianus LMG 1262 T , A. pasteurianus NBRC 101655, A. pomorum DM001, A. tropicalis NBRC 101654, and A. aceti NBRC 14818 were included [2,5,20,22,23]. As no annotation of the draft genome sequence of A. pasteurianus 3P3 [21] was available, the draft genome was annotated using the GenDB platform as described above. The manually curated genome sequence of A. pasteurianus 386B, together with the plasmids identified, was incorporated as well. Comparative analysis of these genome sequences, including synteny analyses, identification and classification of orthologous genes, and phylogenetic analysis was accomplished by the EDGAR software framework using default parameters [88]. In addition, the Artemis Comparison Tool (ACT) was applied to identify similarity between the different plasmids of A. pasteurianus 386B and A. pasteurianus IFO 3283 [89], using the BLASTN algorithm with default parameters [90].