Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi

Background Borrelia (B.) burgdorferi sensu lato, including the tick-transmitted agents of human Lyme borreliosis, have particularly complex genomes, consisting of a linear main chromosome and numerous linear and circular plasmids. The number and structure of plasmids is variable even in strains within a single genospecies. Genes on these plasmids are known to play essential roles in virulence and pathogenicity as well as host and vector associations. For this reason, it is essential to explore methods for rapid and reliable characterisation of molecular level changes on plasmids. In this study we used three strains: a low passage isolate of B. burgdorferi sensu stricto strain B31(−NRZ) and two closely related strains (PAli and PAbe) that were isolated from human patients. Sequences of these strains were compared to the previously sequenced reference strain B31 (available in GenBank) to obtain proof-of-principle information on the suitability of next generation sequencing (NGS) library construction and sequencing methods on the assembly of bacterial plasmids. We tested the effectiveness of different short read assemblers on Illumina sequences, and of long read generation methods on sequence data from Pacific Bioscience single-molecule real-time (SMRT) and nanopore (Oxford Nanopore Technologies) sequencing technology. Results Inclusion of mate pair library reads improved the assembly in some plasmids as did prior enrichment of plasmids. While cp32 plasmids remained refractory to assembly using only short reads they were effectively assembled by long read sequencing methods. The long read SMRT and nanopore sequences came, however, at the cost of indels (insertions or deletions) appearing in an unpredictable manner. Using long and short read technologies together allowed us to show that the three B. burgdorferi s.s. strains investigated here, whilst having similar plasmid structures to each other (apart from fusion of cp32 plasmids), differed significantly from the reference strain B31-GB, especially in the case of cp32 plasmids. Conclusion Short read methods are sufficient to assemble the main chromosome and many of the plasmids in B. burgdorferi. However, a combination of short and long read sequencing methods is essential for proper assembly of all plasmids including cp32 and thus, for gaining an understanding of host- or vector adaptations. An important conclusion from our work is that the evolution of Borrelia plasmids appears to be dynamic. This has important implications for the development of useful research strategies to monitor the risk of Lyme disease occurrence and how to medically manage it. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3804-5) contains supplementary material, which is available to authorized users.


Background
Comparative analyses of whole genome sequences have been shown to be vital for understanding the correlation between infecting pathogen and disease manifestation (e.g. [1][2][3]). A pre-requisite for comparative genomics is the identification of strains of differing pathogenicity and also the ability to assemble the whole of the genome, including accessory regions in genomes of pathogenic bacteria. Bacterial genomes can be divided into conserved core and less conserved accessory (or noncore) regions [4]. Importantly, accessory regions often undergo horizontal gene transfer, probably promoted by mobile genome elements (phages, transposons, plasmids) [5]. This may change pathogenicity as well as host, or vector specificity (e.g. [6]).
Next generation sequencing (NGS) technologies have allowed a big step to be taken towards the goal of whole genome analysis. Different sequencing methods are currently on the market with Illumina being notable for short sequencing read technology [7][8][9], but assembly of accessory regions of the genome may be particularly challenging from short read NGS data [10,11]. On the other hand Pacific Bioscience Systems' Single Molecule Real-Time (SMRT) DNA sequencing and Oxford Nanopore Technologies' nanopore sequencing methods generate long sequence reads (maximum length 20 kb and >100 kb, respectively) with enormous advantages for genome assembly and in particular for plasmids (e.g. [10]). Although long read methods such as nanopore or SMRT sequencing may have less precision with respect to single nucleotide polymorphism accuracy [9], high sequence coverage with the latter method (>100×) has been reported to provide more accuracy at the single nucleotide [12]. Costs of the sequencing methods vary considerably and thus, precision and economics are important factors for deciding which sequencing technology is best suited to investigate the pathogen in question.
In the present work we compared different NGS methodologies for their utility in analyzing the very complex genome of Borrelia burgdorferi sensu stricto (s.s.). B. burgdorferi s.s. belongs to a bacterial species complex that consists of more than 20 genospecies with differing propensity to cause Lyme borreliosis in humans. The bacteria are maintained in natural transmission cycles between vertebrate reservoir hosts and tick vectors. The Borrelia genome is highly fragmented consisting of a linear main chromosome and numerous linear and circular plasmids [13,14]. Figure 1 shows a schematic drawing of the published genome of strain B31, the first strain of B. burgdorferi s.s. ever sequenced. The size of the whole genome amounts to 1.5 Mb with plasmids constituting appr. 40% of it. The GC content is about 28%. It is known that plasmids encode the majority of outer surface proteins [14] that are located at the vector-host-pathogen interface (reviewed by [15]) playing a critical role in the ecology of B. burgdorferi sensu lato (s.l.) and in their ability to cause disease. Therefore, information on plasmid content and structure is an absolute requirement for understanding host or vector adaptation of vector-borne disease agents such as Borrelia. The number and structure of plasmids harbored by Borrelia strains can vary considerably [13,16] and for many European Borrelia species only limited information on plasmid content and structure is available (see http://borreliabase.org/). Furthermore, Fig. 1 Schematic drawing of the genome organization of B. burgdorferi s.s. strain B31-GB. Strain B31-GB has been described to possess >20 linear and circular plasmids (Casjens et al. 2012). Similar color indicates plasmids with high sequence similarity. Note that lp56 has an insertion of a cp32 plasmid difficulties in assembling plasmid sequences currently hamper efforts to understand the interaction of this organism with human and animal reservoir hosts as well as its arthropod (tick) vector [17].
Molecular level changes in the genomes of bacteria with significant consequences for virulence and pathogenicity, together with host and vector associations can be accelerated through horizontal gene transfer and recombination instigated by mobile genetic elements (e.g. transposons, phages). To explore molecular level changes in the genome of vector-borne bacterial pathogens and to understand how best to identify them, we investigated the genomes of closely related strains of the tick-transmitted bacterium B. burgdorferi s. s. using a variety of library construction methods, enrichment of plasmids and different sequencing technologies (Illumina, Pacific Bioscience Systems SMRT technology, Oxford Nanopore technology). Previous work has shown that linear and circular plasmids can have a mosaic structure and differ even between strains of a given species [13], thus we especially focused on the plasmid fraction, which is known to be important in influencing host specificity and virulence in Borrelia [18,19]. It is well recognized in other bacteria that gain or loss of genetic elements (plasmid acquisition/loss, phage transduction) can lead to adaptation to a new environment including host specialization and may change pathogenicity of bacterial pathogens with consequences for human or animal health (e.g. [6,10]).
We sequenced a low passage strain B31 of B. burgdorferi s. s., available at the German National Reference Centre for Borrelia (here termed B31-NRZ), and two additional closely related strains PAli and PAbe [20,21], comparing these with the previously sequenced reference strain B31, available in GenBank (accession number AE000783.1) (here termed B31-GB). These strains were chosen because their close relatedness to the B31 reference strain allowed addressing questions of performance of library construction methods, of assembly software and sequencing technology. One of the main aims was to address the question of what are appropriate requirements for future reference genome assembly in other Borrelia species.
Our data show that short read methods are sufficient to assemble the main chromosome and many of the plasmids. However, both long and short read methods are required to obtain reliable data on cp32 plasmids. The B. burgdorferi s.s. strains investigated here showed in general a high degree of similarity to the reference strain B31-GB but, intriguingly, both long read technologies revealed remarkable differences between strains in some cp32 plasmids. These data together with previous work indicate that B. burgdorferi s.s. strains that show high similarity at the core genome may show structural differences in some of their plasmids suggesting a dynamic system for rapid evolution.

Strains
We used a B31 low passage strain of B. burgdorferi s.s. (4th passage after receiving the strain in 1983), provided as a kind gift of W. Burgdorfer to the German National Reference Centre (NRZ). The strain was originally isolated in 1981 from an Ixodes scapularis tick on Shelter Island [22]. Throughout the paper, this B31 strain is termed B31-NRZ and will be compared to B31-GB, the strain that had been sequenced by Fraser and co-authors [14] and deposited to GenBank under accession number AE000783.1. The strains sequenced by Fraser and co-authors originated from the same tick isolate as B31-NRZ but was cloned and passed through a mouse before sequencing using a whole-genome random approach utilizing small insert plasmid libraries [14,23].
PAli and PAbe are strains isolated from patients in Germany in the 1990s (5th and 6th in vitro passage, respectively). Based on Multilocus Sequence Typing (MLST) and chromosomal single nucleotide polymorphisms (SNPs), they have been shown to belong to the same MLST sequence type (ST) as B31 (ST1) and both these strains differ in approximately 60 SNPs from B31-GB on the main chromosome, thus they are very closely related to B31 [20,21].

PCR analyses
To probe which plasmids are present in the strains B31-NRZ, PAbe and PAli, primers were designed to sequence regions that were unique for individual plasmids in B31-GB (Additional file 1: Tables S5 and S6). PCR primer and conditions used to verify the presence/absence of gaps in Illumina alignments are given in Additional file 1: Tables S7.

DNA purification, plasmid enrichment and library construction
To generate genomic DNA for Illumina sequencing, Borrelia strains were cultured in MKP medium using conditions as described previously [24]. Genomic DNA was extracted via a Maxwell® 16 using a Maxwell LEV Blood DNA kit (Promega, Germany) and libraries of whole genomic DNA (total DNA) were prepared according to the manufacturer's recommendation.
Following A quality check on Illumina sequencing reads was performed either in Galaxy [25] or in the CLC Genomics Workbench and reads that did not pass the filter were not included in the analyses. Sequence reads were mapped on the B31 reference genome (GenBank accession no. NC_001318.1) in the CLC Genomics Workbench using default settings: Mismatch cost = 2; Cost of insertions and deletions = Linear gap cost; Length fraction = 0,5; Similarity fraction = 0,8; Auto-detect paired distances = yes; Non-specific match handling = Map randomly. Variant calls were generated using the fixed ploidy variant detection option with ploidy = 1; required variant probability (%) = 90; minimum coverage = 10; minimum count = 2; minimum frequency (%) = 80; base quality filter = yes; neighborhood radium = 5; minimum central quality = 20; minimum neighborhood quality = 15. Read tracks and reports were generated and saved as well as un-mapped reads.
For de novo assembly of reads in Galaxy [26], SGA [27] and Velvet [28] were used with the following settings in SGA: minimum overlap for the final assembly = 85; correction k-mer = 31; min read length = 40; coverage filter = 2; overlap parameter used for FM-merge = 55; number of pairs required to link two contigs = 5; minimum length of contigs to include into scaffold = 200. Settings for Velvetoptimiser were: start hash length = 35, end hash length = 129, Kmer optimisation metric = N50 size; coverage optimisation metric = length of single longest contig.
For B31-NRZ de novo assembly was also conducted using SPAdes [29] on a virtual BioLinux v.1.8. machine [30] using the following command line: −k 21, 33, 55, 77, For SMRT sequencing 10 μg of DNA were purified via Maxwell® 16 using a Maxwell LEV Blood DNA kit (Promega, Germany). The quality and quantity of DNA was validated using a Qubit 3.0 fluorimeter (Life Technologies, Germany) and a genomic DNA screen tape (Agilent). DNA was sent to the Norwegian Sequencing Centre, University of Oslo, Norway for library construction, sequencing and assembly. Libraries were constructed using Pacific Biosciences 20 kb library preparation protocol. Size selection of the final library was performed using BluePippin with a 7 kb cut-off. Libraries were sequenced on Pacific Biosciences RS II instrument using P6-C4 chemistry with 360 min movie time.
Assembled sequences and consensus sequences of the read mapping conducted in CLC were aligned using MEGA5 [31]. Genetic distance analyses between sequences and multiple alignments of sequences were created in MEGA5.

Sequencing and sequence analysis
Short read sequencing was performed on an Illumina MiSeq platform (Illumina, San Diego CA, USA). Reads were demultiplexed and tags removed as part of the MiSeq run. SMRT sequences were assembled and polished using the Hierarchical Genome Assembly Process (HGAP). Overlapping regions due to circularization of plasmids were removed. Nanopore sequences were processed employing the long read assembler Canu [32] available through a Galaxy platform [25,33].
The software package Mauve [34] was used to order pre-assembled contigs and to align them to a reference genome.
BLAST ring image generator (BRIG) was used to display similarities between genomes [35]. BLAST [36] was performed with default settings to search for similarities between contigs and sequences available in GenBank.

Impact of Library construction and assembler on Illumina MiSeq sequence reads
A summary of the sequencing statistics, i.e. total bp, average read length, GC content, revealed that for all strains sequencing results were of comparative, sufficiently high quantity and quality to obtain a representative view of the genome ( Table 1). Comparison of the different methods using de novo assembly revealed little differences in assembly success when either NX or TS were used on their own. MP libraries have been developed to provide scaffolding for short reads especially for regions carrying repetitive sequences [37]. In our study, when combining NX and TS reads with MP reads the number of contigs (counts) was only slightly reduced from 65 to 62 with NX or NX_MP, respectively, and 63 to 62 for TS and TS_MP for strain PAbe (not PAli or B31-NRZ). Similarly, N50 values changed only slightly when libraries were constructed using TS only or including MP libraries in the assembly (B31-NRZ-TS, N50 value 433.943 and B31-NRZ-TS_MP, N50 value 434.762; PAbe-TS, N50 405.884 and PAbe-TS_MP, N50 value 434.591, Additional file 1: Table  S1). For B31-NRZ de novo assembly of lp28-1 was improved using a combination of TS and MP library reads compared to TS reads alone (Fig. 2).
In some cases, when de novo assemblies of plasmids of B31-NRZ, PAli and PAbe were aligned to the B31-GB reference, gaps in the alignment were observed e.g. lp17 (B31-NRZ) (Fig. 3), lp28-1 (Fig. 2), lp36 (PAbe) and lp38 (PAli). To examine whether these gaps represented true differences between strains or whether these gaps may have appeared as a result of problems in library construction or assembly, PCR was employed. PCR with primers (Additional file 1: Table S8) designed to bridge the border between neighboring contigs and the gapped regions resulted in PCR products, suggesting that these gaps are artefacts likely to be results of library construction methods or assembly of short reads (data not shown). Sequences of PCR products were identical to B31-GB sequences further confirming the artefactual nature of these gaps (e.g. lp38, Additional file 1: Figure S8). The artefactual nature of these gaps was further confirmed by long read assemblies (see below). Comparison of different assemblers revealed that using SPAdes for assembly gave the longest contigs. A single contig covered the whole of the main chromosome (910,137 bp), while several others covered whole plasmids such as lp36, lp38, lp54 or cp26 ( Table 2, Additional file 1: Table S2). However, as shown by BLAST [36] searches, cp32 plasmids were difficult to assemble de novo using any assembler including SPAdes and only short contigs that matched several B31-GB cp32 sequences were generated (Additional file 1: Table S2).
As expected, read mapping to the B31-GB reference genome (including plasmids) provided evidence that the different library preparation methods gave equally good results with numbers of SNPs being very similar regardless of the library construction method (Additional file 1: Table S3).

Plasmids of NRZ strains B31-NRZ, PAli and PAbe
Using commercially available plasmid mini kits, it was possible to enrich Borrelia linear and circular plasmids of all sizes, and even the large linear plasmids lp54 and lp56 were found by PCR in the preparation. The yield of DNA from the main linear chromosome was drastically reduced by this procedure (Additional file 1: Table S1, Figures S1 and S2,) permitting a high read coverage of plasmids and a slightly better de novo assembly of some . Also the use of enriched plasmid DNA for NX library construction improved the assembly (red color, plasmid enrichment). Pacific Bioscience SMRT sequences also provided a good assembly (turquois color, SMRT Bell library). It is noteworthy that in all cases (TS-MP, plasmid enrichment, SMRT Bell) the plasmid of B31-NRZ was shorter than that of B31-GB. Different shades of coloration in one panel indicate different identities between aligned sequences. Regions with lighter shades correspond to less sequence similarity (see region app. 7 kbp to 7.8 kbp in plasmid enrichment) plasmids of B. burgdorferi strains B31, PAli und PAbe. Graphical representation generated using BRIG [35] and comparison of the different methods underlined the suitability of plasmid enrichment for plasmid assembly in Borrelia (Figs. 2, 3 and 4). These images show that gaps that were present with de novo assemblies of total genomic DNA were reduced when enriched plasmid DNA was used for MiSeq sequencing (Figs. 3 and 4). Pacific Bioscience sequences performed even better resulting in fewer gaps in plasmids assemblies (Figs. 2, 3 and 4).

Pacific Biosciences SMRT sequences
De novo assemblies obtained from Pacific Biosciences SMRT sequences resulted in contigs which indicated the presence of one main chromosome and12 plasmids in B31-NRZ, 12 plasmids in PAli and ten plasmids in PAbe. Fig. 3 Visualization of assemblies of lp17 in PAbe aligned to the reference B31-GB using BRIG. Total genomic DNA was used for TS library preparation (purple color, labelled TS library). A spurious gap in the region from 0 to 2.5 kb is visible. Using enriched plasmid DNA for NX library construction improved the assembly and a smaller gap at 16-17 kb is visible (red color, labelled plasmid (de novo)). Not surprinsingly, read mapping of Illumina reads on B31-GB using enriched plasmid DNA for library construction showed complete coverage suggesting that reads for the complete plasmid existed but were not assembled de novo (blue color, labelled plasmid readmapping). De novo assemblies of Pacific Bioscience SMRT sequences showed complete coverage of lp17 confirming that the complete plasmid was present in PAbe and that the gaps presenting in de novo assemblies of short reads were artefacts either of library construction or short read assembly (torquois color, labelled SMRT Bell library) SMRT sequence assemblies correctly displayed the duplicated 5S-23S locus on the main chromosome and repetitive sequences, for example on plasmid lp17 (Additional file 1: Figure S3). In all three strains, differences in plasmid content compared to B31-GB were found; especially some of the cp32 plasmids appeared to have merged (e.g. cp32-1 and cp32-5 in B31-NRZ, cp32-1 and cp32-5 as well as cp-32-4 and cp32-9 in PAbe). The 12 plasmids found in strain B31-NRZ were six linear plasmids lp17, lp28-1, lp36, lp38, lp54, lp56, and six circular plasmids, cp26, cp32-5 + 1, cp32-3, cp32-4, cp32-9 and cp32-2 (Table 3, Additional file 1: Table S4). Differences in B31-NRZ compared to B31-GB included (i) that contig4 matching plasmid cp32-1 in this strain seemed to be fusion of cp32-5 and cp32-1 as its PFam32 sequences were 100% identical to that of cp32-5 and cp32-1. Furthermore, it differed to B31-GB's cp32-1 in sequence stretches of one or several kb that showed higher similarity to cp32-5 and cp32-6 of strains 64b and 156a, respectively (Table 3 and below, see also Additional file 1: Figure S4); (ii) that plasmid lp28-1 was shorter (21,885 bp) than in B31-GB (28,115 bp) and (iii) that plasmid cp32-2 was present and had >3000 SNPs compared to cp32-7. The PFam32 sequence of B31-NRZ contig6 showed 100% identity to PFam32 of plasmid cp32-2 and 99% similarity to PFam32 of plasmids cp32-7 of several B. burgdorferi s. s. strains. Plasmid cp32-2 was described originally in B31 [23] and in some B31 cultures it appears to have been replaced by cp32-7 [13].
PAli contained seven circular plasmids and five out of the six linear plasmids as identified in B31-NRZ, lp28-1 Fig. 4 Visualization of assemblies of cp32-1 of B31-NRZ aligned to the reference B31-GB. De novo assembly of total genomic DNA sequenced from a TS library preparation (purple color, labelled TS library). The alignment shows several gaps covering only three quarter of the plasmid. Using enriched plasmid DNA for NX library construction followed by de novo assembly in the CLC Genomics workbench (red color, plasmid (de novo)), the size of gaps was reduced. Read mapping of Illumina reads on B31-GB using enriched plasmid DNA for NX library construction (blue color, plasmid (read mapping)) produced a gapless alignment of reads to the reference. De novo assemblies of Pacific Bioscience SMRT sequences (torquois color, SMRT Bell library) showed complete coverage of the cp32-1 plasmid. The image shows the larger size of the SMRT assembly with a gap appearing around 17 kb which likely reflect the differences observed between B31-NRZ and B31-GB. Different shades of coloration in one panel indicate different identities between aligned sequences. Regions with lighter shades correspond to less sequence similarity (see region app. 3.5 kbp to 4.0 kbp in de novo assembly using enriched plasmids) was missing completely. Three contigs showed similarities to plasmid cp32-1, and to varying degrees similarities to other cp32 plasmids (Table 3, and discussed below in further detail).
Contig4 of B31-NRZ seemed to be a mosaic plasmid of cp32-1 and other plasmids. It contained in the region from 21 kb -26 kb sequences with high similarity to Bbu64b_V0030, and Bbu64b_V0032 to _V0037 that correspond to region 20,971-21,744 and 23,338-26,211 of cp32-5 of B. burgdorferi s.s. strain 64b. Bbu64b_V0030 to Bbu64b_V0037 encode hypothetical proteins, putative plasmid partition genes or KID repeat protein. The region between 47 and 49 kb in contig4 of B31-NRZ showed the highest similarity to BbU156a_M27 and _M28 (corresponding to regions 16,558 -17,133 and 17,239-17,676 of cp32-6 of B. burgdorferi s.s. strain 156a). M27 of strain 156a codes for a repeat motifcontaining protein whose nucleotide sequences was 91% similar to the Borrelia direct repeat (BdrN) gene of cp32-7 in B31.
In PAli, there were three different contigs that had similarity to cp32-1 of B31-GB. Contig14 had highest similarity to cp32-1 but was truncated (20,004 bp) compared to cp32-1 of B31-GB (30,681 bp) and no PFam32 sequence was found. Contig13 and 15 were partially highly similar to cp32-1 but had sequence stretches of higher similarity to cp32-5 or cp32-6, respectively (Table 3 and Additional file 1: Tables S3 and S4), similar to what was found in contig4 of B31-NRZ. Similarity searches with PFam32 sequences revealed that contig13 In B31-NRZ contig4 (con4) of the SMRT sequences was likely a hybrid plasmid as the first app. 33,000 bp showed 1138 SNPs to cp32-1 while the sequence starting from 33,000 had 773 SNPs to cp32-1. The regions showing variation to cp32-1 of B31 showed similarity to cp32-5 of strain 64b (21-26 kb) and to cp32-6 of strain 156a (47-48.5 kb) (see text for details). Contig4 had 6631 SNPs compared to cp32-1 + 5 of JD1. Its PFam32 sequences showed 100% similarity to cp32-5 and cp32-1 c In PAli, three contigs were found that matched cp32-1 in BLAST searches. Contig14 showed 20 SNPs to cp32-1 but was short (20,004 bp) and did not have a PFam32 locus. Contig13 matched cp32-1 from 1 to 18,740 bp, 18.7-23 kb the similarity was higher to cp32-5 of B. burgdorferi s.s. strain 64b , while the remaining sequence to 32,285 matched cp32-1 again. Its PFam32 sequences showed 100% similarity to cp32-5 of several B. burgdorferi s. s. strains. Contig15 between 16.5 and 18 kb showed a closer match to cp32-6 of strain 156a while the remaining sequence matched cp32-1. PFam32 sequence of contig15 was identical to cp32-1 of B31-GB d In PAbe one contig was present (contig2) that partially matched cp32-1 but part of the sequence also matched cp32-5 (appr. 14.5-19 kb) and cp32-6 (appr. 40-42 kb). In contig2, the PFam32 of cp32-1 showed 100% identity e A plasmid was found in B31-NRZ that was not present in the B31-GB reference. Its PFam32 sequence was identical to cp32-2 and showed high similarity (99%) to cp32-7 of several B. burgdorferi s.s. strains. An almost identical plasmid was found in PAbe with a PFam32 matching that of cp32-2 f In BLAST searches contig3 of PAbe showed similarity to cp32-9 . However, the first part of the sequence was highly similar to cp32-9 while the second part was more similar to cp32-4 of B31 suggesting that this plasmid represents a hybrid plasmid. The PFam32 sequences were identical to PFam32 of cp32-9 and cp32-4 g In B31 plasmid lp28-1 is short (21,885 bp). In PAbe the presence of lp28-1 is questionable as there were only two contigs of approximately 4800 bp that showed high similarity to lp28-1 was likely a member of cp32-5, its PFam32 was identical to PFam32 of cp32-5 plasmids of other B. burgdorferi s. s. strains. The PFam32 sequence of contig15 had 100% similarity to cp32-1 of B31-GB, in spite of this plasmid containing a sequence stretch that was identical to cp32-6 of B. burgdorferi s.s. strain 156a. The cp32-6 sequences mapped to the same region as those in B31-NRZ (BbU156a_M27 and _M28), indicating that amongst the strains investigated here, the plasmids and plasmid parts showed a high degree of similarity to each other (Additional file 1: Figure S4).
In PAbe, plasmid cp32-2 was found being 99% identical to cp32-2 of B31-NRZ. The size of contig3 in PAbe was 62 kb. It showed sequence similarities to cp32-4 and cp32-9 suggesting that this plasmid represents a hybrid plasmid in this strain. To this point, its two PFam32 sequences showed 100% identity to cp32-9 and cp32-4.
In PAbe, several contigs were obtained that partially matched three plasmids (lp28-1, lp36 and lp56). Lp36 was represented by 4 short contigs (2.3-10 kb, Table 3) that aligned in a gapped fashion to lp36 of B31-GB which may suggest that the plasmid was in the process of degradation. For lp56 two overlapping contigs were obtained. It is conceivable that different cells of the population contained the different fragments of the plasmid which was perhaps also in the process of degradation. Discrepancies concerning plasmid lp28-1 existed between different assemblies. De novo assemblies of Illumina reads using total genomic DNA suggested the presence of an lp28-1 in PAbe. A protein BLAST search in the Illumina de novo assembly of PAbe with the lp28-1 partitioning protein of B31-GB revealed a 100% identical hit supporting the presence of lp28-1. However, read mapping of plasmid enriched Illumina reads showed a surprisingly low coverage of lp28-1 of only 14× compared to >100× coverage for other plasmids. In SMRT assemblies two short contigs (total appr. 4800 bp) were recovered one of which showed high similarity (99%) to the VlsE-locus that is located on lp28-1 in B31-GB. Taken together, the data suggest the presence of lp28-1 in PAbe albeit perhaps in low abundance and perhaps in the process of decay.
Analysis of coverage of SMRT contigs suggested it was likely that not all the plasmids were present in each individual cell of the population, but that individual plasmids were present in only a subset of cells. For example in PAbe, coverage of the main chromosome was nearly 1,000fold while plasmids cp32-2 and lp17 showed a coverage of 500fold and 250fold, respectively suggesting a presence of the respective plasmids in either one half or one quarter of the cells.
A drawback of Pacific Biosciences SMRT and nanopore sequences was the presence of (presumably artefactual) insertions and deletions (indels) (Additional file 1: Figure S5) that were neither present in the B31-GB reference sequence nor in consensus sequences of read mapping of the respective Illumina sequences (Additional file 2, Alignments_B31_PAli_PAbe).

Nanopore sequencing
Sequences obtained using the MinION device of Oxford Nanopore Technology were trimmed and assembled using the single read assembler Canu [32]. Both, assembled and unassembled trimmed reads had an average coverage of 40fold. BLAST [36] searches of nanopore contigs and trimmed reads against sequences available in GenBank returned matches to the main chromosome and plasmids as detailed in Table 4. The main chromosome as well as five plasmids (lp54, lp56, lp38, cp26, and cp32-1) returned from Canu assemblies while plasmids lp17, lp28-1, lp36, cp32-3, cp32-4, cp32-9 and cp32-2 were recovered from trimmed read sequences indicating that the (almost) complete length of these plasmids was sequenced as a single length of DNA. Contig 241245 was longer in size than the B31-GB cp26. However, the first 13,431 bases matched to position 12,626-26,496 of cp26 and bases 39,932-46,144 matched to the first 7358 bp of cp26 suggesting an overlap of sequences. While most plasmids were comparable in size to the GenBank reference, some showed only partial matches, e.g. plasmid lp28-1 (15,787 bp) and cp32-4 (22,112 bp) were shorter than plasmids found in the GenBank reference of B31. Interestingly, shorter sequences for plasmid lp28-1 were also found in Illumina and SMRT sequences (Table 3). Intriguingly, nanopore sequences contained a single plasmid of approximately 60 kb in size that matched (in large parts) to contig4 of SMRT sequences (Additional file 1: Figure S6). In addition, unassembled trimmed reads of 18,058 bp showed highest similarity to contig6 of B31-NRZ, a plasmid resembling cp32-2 that was originally described in B31 [23]. The parA gene present in contig6 and the nanopore reads showed highest similarity in BLAST searches to the parA gene of cp32-2.
Determination of single nucleotide differences (SNPs) in MEGA between nanopore sequences and B31-GB sequences resulted in a number of SNPs some of which were due to alignment errors resulting from the fact that gap extension is less costly in alignments than opening of new gaps (see Additional file 1: Figure S5). In comparison, fewer SNPs were observed in SMRT sequences (Table 4) which can be attributed to higher accuracy of SMRT sequences and fewer sequencing gaps in homopolymer regions.

Discussion
We chose B. burgdorferi s.s. strain B31 that was available at the German NRZ for Borrelia and two closely related strains (PAli and PAbe) that were isolated from patients with different clinical manifestations, i.e. erythema migrans and Lyme neuroborreliosis, respectively, to compare various next generation sequencing methods. These strains were chosen to obtain proof-of-principle data on the suitability of methods and to understand how similar or different the plasmid content and structure of very closely related strains in B. burgdorferi s.s. are. Previous studies of genomic differences between Borrelia strains investigated markedly divergent strains of B. burgdorferi s.s., i.e. representing different MLST STs. These studies revealed a substantial rearrangement of plasmid located sequences likely due to homologous and non-homologous recombination [13].
Short read methods provide a good representation of the core genome but not of all plasmids TS libraries have been described to give a more even coverage of genomes due to the mechanical fragmentation of DNA by ultrasound [9]. The NX library construction method, based on enzymatic fragmentation and tagmentation of DNA in a single step, represents a fast and very convenient way of constructing sequencing libraries and requires very little DNA input (1-50 ng). MP libraries give approximate distance and orientation of reads and can serve as a scaffold for reads of other library preparation methods [37,38]. A combination of NX or TS with MP libraries may improve de novo assembly of genomes, especially in regions with sequence repeats [37]. NX libraries have been reported to be especially suited for genomes with a balanced GC content. However, this method performed very well for the AT-rich (70%) [14] Borrelia genome in our study. We did not observe major differences in de novo assemblies whether TS or NX libraries were used, many plasmids being well represented in alignments. For B31-NRZ the assembly of plasmid lp28-1 was considerably improved using a combination of TS and MP library reads compared to TS reads alone (Fig. 2), while in PAbe no difference was observed in lp28-1 assembly. Enrichment of plasmids using commercially available plasmid purification kits also resulted in improvement of de novo assemblies of some plasmids (e.g. reduction of gapped regions in lp17or cp32-1, Figs. 3 and 4) suggesting that higher depth of coverage may have improved assembly of these regions.
Of the different de novo assemblers used in this study, SPAdes produced the longest contigs, at least for the main chromosome and several of the plasmids but not for cp32 plasmids. Plasmid assembly of cp32 plasmids proved very difficult with short sequence reads, regardless of library construction, plasmid enrichment or assembly software used. Borrelia cp32 plasmids are of particular interest, as they harbor erp loci which encode gene products that confer protection of the bacteria against the mammalian innate immunity [39]. Unfortunately, however, many Borrelia strains contain multiple versions of highly similar cp32s, thereby impeding sequence assembly. Obtaining accurate sequences for such genetic elements is of crucial importance and might provide new insights in to ecology and pathogenicity of these spirochetes in the future. To assemble cp32 plasmids properly, long read sequencing was required and revealed subtle differences that were not detected using combinations of library preparations and Illumina sequencing only.

Plasmids in B. burgdorferi appear dynamic
The strains included in our study showed differences in plasmid numbers compared to B31-GB which may have reflected the loss of B. burgdorferi s.s. plasmids during cultivation or freeze/thaw cycles of strains [18,40,41]. The plasmids reported to be lost most frequently in cultures of B31 include cp9, lp5, lp21, lp25, lp28-1, lp28-4 and lp56 [18,41]. Of these cp9, lp5, lp21, lp25, and lp28-4 were not found in any of our strains, either with plasmid specific PCR, or in Illumina sequences of whole genomic DNA, enriched plasmid preparations or in either of the long read sequence assemblies. In addition, plasmids cp32-6, −7, −8, lp28-2, lp28-3, lp28-5, lp28-6 and lp28-7 were also absent. For the remaining plasmids, our data show that not all cells in a given population may harbor every plasmid that was present in the population. There were obvious differences in the frequency of plasmids and some plasmids (e.g. lp36 and lp28-1 in PAbe) may have been present in the population in such low frequencies that they were not captured by all methods. For example, whilst Illumina MiSeq sequences suggested the presence of lp28-1 in PAbe, neither plasmid-enriched preparations nor Pacific Bioscience sequencing captured the full lp28-1 plasmid. These differences in sequencing success point to methodological weaknesses or drawbacks. For example, Pacific Bioscience sequencing requires large amounts of DNA and plasmids that are present in a subset of the population may not be represented well in sequence assemblies. One has to consider that the strains investigated here were very closely related and we assumed that their plasmid content was identical or highly similar. The advantages and disadvantages of sequencing methods described here combined with economics and sample handling (Additional file 1: Table S9) need to be considered when sequencing unknown strains or species for which no reference sequence is available.
Apart from quantitative information, we aimed to explore methods to qualitatively investigate the plasmid structure of closely related Borrelia strains to obtain information on requirements for future reference genomes. It had been shown in Borrelia that plasmids may be inserted into the main linear chromosome or other plasmids, and merging of plasmids has also been observed [13,42]. Linear plasmids have been shown to contain a large number of pseudogenes that may have emerged through gene duplications, accumulation of deleterious mutations and decay [42]. As the insertion of a gene into a different genomic environment may change its expression pattern, and such changes may influence host-or vector-interaction of bacterial pathogens, it is important to investigate the structural changes that may occur in the plasmids of closely related bacterial strains.
The close relationship of the strains investigated here is already shown by identical MLST types [21] and is further demonstrated by the very low number of SNPs that are found in the main chromosome [20] and linear extrachromosomal elements. Of the circular plasmids, cp26 and three cp32s, i.e. cp32-3, cp32-4 and cp32-9, showed also very few SNPs or rearrangements among the strains studied here (although cp32-4 and cp32-9 presented as fusion plasmid in PAbe), further emphasizing the close relationship of the strains. The remaining cp32s showed similarity amongst the three NRZ strains, but revealed differences to B31-GB. Variant segments in the cp32-1 or cp32-5 plasmids had higher BLAST similarity to regions in plasmids cp32-6 of B. burgdorferi s.s. strain156a or cp32-5 of B. burgdorferi s.s. strain 64b. Interestingly, the variable region in contig4 (cp32-1) of B31-NRZ, in contigs15 (cp32-1) and 13 (cp32-5) of PAli and contig2 of PAbe corresponded to regions 1 and 2 described by Casjens and co-workers in 2012. Although homologous recombination may contribute to re-shuffling of sequences in cp32 plasmids within strains [13] and although cp32 plasmids have been shown to contain prophage sequences [43,44] and may therefore be particularly prone to horizontal gene transfer, the mechanisms behind the rearrangements observed here are difficult to understand because the inserted variable sequences had high similarity to B. burgdorferi s.s. strains 156a and 64b, but not B31. It can only be speculated that: (i) the original strain B31 may not have been a clonal isolate and actually consisted of several different Borrelia clones; each clone maintaining different genetic elements during sub-passage of cultures; (ii) the plasmid content of a single Borrelia strain differs between different individual cells; (iii) some plasmids may have been present in our strain population in such a low frequency that no data were obtained. Support for the latter hypothesis comes from our Pacific Bioscience data which showed that some plasmids occur in the population with low frequency, i.e. the data suggest that only every tenth cell may contain these plasmids as the average coverage of plasmids tended to be variable ranging from >300fold for the chromosome and some plasmids to 30fold for other plasmids (data not shown).
Contig6 in B31-NRZ and PAbe corresponded most likely to the elusive cp32-2 plasmid in B31-GB [13]. Casjens et al. [13] suggested the use of PFam32 sequences for plasmid classification as they are homologous to plasmid partitioning proteins (ParA) in other systems and were shown to allow categorization of plasmids in Borrelia. We used this system to classify plasmids in the closely related strains investigated here. Similarity searches with the PFam32 gene of cp32-2 that had been initially described by [23] revealed that it was identical to sequences in both contig6 and in reads obtained by nanopore sequencing. Cp32-2 also possessed a gene region containing an erpCD locus [23,45] and sequences in the contigs6 of B31-NRZ and PAbe had 99% similarity with it.
Despite the fact that Pacific Biosciences SMRT assemblies showed generally >100fold and nanopore sequences a 40fold coverage, a drawback of these assemblies was the occurrence of indels that may provoke frameshifts in coding sequences. This tendency was more pronounced in nanopore than in SMRT sequences. This problem has been noticed for Pacific Bioscience sequence assemblies before and different strategies have been proposed to address the issue including hybrid approaches [46,47] or long read correction tools [48]. It demonstrates that although long read sequencing has hugely improved the assembly of bacterial genomes and in particular plasmids, there are still hurdles to overcome. In view of the fact that in B. burgdorferi s.s. linear plasmids may contain a large number of non-functional pseudogenes [42], in the case of Borrelia the use of a hybrid strategy employing short and long reads may be prudent to generate accurate genome sequences.
The demonstrated merging of cp32 plasmids in the B. burgdorferi s. s. strains investigated here shows that some elements in the genome are dynamic even between strains that are almost identical on the main chromosome [20]. It suggests that even for such closely related strains, the use of reference genomes for read mapping may give only a distorted view of reality. B. burgdorferi s. s. belongs to a bacterial species complex (B. burgdorferi s. l.) which currently comprises >20 named and proposed species. Individual genospecies within the complex differ in their ecological niches in terms of reservoir host and vector associations [49][50][51][52] and in their ability to cause human disease [53,54]. The dynamic nature of the B. burgdorferi s.s. genome (that has been demonstrated previously and here) may be important in terms of understanding the ecology and pathogenicity of B. burgdorferi s.l. in general and for individual strains in particular. PAli and PAbe were isolated from patients with different clinical manifestations and further investigations including a larger number of strains may reveal whether or not a common pattern in plasmid content or structure can be found in strains isolated from patients with different clinical manifestations.

Conclusion
The data accumulated in the present study are in agreement with previous studies that showed that the plasmid