Extensively duplicated and transcriptionally active recent lateral gene transfer from a bacterial Wolbachia endosymbiont to its host filarial nematode Brugia malayi

Background Lymphatic filariasis is a neglected tropical disease afflicting more than 120 million people, while another 1.3 billion people are at risk of infection. The nematode worm Brugia malayi is one of the causative agents of the disease and exists in a mutualistic symbiosis with Wolbachia bacteria. Since extensive lateral gene transfer occurs frequently between Wolbachia and its hosts, we sought to measure the extent of such LGT in B. malayi by whole genome sequencing of Wolbachia-depleted worms. Results A considerable fraction (at least 115.4-kbp, or 10.6%) of the 1.08-Mbp Wolbachia wBm genome has been transferred to its nematode host and retains high levels of similarity, including 227 wBm genes and gene fragments. Complete open reading frames were transferred for 32 of these genes, meaning they have the potential to produce functional proteins. Moreover, four transfers have evidence of life stage-specific regulation of transcription at levels similar to other nematode transcripts, strengthening the possibility that they are functional. Conclusions There is extensive and ongoing transfer of Wolbachia DNA to the worm genome and some transfers are transcribed in a stage-specific manner at biologically relevant levels.


Background
Brugia malayi (filarial nematode) is a causative agent of human lymphatic filariasis, a neglected tropical disease that results in elephantiasis and thus disability, handicap, and stigma. Over 120 million people have lymphatic filariasis, with another 1.3 billion people at risk of infection [1]. Transmission of the disease requires a mosquito vector, which ingests microfilariae from an infected human blood meal. The parasites develop into infective 3rd stage larvae (L3) inside the mosquito and are subsequently transmitted to another human during the next blood meal [1]. Efforts at combating the disease include mass drug administration to reduce the blood levels of microfilariae, the transmissible form. This scheme aims only at interrupting further transmission of the disease, as these drugs do not affect adult worms. Antibiotics kill all life stages by targeting the obligate mutualistic Wolbachia symbiosis, and thus can be used to treat lymphatic filariasis [2]. These Wolbachia endosymbionts are α-Proteobacteria found in all three of the causative agents of lymphatic filariasis, namely Wuchereria bancrofti, B. malayi, and Brugia timori [3].
During the original whole genome sequencing of B. malayi extensive levels of lateral gene transfer (LGT) were identified from its Wolbachia endosymbiont, wBm [4].
LGT is the process whereby organisms acquire DNA from other organisms in the absence of sex.
Most of the nuwts detected previously in B. malayi are degenerate [4], suggesting that there is no selective pressure to maintain their functionality. However, the methods used to assemble the B. malayi genome would favor the discovery of degenerate sequences. Since the endosymbiont is an obligate symbiont, the nematode genome and bacterial genome were sequenced simultaneously. Therefore, to assemble the B. malayi genome, reads that were >98% similar to Wolbachia wBm over >90% of their length were removed from the assembly [4]. This leads to the removal of the most conserved sequences. In addition, regions adjacent to nuwts that were removed in the screen, as well as duplicated regions, are unlikely to be well resolved in the assembly. Therefore, we sought to quantify the size and number of nuwts in the filarial nematode B. malayi genome that arise from its bacterial endosymbiont, Wolbachia sp. wBm. Using genome sequencing of Wolbachia-depleted worms, we obtained the full list of nuwts in B. malayi. Intriguingly, this list includes several fulllength Wolbachia genes with the potential to be functional that are also shown to generate stage-specific transcripts.

Results
Wolbachia depletion of Brugia malayi worms and DNA sequencing Since B. malayi has a mutualistic symbiotic relationship with Wolbachia strain wBm, such that neither symbiotic partner can survive without the other, the B. malayi genome cannot be sequenced without also sequencing the Wolbachia genome. This complicates identification of nuwts when compared to detecting Wolbachia-host LGT in naturally Wolbachia-free nematodes [11,16] or insects that can be cured of their Wolbachia infection with antibiotics [4,12]. To overcome this, a Wolbachiadepletion approach was undertaken in order to examine worms with low, but not immediately lethal, Wolbachia levels.
Worms used for sequencing were treated with tetracycline to deplete the Wolbachia endosymbionts. A pool of DNA was sequenced with a ratio of Brugia:Wolbachia DNA >10. Therefore, nuwts should be identified based on a >10-fold difference in coverage relative to coverage of the same sequences in the bacterial genome. More than 49 million 54-bp reads were generated from a 3-kbp mate pair library, and over 138 million 99-bp reads were generated from a 300-bp paired-end library. Given the experimental differences between a 3-kbp mate pair library and a 300-bp paired end library and a significant correlation between the results obtained ( Figure 1A; p-value <2e-16), the second sequencing strategy provides independent validation of the first. Previously, in D. ananassae we observed differences in the coverage for the paired end libraries and the mate pair libraries, with the mate pair libraries having more smooth coverage and the paired end libraries have a great deal of local variance (data not shown). This was not specifically observed here ( Figure 1B) although when compared to the paired end reads (Figure 2A), the coverage distributions are better delineated for the mate pair reads ( Figure 2B). Taken together, all subsequent analyses used the paired end data, except where noted.

BWA analysis
The sequencing reads generated were aligned against the B. malayi genome [GenBank:AAQA00000000] [17] and the Wolbachia strain wBm [GenBank:AE017321] [18] using BWA [19], a short read aligner that is fast and finds only near-identical matches. As such, this version of BWA is best suited for finding evolutionarily recent nuwts with <67 SNPs/kbp [19]. It also natively allows for analysis of mate pairs that can facilitate mapping nuwts to gaps in the B. malayi genome when one read in the pair is of Wolbachia ancestry and the other read is of nematode ancestry, facilitating some downstream analyses. As expected given the Wolbachia depletion strategy, the majority of the wBm genome was shown to have relatively low coverage with nuwts having relatively high coverage, comparable to or greater than the mean coverage of B. malayi (Figures 2 & 3, Tables 1 & 2). More specifically, the mean coverage across wBm was 1.6× and 13.8× with the mate pair and paired end reads, respectively. In contrast, and as expected for Wolbachiadepleted worms, the corresponding numbers for B. malayi were 16.0× (Table 1) and 130.6× (Table 2). This is consistent with the 10-fold depletion anticipated. Of note, the mean for the wBm mapping is skewed by the numerous high coverage values obtained for nuwts.

Detection of nuwts
The critical coverage for distinguishing between Wolbachia and nuwts was determined to be 3× for the mate pair reads (Figures 2A & 3B, D) and 16× for the paired end reads ( Figures 2B & 3A, C), as described in the methods. Using these thresholds, the boundaries of nuwts >100-bp were defined relative to the wBm genome using a sliding window approach to smooth coverage variance.
Candidate nuwts were covered by 28,060 mate pair reads and 115,812 paired end reads (Table 3). Between 28.6-kbp and 48.8-kbp of the wBm genome was transferred to the B. malayi genome as determined using the mate pair and paired end reads, respectively. Given that the wBm genome is 1.08-Mbp, >4.5% of it was present in B. malayi as recent nuwts that were detected by BWA mapping of the paired end reads (Figure 4). Furthermore, the genes detected were distributed throughout the wBm genome ( Figure 4).  Figure 1 Correlation between 300-bp paired end reads and 3-kbp mate pair reads. (A) The coverage measured using the 300-bp paired end reads was correlated with the 3-kbp mate pair reads. While the scatterplot illustrates 100,000 randomly selected points in the dataset, the linear regression was calculated on the entire data set revealing a significant correlation with a p-value of <2e-16, an intercept of 0.61, a slope of 8.2, and an R-squared value of 0.77. (B) Overlaying the coverage calculated with the paired end reads (blue) and the mate pair reads (orange) across 20-kbp of the Brugia genome shows how the peaks and troughs in both are nearly identical.

Experimental verification of nuwt copy number using qPCR
Coverage from the paired end data set was deep enough to estimate nuwt copy number. Considering that average coverage in B. malayi was 131× (Table 2), there appear to be multiple copies of specific nuwts in each haploid genome based on coverage (Table 4, Additional file 1). Duplication may suggest that these nuwts are playing a fundamental role in the biology of B. malayi, provided that they are functional in B. malayi. Copy number variation is linked to phenotypic diversity and evolutionary adaptation [20]. To this end, copy number variation for 10 nuwts was verified using qPCR (Table 4). For most of the nuwts, the copy number estimate based on read coverage was very close to the qPCR estimate. When the average coverage was compared to qPCR results, a positive correlation was observed ( Figure 5, p-value = 0.060) providing a second independent validation of the coverage estimates. The most notable exception was Wbm0241, for which only one copy was seen by qPCR but four copies were estimated by coverage on the genome sequence. One explanation may be that multiple fragmented copies add up to four copies, but only one copy is spanned by the qPCR primers used.

Size and potential role of transfers
Examining the paired end data set in more detail showed that BWA-detected nuwts in B. malayi originated from 116 wBm genes (Additional file 1). Coverage for 21 genes was high across their entire length (Additional file 2), but the remaining genes were only partially covered, suggesting that only part of the respective wBm gene was detected as a nuwt. Frequently, only 100-200 bp of a gene was observed as a nuwt ( Figure 6A). Furthermore, a pattern is seen where only a minor fraction of a gene is present (e.g. 20% of the gene) or the entire gene is present ( Figure 6B). This suggests that some transfers include full-length genes. However, a substantial number of gene fragments are transferred or many of the full-length genes transferred have decayed yielding gene fragments.
No significant difference (Pearson's Chi-squared test, Yate's continuity correction, p > 0.53) was observed between the frequency of COG functional categories in all nuwts or full-length nuwts when compared to those for wBm. However, genes not classified in any COG category were significantly under-represented in nuwts (26/ 104) relative to the Wolbachia wBm genome (297/805) (Pearson's Chi-squared test, Yate's continuity correction, p = 0.02).

Nuwt location in the B. malayi genome
To examine the insertion sites of nuwts in the nematode genome, we identified read pairs spanning the junction between the nuwt and the portion of the B. malayi genome of nematode ancestry. Such pairs representing junctions have one read mapped to the wBm genome and one to the Brugia genome. As expected, significantly more junctions were present in intergenic regions and significantly fewer junctions in coding sequences and introns/UTRs (Table 5; Pearson's Chi-squared test, Yate's continuity correction, p < 0.01), although junctions near genes could be identified.
The number of junctions near the ends of contigs differed significantly from a uniform distribution (Table 5; Pearson's Chi-squared test, Yate's continuity correction, p < 0.01). During assembly of the reference genome, sequence reads were removed that had >98% identity to  wBm >90% of the length of the read. Since the sequences examined here were mapped with BWA, they lack polymorphisms and they likely represent the sequences removed with such criteria as was used for removing wBm sequences prior to assembly. Given an overabundance of nuwts near contigs ends, it is likely that some of the gaps in the Brugia genome resulted from removal of sequences with near identity to wBm, and that these gaps may be filled with nuwt sequence.

Fixed polymorphisms
BWA alignment of HiSeq reads against the wBm genome showed that most of the 125 nuwts contained at least one single nucleotide polymorphism (SNP). A subset of these SNPs had only one polymorphism and as such, are likely fixed. This suggests that nuwts have been accumulating over recent time in B. malayi.

BLASTN analysis
We also searched for nuwts of different evolutionary ages using BLASTN [21]. BLASTN was selected because it is still relatively fast, but it detects matches with higher levels of polymorphisms when compared to BWA. This makes it more sensitive and suitable for detecting older nuwts. However, BLASTN does not perform well on short reads and therefore only the 99-bp paired end reads were analyzed in this manner. BLASTN was used to detect hits with >80% similarity. While a lower stringency is possible in a BLASTN analysis, we have found that lowering the stringency in this case yields matches to genes annotated as arising from the mitochondria, or numts. This is a peculiar result given the ancient ancestry of mitochondria and is under further, separate investigation. Regardless, this observation also precluded the use of a TBLASTXbased analysis.
With a coverage cut-off of 16×, 115.4-kbp (or 10.6%) of the wBm genome is identified using BLASTN, including fragments of 227 wBm genes (Additional file 3). Thirty-two of these genes had their entire length covered by nuwts, making them excellent candidates for downstream functional analyses (Additional file 2, Figure 6C, D). Like with the BWA-based analysis, no COG category was found significantly different in genes found in nuwts, compared to the complete set of wBm genes (Pearson's Chi-squared test, Yate's continuity correction, p > 0.29).
The BLASTN analysis greatly increased the fraction of the genome of wBm implicated in LGT events. More specifically, when BWA-detected and BLASTN-detected LGTs are compared to each other, there is an additional 65.9-kbp of the wBm genome present as nuwts. Nuwts in this additional genome portion included new fragments of 162 genes. This demonstrates that nuwts have a continuous distribution of nucleotide divergence suggesting that the transfer of Wolbachia DNA to the genome has occurred over a long time span. It is possible that transfers have occurred since the origin of the symbiosis.

Transcriptional activity of nuwts
Transcription of nuwts containing full-length wBm genes was examined using publicly available RNA-Seq data [22]. The B. malayi transcriptome was sequenced in 13 samples corresponding to 7 different life stages [22]. When sequences from each of these samples were aligned against the wBm genome it was found that up to 0.23% of the reads were mapped. This finding was unexpected since poly-A selection took place before sequencing [22], and that step should exclude bacterial transcripts. Such RNA-Seq reads may arise from nuwt transcription. Supporting this hypothesis, a comparison of the mean coverage of the nuwt regions to that of the non-nuwt regions showed a small but highly significant difference between them (Wilcoxon rank sum test with continuity correction, p < 2.2×10 -16 for all 13 samples). The ratio of the average coverage for nuwts compared to non-nuwt regions of the wBm genome varied among the thirteen samples from 1.8 to 7.0. This statistically significant difference means that transcripts with similarity to (See figure on previous page.) Figure 3 Coverage across the genome. The mean coverage along the wBm genome (Panels A and B) or a similarly sized Brugia contig (Panels C and D) are plotted using the 300 bp paired end reads (Panels A and C) and the 3 kbp mate pair reads (Panels B and D). The mean coverage was calculated using 10-bp windows and is presented on a log scale. The dashed red line indicates the cut-off used for detecting nuwts with BWA, which was 16× for the paired end reads and 3× for the mate pair reads. Wolbachia wBm regions having a mean coverage above this cutoff are identified as nuwts located in the genome of B. malayi. Wolbachia are more likely to arise from nuwts than from the bacterial genome.
To further validate whether transcripts are arising from the nuwts, the RNA-Seq data were examined for nuwt-specific polymorphisms (Additional file 4). The 21 BWA-detected, full-length nuwts were particularly interesting since they were more likely to have retained some function. Fourteen of them had strong evidence for transcription in at least one life stage, when examining nuwt-specific SNPs (Additional file 5). More specifically, for some Wolbachia wBm genes all reads found contained only the nuwt-specific SNP, which means that transcription comes only from the nuwt copy of that gene (Additional file 4).
Subsequently, transcription levels were calculated as RPKM values [23] for each of these 21 genes. Stagespecific expression is seen with Wbm0693 transcribed in L3 larvae and Wbm0081 and Wbm0783 transcribed in eggs/embryos ( Figure 7A). All three genes are annotated as hypothetical proteins.
The nuwt originating from the Wbm0693 gene was further examined using quantitative, reverse transcription PCR (qRT-PCR) on RNA from microfilaria, L3, L4, adult males, and adult females. The qRT-PCR product for Wbm0693 was 16-64× more abundant across all five stages than two hypothetical proteins (Wbm0149 and Wbm0783) that are present as nuwts but only expected to be transcribed by the bacteria based on SNPs in the transcriptomics sequence. Wbm0693 is 1-32× less abundant than groEL, which was not identified as a nuwt, but is an abundant transcript in most intracellular bacteria and was amongst the most abundant Wolbachia proteins identified in a proteomic analysis of B. malayi [24]. Wbm0693 is 2-16× less abundant than the average transcript level for 4 constitutively expressed genes ( Figure 7B, C), but is of similar abundance to two of these constitutively expressed genes of nematode ancestry (Bm1_03910 and Bm1_03960).
The analysis of transcription is complicated since the qRT-PCR product could originate from RNA from either the bacteria or the nuwt. Not only do Wolbachia numbers change throughout the nematode life cycle, but transcripts from both origins will have differential transcription through the different life stages. Therefore, the Wbm0693 amplicons were cloned and sequenced, and quantification of the nuwt-specific SNPs was used to identify the relative contributions of the nuwt and bacterial transcripts. While the transcript abundance is lowest in the L3 as measured by the ΔCt, 100% of the amplicons arise from the nuwt ( Figure 7C). In contrast, transcription is high in microfilaria, but most of the amplicons arise from the bacteria ( Figure 7C). Surprisingly, the transcription was high in the L4, males, and females in the qRT-PCR and was predominated by the nuwt-specific alleles. This is contrary to the transcriptomics data, which had higher transcription in the L3s ( Figure 7A). This could reflect biological or technical differences in the RNA obtained for the RNAseq and the qRT-PCR experiments.
The region of the genome that includes Wbm0693 was properly assembled in the original genome sequence, enabling further examination of the transcriptional profile in this region using the RNA-Seq data. The region between Bm1_46245 (hypothetical protein) and Bm1_46250 (apacd-prov protein) contains two adjacent nuwts that arise from different portions of the Wolbachia genome. While the flanking genes of nematode ancestry (Bm1_46245 and Bm1_46250) have clear transcriptional profiles indicating the intron/exon boundaries and stage-specific transcription ( Figure 7D, E), the nuwt containing Wbm0033 (Figure 7E, pink) is transcriptionally silent. Wbm0033 is a small hypothetical protein with homology to DnaJ heat shock proteins. The other nuwt ( Figure 7E, lavender) is the one transcribed in the L3 transcriptomic experiment and contains Wbm0693 and Wbm9002, which are predicted to encode a hypothetical protein and the 5S rRNA, respectively. Different regions of this latter nuwt show different transcriptional profiles.  Counts also include reads that may not be located in a nuwt themselves as long as their mate is located in one.
On the right side of this nuwt is a region encoding the bacterial 5S rRNA, and it is detected in several stages. Since rRNA is quite abundant, this level could reflect endosymbiont rRNA that co-purified with the polyadenylated RNA. The nuwt 5S rRNA has a 14-bp insertion relative to the bacterial-encoded 5S rRNA. This insertion prevents mapping of sequence reads. In all but the L3, transcription levels drop at this 14-bp insertion, supporting that these reads arise from the bacteria-encoded 5S rRNA in all stages except L3. However, the reads from L3 contain this 14-bp sequence, supporting that the transcription in the L3 is from the nuwt.
On the left side of this nuwt is a region encoding Wbm0693. The 5′-portion of Wbm0693 is transcribed in numerous stages, but the 3′-portion is transcribed only in L3. The transcription in L3 is evident across the entire nuwt and into the adjacent gene, Bm1_46250. Since the directionality of the transcripts was not assessed in the RNA-Seq experiments, it is not possible to determine if this transcription results from a promoter activating   transcription of the nuwt or if there is alternative splicing of Bm1_46250 that leads to transcription of this region. The latter would result in anti-sense transcription of Wbm0693 and a chimeric mRNA. The former could result in an mRNA that codes for Wbm0693 or alternatively could result in transcriptional interference [25,26]. The resulting protein would be full length but would have a 7-aa insertion.

Discussion
Lateral gene transfer in eukaryotes is a rare phenomenon, likely because the eukaryotic germline is segregated from the other tissues. This makes the numerous interdomain LGT events found between Wolbachia and its eukaryotic hosts intriguing [4,[7][8][9][10][11][12][13][14][15]]. An advantage Wolbachia has in donating DNA is that it is found in the reproductive tissues and embryos of its hosts. This means that it is ideally positioned for creating heritable LGT in its eukaryotic hosts. The sizes of known nuwts range from a few hundred bp to the entire Wolbachia genome [4,[7][8][9][10][11][12][13][14][15]. In this study, we undertook deep sequencing of B. malayi nematode worms and compiled a more complete list of nuwts in B. malayi. Such detailed cataloguing of the B. malayi nuwts enabled the study of their potential functionality as well as their frequency.
No particular COG class could be found that was overrepresented in the nuwts. However, genes without a function were under-represented. This former result may suggest that there is no preference for the genes that get transferred and that the entire Wolbachia chromosome is  LGT in Wolbachia-nematode systems is RNA-mediated. Previously, proteomics studies have established that ≥99% of the genes with a function are expressed in the closely related bacteria, Ehrlichia chaffeensis and Anaplasma phagocytophilum, while only~80% of hypothetical proteins are expressed [27]. If the same is true in Wolbachia, this bias in genes with and without a function may reflect that LGT occurs through transcripts, and is RNA-mediated, possibly through retrotransposition. This is also consistent with the size of the transfers observed that are similar to the size and composition of bacterial transcripts from operons. This is in contrast to Wolbachia-insect LGT, where large chromosomal fragments are frequently found that must be DNA-mediated. Recently, evidence has been presented to demonstrate LGT from bacteria to the human somatic genome, possibly through an RNA-intermediate [28]. This observation in humans correlates well with what is known about the recognition of RNA molecules by the human immune system [28]. If such preference for an RNA-intermediate in nematodes and a DNA-intermediate in insects exists, it would be interesting if it relates to fundamental differences in the nematode and insect immune systems.

Potential functionality of nuwts
If nuwts are simply decaying after their integration into the eukaryotic genome, then they will not be functionally    Figure 7 Transcriptional activity of nuwts. (A) Transcription levels of full-length protein-coding nuwts are illustrated with a heatmap of RPKMs on a blue-white-red gradient using publicly available data [22]. Blue represents no expression and red represents the most expression with six stages examined: EE, eggs and embryos; IM, immature microfilariae; MM, mature microfilariae; L3, 3rd stage larvae; L4, 4th stage larvae; AM, adult males; AF, adult females. Wbm0693 (hypothetical protein) is highly transcribed in L3 stage infective larvae, and Wbm0081 and Wbm0783 are highly transcribed in eggs/embryos. (B) Four genes were shown to be constitutively expressed and were subsequently used in experiments to validate the results with Wbm0693: green, Bm1_03960, ribosomal protein L16; purple, Bm1_03910, 40S ribosomal protein S27; blue, actin; red, tpH-1; black, average. (C) Relative abundance of the nuwt originating from Wbm0693 is illustrated relative to the average of the four control genes from panel B (black line). The amplified region of the nuwt contains two SNPs that differentiate the nuwt copy of the gene from the Wolbachia copy allowing the percentage of the clones containing high-quality, nuwt-specific SNPs (blue bars) and the percentage of positions containing high-quality, nuwt-specific SNPs (red bars) to be calculated. The vast majority of the transcripts in the L3, L4, male, and female stages originate from the nuwt copy. (D) The paired end reads generated in this study (gray) and the publicly available RNASeq data (EE, green; IM, light blue, MM, red; L3, magenta; L4, orange; AM, purple; AF, pink) were mapped to the region in the B. malayi genome containing Wbm0693. The region amplified in the qRT-PCR experiments presented in Panels B and C is also illustrated. (E) Genomic features are illustrated for the same region as in Panel D. significant. We established transcription for several of the nuwts examined, however transcription does not necessarily imply function [5] and it appears that low-level transcription is common among nuwts [4,[11][12][13]. Using publicly available RNA-Seq data [22], it was found that at least three of the full-length nuwts are transcribed in a life stage-specific manner and at levels that could be biologically meaningful. Life stage-specific transcription, as opposed to constitutive transcription, can be an additional indicator of potential functionality [5].
Analyses like gene silencing are needed to conclusively establish if the nuwts are functional. There are several examples of functional nuwts. In the first case, genes of ancestry that may include Wolbachia are found in the genome of the pea aphid Acyrthosiphon pisum, which is a Wolbachia-free insect [15]. Some of these genes are related to murein metabolism, have acquired spliceosomal introns, and have tissue-specific transcription.
The second case of a functional putative nuwt is that of salivary gland specific (SGS) genes of the mosquitoes Aedes aegypti and Anopheles gambiae, which are associated with Plasmodium invasion of the salivary glands of female mosquitoes [8,10,29]. SGS genes do not have similarity with any other eukaryotic genes in the database, and the only related database sequences with homology are from Wolbachia endosymbionts [8]. Nuwts in these two systems feature traits that are characteristic of functional nuwts [5,30]: (a) longevity after the LGT event, (b) integration into host genome (for A. pisum nuwts) and (c) an associated phenotype (for Ae. aegypti nuwts).

Multiple-copy nuwts
Copy number variation has been suggested to be of great evolutionary importance. More specifically, gene copy number facilitates evolution of new variants of a gene and can also affect transcription levels [31]. In this respect, it is interesting that a considerable number of B. malayi nuwts appear to have multiple copies. These copies could result from: (a) repeated transfers of the same genome fragment, (b) duplication of nuwts following the initial LGT event, or (c) some combination of the two. Unfortunately, we were not able to reliably deduce the sequence of each copy, which would provide better insight on the underlying mechanism of this copy number variation. It is worth mentioning, however, that in another case of LGT, an increase in copy number of the transferred genes was detected [32]. These extra copies were interpreted as being part of the adaptation process of the host organism to the newly acquired genes. Hence, studying the mechanism by which the multiple-copy B. malayi nuwts arose would further elucidate their evolutionary significance and may become possible when new sequencing technologies become available.

Potential drug targets
Treatment of lymphatic filariasis has recently included drugs targeting Wolbachia rather than the nematode itself [3]. However, there is still the need to develop antifilarial drugs that will offer alternative treatment routes. Certain nuwts found in the framework of this study contained fulllength wBm genes and, thus, could represent potentially functional transfers. More specifically, seven of the genes are interesting because of their putative functions. These genes include Wbm0078 (phosphopantetheinyl transferase), Wbm0079 (prolipoprotein diacylglyceryl transferase), Wbm 0080 (SsrA-binding protein), Wbm0147 (thiol-disulfide isomerase), Wbm0148 (thymidilate synthase), Wbm0240 (HIT family hydrolase), and Wbm0275 (glutamine synthetase). Intriguingly, the lipoprotein biosynthesis pathway, in which Wbm0079 is involved, has been previously shown to be a valid drug target [33]. In addition, genes Wbm0081, Wbm0693 and Wbm0783 are of special interest, because transcripts for all three have been detected with differential expression in eggs and larvae (Figure 7). Further functional studies using gene silencing are underway to determine if nuwts can be validated as potential drug targets and to further unravel the complexity of Wolbachia-filarial symbiosis.

Conclusions
Our results suggest that >4.5% of the Wolbachia wBm genome has been transferred to the genome of its nematode host, B. malayi. A considerable number of Wolbachia genome fragments are present in multiple copies in B. malayi. At least 21 full-length genes have been laterally transferred. Analysis of existing transcriptomics data suggests that three of the nuwts are highly transcribed in specific life stages. Taken together, these data suggest that some of the nuwts identified could be functional and may be exploited as potential targets for drug discovery.

Methods
Generation of Wolbachia-depleted Brugia malayi B. malayi worms were obtained as described previously [34]. Briefly, adult B. malayi were maintained in the peritoneal cavities of jirds (Meriones unguiculatus). Worms were depleted of Wolbachia in vivo by treating infected jirds with 2.5 mg/mL tetracycline hydrochloride (Sigma) in drinking water for a period of six weeks. Adult B. malayi were recovered by dissection two weeks following the end of treatment (eight weeks post-treatment) and maintained until processing in RPMI-1640 supplemented with 2 mM L-glutamine, 25 mM HEPES, 100 U/mL penicillin, 100 μg/mL streptomycin and 2.5 μg/mL amphotericin B. Worms were then separated by sex, rinsed in PBS, and added individually to RNAlater solution (Ambion, Austin, TX, USA) for storage at 4°C prior to DNA preparation.

Preparation of DNA and assessment of Wolbachia-depletion
Genomic DNA was isolated from individual tetracyclinetreated B. malayi adult worms using the QIAamp DNA Microkit (Qiagen, Valencia, CA, USA) with overnight lysis and elution in 50 μL of buffer AE. DNA quality was assessed by agarose gel electrophoresis, and quantification was conducted using the Quant-iT PicoGreen dsDNA kit (Invitrogen, Grand Island, NY, USA). Although the tetracycline treatment regimen can yield a 99% reduction in Wolbachia over the population [34], the degree of reduction varies between individual worms. Therefore, quantitative PCR targeting the single-copy genes wsp of Wolbachia and gst of B. malayi [35] was conducted to determine those individual worms with the lowest wsp:gst ratios. DNA from individuals with wsp:gst ratios less than 1:10 were pooled according to sex and used for sequencing.
Sequencing of Wolbachia-depleted genomic DNA from B. malayi Both a 300-bp paired-end and an~3-kbp mate-pair library were constructed for sequencing on the Illumina platform. The 300-bp paired-end library was constructed using the NEBNext® DNA Sample Prep Master Mix Set 1 (New England Biolabs, Ipswich, MA), while the matepair library followed the Illumina Mate Pair Library v2 Sample Preparation Guide protocol. In both cases, DNA was fragmented with the Covaris E210 and libraries were prepared using a modified version of the manufacturer's protocol. The DNA was purified between enzymatic reactions and the size selection of the library was performed with AMPure XT beads (Beckman Coulter Genomics, Danvers, MA). The PCR amplification step was performed with primers containing 6-bp index sequences. Since short reads are required for mate pair libraries, the mate pair library was sequenced on an Illumina Genome Analyzer IIx while the paired end library was sequenced on an Illumina HiSeq2000. Base calling and quality scoring was performed using Illumina software followed by in-house quality assessment and control pipelines to truncate and eliminate poor-quality reads. All of the sequencing data is available in SRA051817.

BWA analysis
Both the mate pair (24,864,076 × 2) and paired end (69,289,764 × 2) reads were used for finding evolutionarily recent LGTs from Wolbachia wBm to B. malayi. Mate pair reads were first reverse-complemented using the FASTXtoolkit [36]. Then both mate pair and paired end datasets were mapped on Wolbachia wBm [GenBank:NC_006833] as well as on Brugia malayi [GenBank:AAQA00000000] genomes using BWA 0.5.9-r16 [19] with the default parameters. The resulting SAM files were then processed using SAMtools [37] for discarding non-mapping reads and sorting. MarkDuplicates from Picard-tools [38] was used for removing read duplicates. The output BAM files were used to generate a file of coverage per base in VCF format [37] with Wolbachia wBm or B. malayi as the reference genome. Subsequently, we extracted read coverage at each genomic position and calculated, using R [39], the mean coverage of our reads on the genomes of wBm or B. malayi (Table 1, Table 2).
To select an appropriate coverage cutoff for distinguishing between nuwts and Wolbachia sequences, coverage at each wBm position was extracted using the sorted BAM file as input to BEDtools [40]. Subsequent examination of the coverage distribution showed that there were a large number of positions with coverage below 3× for mate pair reads ( Figure 2A) and below 16× for paired end reads ( Figure 2B). Such low coverage positions originated from the Wolbachia endosymbiont. In contrast, positions that were covered highly did so because they were actually located in the chromosomes of B. malayi where coverage is high. Hence, 3× and 16× were selected as coverage cut-offs for detecting nuwts, for mate pair and paired end reads, respectively.
In order to determine the wBm regions that were transferred to B. malayi, a sliding window approach was followed, using 10-bp windows with a 5-bp step. Only high coverage windows were extracted using the 3× and 16× cutoffs (as explained above) as a measure of nuwts. Based on paired end data, some nuwts were found to be present in multiple copies in B. malayi. The copy number, per haploid B. malayi genome, was estimated by dividing the mean coverage of the respective nuwt by the mean coverage of B. malayi (i.e. 131 for paired end reads). Lastly, recent nuwts were plotted on the circular chromosome of wBm using Circos [41].

Establishing criteria for defining nuwts
The statistical tests that can be used to distinguish nuwts from the bacterial genome are limited because the data is neither unimodal nor normal, as assessed by the differences between the mean, mode, and median for the data (Tables 1  and 2). Theoretically, statistics used to predict the modes (peaks) and antimodes (troughs) in a histogram of coverage ( Figure 2) may prove suitable, if all nuwts were single copy. However, the coverage measurements suggest many multicopy nuwts and as such a multi-modal method is required.
Mode-based analyses quickly become cumbersome and difficult to derive for multi-modal distributions. A multimodal distribution would be expected if numerous nuwts have different copy numbers. The major mode will reflect the coverage across the bacterial genome while there will be numerous minor modes. Theoretically, one of these minor modes will reflect nuwts at a single copy in the B. malayi genome and should be similar to the distribution of coverage found for single copy B. malayi genes. In addition, other minor modes occur that represent the various copy numbers represented by the nuwts.
Assuming one could derive the statistics for the number of modes observed, the data has several other problems as it relates to an analysis of modes. The major mode is of a significantly higher magnitude than the minor modes while the minor modes have a higher standard deviation, assuming they have the same standard deviation as the mappings to the Brugia genome. This results in a histogram where the minor modes appear as a shoulder to the major mode and to one another.
Given these observations about the data, an appropriate statistical test to dissect the two coverage distributions could not be identified. Therefore, the antimode was established empirically by visually inspecting the coverage and the histogram of the coverage across both genomes (Figures 2 and 3). With this examination one can arrive at a suitable value for the antimode between the major mode in the coverage distribution using the wBm genome (Figure 2, blue) and an estimate of the nearest minor mode using the position and breadth of the coverage distribution across the B. malayi genome (Figure 2, pink). The ideal position would maximize the number of observations made while minimizing the number of false observations. Since the coverage on the wBm genome drops precipitously at the same point that the coverage on the Brugia genome is slowly increasing, the ideal cut-off was determined to be immediately to the right of the precipitous decline. In addition, this cutoff is approximately one standard deviation below the mean for the Brugia coverage. Of note, the use of standard deviation is not ideal since the Brugia coverage is also not normal or unimodal ( Figure 2) likely owing to the fact that the reference genome is not complete. For instance, regions of no coverage are observed in the histogram that likely reflects gaps in the contigs.

Over-or under-representation of COG classes in nuwts
To examine if certain functions were preferentially involved in LGT to the host, all wBm genes were assigned COG (Cluster of Orthologous Group) classes [42]. The frequency of each COG class was counted, as well as the frequency of genes having no COG classes. The same frequencies were counted only for the subset of genes appearing either in BWA-or BLASTN-detected fulllength nuwts. Significant differences in COG frequencies were identified using Pearson's Chi-squared test with Yate's continuity correction and a p-value threshold of p < 0.01.

Nuwt location in the B. malayi genome
Detection of the nuwt insertion sites was based on BWA mappings because BWA keeps track of mate pairs. Only paired end reads were used because they were derived from a 300-bp library and, hence, defined more precisely the junction boundaries.
As a first step, reads not mapped in the Wolbachia wBm genome were extracted if their mates were mapped. Next, these reads were mapped against the B. malayi scaffolds [GenBank: DS236884-DS264093] using BWA with the default parameters. Finally, using the GFF file for B. malayi scaffolds, we categorized mapped reads based on their location in the genome. More specifically, reads were placed in one of the following categories; (a) coding sequences, (b) introns/UTRs, (c) intergenic space, and (d) contig ends. Only one category was assigned to each read.
A statistical analysis was undertaken to compare the distribution in each category to an even distribution across the genome. We counted the number of base pairs found in each of the four categories for the entire genome. We subsequently dealt with each category separately, comparing the actual number of junctions found in that category to the expected number of junctions if it were a purely random process, using Pearson's Chi-squared test with Yate's continuity correction with a p-value threshold of p < 0.01.

Experimental verification of nuwt copy number
A nuwt copy number estimate was determined by dividing coverage of each nuwt by 131, or the single copy coverage of the haploid B. malayi genome. To check the accuracy of the coverage-based estimates, 10 nuwts were selected and their copy numbers were experimentally determined using qPCR. An additional 12 genes including 6 for wBm and another 6 genes for B. malayi were selected as single-copy control genes (Additional file 6). Primers for amplifying a 100-150 bp product were designed using Primer3 [43] and synthesized by Sigma-Aldrich (St. Louis, MO, USA).
B. malayi genomic DNA (70 ng) was used as template in a qPCR reaction containing 2X QuantiTect SYBR Green (Qiagen), RNase-free water, and coverage primers using the standard Qiagen SYBR Green PCR protocol (Qiagen, Germantown, MD, USA). The assays were conducted using an ABI 7900 instrument (Applied Biosystems, Foster City, CA, USA). The reactions were denatured at 95°C for 15 min followed by amplification with 45 cycles of 94°C for 15 s, 55°C for 30 s and 72°C for 30 s. Reactions were followed by a melt curve analysis that starts at 55°C, with a dissociation step at 95°C for 1 min plus 0.5°C/cycle for 80 cycles. Copy number was determined by relating the average abundance of the 12 single-copy amplicons to the average abundance of the 10 nuwts as 2^( ΔCt (single copy -nuwt)). Three technical replicates were performed.