Generation of Wolbachia-depleted Brugia malayi
B. malayi worms were obtained as described previously
. Briefly, adult B. malayi were maintained in the peritoneal cavities of jirds (Meriones unguiculatus). Worms were depleted of Wolbachia in vivo by treating infected jirds with 2.5 mg/mL tetracycline hydrochloride (Sigma) in drinking water for a period of six weeks. Adult B. malayi were recovered by dissection two weeks following the end of treatment (eight weeks post-treatment) and maintained until processing in RPMI-1640 supplemented with 2 mM L-glutamine, 25 mM HEPES, 100 U/mL penicillin, 100 μg/mL streptomycin and 2.5 μg/mL amphotericin B. Worms were then separated by sex, rinsed in PBS, and added individually to RNAlater solution (Ambion, Austin, TX, USA) for storage at 4 °C prior to DNA preparation.
Preparation of DNA and assessment of Wolbachia-depletion
Genomic DNA was isolated from individual tetracycline-treated B. malayi adult worms using the QIAamp DNA Microkit (Qiagen, Valencia, CA, USA) with overnight lysis and elution in 50 μL of buffer AE. DNA quality was assessed by agarose gel electrophoresis, and quantification was conducted using the Quant-iT PicoGreen dsDNA kit (Invitrogen, Grand Island, NY, USA). Although the tetracycline treatment regimen can yield a 99% reduction in Wolbachia over the population
, the degree of reduction varies between individual worms. Therefore, quantitative PCR targeting the single-copy genes wsp of Wolbachia and gst of B. malayi was conducted to determine those individual worms with the lowest wsp:gst ratios. DNA from individuals with wsp:gst ratios less than 1:10 were pooled according to sex and used for sequencing.
Sequencing of Wolbachia-depleted genomic DNA from B. malayi
Both a 300-bp paired-end and an ~3-kbp mate-pair library were constructed for sequencing on the Illumina platform. The 300-bp paired-end library was constructed using the NEBNext® DNA Sample Prep Master Mix Set 1 (New England Biolabs, Ipswich, MA), while the mate-pair library followed the Illumina Mate Pair Library v2 Sample Preparation Guide protocol. In both cases, DNA was fragmented with the Covaris E210 and libraries were prepared using a modified version of the manufacturer’s protocol. The DNA was purified between enzymatic reactions and the size selection of the library was performed with AMPure XT beads (Beckman Coulter Genomics, Danvers, MA). The PCR amplification step was performed with primers containing 6-bp index sequences. Since short reads are required for mate pair libraries, the mate pair library was sequenced on an Illumina Genome Analyzer IIx while the paired end library was sequenced on an Illumina HiSeq2000. Base calling and quality scoring was performed using Illumina software followed by in-house quality assessment and control pipelines to truncate and eliminate poor-quality reads. All of the sequencing data is available in SRA051817.
Both the mate pair (24,864,076 × 2) and paired end (69,289,764 × 2) reads were used for finding evolutionarily recent LGTs from Wolbachia wBm to B. malayi. Mate pair reads were first reverse-complemented using the FASTX-toolkit
. Then both mate pair and paired end datasets were mapped on Wolbachia wBm [GenBank:NC_006833] as well as on Brugia malayi [GenBank:AAQA00000000] genomes using BWA 0.5.9-r16
 with the default parameters. The resulting SAM files were then processed using SAMtools
 for discarding non-mapping reads and sorting. MarkDuplicates from Picard-tools
 was used for removing read duplicates. The output BAM files were used to generate a file of coverage per base in VCF format
 with Wolbachia wBm or B. malayi as the reference genome. Subsequently, we extracted read coverage at each genomic position and calculated, using R
, the mean coverage of our reads on the genomes of wBm or B. malayi (Table
To select an appropriate coverage cutoff for distinguishing between nuwts and Wolbachia sequences, coverage at each wBm position was extracted using the sorted BAM file as input to BEDtools
. Subsequent examination of the coverage distribution showed that there were a large number of positions with coverage below 3× for mate pair reads (Figure
2A) and below 16× for paired end reads (Figure
2B). Such low coverage positions originated from the Wolbachia endosymbiont. In contrast, positions that were covered highly did so because they were actually located in the chromosomes of B. malayi where coverage is high. Hence, 3× and 16× were selected as coverage cut-offs for detecting nuwts, for mate pair and paired end reads, respectively.
In order to determine the wBm regions that were transferred to B. malayi, a sliding window approach was followed, using 10-bp windows with a 5-bp step. Only high coverage windows were extracted using the 3× and 16× cutoffs (as explained above) as a measure of nuwts. Based on paired end data, some nuwts were found to be present in multiple copies in B. malayi. The copy number, per haploid B. malayi genome, was estimated by dividing the mean coverage of the respective nuwt by the mean coverage of B. malayi (i.e. 131 for paired end reads). Lastly, recent nuwts were plotted on the circular chromosome of wBm using Circos
Establishing criteria for defining nuwts
The statistical tests that can be used to distinguish nuwts from the bacterial genome are limited because the data is neither unimodal nor normal, as assessed by the differences between the mean, mode, and median for the data (Tables
2). Theoretically, statistics used to predict the modes (peaks) and antimodes (troughs) in a histogram of coverage (Figure
2) may prove suitable, if all nuwts were single copy. However, the coverage measurements suggest many multi-copy nuwts and as such a multi-modal method is required.
Mode-based analyses quickly become cumbersome and difficult to derive for multi-modal distributions. A multi-modal distribution would be expected if numerous nuwts have different copy numbers. The major mode will reflect the coverage across the bacterial genome while there will be numerous minor modes. Theoretically, one of these minor modes will reflect nuwts at a single copy in the B. malayi genome and should be similar to the distribution of coverage found for single copy B. malayi genes. In addition, other minor modes occur that represent the various copy numbers represented by the nuwts.
Assuming one could derive the statistics for the number of modes observed, the data has several other problems as it relates to an analysis of modes. The major mode is of a significantly higher magnitude than the minor modes while the minor modes have a higher standard deviation, assuming they have the same standard deviation as the mappings to the Brugia genome. This results in a histogram where the minor modes appear as a shoulder to the major mode and to one another.
Given these observations about the data, an appropriate statistical test to dissect the two coverage distributions could not be identified. Therefore, the antimode was established empirically by visually inspecting the coverage and the histogram of the coverage across both genomes (Figures
3). With this examination one can arrive at a suitable value for the antimode between the major mode in the coverage distribution using the wBm genome (Figure
2, blue) and an estimate of the nearest minor mode using the position and breadth of the coverage distribution across the B. malayi genome (Figure
2, pink). The ideal position would maximize the number of observations made while minimizing the number of false observations. Since the coverage on the wBm genome drops precipitously at the same point that the coverage on the Brugia genome is slowly increasing, the ideal cut-off was determined to be immediately to the right of the precipitous decline. In addition, this cutoff is approximately one standard deviation below the mean for the Brugia coverage. Of note, the use of standard deviation is not ideal since the Brugia coverage is also not normal or unimodal (Figure
2) likely owing to the fact that the reference genome is not complete. For instance, regions of no coverage are observed in the histogram that likely reflects gaps in the contigs.
Over- or under-representation of COG classes in nuwts
To examine if certain functions were preferentially involved in LGT to the host, all wBm genes were assigned COG (Cluster of Orthologous Group) classes
. The frequency of each COG class was counted, as well as the frequency of genes having no COG classes. The same frequencies were counted only for the subset of genes appearing either in BWA- or BLASTN-detected full-length nuwts. Significant differences in COG frequencies were identified using Pearson's Chi-squared test with Yate's continuity correction and a p-value threshold of p < 0.01.
Nuwt location in the B. malayi genome
Detection of the nuwt insertion sites was based on BWA mappings because BWA keeps track of mate pairs. Only paired end reads were used because they were derived from a 300-bp library and, hence, defined more precisely the junction boundaries.
As a first step, reads not mapped in the Wolbachia wBm genome were extracted if their mates were mapped. Next, these reads were mapped against the B. malayi scaffolds [GenBank: DS236884-DS264093] using BWA with the default parameters. Finally, using the GFF file for B. malayi scaffolds, we categorized mapped reads based on their location in the genome. More specifically, reads were placed in one of the following categories; (a) coding sequences, (b) introns/UTRs, (c) intergenic space, and (d) contig ends. Only one category was assigned to each read.
A statistical analysis was undertaken to compare the distribution in each category to an even distribution across the genome. We counted the number of base pairs found in each of the four categories for the entire genome. We subsequently dealt with each category separately, comparing the actual number of junctions found in that category to the expected number of junctions if it were a purely random process, using Pearson's Chi-squared test with Yate's continuity correction with a p-value threshold of p < 0.01.
Experimental verification of nuwt copy number
A nuwt copy number estimate was determined by dividing coverage of each nuwt by 131, or the single copy coverage of the haploid B. malayi genome. To check the accuracy of the coverage-based estimates, 10 nuwts were selected and their copy numbers were experimentally determined using qPCR. An additional 12 genes including 6 for wBm and another 6 genes for B. malayi were selected as single-copy control genes (Additional file
6). Primers for amplifying a 100–150 bp product were designed using Primer3
 and synthesized by Sigma-Aldrich (St. Louis, MO, USA).
B. malayi genomic DNA (70 ng) was used as template in a qPCR reaction containing 2X QuantiTect SYBR Green (Qiagen), RNase-free water, and coverage primers using the standard Qiagen SYBR Green PCR protocol (Qiagen, Germantown, MD, USA). The assays were conducted using an ABI 7900 instrument (Applied Biosystems, Foster City, CA, USA). The reactions were denatured at 95 °C for 15 min followed by amplification with 45 cycles of 94 °C for 15 s, 55 °C for 30 s and 72 °C for 30 s. Reactions were followed by a melt curve analysis that starts at 55 °C, with a dissociation step at 95 °C for 1 min plus 0.5 °C/cycle for 80 cycles. Copy number was determined by relating the average abundance of the 12 single-copy amplicons to the average abundance of the 10 nuwts as 2^( ΔCt (single copy - nuwt)). Three technical replicates were performed.
Detecting fixed polymorphisms
There were 106 of the 125 BWA-detected nuwts that had single nucleotide polymorphisms (SNPs), compared to the reference wBm genome. A subset of these nuwts had fixed SNPs meaning the difference compared to the reference was the same in all reads. Fixed polymorphisms were examined with SAMtools mpileup
 when supported by at least 20 reads. Positions containing gaps were excluded. From the remaining nuwt positions we extracted those in which the number of reads differing from the reference was greater than 80% of the total number of reads. The remaining 20% were assumed to arise from the bacterial sequences.
Only the Illumina paired end read set (138,579,528 99-bp reads in total) was mapped on Wolbachia wBm using BLASTN
 with an e-value cutoff of 10-5. All reads were treated as single end reads. Hits shorter than 45 bp were filtered out and only the best e-value hit of each query was kept. Using an in-house Perl script, the BLAST output was converted to a SAM file that was sorted with SAMtools
. Coverage was then calculated using the VCF output of SAMtools mpileup
A coverage cutoff of 16× for detecting older nuwts was selected using a method identical to the one used in the BWA analysis and plotted on the circular chromosome of wBm using Circos.
Transcription of nuwts
Transcription of nuwts was examined using RNA-Seq data for B. malayi found at Array Express (http://www.ebi.ac.uk/arrayexpress/) under accession number E-MTAB-811
. The 13 different fastq files available correspond to 7 different life cycle stages of the worm. All files were searched against the Wolbachia sp. wBm genome using BWA with the default parameters. The resulting SAM file was processed in the same manner as the BWA analysis of DNA sequences in order to produce a VCF file using SAMtools mpileup
. Using custom Perl scripts, wBm genomic positions were tagged as belonging either to nuwts or non-nuwts and the coverage was also extracted. Significance was assessed using the R statistical package (v. 2.14.1) for transcript coverage of nuwts relative to non-nuwts.
Transcription levels for full-length nuwts were calculated as RPKM values
. Briefly, all reads found inside each such nuwt were counted and subsequently normalized by the gene length and also by the number of reads mapping against the scaffolds of the B. malayi genome. Subsequently, heatmaps were drawn using the R package (v 2.14.1), to illustrate highly transcribed genes.
Quantitative RT-PCR (qRT-PCR) was used to validate the publicly available RNA-Seq data using RNA provided by the NIAID FR3 Resource Center. Reverse transcription was carried out using the QuantiTect Reverse Transcription Kit (Qiagen, Valencia, CA, USA) in accordance with the manufacturer’s instructions. Briefly, ~100 ng of total RNA was incubated at 42 °C for 2 min in genomic DNA wipeout buffer and RNase free water. The cDNA was synthesized from the RNA using Quantiscript reverse transcriptase, Quantiscript RT buffer and a primer mix consisting of long random primers and oligo-dT. The reaction was incubated at 42 °C for 15 min and then at 95 °C for 3 min to inactivate the Quantiscript RT. Real-time detection was carried out on 9.1 ng of the resulting cDNA in a reaction containing QuantiTect SYBR green mix (Qiagen, Valencia, CA, USA), RNase free water, and gene-specific primers (Additional file
7) on a ABI7900HT machine (Applied Biosystems, Beverly, MA, USA). The reactions were denatured at 95 °C for 15 min followed by amplification with 45 cycles of 94 °C for 15 s, 55 °C for 30 s and 72 °C for 30 s. The qRT-PCR data was analyzed using a comparative cycle threshold (∆Ct) method. The Ct value of the Wolbachia and nuwt reactions were compared to that of four constitutively expressed B. malayi controls. To determine the relative contributions of the nuwt RNA and the bacterial RNA, reactions without SYBRGreen were set-up in parallel that were cloned using the TOPO-TA cloning kit for sequencing (Invitrogen, Carlsbad, CA, USA) and transformed into TOP10 cells (Invitrogen, USA). Colonies were picked, grown in LB with kanamycin, and plasmids sequenced at the University of Maryland School of Medicine Genome Resource Center (Baltimore, MD, USA) with the M13F and M13R primers.