Samples
Clinical samples were obtained from vesicular fluid swabs and conserved in UTM viral transport medium (Copan, CA, USA). Ten samples that tested positive by qRT-PCR at the Microbiology Service of the A Coruña University Hospital (HUAC) were selected for this study. The remaining fractions of samples were stored at -20\(^{\circ }\)C.
DNA extraction
Viral DNA was extracted following two different protocols. In the first protocol (non-enrichment method), DNA extraction was performed using MagNA Pure Compact Nucleic Acid Isolation Kit I (Roche, Switzerland) following the manufacturer’s instructions and using 500 \(\upmu\)L of UTM viral transport medium as input. The second protocol (MPXV DNA enrichment) was designed to enrich samples in MPXV particles, modified from a saponin-based differential lysis method [18] followed by high g-force centrifugations in a Z 36 HK (Hermle Labortecnik, Germany) centrifuge. The addition of saponin followed by a high NaCl concentration selectively lyses human cells without affecting viral capsids. A subsequent DNase treatment of previously lysed eukaryotic cells remarkably decreases the host DNA present in the samples. Briefly, 400 \(\upmu\)L of samples were centrifuged at 10,000 g for 5 mins at 4\(^{\circ }\)C. Supernatant was transferred to a tube for high g-force (Labcon, CA, USA) and centrifuged at 35,000 g for 30 min at 4\(^{\circ }\)C. Pellet was resuspended in 250 \(\upmu\)L of PBS supplemented with saponin 2.5% and incubated at room temperature for 10 min. After the incubation, 350 \(\upmu\)L of water were added and incubated for 30 s, and 12 \(\upmu\)L of NaCl 5 M were also added. Samples were centrifuged at 35,000 g for 30 min at 4\(^{\circ }\)C, pellets were resuspended in 100 \(\upmu\)L of PBS and then 100 \(\upmu\)L of NaCl 1 M, MgCl2 100 mM and 10 \(\upmu\)L of HL-SAN DNase (ArticZymes, Norway Technologies) were added. Samples were incubated at 37\(^{\circ }\)C for 15 min with shaking at 600 rpm. Following the incubation, samples were washed twice with 800 \(\upmu\)L and 1 mL of PBS and centrifuged at 35,000 g for 30 min at 4\(^{\circ }\)C after each wash. Final pellet was resuspended in 100 \(\upmu\)L of nuclease-free water and nucleic acids were extracted using the QIAamp MinElute Virus Spin Kit (Qiagen, Germany) following the manufacturer’s instructions. DNA quantification was performed using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, MA, USA).
Library generation and sequencing
DNA prep paired-end libraries (Illumina, CA, USA) were prepared using 1-5 ng of DNA extracted following the manufacturer's protocol, except for the number of PCR cycles (15). Libraries concentration was quantified using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, MA, USA). DNA quality and fragment size of the libraries were evaluated using the High Sensitivity D1000 Kit for TapeStation 4150 (Agilent, CA, USA). Libraries were sequenced using a MiSeq platform (Illumina, CA, USA) with paired-end sequencing, a read length of 150 nucleotides and utilizing a V2 micro cartridge (Illumina, CA, USA).
Bioinformatic analysis
Illumina reads were first processed using BBDuk (v. 38.96) [19] to remove PhiX contamination. Clumpify (v. 38.96) [19] was used to remove duplicates and to losslessly compress the files to minify space on disk. Finally, reads were trimmed with Trimmomatic (v. 0.39) [20] for adapter removal and quality control. Human contamination was removed using BMTagger [21]. Kraken2 (v. 2.1.2) [22] was used to classify reads using the full standard database (human, bacteria, plasmid, archaea, virus, fungi and UniVec_Core), extracting read count statistics and eliminating those that did not belong to the Orthopoxvirus genus. Visualization of these steps was facilitated by Pavian (v. 1.2.0) [23] and KrakenTools [22]. Other measures, such as read count at each quality control step were calculated with seqkit (v. 2.1.0) [24] Read quality was assessed before and after the entire cleaning process with FastQC (v. 0.11.9) [25] and MultiQC (v. 1.11) [26].
Reads that passed all the filters were aligned to the reference genome “MPXV_USA_2022_MA001” (ON563414.3) using BWA (v. 0.7.17-r1188) [27]. Duplicates were then marked with Picard (v. 2.27.4) [28] and alignment statistics (coverage, depth, aligned reads...) were calculated with BBMap’s pileup module (v. 38.96) [19]. A consensus sequence was then produced using iVar (v. 1.3.1) [29] (parameters: “-q 20 -t 0.5 -m 10”).
The alignment based consensus method was also compared to a de novo assembly approach, mainly due to the possible large indels that occur in MPXV. However, because of the large inverted terminal repeats (ITRs) in MPXV, de novo assemblies using a short read strategy can fail to represent both of these repetitive areas at the same time. In order to solve this, random subsets of N reads are made, which create different assemblies that when merged together can create a good scaffold, which is then polished using reads. This method can also utilize different assemblers and is inspired by Tricycler’s approach [30]. Possible drawbacks include: requiring high coverage and large repetitive insertions not being resolved with only short reads.
Various assemblies were created for each sample with Unicycler (v. 0.5.0) [31] (parameters: “–linear 1”) and SPAdes (v. 3.15.4) (parameters: “–trusted-contigs $ref -k 31,51,71”) [32] using subsets of reads chosen randomly by seqtk (v. 1.3-r106) [33]. The different assemblies were then aligned to the reference MPXV genome ON563414.3 with minimap2 (v. 2.24-r1122) [34], and a consensus was made using samtools (v. 1.15) [35]. Finally, the consensus was polished with Polypolish (v. 0.5.0) [36], the reads were aligned back to the polished genome and another consensus was made with iVar (v. 1.3.1).
Nextclade (v. 2.3.0) [37] was used to denote the lineages and to quickly visualize the quality of the sequences and their mutations. An SNV comparison was also created with snipit [38]. A more in depth phylogenomic analysis was made to see the relatedness of the study’s samples to all 275 complete MPXV genomes from taxid 10244 in GenBank up to 2022-07-18 (Table S2). Sequences were aligned with Mafft (v. 7.453) [39] (parameters: “–auto”) to create a FASTA alignment, which was then transformed into PHYLIP format and used as input to RAxML (v. 8.2.12) [40] (parameters: “-m GTRCAT -T 60 -n tree -p 1 -N 1000 -p 12345 -x 12345 -f a”). The resulting tree was visualized with the Interactive Tree of Life (iTOL, v.6.5.8) [41].