We are evaluating the capability of detecting/characterizing infectious viruses in human host cells through HTS genomic sequencing using the Illumina Miseq platform and different computational tools. In this study, we examined 2 human B lymphocyte cell lines that underwent spontaneous immortalization promoted by mycoplasma infection of the cell culture using the Illumina MiSeq platform. We used the HTS data analytic program, CLC Genomics Workbench, to classify the massive short sequence reads (49.5 and 45.3 million reads, ~175 bp/each) generated from genomic sequencing of the 2 cell lines by mapping these sequences with human, bacterial, fungal, and viral genomic databases from NCBI database respectively (Figure 1A-D). The majority of reads obtained (97.2% or 8.1 Gb for K4413-Mi cell line and 98.6% or 8.8 Gb for K4123-Mi cell line), that was classified as human sequences, had approximately 1.22-fold and 1.34-fold coverage of a diploid human genome (6.6 Gb).
Consensus EBV genomes were constructed according to the mapping result of the raw read with WT-EBV by using the CLC Genomics Workbench. The K4413-Mi EBV consensus genome was assembled by using 37,757 EBV-related reads (0.075% of total sequencing raw reads), had 171,843 bp in length with ~ 35.22-fold coverage of EBV genome. Thus, it is estimated that there are ~29 copies (35.22 vs 1.22 coverage) of EBV genome in a single cell of the K4413-Mi lymphocyte cell line. Similarly, the K4123-Mi EBV genome was assembled by using 28,178 EBV-related reads, and was 171,793 bp in length with ~ 31.06-fold coverage of the EBV genome. It is estimated that there are ~23 copies (31.06 vs 1.34 coverage) of the EBV genome in a single cell of the K4123-Mi lymphocyte cell line. The genome copy numbers of EBV found in these resting human B lymphocytes that underwent spontaneous immortalization are apparently higher than those previously reported in the undifferentiated NPC tumours. In GD2, there were ~ 6 copies of EBV genome found in a single NPC tumour cell . In comparison, the NA12878 EBV genome constructed from NCBI database revealed an average of 102 copies of EBV genome in each immortalized human lymphocyte induced by infection of B95-8 EBV in culture. It appears that there are many more copies of EBV genomes that are present in each transformed human B lymphocyte induced by acute B95-8 EBV infection than in spontaneously transformed EBV-positive B-cell promoted by infections of mycoplasma. The copy number of EBV genomes found in an NPC tumour cell is evidently the lowest.
Analysis of our results show that some sequences, albeit very few, obtained in the genomic sequencing of K4413-Mi and K4123-Mi human lymphocyte cell lines mapped to bacteria, fungi, and non-EBV viruses (Figure 1A-D). None-biased parallel HTS is capable of picking up trace amounts of DNA molecules present in culture media and serum. Most of the bacteria-related sequences were mapped to Enterobacteria. When matched with the NCBI viral database using the CLC Genomics Workbench, 281 reads in K4413-Mi cell line were found to be Enterobacteria phage phiX174. It is possible that the bacteria in the medium also carried the Enterobacteria phage phiX174. Consistent with the original study finding , no mycoplasma sequence was identified in these cell lines. There are 593 reads in K4413-Mi cell line and 548 reads in K4123-Mi were classified to be other Herpesvirus-related sequence, not the EBV virus (Figure 1B, D). We conducted the individual BlastN for these 1141 reads. Most of these reads also matched with EBV virus with low homology, and some reads matched with Human sequence. In this context, 2.69% and 1.28% of reads showed “no hits” in mapping against the 4 genomic databases (human, bacteria, fungi, and virus) for the K4413-Mi and K4123-Mi cell lines, respectively. When these sequences were mapped using BlastN against NCBI non-redundant databases, most of these sequences were found to be human sequences. The CLC Genomics Workbench is a very powerful tool for NGS data analysis. It allows us to quickly identify the read composition from the massive amount of data. We also can effectively detect and assemble the target viral EBV genome with the CLC Genomics Workbench. Of course, all software programs have its advantages and drawbacks. For bioinformatics tools, parameter settings and databases used will affect the outcome of the analysis. Different bioinformatics tools are likely to produce different results. We also tried the DNASTAR, SOAP package, BWA and other bioinformatics tools for our analysis. The results are slightly different from using the CLC Genomics Workbench (data not shown). In this study, we only present our analysis with the CLC program.
Whole-genome sequencing of EBV in the infected cells enabled the determination and thus comparison of EBV variations at the genome level. The constructed genomes of K4413-Mi EBV and K4123-Mi EBV are highly similar to each other and to the 8 other reported EBV genomes in the GenBank. However, there are apparent degrees of variations among the genomes of these EBVs studied. Phylogenetic comparison of these EBV genomes revealed that K4413-Mi EBV and K4123-Mi EBV are more closely related to B95-8, EBV isolated from a patient with infectious mononucleosis. They are evidently more distant from GD1, GD2, and HKNPC1, EBVs associated with NPC tumours. Specific comparison for the two particular EBV genes (LMP1 and EBNA-1) that were considered risk-loci in NPC  and commonly used for classification also revealed that K4413-Mi EBV and K4123-Mi EBV are closer to B95-8 and more different from NPC-related EBVs (Figure 3B, C). Furthermore, both K4413-Mi EBV and K4123-Mi EBV lacked the 30 bp deletion at the carboxyl terminus and a specific amino acid substitution (Asp) at codon 335 with reference to Gly in B95-8 LMP1, a feature that was reportedly present in over 90% of EBVs found in NPC biopsies . Moreover, K4413-Mi EBV and K4123-Mi EBV were evidently more distant from AG876, EBV isolated from African Burkitt’s lymphoma (Figures 2, 3A). Specific comparison of sequences in the EBNA3 region that had been used for classification of different EBV subtypes , similarly revealed K4413-Mi EBV and K4123-Mi EBV were more closely related to B95-8, a subtype 1 EBV (Figure 2). They were most distant from AG876, a subtype 2 EBV from the Western African case of Burkitt’s lymphoma.
Inclusion of the 2 most recently reported sequences of EBV genomes, Akata and Mutu, in our analysis of genomic sequence variations reveals the significant geographical distribution factor in addition to the factors of disease or tissue association. K4413-Mi EBV and K4123-Mi EBV are clearly more closely related to B95-8 strain and WT-EBV and more different from GD1, GD2 and HKNPC1 EBVs associated with NPC in the East Asia and Akata-EBV strain from a Japanese case of Burkitt’s lymphoma. They are clearly most different from the EBV of Western Africa case of Burkitt’s lymphoma. However, it is interesting to find that Mutu, EBV strain from a Kenya case of Burkitt’s lymphoma in the East Africa is more closely related to K4413-Mi EBV, K4123-Mi EBV, B95-8 and WT-EBV found in the North America (Figure 3A). More genome sequencing data of EBVs from different geographic regions in the world in the future could provide important information of the history or the route of EBV dissemination as well as its evolution. In this study, the geographical distribution factor in sequence variations among these EBVs can similarly be observed from the alignments of translated amino acid sequences of LMP1 and EBNA-1 genes among these EBVs of different origins (Figures 3B, C).
It may also be important to note that there were 10 heterogeneous SNPs found in the K4413-Mi EBV genome. There was also 1 heterogeneous SNPs found in the K4123-Mi EBV genome. The finding would suggest that there could be more than two EBV variants or quasi species within the K4413-Mi cell and K4123-Mi cell.