Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: Virus expression detection reveals RNA-sequencing contamination in TCGA

Fig. 1

VirDetect workflow and performance. a & b VirDetect workflow diagram a VirDetect alignment steps, b virus genome preparation steps. c Number of reads mapping to the viral genome for both human (left) and low complexity (right) simulated reads (100 simulated samples, with 1000,000 human reads and 1000 low complexity reads each). From left to right on x-axis: (1) Unmasked, directly to the virus: all reads directly mapped to the unmodified viral genomes, without filtering human reads. (2) Unmasked: reads unaligned to the human genome were aligned to the unmodified viral genomes. (3) Low complexity masking only: reads unaligned to the human genome were aligned to the viral genomes masked for areas of low complexity. (4) Human masking only: reads unaligned to the human genome were aligned to viral genomes that were masked in areas of human homology. (5) Masked, mapping directly to the virus: all reads were mapped directly to the masked viral genomes, without filtering reads out that map the human genome. (6) Masked: reads unaligned to the human genome were aligned to masked viral genomes. d & e Viral simulated reads (100 simulated samples with 1000 reads each) with 0–10 mutations in the first read pair (d) Sensitivity, measured by the percent of reads that mapped to the viral genomes. e Positive predictive value (PPV) measured by number of true positives (simulated viral reads that mapped to the correct viral genomes) divided by the number of true positives and false positives

Back to article page