Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses

Fig. 1

Overview of the linker-mediated amplification of HIV integration sites and the bioinformatic pipeline used to distinguish real integration sites from artifacts. A schematic drawing of an integrated HIV provirus is shown at the top of the figure. As described in the text, and in the Supplementary material, the host DNA is fragmented, the ends are made blunt, and a single dA is added to the 3′ ends of the fragmented host DNA. The T-linker is composed of a long and a short oligonucleotide. The 3′ end of the short oligonucleotide is blocked (marked by an asterisk, see text) to prevent it from being extended in the PCR steps (see Figs. 2 and 3 for the design and sequences of the linker and primers). The UMI, which is shown in yellow, is a random sequence that is used to help determine if two similar amplified segments do (or do not) originate from two different starting pieces of host DNA. In the example shown in the figure (middle of the drawing), next to T-linkers, there are two fragments that contain a host-virus DNA junction that come from a clone of expanded cells. The two fragments have exactly the same integration site, but there are different breakpoints in the host DNA. As described in the text, the PCR reaction must be initiated from the LTR primer because the segment that the first T-linker primer anneals to is not present in the short oligonucleotide of the T-linker. The bottom of the drawing shows a schematic diagram of the amplified DNA and the location of the Illumina Read 1 and Read 2 primers. The Illumina data are processed, using the pipeline, as indicated in the diagram, and the output is saved as an Excel file (the screen shot shows a small portion of an output file)

Back to article page