Skip to main content
Figure 2 | BMC Genomics

Figure 2

From: SHEAR: sample heterogeneity estimation and assembly by reference

Figure 2

SHEAR workflow diagram. (1) SHEAR’s workflow begins by using CREST to predict the locations of SVs from the original SAM/BAM alignment file. (2) Reads neighboring the breakpoints of the predicted SVs, as well as all unmapped reads, are extracted from the original alignment. (3) These extracted reads are then realigned using a local alignment algorithm (BWA-SW) to improve the soft-clipping accuracy near the breakpoints. Breakpoint extracted reads are aligned in their original neighborhoods, and unmapped extracted reads are aligned against the whole reference sequence. (4) SVs are again predicted from the new alignment, which contains only the realigned reads near the original candidate breakpoints, as well as reads that were initially unmapped but have been realigned using the local alignment algorithm. The new SV predictions will potentially include new SVs and refined breakpoints of previously predicted SVs. Steps 2–4 may be repeated as necessary to pick up new SV events, and will usually only need to be repeated 2–3 times before SV predictions remain constant. (5) Using the refined predicted SV breakpoints, the heterogeneity percentage of each SV is estimated by comparing the soft-clipped and spanning reads at the breakpoints. This calculation varies depending on the SV type (see Figure 4). (6) Finally, a new personal genomic sequence is created by using the predicted SVs to directly modify the original reference sequence.

Back to article page