Skip to main content

Table 3 Analysis of 8 clusters from hierarchical cluster analysis, including the numbers of sites from each call set and a description of the predominant types of sites in each cluster

From: svclassify: a method to establish benchmark structural variant calls

Cluster 4000 Random Personalis Random Random LINEs Random LTRs Random SINEs Personalis deletions 1000 Genomes deletions Total Proportion that are deletions Description
1 0 0 0 0 0 371 284 655 1.000 Mostly large, true homozygous deletions
2 0 0 0 0 2 432 237 671 0.997 Heterozygous Alu deletions
3 1 1 1 0 0 705 402 1110 0.997 Homozygous Alu deletions
4 2397 455 38 28 16 9 28 2971 0.012 Large, likely non-SVs. Generally in easy-to-sequence regions
5 1073 1351 352 378 279 1 33 3467 0.010 Smaller, likely non-SVs. Generally in easy-to-sequence regions
6 17 2 1 0 0 3 138 161 0.876 Likely true large homozygous deletions with inaccurate breakpoints so that the true deletion is larger than the called region
7 14 16 2 2 4 624 811 1473 0.974 Mostly true heterozygous deletions in easier-to-sequence regions
8 498 481 103 90 195 161 752 2280 0.400 Mix of non-SVs and SVs in more difficult regions with coverage between the normal coverage and half the normal coverage
Total 4000 2306 497 498 496 2306 2685 12788 0.390