Copy number variation in human genomes from three major ethno-linguistic groups in Africa

BMC Genomics

Table 2 CNV statistics using GenomeSTRiP and cn.MOPS algorithms

Parameter	GenomeSTRiP	cn.MOPS	GenomeSTRiP that overlap cn.MOPS
Raw CNV regions (CNVR)	16,149	9213
CNVR after QC	11,275	2115	7608
Total CNV scored	127,699	37,679	106,922
Deletion CNV	65,588	26,008	61,025
Gain CNV	62,111	11,671	45,897
Mean CNV count per CNVR	11.3	17.8	14.0
Mean CNVR per individual	654	193	548
Count of overlapping CNVRs ^a	7608	1691	7608
Mean Length of CNVR (kb)	9.5	541.7	10.7
SD length of CNVR (kb)	13.2	1287.6	14.1
Median Length of CNVR (kb)	5.3	32.4	6
Total Length of CNVR (Mb)	108.1	1145.8	81.2
Observed Length CNV present in both methods (Mb) (Simulated ± SD)^b	81.2 (43.4 ± 1.0)

Descriptive statistics of CNVR found using GenomeSTRiP and cn.MOPS. Note that: GenomeSTRiP has about 5.3 times the number of CNVs compared with cn.MOPS (11,275 cf. 2115); GenomeSTRiP CNVRs were shorter (median length 5.3 kb) than cn.MOPS (median length 32.4 kb); Total length of cn.MOPS CNVRs was about 10.6 times greater (1146 Mb cf. 108 Mb) than GenomeSTRiP CNVRs. CNVR = CNV region; a genomic location with chromosome, start and end base pair positions that has overlapping CNVs; CNVRs after QC = The CNVRs left after some CNVRs were dropped because they were only found in samples that were outliers in principal component analysis (PCA) plots of raw data. CNV count per CNVR = Number of samples with a CNV at each CNV region = Total CNVs count/ Total CNVRs; Mean CNVRs per sample = Count of CNV divided by number of samples; Mean, Standard deviation, Median, Total length, Observed length: Calculated per CNV not CNVR
^aCount of any overlap (minimum 1 bp) between GenomeSTRiP and cn.MOPS CNVR
^bThe expected length of CNVs that would be found by both methods was obtained by 100 simulations using all the observed lengths of CNVs allocated to random places in the genome

ISSN: 1471-2164