Decreasing the number of false positives in sequence classification

BMC Genomics

Table 1 Specificity of the different null models

null model	WAM_low	WAM_med	WAM_high	CM_low	CM_med	CM_high
5%GC	2431 (70%)	2377 (68%)	2173 (63%)	2439 (77%)	3775 (75%)	3567 (71%)
25%GC	1118 (32%)	1517 (44%)	1500 (43%)	1681 (53%)	2509 (50%)	2243 (44%)
50%GC	863 (25%)	79 ( 2%)	700 (20%)	674 (21%)	58 ( 1%)	0 ( 0%)
75%GC	1443 (42%)	1534 (44%)	738 (21%)	1644 (52%)	2521 (50%)	2197 (44%)
95%GC	2114 (61%)	2332 (68%)	2529 (73%)	2297 (73%)	3642 (72%)	3625 (72%)
target	45 ( 1%)	18 ( 0%)	25 ( 0%)	3 ( 0%)	0 ( 0%)	0 ( 0%)

Number (and percentage) of positively scored sequences for each null model. WAM_low, WAM_med and WAM_high designate the WAM models generated by the training set with low (36%), medium (48%) and high (65%) GC content, respectively. CM_low, CM_med and CM_high designate the CM models generated by the training set with low (5.6%), medium (49.2%) and high (71.4%) GC content, respectively.

ISSN: 1471-2164