A singular value decomposition approach for improved taxonomic classification of biological sequences

BMC Genomics

Table 2 Inferring quality from clustering methods

Algorithm/ software	Rank	N	Min cLtlf	Max cLtlf	Mean cLtlf	cLtlf clusters sum (∑cLtlf)	cLtlf standard deviation (σ)	Linnaean clusters quality (∑cLtlf/σ)	Linnaean clusters quality gain (K09/K60)%	cLtlf median	Median clusters quality gain (K09/K60)%
AQBC-javaml	K09	8	32	180	71.25	570	52.27	10.90	49.58%	42.50	26.87%
	K60	8	0	220	64.38	515	70.64	7.29		33.50
EM-weka	K09	8	40	120	70.12	561	31.53	17.79	48.99%	57.00	1.79%
	K60	8	16	160	70.25	562	47.06	11.94		56.00
Kmeans-weka	K09	8	30	180	69.38	555	46.70	11.88	9.26%	61.50	-2.38%
	K60	8	16	180	69.88	559	51.39	10.88		63.00
Kmeans-R	K09	8	40	140	71.62	573	34.48	16.62	9.21%	62.00	6.90%
	K60	8	26	140	71.75	574	37.72	15.22		58.00
K-Medoids-R	K09	8	24	160	70.12	561	44.37	12.64	15.92%	60.00	13.21%
	K60	8	26	180	68.50	548	50.24	10.91		53.00
MDBC-weka	K09	8	30	180	69.38	555	46.70	11.88	9.26%	61.50	-2.38%
	K60	8	16	180	69.88	559	51.39	10.88		63.00
ASAP-in house	K09	8	13	225	70.25	562	67.68	8.30	27.51%	52.00	197.14%
	K60	8	13	243	69.12	553	84.92	6.51		17.50

All evaluated partitioning's algorithms showed improved performance considering the Linnaean clusters quality when used the optimized distance matrix created by the better kdc parameters tested.

ISSN: 1471-2164