Skip to main content

Table 2 Results of validation on FAMeS Data sets

From: INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences

FAMeS Data set Taxonomic assignment category Results with complete database Results with modified database
   INDUS TACOA SOrT-ITEMS MEGAN SPHINX INDUS TACOA SOrT-ITEMS MEGAN SPHINX
SimLC (96732) Correct 89.6 79.3 94.9 95 93.1 81.6 74.4 78.1 72.6 83.7
  Wrong 2.6 9 2 2.7 2.8 7.1 13.9 9.2 23.6 11.5
  Specific 81.3 24.4 94.9 93.9 82.8 66.7 21 78.1 65 71
  Non-specific 8.3 54.9 0 1.1 10.3 14.9 53.5 0 7.6 12.7
  Unassigned 7.9 11.6 3.1 2.3 4.1 11.4 11.6 12.6 3.7 4.8
SimMC (113373) Correct 92.8 82.5 93.7 94 92.6 86 79 79.1 69.9 81.8
  Wrong 1 7.3 3.1 3.5 3.6 4.6 10.8 9.1 26.6 13.5
  Specific 84.1 23.7 93.7 93.1 81.1 71.9 23 79.1 63.2 70.1
  Non-specific 8.7 58.8 0 1 11.5 14.1 56 0 6.7 11.6
  Unassigned 6.2 10.2 3.2 2.4 3.8 9.4 10.2 11.8 3.5 4.7
SimHC (115592) Correct 89.6 72.1 92 91.9 86 78.9 67 76.6 77.9 77.6
  Wrong 1.9 11.4 3.6 4.9 7.1 5.7 16 9.5 17.6 10.1
  Specific 76.4 18.7 92 90.3 79.6 63.4 18.2 76.6 69.5 71.2
  Non-specific 13.2 53.4 0 1.5 6.4 15.5 48.8 0 8.4 6.4
  Unassigned 8.6 16.5 4.3 3.2 6.9 15.5 17 13.9 4.5 12.3
Average for FAMeS data sets Correct 90.6 78 93.5 93.6 90.6 82.2 73.5 77.9 73.5 81
  Wrong 1.8 9.3 2.9 3.7 4.5 5.8 13.6 9.3 22.6 11.7
  Specific 80.6 22.3 93.5 92.4 81.2 67.3 20.7 77.9 65.9 70.8
  Non-specific 10.1 55.7 0 1.2 9.4 14.8 52.8 0 7.6 10.2
  Unassigned 7.6 12.8 3.5 2.6 4.9 12.1 12.9 12.8 3.9 7.3
  1. Summary of the results obtained with the FAMeS metagenomic data sets. The complete and modified reference database contained genome fragments from 952 and 652 prokaryotic genomes respectively. The number of sequences in each data set is indicated in parenthesis.