Attribute | Explanation | Before | After | Annotated |
---|
Protein families | CD-HIT clusters [22] | 21,146 | 17,560 | 9,819 |
Functions | Level-3 subsystems [20] | 4,260 | 3,105 | 1,828 |
SNPs | Marker SNPs [3] | 7,880 | 2,545 | 659 |
Subsystems | Level-3 subsystems [20] | 706 | 444 | 398 |
Phages | Phages [21] | 6 | 4 | 4 |
Clusters | Remove redundancy [6] | 0 | 1,647 | 0 |
Total | Â | 33,998 | 25,305 | 12,708 |
- Number of variables is shown before and after the clustering procedure to remove redundancy [6], as well as the number of variables annotated with level-1 subsystems [9]. The full matrix of 25,305 variables used in the manuscript is provided in Additional file 3 and Additional file 4. See text for details.