Fig. 3From: VariantSpark: population scale clustering of genotype informationSchematic overview of VariantSpark. The image shows the flow from the input VCF file to the machine learning library and onto the visualization. It highlights the differences between the Hadoop and Spark implementations for converting data in VCF format to a data structure readable by Mahout and MLlib, respectivelyBack to article page