Skip to main content
Fig. 3 | BMC Genomics

Fig. 3

From: The performance of coalescent-based species tree estimation methods under models of missing data

Fig. 3

Impact of missing data on AD and GTEE values. Average distance (or AD, defined as the normalized RF distance between the true species tree and the true gene trees, averaged across all 1000 genes) and gene tree estimation error (or GTEE, defined as the normalized RF distance between the true and the estimated gene trees, average across all 1000 genes) are shown for increasing amounts of missing data in panels (a-c) and (d-f), respectively. Each column represents a different level of incomplete lineage sorting (ILS): panels (a) and (d) show datasets with low ILS, panels (b) and (e) show datasets with high ILS, and panels (c) and (f) show datasets with very high ILS. Lines represent the average over 20 replicate datasets, and filled regions indicate the standard error. Solid lines indicate the Miid model of missing data, and dashed lines indicate the Mclade model of missing data. Note that datasets with 55% and 95% of genes with clade-based missing data had 34% and 59% total missing data, respectively. Datasets shown here have deep speciation events and 1000 genes; results for datasets with recent speciation are shown in Additional fileĀ 1

Back to article page