# Table 1 High-level description of the tested causal orientation methods.

Method Reference Key principles Sufficient assumptions for causally orienting X → Y Sound
ANM [14] Assuming X → Y with Y = f(X) + e1, where X and e1 are independent, there will be no such additive noise model in the opposite direction X ← Y, X = g(Y) + e2, with Y and e2 independent. • Y = f(X) + e1;
• X and e1 are independent;
• f is non-linear, or one of X and e is non-Gaussian;
• Probability densities are strictly positive;
• All functions (including densities) are 3 times differentiable.
Yes
PNL [15] Assuming X → Y with Y = f2(f1(X) + e1), there will be no such model in the opposite direction X←Y, X = g2(g1(Y) + e2) with Y and e2 independent. • Y = f2(f1(X) + e1);
• X and e1 are independent;
• Either f1 or e1 is Gaussian;
• Both f1 and f2 are continuous and invertible.
Yes
IGCI [16, 17] Assuming X→Y with Y = f(X), one can show that the KL-divergence (a measure of the difference between two probability distributions) between P(Y) and a reference distribution (e.g., Gaussian or uniform) is greater than the KL-divergence between P(X) and the same reference distribution. • Y = f(X) (i.e., there is no noise in the model);
• f is continuous and invertible;
• Logarithm of the derivative of f and P(X) are not correlated.
Yes
GPI-MML [18] Assuming X→Y, the least complex description of P(X, Y) is given by separate descriptions of P(X) and P(Y|X). By estimating the latter two quantities using methods that favor functions and distributions of low complexity, the likelihood of the observed data given X→Y is inversely related to the complexity of P(X) and P(Y | X). • Y = f(X, e);
• X and e are independent;
• e is Gaussian;
• The prior on f and P(X) factorizes.
No
ANM-MML [18] Same as for GPI-MML, except for a different way of estimating P(Y | X) and P(X | Y). • Y = f(X) + e;
• X and e are independent;
• e is Gaussian.
• The prior on f and P(X) factorizes.
No
GPI [18] Assuming X→Y with Y = f(X,e1), where X and e1 are independent and f is "sufficiently simple", there will be no such model in the opposite direction X←Y, X = g(Y,e2) with Y and e2 independent and g "sufficiently simple". Same as for GPI-MML. No
ANM-GAUSS [18] Same as for ANM-MML, except for the different way of estimating P(X) and P(Y). Same as for ANM-MML. No
LINGAM [13] Assuming X→Y, if we fit linear models Y = b2X+e1 and X = b1Y+e2 with e1 and e2 independent, then we will have b1 < b2. • Y = b2X+e1;
• X and e1 are independent;
• e1 is non-Gaussian.
Yes
1. The last column indicates whether a method is sound, i.e. it can provably orient a causal structure under its sufficient assumptions. Because causal orientation methodologies are fairly new and not completely characterized, it is possible that proofs of correctness will become available for GPI-MML, ANM-MML, GPI, and ANM-GAUSS. All methods implicitly assume that there are no feedback loops. The noise term in the models is denoted by small "e".