Book by Judea Pearl, one of the leaders of causal inference who received a Turing award for inventing Bayesian networks. It has some equations, with a level of technicality somewhere between a typical popular science book and a textbook.

Causal inference is required because it’s impossible to tell between causation and correlation from data alone, even with sophisticated deep learning techniques. For example, if birds chirp every day before sunrise, then a statistical method cannot tell you whether the birds are causing the sun to rise. Pearl gives three levels of causation, where each level can’t be built up from tools of the lower levels.

- Level 1 — Association: this is where most machine learning and statistics methods stand today. They can find correlations but can’t differentiate them from causation.
- Level 2 — Intervention: using causal diagrams and do-notation, you can tell whether X causes Y. The first step is to use this machinery to determine if a causal relation is possible from the data, then apply level 1 methods to compute the strength of the causality.
- Level 3 — Counterfactuals: given that you did X and Y happened, determine what would have happened if you did X’ instead.

Most reliable way to determine causality is through a randomized trial, but this is often impractical so we only have observational data. A lot of scientists just control for as many variables as possible, but this is a mistake, since if you control for a mediator, then the effect disappears! Collider bias is when A -> B <- C, then A and C are independent but if you control for B then they are no longer independent. Causal diagrams encodes model assumptions, and gives a quick algorithm to determine which variables should be controlled.

Back-door path is an unintended path between X (cause) and Y (effect) that go through confounders. Can try to block the back-door, but difficult because confounders may be unknown or unmeasurable. Instead, two easier ways:

- Front-door path: find variable Z where X -> Z -> Y but Z is not affected by confounders, then multiply the effect X -> Z and Z -> Y.
- Instrumental variable: find variable W where W -> X -> Y and W not affected by confounder, so treat it as a coin flip.

The author thinks a better representation of causality is one of the key requirements for strong AI.