| Stone size | Treatment A | Treatment B |
|---|---|---|
| Small | 81 / 87 (93%) | 234 / 270 (87%) |
| Large | 192 / 263 (73%) | 55 / 80 (69%) |
| Total | 273 / 350 (78%) | 289 / 350 (83%) |
Getting Started
March 9, 2026
Messerli (2012)
“Spurious Correlations” (n.d.)
“Spurious Scholar” (n.d.)
Shmueli (2010)
Editorial (2021)
Two treatments
Which one is better?
Charig et al. (1986)
| Stone size | Treatment A | Treatment B |
|---|---|---|
| Small | 81 / 87 (93%) | 234 / 270 (87%) |
| Large | 192 / 263 (73%) | 55 / 80 (69%) |
| Total | 273 / 350 (78%) | 289 / 350 (83%) |
Stone size influences both which treatment you get and how well you recover.
✓ Stratify!
| Blood pressure (measured after) |
Drug | No drug |
|---|---|---|
| Low BP | 81 / 87 (93%) | 234 / 270 (87%) |
| High BP | 192 / 263 (73%) | 55 / 80 (69%) |
| Total | 273 / 350 (78%) | 289 / 350 (83%) |
✗ Don’t stratify!
The statistics are identical.
Only the causal story changed.


Pearl and Mackenzie (2018, fig. 1.2)
You got a dog. You’re happy.
Y_i(\text{dog}) = 1 \quad \checkmark
Would you have been happy without the dog?
Y_i(\text{no dog}) = \; ?
Rubin (1974)
\tau_i = Y_i(1) - Y_i(0)
We can never observe both for the same person.
| i | T | Y | Y(1) | Y(0) | \tau_i |
|---|---|---|---|---|---|
| 1 | 0 | 0 | ? | 0 | ? |
| 2 | 1 | 1 | 1 | ? | ? |
| 3 | 1 | 0 | 0 | ? | ? |
| 4 | 0 | 0 | ? | 0 | ? |
| 5 | 0 | 1 | ? | 1 | ? |
| 6 | 1 | 1 | 1 | ? | ? |
Holland (1986)
We can’t know \tau_i for any single person.
But we can ask about the average across many people:
\begin{aligned} ATE &= \mathbb{E}[Y(1) - Y(0)] \\ &= \mathbb{E}[Y(1)] - \mathbb{E}[Y(0)] \end{aligned}
Rung 3 → Rung 2
Observational
\mathbb{E}[Y \mid X = x]
“What Y do we see when X = x?”
Interventional
\mathbb{E}[Y \mid \text{do}(X = x)]
“What Y do we get when we set X = x?”
When are these equal?
Neal (n.d., fig. 4.2)
Pearl et al. (2016)
\mathcal{M} = \langle U, V, F, P(U) \rangle
| V | endogenous variables (what we model) |
| U | exogenous variables (background factors, error terms) |
| F | structural functions (how each V is determined by its parents + U) |
| P(U) | probability distribution over the exogenous variables |
\begin{aligned} Z &\leftarrow f_Z(U_Z) \\ X &\leftarrow f_X(Z, U_X) \\ Y &\leftarrow f_Y(Z, X, U_Y) \end{aligned}
U_Z, U_X, U_Y \overset{\text{ind.}}{\sim} P(U)
\begin{aligned} Z &\leftarrow f_Z(U_Z) \\ X &\leftarrow \color{#107895}{x} \\ Y &\leftarrow f_Y(Z, \textcolor{#107895}{x}, U_Y) \end{aligned}
U_Z, \textcolor{lightgray}{U_X}, U_Y \overset{\text{ind.}}{\sim} P(U)
\begin{aligned} Z &= U_Z \\ X &= \beta^{XZ} Z + U_X \\ Y &= \beta^{YX} X + \beta^{YZ} Z + U_Y \end{aligned}
\begin{pmatrix} U_Z \\ U_X \\ U_Y \end{pmatrix} \sim \mathcal{N}\left(\begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma^2_Z & 0 & 0 \\ 0 & \sigma^2_X & 0 \\ 0 & 0 & \sigma^2_Y \end{pmatrix}\right)
\begin{aligned} Z &= U_Z \\ X &= \color{#107895}{x} \\ Y &= \beta^{YX} \textcolor{#107895}{x} + \beta^{YZ} Z + U_Y \end{aligned}
\begin{pmatrix} U_Z \\ \textcolor{lightgray}{U_X} \\ U_Y \end{pmatrix} \sim \mathcal{N}\left(\begin{pmatrix} 0 \\ \textcolor{lightgray}{0} \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma^2_Z & 0 & 0 \\ 0 & \textcolor{lightgray}{\sigma^2_X} & 0 \\ 0 & 0 & \sigma^2_Y \end{pmatrix}\right)
ATE = \mathbb{E}[Y \mid do(X=1)] - \mathbb{E}[Y \mid do(X=0)]
ATE = \mathbb{E}[Y \mid do(X=x')] - \mathbb{E}[Y \mid do(X=x)]
ATE = \mathbb{E}[Y_i(1) - Y_i(0)]
ATE = \mathbb{E}[Y \mid do(X=1)] - \mathbb{E}[Y \mid do(X=0)]
\begin{aligned} ATE &= \mathbb{E}[Y_i(1) - Y_i(0)] \\ &= \mathbb{E}[f_Y(1, \mathbf{U}_i) - f_Y(0, \mathbf{U}_i)] \\ &= \mathbb{E}[Y \mid do(X=1)] - \mathbb{E}[Y \mid do(X=0)] \end{aligned}
Wright (1920, fig. 1)
Wright (1934)
Elwert and Winship (2014)
Pearl (2009)
A path between X and Y is blocked given a conditioning set Z if it contains:
| a non-collider (fork or chain) that is in Z | |
| a collider that is not in Z (and has no descendant in Z) |
X and Y are d-separated given Z if every path between them is blocked.
Graph surgery — physically cut arrows into X
Backdoor adjustment — statistically neutralize them
P(Y \mid do(X = x)) = \sum_z P(Y \mid X = x, Z = z) \; P(Z = z)
Left side: causal (rung 2). Right side: observable (rung 1).
D’Agostino McGowan et al. (2024)
moritzketzer.github.io/iqb-workshop
Open causal-quartet.R
Breakout rooms in pairs (~30 min), then 5 min break before debrief.
Adapted from Bareinboim et al. (2022, fig. 27.4)
Kline (2015)