Introduction to Graphical Causal Inference

Getting Started

Moritz Ketzer

Humboldt-Universität zu Berlin

March 9, 2026

Welcome

Workshop Website and RStudio Server Setup

moritzketzer.github.io/iqb-workshop

Why bother with Causal Inference?

Messerli (2012)

Descriptive
Predictive
Explanatory

“Spurious Correlations (n.d.)

“Spurious Scholar (n.d.)

Shmueli (2010)

Editorial (2021)

Kidney Stones

Two treatments


Which one is better?

Charig et al. (1986)

Stone size Treatment A Treatment B
Small 81 / 87 (93%) 234 / 270 (87%)
Large 192 / 263 (73%) 55 / 80 (69%)
Total 273 / 350 (78%) 289 / 350 (83%)

Pearl (2013); Kievit et al. (2013)

Why?

Stone size influences both which treatment you get and how well you recover.

✓ Stratify!

Blood pressure
(measured after)
Drug No drug
Low BP 81 / 87 (93%) 234 / 270 (87%)
High BP 192 / 263 (73%) 55 / 80 (69%)
Total 273 / 350 (78%) 289 / 350 (83%)

✗ Don’t stratify!

Same data. Opposite answers.

The statistics are identical.

Only the causal story changed.

Pearl (2009); Pearl and Mackenzie (2018)

Pearl and Mackenzie (2018, fig. 1.2)

Rung 3: Imagining

You got a dog. You’re happy.

Y_i(\text{dog}) = 1 \quad \checkmark

Would you have been happy without the dog?

Y_i(\text{no dog}) = \; ?

The Individual Treatment Effect

Rubin (1974)

\tau_i = Y_i(1) - Y_i(0)

We can never observe both for the same person.

The Fundamental Problem of Causal Inference

i T Y Y(1) Y(0) \tau_i
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 0 ? 0 ?
5 0 1 ? 1 ?
6 1 1 1 ? ?

Holland (1986)

From Individual to Average

We can’t know \tau_i for any single person.

But we can ask about the average across many people:

\begin{aligned} ATE &= \mathbb{E}[Y(1) - Y(0)] \\ &= \mathbb{E}[Y(1)] - \mathbb{E}[Y(0)] \end{aligned}

Rung 3 → Rung 2

Observational

\mathbb{E}[Y \mid X = x]

“What Y do we see when X = x?”

Interventional

\mathbb{E}[Y \mid \text{do}(X = x)]

“What Y do we get when we set X = x?”

When are these equal?

Structural Causal Models

Pearl et al. (2016)

\mathcal{M} = \langle U, V, F, P(U) \rangle

V endogenous variables (what we model)
U exogenous variables (background factors, error terms)
F structural functions (how each V is determined by its parents + U)
P(U) probability distribution over the exogenous variables

An SCM Example

Structural Equations

\begin{aligned} Z &\leftarrow f_Z(U_Z) \\ X &\leftarrow f_X(Z, U_X) \\ Y &\leftarrow f_Y(Z, X, U_Y) \end{aligned}

Distributional Assumptions

U_Z, U_X, U_Y \overset{\text{ind.}}{\sim} P(U)

Graph Surgery

Structural Equations

\begin{aligned} Z &\leftarrow f_Z(U_Z) \\ X &\leftarrow \color{#107895}{x} \\ Y &\leftarrow f_Y(Z, \textcolor{#107895}{x}, U_Y) \end{aligned}

Distributional Assumptions

U_Z, \textcolor{lightgray}{U_X}, U_Y \overset{\text{ind.}}{\sim} P(U)

A Linear SCM

Structural Equations

\begin{aligned} Z &= U_Z \\ X &= \beta^{XZ} Z + U_X \\ Y &= \beta^{YX} X + \beta^{YZ} Z + U_Y \end{aligned}

Distributional Assumptions

\begin{pmatrix} U_Z \\ U_X \\ U_Y \end{pmatrix} \sim \mathcal{N}\left(\begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma^2_Z & 0 & 0 \\ 0 & \sigma^2_X & 0 \\ 0 & 0 & \sigma^2_Y \end{pmatrix}\right)

Graph Surgery: Linear Case

Structural Equations

\begin{aligned} Z &= U_Z \\ X &= \color{#107895}{x} \\ Y &= \beta^{YX} \textcolor{#107895}{x} + \beta^{YZ} Z + U_Y \end{aligned}

Distributional Assumptions

\begin{pmatrix} U_Z \\ \textcolor{lightgray}{U_X} \\ U_Y \end{pmatrix} \sim \mathcal{N}\left(\begin{pmatrix} 0 \\ \textcolor{lightgray}{0} \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma^2_Z & 0 & 0 \\ 0 & \textcolor{lightgray}{\sigma^2_X} & 0 \\ 0 & 0 & \sigma^2_Y \end{pmatrix}\right)

Estimand in do notation

ATE = \mathbb{E}[Y \mid do(X=1)] - \mathbb{E}[Y \mid do(X=0)]

Estimand in do notation

ATE = \mathbb{E}[Y \mid do(X=x')] - \mathbb{E}[Y \mid do(X=x)]

Two Frameworks, One Estimand


Potential Outcomes (Rubin)

ATE = \mathbb{E}[Y_i(1) - Y_i(0)]


Structural Causal Model (Pearl)

ATE = \mathbb{E}[Y \mid do(X=1)] - \mathbb{E}[Y \mid do(X=0)]

One Estimand, three expressions

\begin{aligned} ATE &= \mathbb{E}[Y_i(1) - Y_i(0)] \\ &= \mathbb{E}[f_Y(1, \mathbf{U}_i) - f_Y(0, \mathbf{U}_i)] \\ &= \mathbb{E}[Y \mid do(X=1)] - \mathbb{E}[Y \mid do(X=0)] \end{aligned}

Directed Acyclic Graphs (DAGs)

Wright (1920, fig. 1)

Wright (1934)

The Three Elemental Structures

Fork (Common Cause)

Z <- runif(n, 0, 10)
X <- 0.6 * Z + rnorm(n, mean = 0, sd = 2)
Y <- 1.0 * X - 2.0 * Z
    + rnorm(n, mean = 0, sd = 1.5)

Chain (Mediator)

tutoring <- rnorm(n, mean = 5, sd = 2)
homework <- rbinom(n, 1,
    plogis(tutoring - 5))
scores <- 20 * homework
        + rnorm(n, mean = 50, sd = 8)

tutoring <- rnorm(n, mean = 5, sd = 2)
homework <- rbinom(n, 1,
    plogis(tutoring - 5))
scores <- 20 * homework
        + rnorm(n, mean = 50, sd = 8)

Collider

Elwert and Winship (2014)

talent   <- rnorm(n, mean = 0, sd = 1)
beauty   <- rnorm(n, mean = 0, sd = 1)
celebrity <- talent + beauty

talent   <- rnorm(n, mean = 0, sd = 1)
beauty   <- rnorm(n, mean = 0, sd = 1)
celebrity <- talent + beauty
# Top 15% become famous
celeb <- celebrity > P85

talent   <- rnorm(n, mean = 0, sd = 1)
beauty   <- rnorm(n, mean = 0, sd = 1)
celebrity <- talent + beauty

Collider (Descendant)

talent   <- rnorm(n, mean = 0, sd = 1)
beauty   <- rnorm(n, mean = 0, sd = 1)
celebrity <- talent + beauty
followers <- 0.8 * celebrity
            + rnorm(n, mean = 0, sd = 1.2)

Putting It Together

D-Separation

Pearl (2009)

Directed Separation

A path between X and Y is blocked given a conditioning set Z if it contains:

a non-collider (fork or chain) that is in Z
a collider that is not in Z (and has no descendant in Z)

X and Y are d-separated given Z if every path between them is blocked.

From Surgery to Statistics

Graph surgery — physically cut arrows into X

Backdoor adjustment — statistically neutralize them

P(Y \mid do(X = x)) = \sum_z P(Y \mid X = x, Z = z) \; P(Z = z)

Left side: causal (rung 2). Right side: observable (rung 1).

Exercise: The Causal Quartet

D’Agostino McGowan et al. (2024)

moritzketzer.github.io/iqb-workshop

Open causal-quartet.R

Breakout rooms in pairs (~30 min), then 5 min break before debrief.

Adapted from Bareinboim et al. (2022, fig. 27.4)

Can you prove mediation?

Kline (2015)

Today

Tomorrow

Optional: DAG your Research

References

Bareinboim, Elias, Juan D. Correa, Duligur Ibeling, and Thomas Icard. 2022. “On Pearl’s Hierarchy and the Foundations of Causal Inference.” In Probabilistic and Causal Inference, 1st ed., edited by Hector Geffner, Rina Dechter, and Joseph Y. Halpern. ACM. https://doi.org/10.1145/3501714.3501743.
Charig, C. R., D. R. Webb, S. R. Payne, and J. E. Wickham. 1986. “Comparison of Treatment of Renal Calculi by Open Surgery, Percutaneous Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy.” Br Med J (Clin Res Ed) 292 (6524): 879–82. https://doi.org/10.1136/bmj.292.6524.879.
D’Agostino McGowan, Lucy, Travis Gerke, and Malcolm Barrett. 2024. “Causal Inference Is Not Just a Statistics Problem.” Journal of Statistics and Data Science Education 32 (2): 150–55. https://doi.org/10.1080/26939169.2023.2276446.
Editorial. 2021. “Description, Prediction, Explanation.” Nature Human Behaviour 5 (10): 1261–61. https://doi.org/10.1038/s41562-021-01230-5.
Elwert, Felix, and Christopher Winship. 2014. “Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.” Annual Review of Sociology 40 (July): 31–53. https://doi.org/10.1146/annurev-soc-071913-043455.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60. https://doi.org/10.1080/01621459.1986.10478354.
Kievit, Rogier A., Willem E. Frankenhuis, Lourens J. Waldorp, and Denny Borsboom. 2013. “Simpson’s Paradox in Psychological Science: A Practical Guide.” Frontiers in Psychology 4: 513. https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00513/full.
Kline, Rex B. 2015. “The Mediation Myth.” Basic and Applied Social Psychology 37 (4): 202–13. https://doi.org/10.1080/01973533.2015.1049349.
Messerli, Franz H. 2012. “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” New England Journal of Medicine 367 (16): 1562–64. https://doi.org/10.1056/NEJMon1211064.
Neal, Brady. n.d. “Introduction to Causal Inference.” Accessed March 7, 2026. https://www.bradyneal.com/causal-inference-course.
Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. Second edition, reprinted with corrections. Cambridge University Press.
Pearl, Judea. 2013. “Understanding Simpson’s Paradox.” SSRN Electronic Journal, ahead of print. https://doi.org/10.2139/ssrn.2343788.
Pearl, Judea, Madelyn Glymour, and Nicholas P. Jewell. 2016. Causal Inference in Statistics: A Primer. 1. edition. Wiley.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. First edition. Basic Books.
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688. https://psycnet.apa.org/record/1975-06502-001.
Shmueli, Galit. 2010. “To Explain or to Predict?” Statistical Science 25 (3): 289–310. https://www.jstor.org/stable/41058949.
“Spurious Correlations.” n.d. Accessed February 27, 2026. https://tylervigen.com/spurious-correlations.
“Spurious Scholar.” n.d. Accessed February 27, 2026. https://tylervigen.com/spurious-scholar.
Wright, Sewall. 1920. “The Relative Importance of Heredity and Environment in Determining the Piebald Pattern of Guinea-Pigs.” Proceedings of the National Academy of Sciences 6 (6): 320–32. https://doi.org/10.1073/pnas.6.6.320.
Wright, Sewall. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics 5 (3): 161–215. https://www.jstor.org/stable/2957502?casa_token=W2Rnt77K4xEAAAAA:OIfplx9Qlg2_3chSeXIWMD6DNY6o0MpQ52AEhlHD-q_gRNSON3XE4zckyozxyl9IbLP-PyOOdYTcvpKNzCFr7xYWRyL5pdWwZ53q_uktbvmSb5oiWAg.