Heterogeneous Treatment Effects with Causal Trees
Takeaway
Causal trees partition covariate space to estimate conditional average treatment effects (CATE) with honesty to avoid overfitting.
The problem (before → after)
- Before: Average treatment effects hide who benefits; naive trees overfit.
- After: Honest splitting and cross-fitting produce valid CATE estimates with uncertainty control.
Mental model first
Think of gardening: you divide a field into plots (leaves) where the same fertilizer (treatment) has similar impact. Using separate samples for splitting and estimation avoids fooling yourself.
Just-in-time concepts
- CATE τ(x) = E[Y(1) − Y(0) | X=x].
- Honesty: Use one sample to choose splits, a disjoint sample to estimate effects.
- Splitting criteria: Maximize treatment-effect heterogeneity while controlling variance.
First-pass solution
Grow a tree on a training split to find partitions; estimate τ̂ in leaves on a separate sample; prune via cross-validation; report uncertainty via leaf-level variance.
Iterative refinement
- Causal forests average many causal trees for stability.
- Doubly robust estimation improves efficiency.
- Policy learning selects treatments to maximize outcomes subject to constraints.
Principles, not prescriptions
- Separate model selection from estimation to maintain validity.
- Prefer simple, interpretable partitions when stakes are high.
Common pitfalls
- Data leakage between split and estimate sets.
- Sparse leaves inflate variance; prune aggressively.
Connections and contrasts
- See also: [/blog/double-ml], [/blog/multi-armed-bandits], [/blog/simpsons-paradox].
Quick checks
- Why honesty? — Prevents adaptive overfitting of effects.
- What to split on? — Criteria targeting heterogeneity with variance control.
- Why forests? — Reduce variance by averaging many honest trees.
Further reading
- Athey & Imbens, 2016 (source above)
- Wager & Athey, causal forests