Takeaway

Double ML uses orthogonal moments and cross-fitting to estimate low-dimensional causal/structural parameters while controlling bias from high-dimensional nuisance components learned by ML.

The problem (before → after)

  • Before: Plug-in ML estimates of nuisance functions bias target parameter estimates.
  • After: Orthogonalize the moment condition so first-order errors in nuisances cancel; cross-fit to avoid overfitting.

Mental model first

Imagine balancing a scale with noisy weights; by arranging the pans (moments) so that small errors cancel, you still read the true mass (parameter) accurately.

Just-in-time concepts

  • Neyman orthogonality: ∂ E[m(W; θ, η)] / ∂η = 0 at true (θ₀, η₀).
  • Cross-fitting: Split data; fit nuisances on folds; plug into moments on held-out folds.
  • Asymptotic normality: √n consistency with valid inference under regularity.

First-pass solution

Estimate propensity and outcome models with flexible ML; form orthogonal score; solve for θ̂ using cross-fitted scores; compute standard errors via influence functions.

Iterative refinement

  1. High-dimensional controls: Lasso/boosting/RF for nuisances.
  2. Heterogeneous effects via orthogonalization within strata.
  3. Debiased ML for other targets (ATE, IV, policy value).

Principles, not prescriptions

  • Build estimators robust to small nuisance errors.
  • Separate fitting and estimating stages to prevent overfitting bias.

Common pitfalls

  • Violating overlap/positivity assumptions.
  • Using the same data fold for fitting and scoring.

Connections and contrasts

  • See also: [/blog/causal-trees], [/blog/multi-armed-bandits], [/blog/pac-bayes].

Quick checks

  1. Why orthogonal moments? — Reduce bias from nuisance estimation.
  2. Why cross-fitting? — Avoids adaptive overfitting in scores.
  3. What assumptions? — Overlap, smoothness, bounded moments.

Further reading

  • Chernozhukov et al., 2018 (source above)
  • Semiparametric efficiency literature