Causal Discovery from Observational Data

Takeaway

Under assumptions (causal sufficiency, faithfulness, acyclicity), algorithms can infer aspects of causal structure from observational data via conditional independencies or scores.

The problem (before → after)

Before: Correlation alone doesn’t reveal direction or confounding.
After: Conditional independence patterns restrict DAGs; scores and interventions refine orientation.

Mental model first

It’s detective work: alibis (independencies) rule out suspects (edges); remaining orientations follow from logic plus minimality assumptions.

Just-in-time concepts

PC/FCI (constraint-based); GES (score-based); LiNGAM (non-Gaussian).
Markov equivalence, CPDAGs, and PAGs under latent confounding.
Intervention and invariance strengthen identification.

First-pass solution

Test conditional independencies; build skeleton; orient edges with v-structures and rules; or search DAG space to maximize a score with penalties.

Latent confounders: FCI and algorithms with partial ancestral graphs.
Nonlinear/non-Gaussian models identify directions (ANM, LiNGAM).
Invariant causal prediction across environments.

Principles, not prescriptions

Combine multiple sources: independence, asymmetries, interventions, and invariance.
Beware finite-sample errors in CI testing.

Common pitfalls

Violating assumptions (e.g., hidden confounding) misleads discovery.
Overconfidence: output is an equivalence class, not a single DAG.

Connections and contrasts

See also: [/blog/causal-inference-do-calculus], [/blog/causal-trees], [/blog/double-ml].

Quick checks

What’s a CPDAG? — Represents all DAGs with the same CI relations.
Why non-Gaussian helps? — Breaks symmetry in direction detection.
How to validate? — Interventional tests or invariance across environments.