Dirichlet Processes — Bayesian Nonparametrics
Takeaway
The Dirichlet process defines a prior over discrete distributions with countably infinite support, enabling models to grow complexity with data (e.g., infinite mixture models).
The problem (before → after)
- Before: Fixed-K mixture models risk under/overfitting.
- After: DP mixtures infer the number of clusters from data with principled uncertainty.
Mental model first
Chinese restaurant process: customers (data) enter and either join an occupied table with probability proportional to its size or start a new table with probability proportional to α.
Just-in-time concepts
- DP(α, G₀), stick-breaking β_k ∼ Beta(1, α), π_k = β_k ∏_{j<k} (1−β_j).
- Exchangeability and Pólya urn representation.
- Gibbs sampling and collapsed inference.
First-pass solution
Define a DP mixture: draw component parameters from G; assign data to components via CRP; sample assignments and parameters; compute predictive densities.
Iterative refinement
- Hierarchical DPs share components across groups.
- Truncation and variational inference for scalability.
- Pitman–Yor processes favor power-law cluster sizes.
Principles, not prescriptions
- Let data determine complexity; avoid rigid K.
- Use conjugacy for efficient inference when possible.
Common pitfalls
- Label switching and mixing issues in MCMC.
- Sensitivity to α and base measure choices.
Connections and contrasts
- See also: [/blog/variational-inference], [/blog/black-box-vi].
Quick checks
- Why DP mixtures? — Flexible clustering with uncertainty over K.
- What does α control? — Tendency to create new clusters.
- Why discrete draws? — DP draws are almost surely discrete.
Further reading
- Ferguson (1973); Neal (2000); Blei et al. tutorials