Takeaway

The Dirichlet process defines a prior over discrete distributions with countably infinite support, enabling models to grow complexity with data (e.g., infinite mixture models).

The problem (before → after)

  • Before: Fixed-K mixture models risk under/overfitting.
  • After: DP mixtures infer the number of clusters from data with principled uncertainty.

Mental model first

Chinese restaurant process: customers (data) enter and either join an occupied table with probability proportional to its size or start a new table with probability proportional to α.

Just-in-time concepts

  • DP(α, G₀), stick-breaking β_k ∼ Beta(1, α), π_k = β_k ∏_{j<k} (1−β_j).
  • Exchangeability and Pólya urn representation.
  • Gibbs sampling and collapsed inference.

First-pass solution

Define a DP mixture: draw component parameters from G; assign data to components via CRP; sample assignments and parameters; compute predictive densities.

Iterative refinement

  1. Hierarchical DPs share components across groups.
  2. Truncation and variational inference for scalability.
  3. Pitman–Yor processes favor power-law cluster sizes.

Principles, not prescriptions

  • Let data determine complexity; avoid rigid K.
  • Use conjugacy for efficient inference when possible.

Common pitfalls

  • Label switching and mixing issues in MCMC.
  • Sensitivity to α and base measure choices.

Connections and contrasts

  • See also: [/blog/variational-inference], [/blog/black-box-vi].

Quick checks

  1. Why DP mixtures? — Flexible clustering with uncertainty over K.
  2. What does α control? — Tendency to create new clusters.
  3. Why discrete draws? — DP draws are almost surely discrete.

Further reading

  • Ferguson (1973); Neal (2000); Blei et al. tutorials