Takeaway

BBVI estimates noisy but unbiased gradients of the ELBO using Monte Carlo and generic score-function estimators, enabling variational inference without model-specific algebra.

The problem (before → after)

  • Before: Deriving model-specific updates is tedious and error-prone.
  • After: Treat the ELBO as an expectation and differentiate under the integral to get general-purpose stochastic gradients.

Mental model first

Like steering a boat in fog with noisy wind readings: each push is imperfect, but on average it points toward higher ELBO; variance reduction keeps the course steady.

Just-in-time concepts

  • Score-function estimator: ∇_ϕ E_q[f] = E_q[f ∇_ϕ log q].
  • Control variates: Baselines and Rao–Blackwellization reduce variance.
  • Reparameterization when possible: Prefer low-variance gradients.

First-pass solution

Sample z ∼ q_ϕ; compute g = (log p(x,z) − log q_ϕ(z)) ∇_ϕ log q_ϕ(z); average over minibatches; apply Adam.

Iterative refinement

  1. Natural gradients in variational families (e.g., Gaussians).
  2. Adaptive baselines learned alongside ϕ.
  3. Hybrid estimators mixing reparameterization and score functions.

Code as a byproduct (score estimator)

def bbvi_grad(logp, logq, score):
    # logp: log p(x,z), logq: log q(z), score: ∇_ϕ log q(z)
    return (logp - logq) * score

Principles, not prescriptions

  • Prefer reparameterization; fall back to score estimators when needed.
  • Attack variance aggressively with baselines and control variates.

Common pitfalls

  • High-variance gradients stall learning; tune sample sizes and baselines.
  • Mis-specified q leads to biased posteriors regardless of estimator quality.

Connections and contrasts

  • See also: [/blog/variational-inference], [/blog/normalizing-flows].

Quick checks

  1. When to use BBVI? — Non-reparameterizable latents or complex models.
  2. Why baselines help? — Reduce variance without changing expectation.
  3. What if q is misspecified? — ELBO optimum is biased.

Further reading

  • Ranganath et al., 2014 (source above)
  • Variance reduction techniques in VI