Black-Box Variational Inference

Takeaway

BBVI estimates noisy but unbiased gradients of the ELBO using Monte Carlo and generic score-function estimators, enabling variational inference without model-specific algebra.

The problem (before → after)

Before: Deriving model-specific updates is tedious and error-prone.
After: Treat the ELBO as an expectation and differentiate under the integral to get general-purpose stochastic gradients.

Mental model first

Like steering a boat in fog with noisy wind readings: each push is imperfect, but on average it points toward higher ELBO; variance reduction keeps the course steady.

Just-in-time concepts

Score-function estimator: ∇_ϕ E_q[f] = E_q[f ∇_ϕ log q].
Control variates: Baselines and Rao–Blackwellization reduce variance.
Reparameterization when possible: Prefer low-variance gradients.

First-pass solution

Sample z ∼ q_ϕ; compute g = (log p(x,z) − log q_ϕ(z)) ∇_ϕ log q_ϕ(z); average over minibatches; apply Adam.

Natural gradients in variational families (e.g., Gaussians).
Adaptive baselines learned alongside ϕ.
Hybrid estimators mixing reparameterization and score functions.

Code as a byproduct (score estimator)

def bbvi_grad(logp, logq, score):
    # logp: log p(x,z), logq: log q(z), score: ∇_ϕ log q(z)
    return (logp - logq) * score

Principles, not prescriptions

Prefer reparameterization; fall back to score estimators when needed.
Attack variance aggressively with baselines and control variates.

Common pitfalls

High-variance gradients stall learning; tune sample sizes and baselines.
Mis-specified q leads to biased posteriors regardless of estimator quality.

Connections and contrasts

See also: [/blog/variational-inference], [/blog/normalizing-flows].

Quick checks

When to use BBVI? — Non-reparameterizable latents or complex models.
Why baselines help? — Reduce variance without changing expectation.
What if q is misspecified? — ELBO optimum is biased.