Normalizing Flows — Exact Likelihood via Invertible Networks
Takeaway
Flows transform a simple base distribution into a complex target using a sequence of invertible maps; log-likelihoods are exact via the change-of-variables formula.
The problem (before → after)
- Before: Powerful generative models often lack tractable likelihoods or exact sampling.
- After: Invertible, differentiable layers yield both exact densities and efficient sampling.
Mental model first
Like kneading dough: each fold and stretch reshapes the density while preserving total mass; track how local volumes expand or shrink with the Jacobian determinant.
Just-in-time concepts
- Change of variables: log p_X(x) = log p_Z(f(x)) + log |det J_f(x)|.
- Coupling and autoregressive layers: triangular Jacobians make det cheap.
- Expressivity: Depth and permutations mix dimensions; constraints keep maps invertible.
First-pass solution
Stack affine coupling layers with permutations; train by maximizing exact log-likelihood; sample by applying inverses to base noise.
Iterative refinement
- Continuous flows (Neural ODEs) trade determinants for ODE solves.
- Dequantization for discrete data; multiscale architectures for images.
- Hybrid models: flows for posteriors in VI or decoders in VAEs.
Code as a byproduct (affine coupling log-det)
import torch
def affine_coupling(x, s, t, mask):
x1, x2 = x*mask, x*(1-mask)
scale = torch.tanh(s(x1))
shift = t(x1)
y2 = (x2 * torch.exp(scale)) + shift
y = x1 + y2
logdet = ((1-mask) * scale).sum(dim=1)
return y, logdet
Principles, not prescriptions
- Design layers with cheap Jacobians and stable inverses.
- Mix dimensions aggressively to avoid factorized bottlenecks.
Common pitfalls
- Numerical issues computing log-dets; stabilize scales.
- Limited expressivity if permutations/partitions are fixed and shallow.
Connections and contrasts
- See also: [/blog/variational-inference], [/blog/diffusion-models], [/blog/gans].
Quick checks
- Why triangular Jacobians? — Determinant becomes the product of diagonal entries → cheap.
- How to sample? — Draw z ∼ base and invert the flow.
- Why flows in VI? — To make posteriors more expressive.
Further reading
- RealNVP, Glow, Neural ODEs
- Original flow paper (source above)