Normalizing Flows — Exact Likelihood via Invertible Networks

Takeaway

Flows transform a simple base distribution into a complex target using a sequence of invertible maps; log-likelihoods are exact via the change-of-variables formula.

The problem (before → after)

Before: Powerful generative models often lack tractable likelihoods or exact sampling.
After: Invertible, differentiable layers yield both exact densities and efficient sampling.

Mental model first

Like kneading dough: each fold and stretch reshapes the density while preserving total mass; track how local volumes expand or shrink with the Jacobian determinant.

Just-in-time concepts

Change of variables: log p_X(x) = log p_Z(f(x)) + log |det J_f(x)|.
Coupling and autoregressive layers: triangular Jacobians make det cheap.
Expressivity: Depth and permutations mix dimensions; constraints keep maps invertible.

First-pass solution

Stack affine coupling layers with permutations; train by maximizing exact log-likelihood; sample by applying inverses to base noise.

Continuous flows (Neural ODEs) trade determinants for ODE solves.
Dequantization for discrete data; multiscale architectures for images.
Hybrid models: flows for posteriors in VI or decoders in VAEs.

Code as a byproduct (affine coupling log-det)

import torch

def affine_coupling(x, s, t, mask):
    x1, x2 = x*mask, x*(1-mask)
    scale = torch.tanh(s(x1))
    shift = t(x1)
    y2 = (x2 * torch.exp(scale)) + shift
    y = x1 + y2
    logdet = ((1-mask) * scale).sum(dim=1)
    return y, logdet

Principles, not prescriptions

Design layers with cheap Jacobians and stable inverses.
Mix dimensions aggressively to avoid factorized bottlenecks.

Common pitfalls

Numerical issues computing log-dets; stabilize scales.
Limited expressivity if permutations/partitions are fixed and shallow.

Connections and contrasts

See also: [/blog/variational-inference], [/blog/diffusion-models], [/blog/gans].

Quick checks

Why triangular Jacobians? — Determinant becomes the product of diagonal entries → cheap.
How to sample? — Draw z ∼ base and invert the flow.
Why flows in VI? — To make posteriors more expressive.