Takeaway

Train a network to reverse a gradual noising process; sample by iteratively denoising from pure noise to data.

The problem (before → after)

  • Before: Likelihood-based models struggle with complex, multi-scale data without heavy inductive bias.
  • After: Use a simple forward diffusion and learn its reverse; training reduces to denoising or noise prediction.

Mental model first

Picture blurring an image repeatedly until it becomes static; a skilled restorer learns to remove just the right amount of blur at each step to recover the picture.

Just-in-time concepts

  • Forward process q(x_t|x_{t−1}) adds Gaussian noise with schedule β_t.
  • Reverse process p_θ(x_{t−1}|x_t) is Gaussian with learned mean.
  • Noise prediction: Predict ε rather than x to stabilize training.

First-pass solution

Optimize an MSE between predicted and true noise under a cosine or linear β schedule; sample with ancestral steps from t=T to 0.

Iterative refinement

  1. Classifier-free guidance trades fidelity for adherence to prompts.
  2. Fewer steps: Distillation and improved samplers (DDIM, DPM-Solver) accelerate sampling.
  3. Continuous-time SDEs unify with score-based modeling.

Code as a byproduct (one reverse step)

import torch

def reverse_step(x_t, t, model, alpha_t, beta_t):
    eps_hat = model(x_t, t)
    mean = (x_t - beta_t/torch.sqrt(1-alpha_t) * eps_hat) / torch.sqrt(1-beta_t)
    z = torch.randn_like(x_t) if t > 0 else 0.0
    x_prev = mean + torch.sqrt(beta_t) * z
    return x_prev

Principles, not prescriptions

  • Choose parameterizations that match the noise process for stable training.
  • Schedules control trade-offs between learning and sampling cost.

Common pitfalls

  • Mismatch between training and sampling parameterizations.
  • Under-conditioning in conditional tasks; use guidance or better encoders.

Connections and contrasts

  • See also: [/blog/score-based-modeling], [/blog/normalizing-flows], [/blog/gans].

Quick checks

  1. Why predict noise? — Yields a well-conditioned loss tied to the diffusion process.
  2. How to speed up sampling? — Use DDIM or solver-based samplers.
  3. Relation to scores? — Reverse dynamics depend on score ∇_x log p_t(x).

Further reading

  • Ho et al., DDPM; Song et al. (SDEs)
  • Denoising diffusion implicit models (DDIM)