Takeaway

Probably Approximately Correct (PAC) learning quantifies how many samples suffice to learn a hypothesis with small error and high confidence, governed by capacity measures like VC dimension.

The problem (before → after)

  • Before: Learning performance felt empirical and dataset-specific.
  • After: Generalization bounds link training error, hypothesis class capacity, and sample size.

Mental model first

Learning is like polling voters: to estimate the population preference (true error), you need enough independent samples; complex hypotheses need more samples to avoid overfitting.

Just-in-time concepts

  • ε, δ guarantees; realizable vs agnostic settings.
  • VC dimension: shattering and growth functions.
  • Uniform convergence and structural risk minimization.

First-pass solution

Choose hypothesis class with finite VC dimension; ERM achieves error ≤ ε with n = O((VC log(1/ε) + log(1/δ))/ε) samples.

Iterative refinement

  1. Rademacher complexity tightens bounds.
  2. Margin bounds for large-margin classifiers.
  3. Compression and stability-based generalization.

Principles, not prescriptions

  • Control capacity relative to data to avoid overfitting.
  • Prefer simpler classes when data are scarce.

Common pitfalls

  • Misusing bounds as precise predictors of test error.
  • Ignoring distribution shift; bounds are distribution-specific.

Connections and contrasts

  • See also: [/blog/pac-bayes], [/blog/multi-armed-bandits].

Quick checks

  1. What is VC dimension? — Largest set a class can shatter.
  2. Difference between ε and δ? — Accuracy vs confidence.
  3. Agnostic vs realizable? — Whether true function lies in the class.

Further reading

  • Vapnik; Shalev-Shwartz & Ben-David