PAC Learning — Sample Complexity and Generalization
Takeaway
Probably Approximately Correct (PAC) learning quantifies how many samples suffice to learn a hypothesis with small error and high confidence, governed by capacity measures like VC dimension.
The problem (before → after)
- Before: Learning performance felt empirical and dataset-specific.
- After: Generalization bounds link training error, hypothesis class capacity, and sample size.
Mental model first
Learning is like polling voters: to estimate the population preference (true error), you need enough independent samples; complex hypotheses need more samples to avoid overfitting.
Just-in-time concepts
- ε, δ guarantees; realizable vs agnostic settings.
- VC dimension: shattering and growth functions.
- Uniform convergence and structural risk minimization.
First-pass solution
Choose hypothesis class with finite VC dimension; ERM achieves error ≤ ε with n = O((VC log(1/ε) + log(1/δ))/ε) samples.
Iterative refinement
- Rademacher complexity tightens bounds.
- Margin bounds for large-margin classifiers.
- Compression and stability-based generalization.
Principles, not prescriptions
- Control capacity relative to data to avoid overfitting.
- Prefer simpler classes when data are scarce.
Common pitfalls
- Misusing bounds as precise predictors of test error.
- Ignoring distribution shift; bounds are distribution-specific.
Connections and contrasts
- See also: [/blog/pac-bayes], [/blog/multi-armed-bandits].
Quick checks
- What is VC dimension? — Largest set a class can shatter.
- Difference between ε and δ? — Accuracy vs confidence.
- Agnostic vs realizable? — Whether true function lies in the class.
Further reading
- Vapnik; Shalev-Shwartz & Ben-David