PAC Learning — Sample Complexity and Generalization

Takeaway

Probably Approximately Correct (PAC) learning quantifies how many samples suffice to learn a hypothesis with small error and high confidence, governed by capacity measures like VC dimension.

The problem (before → after)

Before: Learning performance felt empirical and dataset-specific.
After: Generalization bounds link training error, hypothesis class capacity, and sample size.

Mental model first

Learning is like polling voters: to estimate the population preference (true error), you need enough independent samples; complex hypotheses need more samples to avoid overfitting.

Just-in-time concepts

ε, δ guarantees; realizable vs agnostic settings.
VC dimension: shattering and growth functions.
Uniform convergence and structural risk minimization.

First-pass solution

Choose hypothesis class with finite VC dimension; ERM achieves error ≤ ε with n = O((VC log(1/ε) + log(1/δ))/ε) samples.

Rademacher complexity tightens bounds.
Margin bounds for large-margin classifiers.
Compression and stability-based generalization.

Principles, not prescriptions

Control capacity relative to data to avoid overfitting.
Prefer simpler classes when data are scarce.

Common pitfalls

Misusing bounds as precise predictors of test error.
Ignoring distribution shift; bounds are distribution-specific.

Connections and contrasts

See also: [/blog/pac-bayes], [/blog/multi-armed-bandits].

Quick checks

What is VC dimension? — Largest set a class can shatter.
Difference between ε and δ? — Accuracy vs confidence.
Agnostic vs realizable? — Whether true function lies in the class.