Takeaway

Differential privacy (DP) limits how much any single individual can affect an output by adding calibrated noise; guarantees compose gracefully across analyses.

The problem (before → after)

  • Before: Anonymization fails under linkage attacks; repeats leak information.
  • After: DP provides provable privacy guarantees with parameters (ε, δ) by perturbing outputs proportionally to sensitivity.

Mental model first

Imagine whispering answers in a noisy room: each person’s voice is masked by controlled static so you can hear the crowd, not any one person.

Just-in-time concepts

  • (ε, δ)-DP: Neighboring datasets produce similar output distributions.
  • Sensitivity: Max change in a function when one record is added/removed.
  • Mechanisms: Laplace, Gaussian; advanced composition and privacy accounting.

First-pass solution

Choose query f; compute sensitivity Δ; add noise scaled to Δ/ε; track cumulative privacy loss over multiple queries.

Iterative refinement

  1. RDP/zCDP for tighter accounting.
  2. Local DP for client-side privacy.
  3. Private training: DP-SGD for deep learning with gradient clipping and noise.

Principles, not prescriptions

  • Budget privacy across analyses; report ε with results.
  • Match mechanism to query type and sensitivity.

Common pitfalls

  • Underestimating sensitivity and composition effects.
  • Reporting results without the privacy budget and utility trade-offs.

Connections and contrasts

  • See also: [/blog/secure-multiparty-computation], [/blog/zero-knowledge-proofs], [/blog/information-theory].

Quick checks

  1. Why sensitivity? — Scales noise to bound individual influence.
  2. What is ε? — Privacy loss parameter; smaller is more private.
  3. How to train DP models? — DP-SGD: clip gradients and add Gaussian noise.

Further reading

  • Dwork et al., 2006; Abadi et al., DP-SGD