Differential Privacy — Privacy by Adding Noise
Takeaway
Differential privacy (DP) limits how much any single individual can affect an output by adding calibrated noise; guarantees compose gracefully across analyses.
The problem (before → after)
- Before: Anonymization fails under linkage attacks; repeats leak information.
- After: DP provides provable privacy guarantees with parameters (ε, δ) by perturbing outputs proportionally to sensitivity.
Mental model first
Imagine whispering answers in a noisy room: each person’s voice is masked by controlled static so you can hear the crowd, not any one person.
Just-in-time concepts
- (ε, δ)-DP: Neighboring datasets produce similar output distributions.
- Sensitivity: Max change in a function when one record is added/removed.
- Mechanisms: Laplace, Gaussian; advanced composition and privacy accounting.
First-pass solution
Choose query f; compute sensitivity Δ; add noise scaled to Δ/ε; track cumulative privacy loss over multiple queries.
Iterative refinement
- RDP/zCDP for tighter accounting.
- Local DP for client-side privacy.
- Private training: DP-SGD for deep learning with gradient clipping and noise.
Principles, not prescriptions
- Budget privacy across analyses; report ε with results.
- Match mechanism to query type and sensitivity.
Common pitfalls
- Underestimating sensitivity and composition effects.
- Reporting results without the privacy budget and utility trade-offs.
Connections and contrasts
- See also: [/blog/secure-multiparty-computation], [/blog/zero-knowledge-proofs], [/blog/information-theory].
Quick checks
- Why sensitivity? — Scales noise to bound individual influence.
- What is ε? — Privacy loss parameter; smaller is more private.
- How to train DP models? — DP-SGD: clip gradients and add Gaussian noise.
Further reading
- Dwork et al., 2006; Abadi et al., DP-SGD