Takeaway

Entropy quantifies uncertainty; channel capacity and coding theorems show how reliably we can communicate over noisy channels.

The problem (before → after)

  • Before: No principled way to measure information or limits of compression/communication.
  • After: Entropy H, mutual information I, and capacity C provide tight limits and achievable strategies.

Mental model first

Information is surprise: messages that are harder to predict carry more bits. Compression removes predictability; error-correcting codes add redundancy to fight noise.

Just-in-time concepts

  • Entropy H(X) = −∑ p log p; mutual information I(X;Y); KL divergence.
  • Source coding: Optimal average codelength ≈ H.
  • Channel coding: Reliable communication below capacity with vanishing error.

First-pass solution

Design prefix-free codes approaching H; use block codes and decoding to approach capacity; measure performance with MI and error rates.

Iterative refinement

  1. Modern codes: LDPC, Turbo, Polar codes approach capacity efficiently.
  2. Info theory in ML: regularization, representation learning, privacy.
  3. Rate–distortion trades fidelity for bitrate.

Principles, not prescriptions

  • Bits measure uncertainty, not meaning.
  • Trade redundancy and rates to meet reliability goals.

Common pitfalls

  • Confusing MI with causation.
  • Applying capacity without matching channel models.

Connections and contrasts

  • See also: [/blog/differential-privacy], [/blog/kelly-criterion], [/blog/black-box-vi].

Quick checks

  1. Why negative sum? — Ensures higher uncertainty → higher entropy.
  2. What is capacity? — Max MI over inputs: C = max_{p(x)} I(X;Y).
  3. How does compression limit relate to H? — Average codelength can’t beat H.

Further reading

  • Shannon (1948); Cover & Thomas