Notes

Causal Inference

Causal Inference and the Do-Calculus
The do-calculus uses graphical rules to identify causal effects from observational data when interventions are not available.
Heterogeneous Treatment Effects with Causal Trees
Causal trees partition covariate space to estimate conditional average treatment effects (CATE) with honesty to avoid overfitting.

Distributed Systems

Byzantine Fault Tolerance (PBFT)
PBFT tolerates arbitrary (including malicious) faults by requiring 3f+1 replicas and 2f+1 quorums with authenticated messages across pre-prepare, prepare, and commit phases.
CAP Theorem — Consistency, Availability, Partition Tolerance
Under network partitions, a system must choose between immediate consistency and availability; designs position along this trade-off with specific guarantees.
Conflict-Free Replicated Data Types (CRDTs)
CRDTs converge automatically under concurrent, out-of-order updates by designing operations that commute (or using causal metadata to resolve order).
Paxos — Reaching Consensus in an Async World
Paxos gets a cluster to agree on a value despite crashes and message delays by using quorums and epochs (proposal numbers) to ensure safety.
Raft — Understandable Consensus
Raft decomposes consensus into leader election, log replication, and safety; its design emphasizes understandability without sacrificing performance.

Graphics & Vision

Metropolis Light Transport (MLT)
MLT uses Markov Chain Monte Carlo to explore important light paths by mutating existing paths, dramatically reducing variance for difficult lighting (caustics, small lights).
Neural Style Transfer
Optimize an image so its deep features match the content of one image and the style (Gram statistics) of another, producing artworks that blend both.
Perceptual Losses for Image Synthesis
Perceptual losses compare deep feature activations rather than pixels, aligning optimization with human perception for tasks like style transfer and super-resolution.
Retinex and Color Constancy
Retinex explains how we perceive stable surface colors under changing illumination by comparing reflectance ratios across spatial scales.
The Rendering Equation and Path Tracing
The rendering equation expresses outgoing radiance as emitted light plus reflected incident light integrated over directions; path tracing solves it via Monte Carlo integration.

Machine Learning

Attention Is All You Need
Transformers replace recurrence and convolution with self-attention, letting models focus on the most relevant tokens anywhere in a sequence and enabling parallel training that scales.
Automatic Differentiation in Machine Learning
Autodiff computes exact derivatives of programs by chaining local derivatives through a computational graph; reverse mode powers deep learning by yielding gradients at constant cost per parameter.
Black-Box Variational Inference
BBVI estimates noisy but unbiased gradients of the ELBO using Monte Carlo and generic score-function estimators, enabling variational inference without model-specific algebra.
Causal Discovery from Observational Data
Under assumptions (causal sufficiency, faithfulness, acyclicity), algorithms can infer aspects of causal structure from observational data via conditional independencies or scores.
Contextual Bandits with Linear Payoffs
Contextual bandits choose actions based on features; algorithms like LinUCB and Thompson sampling with linear models achieve regret scaling with feature dimension.
Diffusion Probabilistic Models (DDPM)
Train a network to reverse a gradual noising process; sample by iteratively denoising from pure noise to data.
Dirichlet Processes — Bayesian Nonparametrics
The Dirichlet process defines a prior over discrete distributions with countably infinite support, enabling models to grow complexity with data (e.g., infinite mixture models).
Double Machine Learning
Double ML uses orthogonal moments and cross-fitting to estimate low-dimensional causal/structural parameters while controlling bias from high-dimensional nuisance components learned by ML.
Expectation–Maximization (EM) Algorithm
EM maximizes likelihoods with latent variables by alternating between inferring hidden structure (E-step) and optimizing parameters given that structure (M-step).
Generative Adversarial Networks
GANs pit a generator against a discriminator in a minimax game so the generator learns to produce samples indistinguishable from real data.
How Powerful Are Graph Neural Networks?
Message-passing GNNs are at most as powerful as the 1-WL isomorphism test; architecture choices affect expressivity and oversmoothing.
Normalizing Flows — Exact Likelihood via Invertible Networks
Flows transform a simple base distribution into a complex target using a sequence of invertible maps; log-likelihoods are exact via the change-of-variables formula.
PAC Learning — Sample Complexity and Generalization
Probably Approximately Correct (PAC) learning quantifies how many samples suffice to learn a hypothesis with small error and high confidence, governed by capacity measures like VC dimension.
PAC-Bayes Bounds — Data-Dependent Generalization
PAC-Bayes bounds provide tight, data-dependent generalization guarantees for randomized predictors via a trade-off between empirical loss and KL divergence to a prior.
Regret Bounds in Multi-Armed Bandits
Bandit algorithms balance exploration and exploitation to minimize cumulative regret; UCB and Thompson sampling achieve logarithmic regret under standard assumptions.
Score-Based Generative Modeling
Learn the score (the gradient of log density) at multiple noise levels and sample with Langevin dynamics or SDE solvers.
SHAP — Shapley Additive Explanations
SHAP attributes a model’s prediction to features using Shapley values from cooperative game theory, satisfying axioms like efficiency, symmetry, and additivity.
Variational Inference and the ELBO
Variational inference turns Bayesian inference into optimization: choose a tractable family qϕ(z|x) and fit it to p(z|x) by maximizing the ELBO (equivalently, minimizing KL(q||p)).
Vector-Quantized VAEs (VQ-VAE)
VQ-VAE replaces continuous latents with a learned discrete codebook, improving fidelity and avoiding posterior collapse via commitment and codebook losses.

Management & UX

Aesthetic–Usability Effect
People perceive attractive products as easier to use; aesthetics can mask usability issues—or buy you goodwill to fix them.
Conway’s Law — Organizations Mirror Architecture
System designs reflect the communication structures of the organizations that build them; change the org if you want to change the architecture.
Mere Exposure Effect
People tend to prefer things they’ve seen before; repeated, non-annoying exposure increases liking up to saturation.
Psychological Safety and Team Learning
Teams learn and innovate when members feel safe to take interpersonal risks—asking questions, admitting mistakes, and offering ideas without fear of ridicule.
The Paradox of Choice
Beyond a point, more options increase anxiety, delay decisions, and reduce satisfaction; good design curates and guides rather than dumping choices.

Math and CS

Banach–Tarski Paradox
With the axiom of choice, a 3D ball can be partitioned into finitely many non-measurable pieces and reassembled into two balls identical to the original.
Gödel’s Incompleteness Theorems
Any sufficiently strong, consistent formal system cannot prove all true statements about arithmetic (first theorem) and cannot prove its own consistency (second theorem).
Homotopy Type Theory and Univalence
HoTT interprets types as spaces and equalities as paths; the univalence axiom identifies equivalent types, enabling structural reasoning and machine-checked proofs.
P vs NP
Is every efficiently checkable problem also efficiently solvable? Most believe P ≠ NP, implying inherent limits on algorithmic efficiency.

Misc

Benders Decomposition for Large-Scale Optimization
Benders splits a hard problem into a master over complicating variables and subproblems over the rest; iteratively add feasibility/optimality cuts from subproblem duals.
Branch and Bound for Integer Programming
Branch-and-bound solves discrete optimization by recursively partitioning the search space and pruning subproblems using bounds from relaxations.
Convex Optimization — Duality and KKT Conditions
Convex problems have no bad local minima; duality and KKT conditions provide certificates of optimality and efficient algorithms.
Differential Privacy — Privacy by Adding Noise
Differential privacy (DP) limits how much any single individual can affect an output by adding calibrated noise; guarantees compose gracefully across analyses.
Fully Homomorphic Encryption (FHE)
FHE lets you compute on encrypted data without decrypting; the result decrypts to the same output as computing on plaintext, enabling private cloud computation.
Information Theory and Entropy
Entropy quantifies uncertainty; channel capacity and coding theorems show how reliably we can communicate over noisy channels.
Interior-Point Methods for Convex Programming
Interior-point methods follow a central path defined by barrier functions to solve large LP/QP/SDP problems efficiently with polynomial-time guarantees.
Optimal Transport for Machine Learning
Optimal transport measures the cost to move one distribution into another; Wasserstein distances provide geometry-aware losses for ML and statistics.
Quantum Decoherence and the Classical World
Decoherence explains the emergence of classical behavior: environmental entanglement suppresses interference between pointer states, making superpositions effectively classical mixtures.
Secure Multiparty Computation (MPC)
MPC lets parties compute a function on their private inputs without revealing them, learning only the output.
Simpson’s Paradox
Aggregated data can reverse trends present within groups; stratification and causal reasoning resolve the apparent contradiction.
The Halting Problem
There is no algorithm that decides for every program–input pair whether it halts; undecidability arises from self-reference and diagonalization.
Zero-Knowledge Proofs
Zero-knowledge proofs let a prover convince a verifier a statement is true without revealing anything beyond its truth.

Physics

Bell’s Theorem and Quantum Nonlocality
No local hidden-variable theory can reproduce all quantum predictions: certain correlations violate Bell inequalities, revealing nonlocality or the failure of local realism.
Fractional Quantum Hall Effect
At low temperatures and high magnetic fields, electrons form correlated states with quantized Hall conductance at fractional values and quasiparticles with anyonic statistics.
Holography (AdS/CFT Duality)
AdS/CFT equates a gravity theory in (d+1)-dimensional Anti–de Sitter space with a conformal field theory on its d-dimensional boundary; bulk geometry ↔ boundary quantum dynamics.
Many-Body Localization (MBL)
In disordered interacting systems, eigenstates can fail to thermalize; local memory persists indefinitely, violating the eigenstate thermalization hypothesis.
Nonabelian Gauge Theory (Yang–Mills)
Yang–Mills theory generalizes electromagnetism to noncommuting (nonabelian) gauge symmetries; its self-interacting gauge fields explain phenomena like asymptotic freedom and confinement in QCD.
Path Integral — Sum Over Histories
Quantum amplitudes arise by summing contributions from all possible paths, each weighted by e^{i S[path]/ħ}; classical physics emerges from stationary action via interference.
Quantum Error Correction and Surface Codes
Quantum error correction encodes logical qubits into entangled physical qubits so local noise can be detected and corrected without measuring the quantum information itself.
Renormalization Group
Renormalization group (RG) explains how a system’s description changes as you zoom in or out; fixed points of that zooming flow determine universal behavior like critical exponents near phase transiti
Spontaneous Symmetry Breaking and the Higgs
When the lowest-energy state does not share the symmetry of the laws, massless Nambu–Goldstone modes appear; with gauge fields, the Higgs mechanism gives masses to gauge bosons while preserving gauge
The Black Hole Information Paradox
Hawking radiation seems thermal and featureless, suggesting information loss; modern holographic ideas indicate unitarity is preserved, with the Page curve recovering late-time purity.

Probability & Economics

Black–Scholes — Pricing and Hedging
Under no-arbitrage with geometric Brownian motion and continuous hedging, option prices satisfy the Black–Scholes PDE; the delta hedge replicates payoffs and eliminates risk.
Diffusion of Innovation — The Bass Model
The Bass model explains adoption curves via innovators (external influence) and imitators (word-of-mouth), producing an S-shaped cumulative adoption over time.
Kelly Criterion — Bet Sizing for Long-Run Growth
The Kelly criterion prescribes the fraction of capital to bet that maximizes long-run exponential growth, not short-run win rate.
Network Externalities and Compatibility
When a product’s value grows with the number of users, early advantages can snowball; compatibility and standards shape market structure and welfare.
Rough Volatility Models
Empirical evidence suggests volatility is “rough” (Hurst H≈0.1), better captured by fractional stochastic processes than classical diffusions, improving fits to implied volatility surfaces.
The Monty Hall Problem — Why Switching Wins
If the host always opens a goat door and offers a switch, switching doubles your chance of winning: from 1/3 to 2/3.
Two-Sided Markets and Platform Pricing
Platforms must balance cross-side externalities; optimal pricing often subsidizes one side to attract the other, affecting adoption and competition.

Security

Adversarial Examples and Robustness
Small, structured perturbations can reliably fool neural networks; robustness requires training objectives and architectures aligned with worst-case perturbations.