Byzantine Fault Tolerance (PBFT)
Takeaway
PBFT tolerates arbitrary (including malicious) faults by requiring 3f+1 replicas and 2f+1 quorums with authenticated messages across pre-prepare, prepare, and commit phases.
The problem (before → after)
- Before: Crash-fault protocols fail under equivocation or lies.
- After: Quorum certificates and message authentication detect and outvote Byzantine behavior.
Mental model first
Imagine traitors in an army: orders are accepted only when confirmed by enough independent messengers carrying signed copies.
Just-in-time concepts
- Quorum sizes: 3f+1 nodes tolerate f faulty; 2f+1 signatures commit.
- View changes for leader replacement.
- Deterministic state machine replication.
First-pass solution
Leader proposes; replicas exchange prepares and commits; once 2f+1 commits, execute and reply; on faulty leader evidence or timeout, trigger view change.
Iterative refinement
- Optimizations: batching, speculative execution.
- Cryptographic choices: MACs vs signatures.
- BFT in WANs and permissionless settings (tendermint, hotstuff variants).
Principles, not prescriptions
- Authenticate messages; require overlapping quorums.
- Separate normal-case fast path from view change.
Common pitfalls
- Inadequate view-change safety checks.
- Non-determinism in application logic breaks replication.
Connections and contrasts
- See also: [/blog/raft-consensus], crash fault-tolerant protocols, blockchain consensus.
Quick checks
- Why 3f+1? — To ensure 2f+1 honest nodes remain for quorums.
- Why three phases? — To prevent equivocation and ensure agreement.
- When view change? — Suspected faulty or slow leader.
Further reading
- Castro & Liskov (source above); HotStuff, Tendermint