Byzantine Fault Tolerance (PBFT)

Takeaway

PBFT tolerates arbitrary (including malicious) faults by requiring 3f+1 replicas and 2f+1 quorums with authenticated messages across pre-prepare, prepare, and commit phases.

The problem (before → after)

Before: Crash-fault protocols fail under equivocation or lies.
After: Quorum certificates and message authentication detect and outvote Byzantine behavior.

Mental model first

Imagine traitors in an army: orders are accepted only when confirmed by enough independent messengers carrying signed copies.

Just-in-time concepts

Quorum sizes: 3f+1 nodes tolerate f faulty; 2f+1 signatures commit.
View changes for leader replacement.
Deterministic state machine replication.

First-pass solution

Leader proposes; replicas exchange prepares and commits; once 2f+1 commits, execute and reply; on faulty leader evidence or timeout, trigger view change.

Optimizations: batching, speculative execution.
Cryptographic choices: MACs vs signatures.
BFT in WANs and permissionless settings (tendermint, hotstuff variants).

Principles, not prescriptions

Authenticate messages; require overlapping quorums.
Separate normal-case fast path from view change.

Common pitfalls

Inadequate view-change safety checks.
Non-determinism in application logic breaks replication.

Connections and contrasts

See also: [/blog/raft-consensus], crash fault-tolerant protocols, blockchain consensus.

Quick checks

Why 3f+1? — To ensure 2f+1 honest nodes remain for quorums.
Why three phases? — To prevent equivocation and ensure agreement.
When view change? — Suspected faulty or slow leader.