Paxos and Raft
Paxos and Raft are the two most influential practical consensus protocols, providing crash-fault-tolerant agreement on a replicated log that underpins real-world coordination systems.
Definition
Paxos and Raft are quorum-based protocols that allow a set of replicas to agree on an ordered sequence of commands (a replicated log) despite crash failures, ensuring that committed entries are never lost or reordered as long as a majority of replicas remain available.
Scope
This topic covers the Paxos family—single-decree Paxos, Multi-Paxos, and its engineering refinements—and the Raft protocol, which reorganizes the same guarantees around an explicit leader, log replication, and membership change for understandability. It covers the roles of proposers/acceptors and leaders/followers, quorum intersection, leader election and terms, log matching, and the practical concerns of snapshots and reconfiguration.
Core questions
- How do quorum intersection and proposal numbering keep Paxos safe across rounds and leader changes?
- How does Raft decompose consensus into leader election, log replication, and safety?
- What engineering challenges arise when turning these protocols into production systems?
Key theories
- Single-decree and Multi-Paxos
- Paxos reaches agreement on one value through prepare and accept phases governed by monotonic proposal numbers and majority quorums; Multi-Paxos amortizes the prepare phase across a stream of decisions led by a stable leader to build a replicated log.
- Raft's decomposition
- Raft attains the same safety as Paxos by electing a single leader per term, having the leader append entries that followers replicate, and enforcing a log-matching property, deliberately trading minimality for understandability and ease of implementation.
- From specification to running system
- Deploying Paxos in practice requires handling disk failures, leader leases, log compaction, and reconfiguration, details often glossed over in the original algorithm but essential to correctness and performance.
Clinical relevance
Paxos and Raft run inside widely used coordination services, distributed databases, and configuration stores; understanding them is essential for building or operating any system that must keep replicas strongly consistent through failures.
History
Lamport described Paxos in his 1998 'part-time parliament' paper and clarified it in 'Paxos made simple' (2001); Chandra and colleagues reported the realities of making it live at scale in 2007; and Ongaro and Ousterhout introduced Raft in 2014 to make equivalent guarantees far easier to teach and implement.
Debates
- Understandability versus minimality in consensus protocols
- Raft was explicitly designed to be easier to understand than Paxos, prompting debate over whether its additional structure (a strong leader) sacrifices flexibility; proponents argue understandability reduces implementation bugs, while others note Paxos variants can be more general.
Key figures
- Leslie Lamport
- Diego Ongaro
- John Ousterhout
- Tushar Chandra
Related topics
Seminal works
- lamport1998
- ongaro2014
- chandra2007
Frequently asked questions
- Are Paxos and Raft fundamentally different algorithms?
- No—they solve the same problem with the same majority-quorum core and equivalent safety guarantees. Raft mainly reorganizes Paxos around a strong leader and an explicit log to make the protocol easier to understand and implement.