Data Replication and Consistency
Data replication keeps multiple copies of data for availability and performance, and consistency protocols govern how reads and writes across those copies are reconciled.
Definition
Data replication maintains copies of a data item on several nodes; a consistency model specifies the guarantees about the values that reads may return given the history of writes, ranging from strong (every read sees the latest write) to eventual (replicas converge if updates cease).
Scope
This topic covers replication strategies (primary-backup, multi-master, quorum), quorum-based read/write protocols and their intersection requirements, anti-entropy and gossip for eventual convergence, conflict detection with version vectors and conflict-free replicated data types (CRDTs), and the spectrum of consistency from linearizable to eventual. It treats the data-level counterpart of state-machine replication.
Core questions
- How do quorum sizes for reads and writes guarantee that reads observe the latest write?
- How do replicas converge under eventual consistency, and how are conflicts resolved?
- What consistency level should an application choose given its latency and availability needs?
Key theories
- Quorum consensus for replicated data
- By assigning votes to replicas and requiring read and write quorums whose sizes sum to more than the total, every read quorum intersects the latest write quorum, guaranteeing that reads observe up-to-date data.
- Eventual consistency and anti-entropy
- Highly available stores accept writes at any replica and reconcile asynchronously via gossip and version vectors, guaranteeing only that replicas converge when updates stop, as exemplified by the Dynamo design.
- Conflict-free replicated data types
- CRDTs are data types whose operations are designed to commute or whose states form a join-semilattice, so concurrent updates merge deterministically without coordination, providing strong eventual consistency.
Clinical relevance
These techniques define the guarantees of real storage systems: quorum protocols underlie strongly consistent key-value stores, while eventual consistency and CRDTs power highly available stores, shopping carts, and collaborative editors where availability outranks immediate agreement.
History
Gifford's 1979 weighted-voting scheme established quorum replication; Amazon's 2007 Dynamo paper popularized highly available eventual consistency; and the 2011 formalization of CRDTs gave a principled basis for coordination-free convergence, shaping modern replicated-data design.
Debates
- How much consistency should replicated data provide by default?
- Strong consistency eases application development but limits availability and adds latency, while eventual consistency maximizes availability at the cost of exposing temporary divergence; tunable quorums and CRDTs are attempts to let applications choose per operation.
Key figures
- David Gifford
- Werner Vogels
- Marc Shapiro
- Andrew S. Tanenbaum
Related topics
Seminal works
- gifford1979
- decandia2007
- shapiro2011
Frequently asked questions
- How do read and write quorums guarantee fresh reads?
- If a write must reach W replicas and a read must consult R replicas, and R plus W exceeds the total number of replicas, then any read quorum overlaps the most recent write quorum in at least one replica, so the read can observe the latest value.