ScholarGate
Msaidizi

State Machine Replication

State machine replication makes a service fault-tolerant by running deterministic replicas that execute the same commands in the same order, so survivors mask the failure of others.

Tafuta mada kwa PaperMindHivi karibuniFind papers & topics
Tools & resources
Pakua slaidi
Learn & explore
VideoHivi karibuni

Definition

State machine replication is a technique in which a service is modeled as a deterministic state machine and replicated across several servers that all apply the same ordered sequence of client commands, ensuring that non-faulty replicas remain in identical states and can substitute for failed ones.

Scope

This topic covers the state-machine model of a service, the requirement that replicas be deterministic, the use of total-order (atomic) broadcast or consensus to agree on the command sequence, and the contrast with primary-backup (passive) replication. It also covers handling non-determinism, replica reconfiguration, and the extension to Byzantine replicas.

Core questions

  • Why must replicas be deterministic and process commands in an identical order?
  • How is the agreed command order produced from consensus or atomic broadcast?
  • How does active (state-machine) replication differ from passive primary-backup replication?

Key theories

The state-machine approach
Any deterministic service can be made fault-tolerant by replicating it and feeding every replica the same ordered command stream; provided fewer than the tolerated number of replicas fail, the service remains correct and available.
Ordering via consensus
The required total order on commands is exactly an atomic broadcast, which is equivalent to consensus, so replicated state machines are typically built on a consensus protocol such as Paxos or Raft.
Byzantine state-machine replication
Extending the approach to tolerate arbitrary faults requires agreement among more than two-thirds of replicas and result voting by clients, as realized in practical Byzantine fault-tolerant replication.

Clinical relevance

State machine replication is the standard way to build a highly available, strongly consistent service; replicated logs, coordination services, and many distributed databases are state machines driven by a consensus-ordered command stream.

History

Lamport's logical-clock paper sketched the replicated-state-machine idea in 1978; Schneider's 1990 tutorial made it the canonical framework for fault-tolerant services; and Castro and Liskov's 1999 PBFT extended it to the Byzantine setting, a line that later influenced blockchain designs.

Debates

Active versus passive replication
Active (state-machine) replication executes commands at all replicas for fast failover but wastes work and requires determinism; passive primary-backup replication executes only at a primary and ships state, saving work but incurring failover delay and a single execution point.

Key figures

  • Fred Schneider
  • Leslie Lamport
  • Miguel Castro
  • Barbara Liskov

Related topics

Seminal works

  • schneider1990
  • lamport1998
  • castro1999

Frequently asked questions

Why must the replicated service be deterministic?
If replicas could make different choices on the same input—because of randomness, timing, or thread scheduling—their states would diverge despite seeing identical commands. Determinism guarantees that the same ordered command stream produces the same state everywhere.

Methods for this concept

Related concepts