ScholarGate
Assistent

Global Snapshots and State

A global snapshot captures a consistent view of a distributed computation's state—each process's local state plus the messages in transit—without freezing the system.

Finn tema med PaperMindSnartFind papers & topics
Tools & resources
Last ned lysbilder
Learn & explore
VideoSnart

Definition

A consistent global state is a collection of local process states and channel contents corresponding to a consistent cut—one in which, for every recorded message receipt, the corresponding send is also recorded—so that the state could have arisen during the computation even though no global instant was observed.

Scope

This topic covers the notion of a consistent global state and the cut that defines it, the Chandy-Lamport marker-based snapshot algorithm and its assumptions (FIFO channels, reliable delivery), and the application of snapshots to stable-property detection such as termination and deadlock detection and to distributed checkpointing and recovery.

Core questions

  • What makes a recorded global state consistent rather than impossible?
  • How can such a state be recorded while the computation continues to run?
  • How are stable properties like termination and deadlock detected from snapshots?

Key theories

Consistent cuts
A global state corresponds to a cut across the processes' event sequences; the cut is consistent exactly when it is closed under the happened-before relation, ensuring no message is received before it is sent in the recorded state.
Chandy-Lamport snapshot algorithm
An initiator records its state and sends a marker on each outgoing channel; each process, on first receiving a marker, records its state and then records incoming messages on other channels until their markers arrive, capturing channel contents.
Stable-property detection
Because snapshots capture a state that the system could have been in, any stable property (one that stays true once it holds, such as termination or deadlock) detected in a snapshot truly holds, making snapshots a general detection tool.

Clinical relevance

Snapshot algorithms power distributed checkpoint/restart for fault recovery, including the asynchronous snapshotting used by modern stream-processing engines to provide exactly-once guarantees, as well as deadlock and termination detection in long-running computations.

History

Chandy and Lamport's 1985 algorithm gave the first practical method to record a consistent global state without stopping the system; Mattern and others generalized the underlying cut theory, and the technique later became foundational to fault-tolerant stream processing.

Key figures

  • K. Mani Chandy
  • Leslie Lamport
  • Friedemann Mattern

Related topics

Seminal works

  • chandy1985
  • mattern1989
  • lynch1996

Frequently asked questions

Does taking a snapshot require pausing the system?
No. The Chandy-Lamport algorithm records a consistent global state while computation continues, by propagating markers along channels; the recorded state is one the system could have been in, even though it was never globally halted.

Methods for this concept

Related concepts