Transaction Management and Concurrency
Transaction management and concurrency control let a database execute many concurrent transactions correctly and survive failures, guaranteeing the atomicity, consistency, isolation, and durability (ACID) properties.
Definition
A transaction is a sequence of database operations executed as a single logical unit that is atomic, consistency-preserving, isolated from concurrent transactions, and durable once committed; transaction management is the set of mechanisms that enforce these properties under concurrency and failure.
Scope
This area covers the transaction as the unit of work and the mechanisms that make concurrent, failure-prone execution correct: the ACID properties; serializability as the correctness criterion for concurrency; locking, timestamp, and multiversion protocols that enforce it; the weaker isolation levels that trade consistency for performance; and the logging and recovery techniques that ensure atomicity and durability across crashes. It excludes distributed commit across multiple sites, which is treated in the distributed and parallel databases area.
Sub-topics
Core questions
- What do the ACID properties guarantee and why are they needed?
- What makes a concurrent schedule correct, i.e. serializable?
- How do locking, timestamp, and multiversion protocols enforce serializability?
- How do logging and recovery restore a consistent state after a crash?
- How do weaker isolation levels trade consistency for concurrency?
Key concepts
- transaction and commit/abort
- ACID properties
- serializability and the precedence graph
- two-phase locking
- timestamp ordering
- multiversion concurrency control
- deadlock detection and prevention
- write-ahead logging
- isolation levels
Key theories
- ACID properties
- Atomicity (all-or-nothing execution), consistency (preservation of integrity constraints), isolation (concurrent transactions appear to run alone), and durability (committed effects survive failures) together define correct transactional behavior.
- Serializability
- A concurrent schedule is correct if it is equivalent to some serial execution; conflict serializability, tested via the precedence (serialization) graph, is the practical criterion concurrency-control protocols enforce.
- Concurrency control and recovery
- Locking, timestamp ordering, and multiversion methods prevent non-serializable interleavings, while write-ahead logging and recovery algorithms guarantee atomicity and durability, the two halves of correct transaction processing.
Clinical relevance
Transaction management is what makes databases trustworthy for money and records: it ensures that a bank transfer never debits one account without crediting another, that an airline never double-books a seat under concurrent requests, and that committed data survives crashes, underpinning all reliable transactional systems.
History
The transaction concept and the ACID properties were formalized at IBM's System R in the 1970s; Eswaran and colleagues (1976) established consistency and locking notions, Jim Gray defined transactions and recovery, and Bernstein, Hadzilacos, and Goodman (1987) systematized serializability theory. Gray's contributions to transaction processing earned him the Turing Award.
Key figures
- Jim Gray
- Philip Bernstein
- Andreas Reuter
Related topics
Seminal works
- gray1992
- bernstein1987
- eswaran1976
Frequently asked questions
- What does it mean for a schedule of transactions to be serializable?
- A concurrent schedule is serializable if its overall effect is identical to running the same transactions one after another in some order. Serializability is the gold-standard correctness criterion: even though transactions interleave for performance, the result is as if they had executed serially, so no transaction sees an inconsistent intermediate state.
- Why are weaker isolation levels allowed if they can produce anomalies?
- Enforcing full serializability has a performance cost in lock contention and aborts. Many applications can tolerate certain anomalies in exchange for higher concurrency, so the SQL standard defines weaker isolation levels (read committed, repeatable read, and so on) that let developers deliberately trade some isolation for throughput.