ScholarGate
ผู้ช่วย

Cost-Based Query Optimization

Cost-based query optimization searches the space of equivalent execution plans for a query and selects the one with the lowest estimated cost, using statistics about the data and a model of execution cost.

ค้นหาหัวข้อด้วย PaperMindเร็ว ๆ นี้Find papers & topics
Tools & resources
ดาวน์โหลดสไลด์
Learn & explore
วิดีโอเร็ว ๆ นี้

Definition

Cost-based query optimization is the process of estimating, from data statistics and a cost model, the execution cost of alternative equivalent plans for a query and selecting the plan of least estimated cost, typically with join ordering chosen by dynamic programming.

Scope

This topic covers how an optimizer chooses a plan: the plan search space generated by relational-algebra equivalences and alternative physical operators, the cost model that scores plans (chiefly in I/O and CPU), cardinality and selectivity estimation from statistics and histograms, and the dynamic-programming approach to join-order enumeration introduced by System R. It excludes the implementation of the operators themselves and rule-driven logical rewriting beyond what feeds plan generation.

Core questions

  • How is the space of equivalent execution plans generated and pruned?
  • How does the optimizer estimate the cardinality and selectivity of operations?
  • How does dynamic programming enumerate join orders efficiently?
  • What does the cost model account for, and how are I/O and CPU costs combined?
  • Why do cardinality-estimation errors cause poor plan choices?

Key concepts

  • plan search space
  • cost model (I/O and CPU)
  • selectivity and cardinality estimation
  • histograms and statistics
  • join-order enumeration
  • dynamic programming
  • interesting orders
  • transformation-based optimization

Key theories

System R dynamic-programming optimization
Selinger's optimizer enumerates join orders bottom-up with dynamic programming, keeping the cheapest plan for each subset of relations (and each interesting sort order), which makes searching the large join-order space tractable.
Cardinality and selectivity estimation
The optimizer estimates the number of tuples each operation produces using table statistics, histograms, and selectivity formulas; these estimates drive the cost model and are the main source of optimization error.
Cost models and search strategies
Plans are scored by a cost model combining I/O, CPU, and memory effects; beyond System R's dynamic programming, transformation-based optimizers explore plans via rewrite rules to extend optimization to richer operators and distributed settings.

Clinical relevance

The optimizer is the component that lets users write declarative SQL without specifying how to execute it; the quality of its cost estimates and search directly governs whether queries on large databases are fast, so cost-based optimization is among the most studied and commercially important parts of database systems.

History

The 1979 System R optimizer by Selinger and colleagues introduced cost-based optimization with dynamic-programming join ordering and selectivity-based cost estimation, defining the field. Later transformation-based optimizers such as Volcano and Cascades generalized the search, and ongoing work on cardinality estimation, including learned models, addresses the framework's central weakness.

Debates

Cardinality estimation versus adaptive execution
Because cost-based optimization is only as good as its size estimates, which are often wrong for correlated data, researchers debate whether to invest in better statistics and machine-learned estimators or to rely more on adaptive, runtime re-optimization that corrects bad plans during execution.

Key figures

  • Patricia Selinger
  • Goetz Graefe
  • Surajit Chaudhuri

Related topics

Seminal works

  • selinger1979
  • graefe1993

Frequently asked questions

Why does an optimizer sometimes choose a bad plan?
Cost-based optimizers rely on estimates of how many rows each operation will produce. When data is skewed or columns are correlated, these estimates can be far off, and the error compounds across joins. A wrong estimate can lead the optimizer to pick a poor join order or access path, even though the plan looked cheap on paper.
Why use dynamic programming for join ordering?
The number of possible join orders grows combinatorially with the number of tables, so exhaustive search is infeasible. Dynamic programming builds optimal plans for larger sets of relations from optimal plans for smaller subsets, dramatically reducing the work while still finding a good (often optimal) join order for typical query sizes.

Methods for this concept

Related concepts