ScholarGate
助手

Join Algorithms

Join algorithms are the physical methods — nested-loop, sort-merge, and hash join — that combine tuples from two or more relations on a join condition, and they are usually the most performance-critical operators in a query plan.

用 PaperMind 寻找选题即将推出Find papers & topics
Tools & resources
下载幻灯片
Learn & explore
视频即将推出

Definition

A join algorithm is a physical operator that computes the join of two relations on a predicate by systematically pairing tuples that satisfy the condition, using nested iteration, sorted merging, or hashing to find matching tuples efficiently.

Scope

This topic covers the principal algorithms for evaluating joins: simple, block, and index nested-loop joins; sort-merge join and its synergy with already-sorted inputs; and hash join, including the grace and hybrid variants that handle inputs larger than memory. It analyzes their I/O cost and memory requirements and the conditions under which each is preferred. It excludes the optimizer's enumeration of join orders, which is treated under cost-based query optimization.

Core questions

  • How do nested-loop, sort-merge, and hash join differ in approach and cost?
  • When does an index nested-loop join outperform the alternatives?
  • How do grace and hybrid hash joins handle inputs larger than memory?
  • How is the I/O cost of each join method analyzed in terms of pages and passes?
  • What join condition (equality versus inequality) does each algorithm require?

Key concepts

  • nested-loop join
  • block nested-loop join
  • index nested-loop join
  • sort-merge join
  • hash join
  • grace and hybrid hash join
  • equijoin versus theta join
  • I/O cost analysis

Key theories

Nested-loop joins
For each tuple of one relation the algorithm scans the other for matches; block nested-loop reduces I/O by buffering pages, and index nested-loop replaces the inner scan with an index lookup when one is available, making it efficient for selective joins.
Sort-merge join
Both inputs are sorted on the join attribute and then merged in a single coordinated pass; it is especially attractive when inputs are already sorted or when the output must be sorted, and it handles equality joins efficiently.
Hash join
An equality join is computed by building an in-memory hash table on the smaller relation and probing it with the larger; the grace and hybrid variants partition both inputs to disk when they exceed memory, giving strong performance for large equijoins.

Clinical relevance

Joins dominate the cost of analytical and reporting queries that combine multiple tables, so the choice of join algorithm — often the single most important decision in a plan — determines whether such queries are interactive or take hours, making these algorithms central to database performance.

History

Sort-merge and nested-loop joins date to the earliest relational systems. Hash join and its grace and hybrid variants were developed in the 1980s, notably in parallel-database research, and were shown to outperform sort-merge for many large equijoins. Graefe's 1993 survey consolidated the analysis of these algorithms that database texts still follow.

Key figures

  • Goetz Graefe
  • David DeWitt

Related topics

Seminal works

  • graefe1993
  • garciamolina2008

Frequently asked questions

Which join algorithm is fastest?
It depends on the inputs. Hash join is usually best for large equijoins when neither input is sorted; sort-merge wins when inputs are already sorted or the output must be ordered; and index nested-loop is best when one input is small and the other has a selective index on the join column. The optimizer picks based on estimated cost.
Why can hash join not handle inequality conditions?
Hash join groups tuples by the hash of the join key so that only tuples with equal keys land in the same bucket. That works for equality (equijoin) conditions but not for inequalities like less-than, which require comparing tuples across buckets — those are handled by nested-loop or sort-merge style methods instead.

Methods for this concept

Related concepts