Why are B+-trees used instead of binary search trees in databases?

Databases store data on disk, where the cost is dominated by the number of page reads. B+-trees have a high fanout, so each node fills a disk page and the tree is very shallow, requiring only a few I/Os to reach any record. A binary tree would be far deeper and incur many more disk accesses.

When should I use a hash index rather than a B+-tree?

Use a hash index when you only need equality lookups (e.g. find the row with a given id) and want expected constant-time access. Use a B+-tree when you also need range queries, ordered scans, or prefix matching, since hash indexes do not preserve key order and cannot answer range conditions efficiently.

Indexing and Access Methods

Indexes and access methods are the auxiliary data structures — chiefly B+-trees and hash indexes — that let a database locate matching tuples without scanning an entire table, providing the fast access paths the optimizer relies on.

Znajdź temat z PaperMindWkrótceFind papers & topics

Tools & resources

Pobierz slajdy

Learn & explore

WideoWkrótce

Definition

An index is an auxiliary data structure on one or more attributes of a relation that maps key values to the locations of matching tuples, enabling retrieval in time sublinear in the table size; an access method is the mechanism a query uses to read data, such as a full scan or an index scan.

Scope

This topic covers the structures and concepts behind data access paths: the distinction between clustered and unclustered, primary and secondary indexes; the B+-tree as the workhorse ordered index supporting equality and range search; hash-based indexes for equality search; and the role of indexes in selection, join, and sort avoidance. It treats how index choice affects query cost and update overhead. It excludes the cost-based optimizer's overall plan selection, which is a separate topic.

Core questions

How does a B+-tree support both equality and range queries efficiently?
What is the difference between clustered and unclustered, primary and secondary indexes?
When is a hash index preferable to a tree index?
How do indexes speed up selections, joins, and ordered retrieval?
What is the cost of maintaining indexes under inserts, updates, and deletes?

Key concepts

B+-tree index
hash index
clustered versus unclustered index
primary and secondary indexes
dense and sparse indexes
range and equality search
extendible and linear hashing
index maintenance cost

Key theories

B+-tree indexes: The B+-tree is a balanced, high-fanout search tree that stores keys in sorted order with all data references in the leaves; it supports equality and range queries and ordered scans in logarithmic I/O and stays balanced under insertions and deletions.
Clustered versus unclustered indexes: A clustered index keeps the table's rows physically ordered by the index key, making range scans very efficient, whereas an unclustered (secondary) index points into an unordered file, so retrieving many matching rows may cost one I/O per row.
Hash indexes: Hash-based indexes map keys to buckets for expected constant-time equality lookups but do not support range queries; dynamic schemes such as extendible and linear hashing let them grow gracefully with the data.

Clinical relevance

Indexing is the most common practical lever for database performance: choosing the right indexes can turn full-table scans into millisecond lookups for transactional and analytical queries, while over-indexing slows updates, so index design is a recurring decision in operating real systems.

History

Bayer and McCreight introduced the B-tree in 1972 to maintain large ordered indices on disk; the B+-tree variant, which keeps all data in the leaves, became the standard database index, as surveyed in Comer's 1979 'Ubiquitous B-tree.' Hash-based access methods and dynamic hashing schemes developed in parallel, and both families remain core to every relational engine.

Key figures

Rudolf Bayer
Edward McCreight
Douglas Comer

Seminal works

bayer1972
comer1979
ramakrishnan2003

Frequently asked questions

Why are B+-trees used instead of binary search trees in databases?: Databases store data on disk, where the cost is dominated by the number of page reads. B+-trees have a high fanout, so each node fills a disk page and the tree is very shallow, requiring only a few I/Os to reach any record. A binary tree would be far deeper and incur many more disk accesses.
When should I use a hash index rather than a B+-tree?: Use a hash index when you only need equality lookups (e.g. find the row with a given id) and want expected constant-time access. Use a B+-tree when you also need range queries, ordered scans, or prefix matching, since hash indexes do not preserve key order and cannot answer range conditions efficiently.