ScholarGate
المساعد

Hash Tables

A hash table implements a dictionary by using a hash function to map keys to array positions, supporting expected constant-time insertion, deletion and lookup when collisions are well managed.

اعثر على موضوع باستخدام PaperMindقريبًاFind papers & topics
Tools & resources
تنزيل الشرائح
Learn & explore
فيديوقريبًا

Definition

A hash table is a data structure that stores key-value pairs in an array, using a hash function to compute from each key an index into the array, with a collision-resolution scheme to handle distinct keys that hash to the same index.

Scope

This topic covers hashing-based dictionaries: hash functions and their desirable properties, collision-resolution strategies (separate chaining and open addressing), load factor and resizing, the universal and perfect hashing frameworks that give provable guarantees, and related probabilistic structures such as Bloom filters. It excludes ordered dictionary structures, which are covered under search trees.

Core questions

  • What makes a hash function good, and how is it chosen to spread keys uniformly?
  • How are collisions resolved by chaining or open addressing, and how do they affect cost?
  • How does the load factor govern expected operation time and trigger resizing?
  • How do universal and perfect hashing provide provable performance guarantees?
  • When is a space-efficient probabilistic structure like a Bloom filter preferable to an exact table?

Key concepts

  • hash function
  • separate chaining
  • open addressing
  • load factor
  • rehashing and resizing
  • universal hashing
  • perfect hashing
  • Bloom filter

Key theories

Universal hashing
By choosing the hash function at random from a carefully designed (universal) family, one can guarantee a low expected number of collisions for any fixed set of keys, making worst-case adversarial inputs improbable.
Collision resolution and load factor
Separate chaining stores colliding keys in lists per slot, while open addressing probes alternative slots; expected operation time is governed by the load factor (entries per slot), and tables are resized to keep it bounded.

Clinical relevance

Hash tables are among the most-used data structures in computing: they implement dictionaries and sets in standard libraries, power database indexing and in-memory caches, support symbol tables in compilers, and underlie deduplication and membership tests. Bloom filters scale membership queries in databases and networking where exact storage is infeasible.

History

Hashing originated in the 1950s with work attributed to Hans Peter Luhn at IBM. Burton Bloom introduced the space-efficient Bloom filter in 1970. Carter and Wegman formalized universal and later strongly universal hashing in the late 1970s and early 1980s, giving hashing its rigorous theoretical foundation.

Key figures

  • Hans Peter Luhn
  • J. Lawrence Carter
  • Mark Wegman
  • Burton H. Bloom

Related topics

Seminal works

  • bloom1970
  • carter1981
  • cormen2009

Frequently asked questions

Why are hash table operations described as expected O(1) rather than guaranteed O(1)?
If many keys collide, operations can degrade toward O(n). Constant time holds in expectation under a good hash function and bounded load factor; universal hashing makes a bad case improbable, but worst-case guarantees require perfect hashing or other techniques.
What is a Bloom filter and how does it differ from a hash table?
A Bloom filter is a compact probabilistic structure that tests set membership using several hash functions over a bit array. It can yield false positives but never false negatives, and it stores no keys, trading exactness for large space savings compared with a hash table.

Methods for this concept

Related concepts