Why are caches effective if they are so small compared to main memory?

Because programs exhibit locality: at any moment they touch only a small working set of data repeatedly, and accesses cluster in nearby addresses. A small cache holding that working set therefore satisfies the large majority of requests.

What is the difference between a cache and virtual memory?

Both move data between a faster, smaller level and a slower, larger one. A cache (managed in hardware) holds blocks of main memory; virtual memory (managed with the operating system) maps program addresses to physical memory and pages data to and from disk, also providing protection and an address space larger than physical RAM.

Memory Hierarchy and Caches

The memory hierarchy organizes storage into levels — registers, caches, main memory, and backing store — so that the small, fast levels hold the data most likely to be used, giving programs the illusion of memory that is both large and fast.

Definition

A memory hierarchy is a layered arrangement of storage technologies of differing speed, cost, and capacity, managed so that frequently and recently accessed data resides in the faster, smaller levels — most importantly the cache — to minimize average access time.

Scope

This area covers why and how memory is organized as a hierarchy exploiting locality of reference. It includes cache structure (placement, replacement, and write policies), multi-level caches, cache coherence across processors, virtual memory and address translation, and the underlying DRAM and emerging memory technologies. It excludes processor execution itself (processor microarchitecture) and large-scale file and device storage (storage and I/O systems), although it borders both.

Sub-topics

Core questions

Why does locality of reference make a layered memory hierarchy effective?
How are caches organized in terms of placement, associativity, replacement, and write policy?
How is coherence maintained when multiple caches hold copies of the same memory?
How does virtual memory translate addresses and provide protection and the illusion of large memory?
How do the characteristics of DRAM and emerging memories shape system performance?

Key concepts

temporal and spatial locality
cache hit and miss
associativity and replacement policy
write-through and write-back
multi-level caches
cache coherence
virtual memory and paging
translation lookaside buffer
DRAM and memory bandwidth
average memory access time

Key theories

Locality of reference: Programs tend to reuse recently accessed data (temporal locality) and access nearby addresses (spatial locality); the memory hierarchy exploits both by caching recently used blocks and fetching neighbors together.
Cache design trade-offs: Cache performance is governed by miss rate, miss penalty, and hit time, and tuned through size, block size, associativity, and replacement and write policies; classic analyses identify compulsory, capacity, and conflict misses as the targets of these choices.

Mechanisms

When the processor requests data, the cache is checked first; a hit returns data quickly, while a miss fetches a block from a lower level and may evict another by a replacement policy. Writes are propagated by write-through or write-back schemes, and coherence protocols keep multiple caches consistent. Virtual memory adds a translation step — via page tables and a translation lookaside buffer — mapping program addresses to physical DRAM while enforcing protection.

Clinical relevance

The memory hierarchy frequently dominates real performance: because processors are far faster than main memory, cache behavior and locality often matter more than raw instruction speed. Cache-aware data layouts, blocking, and prefetching are central to high-performance computing, databases, and machine-learning kernels, and cache timing has become a source of security side channels.

History

Maurice Wilkes proposed the cache (a 'slave memory') in 1965, and caches entered commercial machines such as the IBM System/360 Model 85. Virtual memory originated with the Atlas computer in the early 1960s, formalized by Denning's working-set model. Alan Jay Smith's 1982 survey consolidated cache design knowledge, and multi-level caches and sophisticated coherence protocols became standard as processor-memory speed gaps widened.

Debates

Hardware versus software management of locality: There is ongoing tension between transparent hardware caching and explicitly managed memories (scratchpads, software-controlled prefetch): hardware caching is general and easy to program against, while explicit management can be more predictable and efficient for specialized workloads.

Key figures

Maurice Wilkes
Alan Jay Smith
Peter J. Denning
John L. Hennessy
David A. Patterson

Seminal works

hennessy2019
smith1982cache
patterson2020

Frequently asked questions

Why are caches effective if they are so small compared to main memory?: Because programs exhibit locality: at any moment they touch only a small working set of data repeatedly, and accesses cluster in nearby addresses. A small cache holding that working set therefore satisfies the large majority of requests.
What is the difference between a cache and virtual memory?: Both move data between a faster, smaller level and a slower, larger one. A cache (managed in hardware) holds blocks of main memory; virtual memory (managed with the operating system) maps program addresses to physical memory and pages data to and from disk, also providing protection and an address space larger than physical RAM.