What is the difference between in-order and out-of-order execution?

An in-order processor executes instructions strictly in program order, stalling when an instruction's operands are not ready. An out-of-order processor executes any instruction whose operands are available, using buffers and renaming to reorder execution while still committing results in program order.

Why does branch prediction matter?

Deep pipelines fetch and begin executing instructions before a branch's direction is known. Accurate branch prediction lets the processor keep the pipeline full along the likely path; a misprediction wastes the speculative work and incurs a multi-cycle penalty.

Processor Microarchitecture

Processor microarchitecture is the internal hardware organization that implements an instruction set architecture, encompassing the pipeline, execution units, register renaming, and control logic that turn a stream of instructions into computed results as fast as possible.

Znajdź temat z PaperMindWkrótceFind papers & topics

Tools & resources

Pobierz slajdy

Learn & explore

WideoWkrótce

Definition

Microarchitecture is the concrete logical organization of a processor — its pipeline stages, functional units, buffers, and control — that realizes the behavior specified by an instruction set architecture while seeking high performance and efficiency.

Scope

This area covers how a processor is built beneath the ISA interface: the datapath and control, pipelining and the hazards that limit it, techniques for extracting instruction-level parallelism, branch prediction, speculative and out-of-order execution, and the scheduling structures that keep execution units busy. It excludes the visible instruction set itself (instruction set architecture) and the memory subsystem beyond the first levels of cache (memory hierarchy and caches), as well as multi-core organization (parallel and multicore architecture).

Sub-topics

Core questions

How does pipelining overlap instruction execution, and what structural, data, and control hazards limit it?
How much instruction-level parallelism exists in a program, and how can hardware extract it?
How does branch prediction reduce the cost of control hazards in deep pipelines?
How do out-of-order execution and register renaming expose parallelism while preserving program semantics?
How are performance, power, and complexity traded off in microarchitectural design?

Key concepts

datapath and control
instruction pipeline
structural, data, and control hazards
forwarding and bypassing
instruction-level parallelism
branch prediction
speculative execution
out-of-order execution
register renaming
superscalar issue

Key theories

Pipelining: Overlapping the execution of multiple instructions in stages increases instruction throughput; the achievable speedup is bounded by pipeline depth and the stalls introduced by hazards and dependencies.
Dynamic scheduling and register renaming: Tomasulo's algorithm dynamically schedules instructions onto execution units and renames registers via reservation stations and a common data bus, allowing instructions to execute out of program order while respecting true data dependencies — the foundation of modern out-of-order processors.

Mechanisms

A pipelined processor splits instruction processing into stages (fetch, decode, execute, memory, writeback) so that several instructions are in flight at once. Hazards — a needed result not yet available, a contested resource, or an unresolved branch — are handled by forwarding, stalling, prediction, and speculation. Superscalar out-of-order cores add reservation stations, reorder buffers, and renaming so that independent instructions execute as soon as their operands are ready and results are committed in program order.

Clinical relevance

Microarchitecture determines the real-world speed and energy efficiency of processors: pipelining, superscalar issue, and out-of-order execution underlie the performance of nearly every modern CPU. Microarchitectural design also has security consequences — speculative-execution side channels such as Spectre and Meltdown arise directly from performance features.

History

Pipelining and multiple functional units appeared in the IBM System/360 Model 91 and CDC 6600 in the 1960s, where Tomasulo introduced dynamic scheduling. RISC microarchitectures of the 1980s made deep pipelines mainstream, and superscalar out-of-order designs became dominant in high-performance CPUs through the 1990s and 2000s. Aggressive speculation later exposed the microarchitectural side channels publicized in 2018.

Debates

Aggressive speculation versus security and efficiency: Deep speculative out-of-order execution boosts single-thread performance but increases power and has enabled transient-execution security attacks, prompting debate over how much speculation is worthwhile relative to simpler, more efficient or more predictable designs.

Key figures

Robert Tomasulo
John L. Hennessy
David A. Patterson
Yale Patt
James E. Smith

Seminal works

hennessy2019
tomasulo1967
patterson2020

Frequently asked questions

What is the difference between in-order and out-of-order execution?: An in-order processor executes instructions strictly in program order, stalling when an instruction's operands are not ready. An out-of-order processor executes any instruction whose operands are available, using buffers and renaming to reorder execution while still committing results in program order.
Why does branch prediction matter?: Deep pipelines fetch and begin executing instructions before a branch's direction is known. Accurate branch prediction lets the processor keep the pipeline full along the likely path; a misprediction wastes the speculative work and incurs a multi-cycle penalty.