Why are GPUs so much faster than CPUs for some workloads?

GPUs devote far more of their transistors to arithmetic units and run thousands of threads to hide memory latency, trading single-thread speed for aggregate throughput. This makes them excellent for regular, highly data-parallel work, though poorly suited to branchy, latency-sensitive code.

GPU and Accelerator Computing

GPU and accelerator computing harnesses massively parallel many-core processors to accelerate data-parallel workloads far beyond what general-purpose CPUs achieve.

Leia teema tööriistaga PaperMindPeagiFind papers & topics

Tools & resources

Laadi slaidid alla

Learn & explore

VideoPeagi

Definition

GPU and accelerator computing is the use of specialized many-core processors, optimized for high-throughput data-parallel execution, to offload and speed up the parallelizable portions of a computation under a host-device programming model.

Scope

This topic covers the architecture of graphics processing units and other accelerators as throughput-oriented, many-core SIMD/SIMT machines; the programming models that target them (CUDA, OpenCL, and directive-based offloading); the thread-hierarchy and memory-hierarchy abstractions (threads, warps, blocks, grids; global, shared, and register memory); and the performance considerations—occupancy, memory coalescing, and divergence—that govern achievable throughput.

Core questions

How does the throughput-oriented, many-core accelerator model differ from a general-purpose CPU?
How are computations expressed as massively parallel kernels over a thread hierarchy?
What memory and execution behaviors—coalescing, divergence, occupancy—limit achievable performance?

Key theories

SIMT execution model: GPUs run thousands of lightweight threads grouped into warps that execute in lockstep (single-instruction, multiple-thread); performance depends on keeping warps busy and avoiding control-flow divergence within a warp.
Hierarchical thread and memory model: CUDA organizes threads into blocks and grids and exposes a memory hierarchy of registers, fast shared memory, and large global memory; mapping data and computation onto this hierarchy is the central performance task.
General-purpose GPU computing: The evolution from fixed-function graphics pipelines to programmable, general-purpose accelerators turned GPUs into a mainstream platform for scientific and data-intensive computing.

Clinical relevance

Accelerators are the workhorses of modern computing-intensive applications: deep-learning training and inference, scientific simulation, image and signal processing, and cryptography all rely on GPUs for order-of-magnitude speedups over CPUs.

History

GPUs evolved from fixed-function graphics hardware into programmable parallel processors; the 2007 release of CUDA, described by Nickolls and colleagues in 2008, made general-purpose GPU computing accessible, and accelerators subsequently became central to high-performance and machine-learning computing.

Key figures

John Nickolls
Wen-mei Hwu
David Kirk
John Owens

Seminal works

nickolls2008
kirk2016
owens2008

Frequently asked questions

Why are GPUs so much faster than CPUs for some workloads?: GPUs devote far more of their transistors to arithmetic units and run thousands of threads to hide memory latency, trading single-thread speed for aggregate throughput. This makes them excellent for regular, highly data-parallel work, though poorly suited to branchy, latency-sensitive code.