GPU and Accelerator Computing
GPU and accelerator computing harnesses massively parallel many-core processors to accelerate data-parallel workloads far beyond what general-purpose CPUs achieve.
Definition
GPU and accelerator computing is the use of specialized many-core processors, optimized for high-throughput data-parallel execution, to offload and speed up the parallelizable portions of a computation under a host-device programming model.
Scope
This topic covers the architecture of graphics processing units and other accelerators as throughput-oriented, many-core SIMD/SIMT machines; the programming models that target them (CUDA, OpenCL, and directive-based offloading); the thread-hierarchy and memory-hierarchy abstractions (threads, warps, blocks, grids; global, shared, and register memory); and the performance considerations—occupancy, memory coalescing, and divergence—that govern achievable throughput.
Core questions
- How does the throughput-oriented, many-core accelerator model differ from a general-purpose CPU?
- How are computations expressed as massively parallel kernels over a thread hierarchy?
- What memory and execution behaviors—coalescing, divergence, occupancy—limit achievable performance?
Key theories
- SIMT execution model
- GPUs run thousands of lightweight threads grouped into warps that execute in lockstep (single-instruction, multiple-thread); performance depends on keeping warps busy and avoiding control-flow divergence within a warp.
- Hierarchical thread and memory model
- CUDA organizes threads into blocks and grids and exposes a memory hierarchy of registers, fast shared memory, and large global memory; mapping data and computation onto this hierarchy is the central performance task.
- General-purpose GPU computing
- The evolution from fixed-function graphics pipelines to programmable, general-purpose accelerators turned GPUs into a mainstream platform for scientific and data-intensive computing.
Clinical relevance
Accelerators are the workhorses of modern computing-intensive applications: deep-learning training and inference, scientific simulation, image and signal processing, and cryptography all rely on GPUs for order-of-magnitude speedups over CPUs.
History
GPUs evolved from fixed-function graphics hardware into programmable parallel processors; the 2007 release of CUDA, described by Nickolls and colleagues in 2008, made general-purpose GPU computing accessible, and accelerators subsequently became central to high-performance and machine-learning computing.
Key figures
- John Nickolls
- Wen-mei Hwu
- David Kirk
- John Owens
Related topics
Seminal works
- nickolls2008
- kirk2016
- owens2008
Frequently asked questions
- Why are GPUs so much faster than CPUs for some workloads?
- GPUs devote far more of their transistors to arithmetic units and run thousands of threads to hide memory latency, trading single-thread speed for aggregate throughput. This makes them excellent for regular, highly data-parallel work, though poorly suited to branchy, latency-sensitive code.