Krylov Subspace Methods
Krylov subspace methods solve large sparse linear systems by extracting the best approximation from the subspace generated by repeated matrix-vector products, needing only the action of the matrix rather than its entries.
Definition
A Krylov subspace method is an iterative solver that, at the m-th step, seeks an approximate solution in the m-dimensional Krylov subspace spanned by the residual and its successive images under the matrix, choosing the iterate by a projection or minimization condition.
Scope
This topic covers the Krylov subspace and its construction by the Arnoldi and Lanczos processes, the conjugate gradient method for symmetric positive-definite systems, MINRES and GMRES for symmetric indefinite and general matrices, biorthogonalization methods such as BiCGSTAB, and the convergence theory linking iteration count to the spectrum and condition number.
Core questions
- What is the Krylov subspace, and how is an orthonormal basis for it built stably?
- Why is the conjugate gradient method optimal and short-recurrence for symmetric positive-definite systems?
- How does GMRES handle general nonsymmetric systems, and why does it require increasing storage?
- How do the matrix's spectral properties determine the rate of convergence?
Key theories
- Conjugate gradient method
- For symmetric positive-definite systems, the conjugate gradient method minimizes the energy norm of the error over the Krylov subspace using a short three-term recurrence, converging in at most n steps in exact arithmetic and far faster when eigenvalues are clustered.
- GMRES and the Arnoldi process
- For general matrices, GMRES uses the Arnoldi process to build an orthonormal Krylov basis and minimizes the residual norm over the subspace, but because no short recurrence exists in general it must store all basis vectors, motivating restarted variants.
Mechanisms
Each method repeatedly multiplies a vector by the matrix to extend the Krylov subspace and orthogonalizes the new direction against previous ones. For symmetric matrices the Lanczos process yields a tridiagonal projection and a short recurrence, so the conjugate gradient and MINRES methods need only a few vectors of storage. For nonsymmetric matrices the Arnoldi process produces a full upper-Hessenberg projection, and GMRES minimizes the residual by solving a small least-squares problem each step, at the cost of storing the whole basis; restarting bounds this cost. Convergence is dictated by how favourably the eigenvalues are distributed, which preconditioning is designed to improve.
Clinical relevance
Krylov methods, especially preconditioned conjugate gradient and GMRES, are the standard solvers inside finite-element and finite-volume simulation codes, large-scale optimization, and scientific computing generally; their matrix-free nature lets them solve systems with millions or billions of unknowns that no direct method could factorize.
History
The conjugate gradient method was introduced by Hestenes and Stiefel in 1952 and the underlying Lanczos process in 1950; their full power as iterative solvers was recognized in the 1970s, and the development of GMRES by Saad and Schultz in 1986 and of stabilized biorthogonal methods extended the approach to general nonsymmetric systems.
Key figures
- Magnus Hestenes
- Eduard Stiefel
- Cornelius Lanczos
- Yousef Saad
- Henk van der Vorst
Related topics
Seminal works
- saad2003
- greenbaum1997
Frequently asked questions
- Why does the conjugate gradient method only work for symmetric positive-definite matrices?
- Its short, efficient recurrence and its optimality in the energy norm rely on the matrix being symmetric positive-definite. For symmetric indefinite or nonsymmetric matrices, different methods such as MINRES or GMRES are required, generally with more storage or work per step.
- Why does GMRES need so much memory?
- For a general nonsymmetric matrix there is no short recurrence that keeps the Krylov basis orthogonal, so GMRES must store every basis vector to minimize the residual. Restarted GMRES caps the memory by periodically discarding the basis, at the cost of slower convergence.