Static Program Analysis
Static program analysis examines source or intermediate code, without executing it, to infer properties such as which values a variable may hold or whether an error can occur.
Definition
Static program analysis is the computation of approximate but sound information about a program's possible behaviors directly from its code, without running it, typically by solving dataflow equations over a lattice of abstract states.
Scope
This topic covers dataflow analysis over control-flow graphs, lattice-based fixpoint computation, interprocedural analysis, pointer and alias analysis, and call-graph and control-flow analysis. It addresses the precision dimensions (flow-, context-, and path-sensitivity), the soundness of analyses, and their use in optimization and bug detection.
Core questions
- How are program properties computed by propagating facts over a control-flow graph?
- How do flow-, context-, and path-sensitivity affect precision and cost?
- How is sound analysis carried across procedure calls?
- How does pointer analysis cope with aliasing and indirection?
Key theories
- Dataflow analysis as fixpoint over a lattice
- Kildall unified global program optimizations as the computation of a fixpoint of dataflow equations over a lattice, providing the general framework for static analyses.
- Interprocedural analysis via graph reachability
- Reps, Horwitz, and Sagiv reduced a large class of precise interprocedural dataflow problems to graph reachability, enabling efficient context-sensitive analysis across procedure boundaries.
- Principles and dimensions of analysis
- Nielson, Nielson, and Hankin systematize dataflow, constraint-based, and type-based analyses and the precision-versus-cost trade-offs that govern their design.
Clinical relevance
Static analysis drives compiler optimizations, bug- and vulnerability-finding tools, linters, and IDE features. Sound analyses can certify the absence of certain errors, while scalable heuristic analyses are widely deployed to catch defects early in development.
History
Dataflow analysis grew out of optimizing compilers, with Kildall's 1973 lattice framework and the Allen-Cocke analyses providing the theoretical base. Interprocedural and pointer analyses advanced through the 1980s and 1990s, including the Reps-Horwitz-Sagiv reachability formulation, and static analysis later scaled to industrial bug-finding on very large codebases.
Debates
- Precision versus scalability
- Analysis designers continually trade precision dimensions such as context- and path-sensitivity, which reduce false positives, against the computational cost that limits how large a program can be analyzed.
Key figures
- Gary Kildall
- Thomas Reps
- Susan Horwitz
- Flemming Nielson
- Hanne Riis Nielson
Related topics
Seminal works
- kildall1973
- reps1995
- nielson1999
Frequently asked questions
- What does it mean for a static analysis to be sound?
- A sound analysis never misses a real occurrence of the property it checks: if it reports no error of a given kind, that kind of error genuinely cannot happen, though it may also raise false alarms.
- Why is pointer analysis difficult?
- Pointers and references introduce aliasing, where multiple names refer to the same memory, so the analysis must approximate which locations each pointer may target, a problem that trades precision against scalability.