Parsing and Grammar Formalisms
Recovering the grammatical structure of sentences by machine: the grammar formalisms that describe legal structures and the algorithms that compute them, from constituency trees to dependency graphs.
Definition
Parsing is the computational assignment of grammatical structure to an input string according to a grammar; grammar formalisms are the systems used to specify which structures are legal.
Scope
Covers syntactic analysis in computational linguistics — context-free constituency parsing and its probabilistic and chart-based algorithms, dependency parsing, the major grammar formalisms beyond plain context-free grammars, and the sequence-labeling tasks (such as part-of-speech tagging) that feed parsing. It excludes semantic interpretation, which is handled in computational semantics, and the underlying automata theory, covered in foundations.
Sub-topics
Core questions
- How can a sentence be assigned a syntactic tree or dependency graph efficiently?
- What grammar formalisms capture natural-language syntax adequately?
- How do probabilities help disambiguate among many possible parses?
- How do tagging and chunking support full parsing?
Key concepts
- constituency parse
- dependency parse
- context-free grammar
- chart parsing
- probabilistic grammar
- part-of-speech tagging
- treebank
- structural ambiguity
Key theories
- Chart parsing
- Dynamic-programming algorithms such as CKY and Earley that compute all possible analyses of a sentence in polynomial time by reusing shared subparses.
- Probabilistic context-free grammars
- Attaching probabilities to grammar rules so that the most likely parse can be selected, addressing the pervasive structural ambiguity of natural language.
History
Early parsing relied on hand-built grammars and exhaustive search; the CKY and Earley algorithms made context-free parsing efficient. The release of treebanks in the 1990s enabled data-driven probabilistic parsing, and the 2000s saw dependency parsing rise to prominence for its cross-linguistic robustness, later subsumed by neural parsers.
Debates
- Constituency versus dependency representation
- Whether syntax is best represented as nested phrases or as labeled head-dependent relations; both are widely used, with dependency favored for free-word-order languages and downstream tasks.
Key figures
- Jay Earley
- Joakim Nivre
- Christopher Manning
- Mitchell Marcus
Related topics
Seminal works
- manning1999
- kubler2009
- jurafsky2025
Frequently asked questions
- Why is parsing hard if grammar rules are known?
- Natural sentences are massively ambiguous: a single string can have many legal structures. Parsing must therefore not only find structures but rank them, which is why probabilistic and learned models are essential.