Why is parsing hard if grammar rules are known?

Natural sentences are massively ambiguous: a single string can have many legal structures. Parsing must therefore not only find structures but rank them, which is why probabilistic and learned models are essential.

Parsing and Grammar Formalisms

Recovering the grammatical structure of sentences by machine: the grammar formalisms that describe legal structures and the algorithms that compute them, from constituency trees to dependency graphs.

Find emne med PaperMindSnartFind papers & topics

Tools & resources

Hent slides

Learn & explore

VideoSnart

Definition

Parsing is the computational assignment of grammatical structure to an input string according to a grammar; grammar formalisms are the systems used to specify which structures are legal.

Scope

Covers syntactic analysis in computational linguistics — context-free constituency parsing and its probabilistic and chart-based algorithms, dependency parsing, the major grammar formalisms beyond plain context-free grammars, and the sequence-labeling tasks (such as part-of-speech tagging) that feed parsing. It excludes semantic interpretation, which is handled in computational semantics, and the underlying automata theory, covered in foundations.

Sub-topics

Core questions

How can a sentence be assigned a syntactic tree or dependency graph efficiently?
What grammar formalisms capture natural-language syntax adequately?
How do probabilities help disambiguate among many possible parses?
How do tagging and chunking support full parsing?

Key concepts

constituency parse
dependency parse
context-free grammar
chart parsing
probabilistic grammar
part-of-speech tagging
treebank
structural ambiguity

Key theories

Chart parsing: Dynamic-programming algorithms such as CKY and Earley that compute all possible analyses of a sentence in polynomial time by reusing shared subparses.
Probabilistic context-free grammars: Attaching probabilities to grammar rules so that the most likely parse can be selected, addressing the pervasive structural ambiguity of natural language.

History

Early parsing relied on hand-built grammars and exhaustive search; the CKY and Earley algorithms made context-free parsing efficient. The release of treebanks in the 1990s enabled data-driven probabilistic parsing, and the 2000s saw dependency parsing rise to prominence for its cross-linguistic robustness, later subsumed by neural parsers.

Debates

Constituency versus dependency representation: Whether syntax is best represented as nested phrases or as labeled head-dependent relations; both are widely used, with dependency favored for free-word-order languages and downstream tasks.

Key figures

Jay Earley
Joakim Nivre
Christopher Manning
Mitchell Marcus

Seminal works

manning1999
kubler2009
jurafsky2025

Frequently asked questions

Why is parsing hard if grammar rules are known?: Natural sentences are massively ambiguous: a single string can have many legal structures. Parsing must therefore not only find structures but rank them, which is why probabilistic and learned models are essential.