What is the difference between constituency and dependency parsing?

Constituency parsing groups words into nested phrases (such as noun phrases and verb phrases), producing a tree of constituents. Dependency parsing instead links each word to the word it depends on (its head), producing a graph of grammatical relations. Both capture syntactic structure but emphasize different aspects.

Why is parsing hard despite grammars being well defined?

Natural-language sentences are highly ambiguous: a single sentence can have many grammatically valid structures, and the number can grow rapidly with sentence length. Choosing the intended analysis requires statistical or learned preferences, not just a grammar, which is what makes parsing challenging.

Syntactic Parsing

Syntactic parsing is the task of recovering the grammatical structure of a sentence, assigning it a constituency tree or a dependency structure that shows how words combine and relate.

Definition

Syntactic parsing maps a sentence to a representation of its grammatical structure—typically a constituency (phrase-structure) tree or a dependency graph—according to a grammar or a model learned from annotated data.

Scope

This topic covers the analysis of sentence structure: context-free and richer grammars, constituency parsing (phrase-structure trees) and dependency parsing (head-dependent relations), classic chart-parsing algorithms such as CKY and Earley, and probabilistic and data-driven parsing trained on treebanks. It addresses how syntactic ambiguity is represented and resolved. The downstream use of syntactic structure to compute meaning is covered under computational semantics.

Core questions

How is the grammatical structure of a sentence represented, as constituents or as dependencies?
How do chart-parsing algorithms efficiently explore the many possible analyses of a sentence?
How is syntactic ambiguity handled, and how do probabilistic models choose among parses?
How are parsers trained and evaluated using annotated corpora (treebanks)?

Key concepts

constituency (phrase-structure) trees
dependency structures
context-free grammar
CKY and Earley parsing
probabilistic context-free grammar
syntactic ambiguity
treebanks
part-of-speech tags

Key theories

Context-free grammars and chart parsing: Context-free grammars model phrase structure, and dynamic-programming chart parsers such as the CKY and Earley algorithms recover all valid parses in polynomial time by reusing analyses of subspans.
Probabilistic parsing: Assigning probabilities to grammar rules (as in probabilistic context-free grammars) lets a parser rank competing analyses and select the most likely structure, addressing the pervasive ambiguity of natural-language syntax.
Treebanks and data-driven parsing: Large annotated corpora such as the Penn Treebank provided the training and evaluation data that turned parsing into a data-driven task, enabling statistical and later neural parsers learned from human-annotated structures.

Clinical relevance

Syntactic parsing supports grammar checking, information extraction, question answering, and machine translation, by exposing how words group and relate; dependency structure in particular is widely used as input to downstream semantic and extraction systems.

History

Parsing built on Chomsky's formal grammars; the CKY (1960s) and Earley (1970) algorithms gave efficient context-free parsing. The Penn Treebank (1993) catalyzed statistical parsing, and probabilistic and later neural parsers progressively improved accuracy and robustness on real text.

Key figures

Noam Chomsky
Tadao Kasami
Jay Earley
Mitchell P. Marcus
Christopher D. Manning

Seminal works

marcus1993
jurafsky2023

Frequently asked questions

What is the difference between constituency and dependency parsing?: Constituency parsing groups words into nested phrases (such as noun phrases and verb phrases), producing a tree of constituents. Dependency parsing instead links each word to the word it depends on (its head), producing a graph of grammatical relations. Both capture syntactic structure but emphasize different aspects.
Why is parsing hard despite grammars being well defined?: Natural-language sentences are highly ambiguous: a single sentence can have many grammatically valid structures, and the number can grow rapidly with sentence length. Choosing the intended analysis requires statistical or learned preferences, not just a grammar, which is what makes parsing challenging.