Syntactic Parsing
Syntactic parsing is the task of recovering the grammatical structure of a sentence, assigning it a constituency tree or a dependency structure that shows how words combine and relate.
Definition
Syntactic parsing maps a sentence to a representation of its grammatical structure—typically a constituency (phrase-structure) tree or a dependency graph—according to a grammar or a model learned from annotated data.
Scope
This topic covers the analysis of sentence structure: context-free and richer grammars, constituency parsing (phrase-structure trees) and dependency parsing (head-dependent relations), classic chart-parsing algorithms such as CKY and Earley, and probabilistic and data-driven parsing trained on treebanks. It addresses how syntactic ambiguity is represented and resolved. The downstream use of syntactic structure to compute meaning is covered under computational semantics.
Core questions
- How is the grammatical structure of a sentence represented, as constituents or as dependencies?
- How do chart-parsing algorithms efficiently explore the many possible analyses of a sentence?
- How is syntactic ambiguity handled, and how do probabilistic models choose among parses?
- How are parsers trained and evaluated using annotated corpora (treebanks)?
Key concepts
- constituency (phrase-structure) trees
- dependency structures
- context-free grammar
- CKY and Earley parsing
- probabilistic context-free grammar
- syntactic ambiguity
- treebanks
- part-of-speech tags
Key theories
- Context-free grammars and chart parsing
- Context-free grammars model phrase structure, and dynamic-programming chart parsers such as the CKY and Earley algorithms recover all valid parses in polynomial time by reusing analyses of subspans.
- Probabilistic parsing
- Assigning probabilities to grammar rules (as in probabilistic context-free grammars) lets a parser rank competing analyses and select the most likely structure, addressing the pervasive ambiguity of natural-language syntax.
- Treebanks and data-driven parsing
- Large annotated corpora such as the Penn Treebank provided the training and evaluation data that turned parsing into a data-driven task, enabling statistical and later neural parsers learned from human-annotated structures.
Clinical relevance
Syntactic parsing supports grammar checking, information extraction, question answering, and machine translation, by exposing how words group and relate; dependency structure in particular is widely used as input to downstream semantic and extraction systems.
History
Parsing built on Chomsky's formal grammars; the CKY (1960s) and Earley (1970) algorithms gave efficient context-free parsing. The Penn Treebank (1993) catalyzed statistical parsing, and probabilistic and later neural parsers progressively improved accuracy and robustness on real text.
Key figures
- Noam Chomsky
- Tadao Kasami
- Jay Earley
- Mitchell P. Marcus
- Christopher D. Manning
Related topics
Seminal works
- marcus1993
- jurafsky2023
Frequently asked questions
- What is the difference between constituency and dependency parsing?
- Constituency parsing groups words into nested phrases (such as noun phrases and verb phrases), producing a tree of constituents. Dependency parsing instead links each word to the word it depends on (its head), producing a graph of grammatical relations. Both capture syntactic structure but emphasize different aspects.
- Why is parsing hard despite grammars being well defined?
- Natural-language sentences are highly ambiguous: a single sentence can have many grammatically valid structures, and the number can grow rapidly with sentence length. Choosing the intended analysis requires statistical or learned preferences, not just a grammar, which is what makes parsing challenging.