ScholarGate
Assistant

Constituency and Context-Free Parsing

Computing the phrase-structure tree of a sentence using context-free grammars, dynamic-programming algorithms like CKY and Earley, and probabilistic grammars that resolve ambiguity.

Definition

Constituency parsing assigns a nested phrase-structure tree to a sentence according to a context-free grammar, typically selecting the most probable tree under a probabilistic grammar.

Scope

Covers parsing with context-free grammars: the CKY and Earley algorithms, Chomsky normal form, probabilistic context-free grammars and their lexicalized refinements, and treebank-trained statistical parsers. It addresses ambiguity resolution and parser evaluation. Dependency representations and non-context-free formalisms are treated in sibling topics.

Core questions

  • How does the CKY algorithm parse a sentence in cubic time?
  • Why must grammars often be converted to Chomsky normal form first?
  • How do probabilistic and lexicalized grammars improve disambiguation?
  • How is parser accuracy measured against a treebank?

Key concepts

  • context-free grammar
  • CKY algorithm
  • Earley algorithm
  • Chomsky normal form
  • probabilistic context-free grammar
  • lexicalization
  • parse tree
  • treebank

Key theories

Dynamic-programming parsing
The CKY and Earley algorithms compute all parses in polynomial time by filling a chart of subconstituents, avoiding the exponential blowup of naive search.
Lexicalized probabilistic parsing
Conditioning rule probabilities on head words substantially improves parsing accuracy by capturing lexical preferences absent from plain PCFGs.

History

The CKY algorithm (1960s) and Earley's 1970 algorithm gave efficient context-free recognition. With the Penn Treebank, probabilistic and then lexicalized parsers from Collins and Charniak achieved high accuracy in the late 1990s, defining the statistical parsing era before neural models.

Debates

How much lexicalization is needed?
Lexicalized parsers are accurate but sparse; debate concerned whether unlexicalized PCFGs with careful state-splitting could match them, which later work showed was partly possible.

Key figures

  • Jay Earley
  • Michael Collins
  • Eugene Charniak

Related topics

Seminal works

  • earley1970
  • collins2003

Frequently asked questions

What is a chart in parsing?
A chart is a table that stores every partial constituent found over each span of the sentence, so that shared substructures are computed once and reused, giving polynomial-time parsing.

Methods for this concept

Related concepts