What is a chart in parsing?

A chart is a table that stores every partial constituent found over each span of the sentence, so that shared substructures are computed once and reused, giving polynomial-time parsing.

Constituency and Context-Free Parsing

Computing the phrase-structure tree of a sentence using context-free grammars, dynamic-programming algorithms like CKY and Earley, and probabilistic grammars that resolve ambiguity.

Definition

Constituency parsing assigns a nested phrase-structure tree to a sentence according to a context-free grammar, typically selecting the most probable tree under a probabilistic grammar.

Scope

Covers parsing with context-free grammars: the CKY and Earley algorithms, Chomsky normal form, probabilistic context-free grammars and their lexicalized refinements, and treebank-trained statistical parsers. It addresses ambiguity resolution and parser evaluation. Dependency representations and non-context-free formalisms are treated in sibling topics.

Core questions

How does the CKY algorithm parse a sentence in cubic time?
Why must grammars often be converted to Chomsky normal form first?
How do probabilistic and lexicalized grammars improve disambiguation?
How is parser accuracy measured against a treebank?

Key concepts

context-free grammar
CKY algorithm
Earley algorithm
Chomsky normal form
probabilistic context-free grammar
lexicalization
parse tree
treebank

Key theories

Dynamic-programming parsing: The CKY and Earley algorithms compute all parses in polynomial time by filling a chart of subconstituents, avoiding the exponential blowup of naive search.
Lexicalized probabilistic parsing: Conditioning rule probabilities on head words substantially improves parsing accuracy by capturing lexical preferences absent from plain PCFGs.

History

The CKY algorithm (1960s) and Earley's 1970 algorithm gave efficient context-free recognition. With the Penn Treebank, probabilistic and then lexicalized parsers from Collins and Charniak achieved high accuracy in the late 1990s, defining the statistical parsing era before neural models.

Debates

How much lexicalization is needed?: Lexicalized parsers are accurate but sparse; debate concerned whether unlexicalized PCFGs with careful state-splitting could match them, which later work showed was partly possible.

Key figures

Jay Earley
Michael Collins
Eugene Charniak

Seminal works

earley1970
collins2003

Frequently asked questions

What is a chart in parsing?: A chart is a table that stores every partial constituent found over each span of the sentence, so that shared substructures are computed once and reused, giving polynomial-time parsing.