ScholarGate
Assistent

Parsing and Grammar Formalisms

Recovering the grammatical structure of sentences by machine: the grammar formalisms that describe legal structures and the algorithms that compute them, from constituency trees to dependency graphs.

Find emne med PaperMindSnartFind papers & topics
Tools & resources
Hent slides
Learn & explore
VideoSnart

Definition

Parsing is the computational assignment of grammatical structure to an input string according to a grammar; grammar formalisms are the systems used to specify which structures are legal.

Scope

Covers syntactic analysis in computational linguistics — context-free constituency parsing and its probabilistic and chart-based algorithms, dependency parsing, the major grammar formalisms beyond plain context-free grammars, and the sequence-labeling tasks (such as part-of-speech tagging) that feed parsing. It excludes semantic interpretation, which is handled in computational semantics, and the underlying automata theory, covered in foundations.

Sub-topics

Core questions

  • How can a sentence be assigned a syntactic tree or dependency graph efficiently?
  • What grammar formalisms capture natural-language syntax adequately?
  • How do probabilities help disambiguate among many possible parses?
  • How do tagging and chunking support full parsing?

Key concepts

  • constituency parse
  • dependency parse
  • context-free grammar
  • chart parsing
  • probabilistic grammar
  • part-of-speech tagging
  • treebank
  • structural ambiguity

Key theories

Chart parsing
Dynamic-programming algorithms such as CKY and Earley that compute all possible analyses of a sentence in polynomial time by reusing shared subparses.
Probabilistic context-free grammars
Attaching probabilities to grammar rules so that the most likely parse can be selected, addressing the pervasive structural ambiguity of natural language.

History

Early parsing relied on hand-built grammars and exhaustive search; the CKY and Earley algorithms made context-free parsing efficient. The release of treebanks in the 1990s enabled data-driven probabilistic parsing, and the 2000s saw dependency parsing rise to prominence for its cross-linguistic robustness, later subsumed by neural parsers.

Debates

Constituency versus dependency representation
Whether syntax is best represented as nested phrases or as labeled head-dependent relations; both are widely used, with dependency favored for free-word-order languages and downstream tasks.

Key figures

  • Jay Earley
  • Joakim Nivre
  • Christopher Manning
  • Mitchell Marcus

Related topics

Seminal works

  • manning1999
  • kubler2009
  • jurafsky2025

Frequently asked questions

Why is parsing hard if grammar rules are known?
Natural sentences are massively ambiguous: a single string can have many legal structures. Parsing must therefore not only find structures but rank them, which is why probabilistic and learned models are essential.

Methods for this concept

Related concepts