ScholarGate
Msaidizi

Computational Morphology

Modeling the internal structure of words by machine — analysis, generation, stemming, lemmatization, and subword segmentation — from finite-state morphology to the byte-pair encoding used by modern neural systems.

Tafuta mada kwa PaperMindHivi karibuniFind papers & topics
Tools & resources
Pakua slaidi
Learn & explore
VideoHivi karibuni

Definition

Computational morphology is the algorithmic analysis and generation of word forms in terms of their constituent morphemes and morphological features.

Scope

Covers the computational treatment of word structure: morphological analysis and generation with finite-state transducers, two-level morphology, stemming and lemmatization, and data-driven subword segmentation such as byte-pair encoding. It addresses inflection, derivation, and compounding across typologically diverse languages. The underlying finite-state machinery is detailed in the foundations area.

Core questions

  • How are morphological alternations modeled with finite-state transducers?
  • What is the difference between stemming and lemmatization?
  • How does subword segmentation handle rare and unseen words in neural models?
  • Why is morphology harder for agglutinative and templatic languages?

Key concepts

  • morpheme
  • inflection and derivation
  • two-level morphology
  • finite-state transducer
  • stemming
  • lemmatization
  • byte-pair encoding
  • agglutination

Key theories

Two-level morphology
Koskenniemi's model relating surface and lexical word forms through parallel finite-state rules, enabling a single grammar to both analyze and generate forms.
Data-driven subword segmentation
Learning a vocabulary of frequent character sequences, as in byte-pair encoding, so neural models can represent any word as a sequence of subword units.

History

Koskenniemi's 1983 two-level morphology established finite-state methods as the standard for morphological processing, consolidated in Beesley and Karttunen's handbook. As neural models rose, hand-built morphological analyzers were complemented by learned subword segmentation such as byte-pair encoding, which sidesteps explicit morphology while handling rare words.

Debates

Explicit morphology versus subword units
Whether neural systems need linguistically informed morphological analysis or whether statistical subword segmentation suffices; the answer appears to depend on language type and data scale.

Key figures

  • Kimmo Koskenniemi
  • Lauri Karttunen
  • Kenneth Beesley
  • Rico Sennrich

Related topics

Seminal works

  • koskenniemi1983
  • beesley2003
  • sennrich2016

Frequently asked questions

What is the difference between stemming and lemmatization?
Stemming crudely chops affixes to a common stem (e.g., 'studies' to 'studi'), while lemmatization maps a word to its dictionary form using morphological knowledge (e.g., 'studies' to 'study').

Methods for this concept

Related concepts