ScholarGate
ผู้ช่วย

Computational Morphology

Modeling the internal structure of words by machine — analysis, generation, stemming, lemmatization, and subword segmentation — from finite-state morphology to the byte-pair encoding used by modern neural systems.

ค้นหาหัวข้อด้วย PaperMindเร็ว ๆ นี้Find papers & topics
Tools & resources
ดาวน์โหลดสไลด์
Learn & explore
วิดีโอเร็ว ๆ นี้

Definition

Computational morphology is the algorithmic analysis and generation of word forms in terms of their constituent morphemes and morphological features.

Scope

Covers the computational treatment of word structure: morphological analysis and generation with finite-state transducers, two-level morphology, stemming and lemmatization, and data-driven subword segmentation such as byte-pair encoding. It addresses inflection, derivation, and compounding across typologically diverse languages. The underlying finite-state machinery is detailed in the foundations area.

Core questions

  • How are morphological alternations modeled with finite-state transducers?
  • What is the difference between stemming and lemmatization?
  • How does subword segmentation handle rare and unseen words in neural models?
  • Why is morphology harder for agglutinative and templatic languages?

Key concepts

  • morpheme
  • inflection and derivation
  • two-level morphology
  • finite-state transducer
  • stemming
  • lemmatization
  • byte-pair encoding
  • agglutination

Key theories

Two-level morphology
Koskenniemi's model relating surface and lexical word forms through parallel finite-state rules, enabling a single grammar to both analyze and generate forms.
Data-driven subword segmentation
Learning a vocabulary of frequent character sequences, as in byte-pair encoding, so neural models can represent any word as a sequence of subword units.

History

Koskenniemi's 1983 two-level morphology established finite-state methods as the standard for morphological processing, consolidated in Beesley and Karttunen's handbook. As neural models rose, hand-built morphological analyzers were complemented by learned subword segmentation such as byte-pair encoding, which sidesteps explicit morphology while handling rare words.

Debates

Explicit morphology versus subword units
Whether neural systems need linguistically informed morphological analysis or whether statistical subword segmentation suffices; the answer appears to depend on language type and data scale.

Key figures

  • Kimmo Koskenniemi
  • Lauri Karttunen
  • Kenneth Beesley
  • Rico Sennrich

Related topics

Seminal works

  • koskenniemi1983
  • beesley2003
  • sennrich2016

Frequently asked questions

What is the difference between stemming and lemmatization?
Stemming crudely chops affixes to a common stem (e.g., 'studies' to 'studi'), while lemmatization maps a word to its dictionary form using morphological knowledge (e.g., 'studies' to 'study').

Methods for this concept

Related concepts