ScholarGate
Assistant

Regular Expressions and Finite-State Methods

Practical techniques built on regular languages — pattern matching with regular expressions and string-to-string mapping with finite-state transducers — that handle tokenization, normalization, and morphological analysis efficiently.

Definition

Finite-state methods are language-processing techniques in which patterns and mappings are expressed as regular expressions or finite-state automata and transducers, guaranteeing efficient linear-time recognition.

Scope

Covers regular expressions as a pattern language over strings, finite-state automata and transducers as their computational realization, and their application to text normalization, tokenization, spelling, and computational morphology. It includes weighted finite-state methods used in speech and shallow processing. Full phonological theory and deep syntactic parsing are out of scope.

Core questions

  • How can regular expressions specify and extract textual patterns precisely?
  • How do finite-state transducers map surface forms to lexical analyses, as in morphology?
  • Why are finite-state methods preferred for tokenization and normalization?

Key concepts

  • regular expression
  • finite-state transducer
  • tokenization
  • text normalization
  • morphological analysis
  • two-level morphology
  • weighted automata
  • edit distance

Key theories

Regular models of morphology and phonology
The result that phonological rewrite rules and morphological alternations can be compiled into finite-state transducers, making analysis and generation a single efficient framework.
Equivalence of regular expressions and finite automata
Regular expressions, regular grammars, and finite-state automata all describe exactly the regular languages, so a declarative pattern can be compiled into an efficient recognizer.

History

Regular expressions entered computing from Kleene's work and became ubiquitous in text tools. In the 1980s Koskenniemi's two-level morphology and Kaplan and Kay's compilation of phonological rules into transducers established finite-state technology as the workhorse of morphological processing, an approach consolidated in Beesley and Karttunen's handbook.

Debates

How far can finite-state methods scale?
Finite-state techniques are extremely efficient but limited to regular phenomena; the debate concerns which language-processing tasks remain best served by them versus richer statistical or neural models.

Key figures

  • Martin Kay
  • Ronald Kaplan
  • Kimmo Koskenniemi
  • Lauri Karttunen

Related topics

Seminal works

  • kaplan1994
  • beesley2003

Frequently asked questions

Why use a finite-state transducer instead of just a lookup table for morphology?
A transducer compactly encodes systematic alternations and can analyze or generate word forms it has never seen, whereas a table only stores forms explicitly listed in it.

Methods for this concept

Related concepts