Why group speech and text applications together?

They share the same probabilistic and neural foundations — language models, sequence modeling, and evaluation — so techniques developed for one, such as language modeling in speech recognition, transfer readily to the other.

Speech and Language Applications

The applied face of computational linguistics: converting between speech and text, extracting structured information from documents, and building systems that answer questions and hold conversations.

Definition

Speech and language applications are end-user systems that perceive, understand, or produce human language, built by composing the methods of computational linguistics.

Scope

Covers the major application areas of speech and language technology — automatic speech recognition, text-to-speech synthesis, information extraction, and question answering and dialogue systems. It situates these as integrative tasks that combine the field's foundations, parsing, semantics, and learning methods. Component techniques are covered in their respective areas.

Sub-topics

Core questions

How is spoken language converted to and from text?
How is structured information extracted from unstructured documents?
How do systems answer natural-language questions and sustain dialogue?
How are application systems evaluated for real-world use?

Key concepts

automatic speech recognition
text-to-speech
information extraction
named-entity recognition
question answering
dialogue system
acoustic model
evaluation

Key theories

Noisy-channel speech recognition: Framing recognition as recovering the most probable word sequence given an acoustic signal by combining an acoustic model and a language model.
Pipeline of language understanding: Applications compose tokenization, parsing, semantics, and retrieval into pipelines or end-to-end models that map user input to useful responses.

History

Speech recognition drove much of early statistical NLP, with shared corpora such as the Wall Street Journal collection enabling rigorous comparison. Information extraction and question answering grew through evaluation campaigns in the 1990s and 2000s, and dialogue systems became consumer products as neural methods and large language models matured.

Debates

Pipelines versus end-to-end systems: Whether to build applications from modular linguistic components or to train end-to-end neural systems; end-to-end approaches dominate where data is plentiful but offer less interpretability.

Key figures

Daniel Jurafsky
James H. Martin
Frederick Jelinek
Janet Baker

Seminal works

paul1992
manning1999
jurafsky2025

Frequently asked questions

Why group speech and text applications together?: They share the same probabilistic and neural foundations — language models, sequence modeling, and evaluation — so techniques developed for one, such as language modeling in speech recognition, transfer readily to the other.