ScholarGate
Asystent

Speech Perception and Intelligibility

Speech perception is the process by which listeners recover linguistic units, words, and meaning from the rapidly varying acoustic speech signal. Intelligibility is the degree to which speech is correctly understood, and it depends on the speech material, the listener, and the listening conditions, especially background noise. This topic covers the acoustic cues that distinguish speech sounds, how listeners categorise them, and how intelligibility is measured and predicted.

Definition

Speech perception is the auditory and cognitive process of mapping the acoustic speech signal onto linguistic categories such as phonemes and words, and intelligibility is a measure of how accurately a listener recovers the intended speech.

Scope

The topic covers the acoustic cues to vowels and consonants, categorical perception of phonemes, the robustness of speech to degradation and noise, and the measurement and prediction of intelligibility. It is reference and educational material on auditory and speech perception, not clinical guidance.

Core questions

  • Which acoustic cues distinguish one speech sound from another?
  • How do listeners map a continuously varying signal onto discrete phonemes?
  • How much of the speech signal can be degraded before intelligibility fails?
  • How is speech intelligibility measured and predicted across listening conditions?

Key concepts

  • Formants and vowel identity
  • Voice onset time and consonant cues
  • Categorical perception
  • Speech reception threshold
  • Speech intelligibility index
  • Envelope versus fine-structure cues
  • Speech in noise and informational masking

Key theories

Categorical perception of speech
Listeners tend to partition continua of speech sounds, such as a series varying in voice onset time, into discrete phoneme categories, discriminating pairs that straddle a category boundary far better than equally spaced pairs within a category.
Distribution of speech information across frequency bands
Intelligibility can be predicted by weighting the audibility of speech across frequency bands, the basis of the articulation index and speech intelligibility index, which quantify how much usable speech information reaches the listener.

Mechanisms

Vowels are largely identified by the frequencies of their formants, the resonances of the vocal tract, while consonants are signalled by rapid spectral transitions, bursts, and timing cues such as voice onset time. The auditory system extracts these spectral and temporal patterns and the higher levels of processing map them onto phoneme and word categories, drawing on context and linguistic knowledge. Speech is highly redundant, so it remains intelligible when substantially degraded; experiments replacing fine spectral detail with a few bands of amplitude-modulated noise show that the slow temporal envelope alone can support good recognition in quiet, a principle relevant to cochlear implant coding.

Clinical relevance

Difficulty understanding speech, particularly in noise, is among the most common and disabling consequences of hearing loss, and it can exceed what pure-tone thresholds predict because reduced frequency selectivity and temporal coding degrade the cues listeners rely on. Speech-perception measures therefore complement the audiogram in describing functional hearing. This material explains why speech understanding is tested and is not a basis for individual diagnosis or treatment.

Evidence & guidelines

The acoustic basis of vowels and consonants was mapped in classic studies by Peterson and Barney (1952) and Miller and Nicely (1955), and categorical perception was established by Liberman and colleagues (1957). The prediction of intelligibility from band audibility is standardised as the Speech Intelligibility Index in ANSI S3.5-1997, and the sufficiency of temporal-envelope cues was demonstrated by Shannon and colleagues (1995).

History

Wartime and post-war work at Bell Laboratories on the articulation of telephone speech produced the articulation index and detailed studies of consonant and vowel acoustics. Liberman and colleagues at Haskins Laboratories established categorical perception in the 1950s and developed influential theories of speech. Later work, including band-vocoder studies by Shannon and colleagues, clarified the relative roles of spectral detail and temporal envelope and informed cochlear-implant signal processing.

Debates

Is speech perceived by specialised mechanisms or by general auditory processes?
Theories differ on whether speech recruits a dedicated perceptual mode tied to articulation or is handled by general-purpose auditory and learning processes; both views account for parts of the evidence and the question remains contested.

Key figures

  • George A. Miller
  • Gordon Peterson
  • Alvin Liberman
  • Robert Shannon
  • Harvey Fletcher

Related topics

Seminal works

  • peterson-barney-1952
  • miller-nicely-1955
  • liberman-1957
  • shannon-1995

Frequently asked questions

Why can hearing loss make speech hard to understand even when sounds are audible?
Audibility restores detection but not the fine frequency and timing resolution speech relies on. Reduced cochlear selectivity and temporal coding blur the cues that distinguish speech sounds, so understanding, especially in noise, can remain poor even when sounds are loud enough to hear.
How is speech intelligibility measured?
It is commonly measured behaviourally as the percentage of words or sentences correctly identified at a given level or signal-to-noise ratio, sometimes summarised as a speech reception threshold. It can also be predicted from the audibility of speech across frequency bands using indices such as the Speech Intelligibility Index.

Methods for this concept

Related concepts