ScholarGate
সহকারী

Language Modeling

Assigning probabilities to sequences of words, the foundational task that lets systems predict, score, and generate text — from classical n-gram counters to neural language models.

PaperMind দিয়ে বিষয় খুঁজুনশীঘ্রইFind papers & topics
Tools & resources
স্লাইড ডাউনলোড করুন
Learn & explore
ভিডিওশীঘ্রই

Definition

A language model is a probability distribution over sequences of words or tokens, typically defined by predicting each token from its preceding context.

Scope

Covers the language-modeling task itself: estimating the probability of a word given its context, n-gram models and their smoothing techniques, evaluation by perplexity, and the transition to neural and distributed representations. It situates large language models as the modern incarnation of the same task. Detailed neural architectures are treated in the statistical-and-neural NLP area.

Core questions

  • How can the probability of a sentence be decomposed into conditional word probabilities?
  • How does smoothing handle word sequences never seen in training?
  • How is perplexity used to evaluate and compare language models?
  • What did neural language models change relative to n-gram models?

Key concepts

  • n-gram
  • Markov assumption
  • smoothing
  • perplexity
  • backoff and interpolation
  • distributed word representations
  • cross-entropy
  • next-token prediction

Key theories

N-gram Markov modeling
Approximating the probability of a word by conditioning only on the previous n−1 words, turning language modeling into a tractable counting-and-smoothing problem.
Neural probabilistic language model
Replacing sparse n-gram counts with a neural network that learns distributed word representations, mitigating the curse of dimensionality and enabling generalization to unseen contexts.

History

Shannon's information theory framed language as a predictable stochastic source, and the speech-recognition community at IBM made n-gram modeling central in the 1980s. Bengio and colleagues introduced neural probabilistic language models in 2003, seeding the distributed-representation approach that, scaled up, produced today's large language models.

Debates

Counting versus learned representations
Whether language is best modeled by smoothed counts over discrete sequences or by neural networks that learn continuous representations; neural methods now dominate but inherit the same probabilistic objective.

Key figures

  • Claude Shannon
  • Frederick Jelinek
  • Yoshua Bengio
  • Daniel Jurafsky

Related topics

Seminal works

  • shannon1948
  • bengio2003
  • jurafsky2025

Frequently asked questions

What is perplexity?
Perplexity measures how surprised a language model is by held-out text; lower perplexity means the model assigns higher probability to the observed words, indicating a better fit.
Why does language modeling need smoothing?
Any finite corpus omits many valid word sequences, so a naive model would assign them zero probability. Smoothing redistributes a little probability mass to unseen events so the model can handle novel text.

Methods for this concept

Related concepts