ScholarGate
Assistent

Natural Language Processing

Natural language processing is the area of artificial intelligence concerned with enabling computers to analyze, understand, and generate human language in text or speech.

Finn tema med PaperMindSnartFind papers & topics
Tools & resources
Last ned lysbilder
Learn & explore
VideoSnart

Definition

Natural language processing is the study and engineering of methods that let computers map between human language and structured representations of its form and meaning, supporting tasks from parsing and translation to extraction and generation.

Scope

This area covers the computational treatment of human language across its levels of structure: morphology and syntax (parsing), semantics and meaning representation, discourse, and applications such as machine translation and information extraction. It treats the formal models of language (grammars, logical and distributional meaning representations) and the tasks of analyzing and producing language. The general statistical and neural learning methods that train modern language models are part of the machine-learning subfield; this area emphasizes the linguistic structure, tasks, and representations specific to language.

Sub-topics

Core questions

  • How is the grammatical structure of a sentence recovered from a sequence of words?
  • How can the meaning of words, sentences, and discourse be represented computationally?
  • How is ambiguity, pervasive at every level of language, resolved using context?
  • How are language-understanding capabilities turned into applications such as translation and extraction?

Key concepts

  • morphology and tokenization
  • syntax and parsing
  • semantics and meaning representation
  • ambiguity and disambiguation
  • discourse and pragmatics
  • language models
  • machine translation
  • information extraction

Key theories

Levels of linguistic analysis
Language is analyzed at distinct but interacting levels—phonology, morphology, syntax, semantics, pragmatics, and discourse—and NLP systems are organized around recovering structure and meaning at these levels.
Grammars and parsing
Formal grammars, especially context-free and richer formalisms, model the syntactic structure of language, and parsing algorithms recover that structure, providing a backbone for meaning analysis.
Statistical and distributional language modeling
Treating language probabilistically—modeling the likelihood of word sequences and representing word meaning by distributional context—gave NLP robustness to ambiguity and variation and became the dominant paradigm.

Clinical relevance

Natural language processing powers search engines, machine translation, question answering and chat systems, speech recognition and dialogue, sentiment analysis, and the extraction of structured information from text in domains such as biomedicine and law, making it one of the most visibly deployed areas of AI.

History

NLP began with 1950s machine translation and the symbolic systems of the 1960s-70s, such as Winograd's SHRDLU. Statistical methods rose to prominence from the late 1980s, consolidated in texts such as Manning and Schütze (1999), and neural and large-scale language-model methods later transformed the field; its tasks and linguistic foundations remain a standard part of AI.

Debates

Symbolic vs. statistical and neural approaches
NLP has long oscillated between hand-built symbolic grammars and rules and data-driven statistical or neural models; the statistical turn and later neural methods came to dominate for robustness, though questions of interpretability and incorporating linguistic structure persist.

Key figures

  • Daniel Jurafsky
  • James H. Martin
  • Christopher D. Manning
  • Terry Winograd
  • Karen Spärck Jones

Related topics

Seminal works

  • winograd1972
  • manning1999
  • jurafsky2023

Frequently asked questions

What is the difference between natural language processing and computational linguistics?
The terms overlap heavily. Computational linguistics emphasizes using computation to understand and model human language as a scientific phenomenon, while natural language processing emphasizes engineering systems that perform useful language tasks. In practice the same models and methods serve both goals.
Why is ambiguity such a central problem in NLP?
Human language is ambiguous at every level: words have multiple senses, sentences have multiple parses, and references can be unclear. Much of NLP is about using context and probabilistic or learned models to choose the interpretation a human would, which is what makes the field difficult.

Methods for this concept

Related concepts