ScholarGate
Assistant

Information Extraction

Turning unstructured text into structured data: detecting named entities, the relations between them, and the events they participate in, so documents can be queried and aggregated.

Definition

Information extraction is the automatic identification of structured facts — entities, relations, and events — from unstructured natural-language text.

Scope

Covers extracting structured information from text — named-entity recognition, relation extraction, event extraction, and temporal and template filling. It addresses both rule-based and learned approaches and the evaluation traditions established by shared tasks. The underlying sequence-labeling models are covered in the parsing area.

Core questions

  • How are named entities detected and classified in text?
  • How are relations and events between entities extracted?
  • How did shared evaluations shape the task and its metrics?
  • How do rule-based and learned extraction methods compare?

Key concepts

  • named-entity recognition
  • relation extraction
  • event extraction
  • template filling
  • conditional random field
  • distant supervision
  • ontology population
  • evaluation campaign

Key theories

Template-filling information extraction
Framing extraction as filling structured templates with entities and relations found in text, the formulation developed in the Message Understanding Conferences.
Sequence-labeling extraction
Casting entity and span extraction as sequence labeling with models such as conditional random fields and neural taggers over tokens.

History

Information extraction was shaped by the Message Understanding Conferences of the 1990s, which defined named-entity and template-filling tasks and their evaluation. The field moved from hand-built patterns to statistical sequence models such as conditional random fields, and then to neural and distantly supervised extraction at scale.

Debates

Supervised versus distantly supervised extraction
Whether to rely on costly hand-labeled data or to bootstrap from knowledge bases via distant supervision, which scales but introduces noisy labels.

Key figures

  • Ralph Grishman
  • Beth Sundheim
  • Andrew McCallum

Related topics

Seminal works

  • grishman1996
  • lafferty2001

Frequently asked questions

What is named-entity recognition?
Named-entity recognition finds and classifies proper-name spans in text, such as people, organizations, and locations. It is usually the first step in extracting relations and events from documents.

Methods for this concept

Related concepts