What is named-entity recognition?

Named-entity recognition finds and classifies proper-name spans in text, such as people, organizations, and locations. It is usually the first step in extracting relations and events from documents.

Information Extraction

Turning unstructured text into structured data: detecting named entities, the relations between them, and the events they participate in, so documents can be queried and aggregated.

Definition

Information extraction is the automatic identification of structured facts — entities, relations, and events — from unstructured natural-language text.

Scope

Covers extracting structured information from text — named-entity recognition, relation extraction, event extraction, and temporal and template filling. It addresses both rule-based and learned approaches and the evaluation traditions established by shared tasks. The underlying sequence-labeling models are covered in the parsing area.

Core questions

How are named entities detected and classified in text?
How are relations and events between entities extracted?
How did shared evaluations shape the task and its metrics?
How do rule-based and learned extraction methods compare?

Key concepts

named-entity recognition
relation extraction
event extraction
template filling
conditional random field
distant supervision
ontology population
evaluation campaign

Key theories

Template-filling information extraction: Framing extraction as filling structured templates with entities and relations found in text, the formulation developed in the Message Understanding Conferences.
Sequence-labeling extraction: Casting entity and span extraction as sequence labeling with models such as conditional random fields and neural taggers over tokens.

History

Information extraction was shaped by the Message Understanding Conferences of the 1990s, which defined named-entity and template-filling tasks and their evaluation. The field moved from hand-built patterns to statistical sequence models such as conditional random fields, and then to neural and distantly supervised extraction at scale.

Debates

Supervised versus distantly supervised extraction: Whether to rely on costly hand-labeled data or to bootstrap from knowledge bases via distant supervision, which scales but introduces noisy labels.

Key figures

Ralph Grishman
Beth Sundheim
Andrew McCallum

Seminal works

grishman1996
lafferty2001

Frequently asked questions

What is named-entity recognition?: Named-entity recognition finds and classifies proper-name spans in text, such as people, organizations, and locations. It is usually the first step in extracting relations and events from documents.