Why is part-of-speech tagging not trivial?

Many words are ambiguous — 'book' can be a noun or a verb — so the correct tag depends on context. Sequence models resolve this by considering surrounding words and tags jointly.

Part-of-Speech Tagging and Sequence Labeling

Assigning a label to each token in a sentence — its part of speech, named-entity type, or chunk tag — using probabilistic sequence models such as hidden Markov models and conditional random fields.

Găsește o temă cu PaperMindÎn curândFind papers & topics

Tools & resources

Descarcă prezentarea

Learn & explore

VideoÎn curând

Definition

Sequence labeling is the task of assigning a categorical label to each element of an input sequence, with part-of-speech tagging as its canonical instance.

Scope

Covers sequence-labeling tasks central to shallow analysis: part-of-speech tagging, named-entity recognition, and chunking. It includes the standard models — hidden Markov models, maximum-entropy Markov models, conditional random fields, and neural sequence taggers — and tagsets such as the Penn Treebank and Universal POS. Full parsing is covered in sibling topics.

Core questions

How do hidden Markov models assign the most likely tag sequence?
Why do conditional random fields outperform locally normalized models?
How are tagsets designed and standardized across languages?
How does sequence labeling support downstream parsing and extraction?

Key concepts

part-of-speech tag
hidden Markov model
Viterbi algorithm
conditional random field
named-entity recognition
chunking
tagset
BIO encoding

Key theories

Hidden Markov model tagging: Modeling a tag sequence as a Markov chain emitting observed words, with the Viterbi algorithm recovering the most probable tag sequence efficiently.
Conditional random fields: Globally normalized discriminative models for sequence labeling that condition on the whole input and avoid the label bias of locally normalized models.

History

POS tagging was an early success of statistical NLP once the Penn Treebank (1993) provided large annotated data. Hidden Markov model taggers gave way to discriminative maximum-entropy and conditional-random-field models around 2001, which were in turn absorbed into neural sequence labelers in the 2010s.

Debates

Generative versus discriminative sequence models: Whether to model the joint distribution of words and tags (HMMs) or to condition labels directly on the input (CRFs); discriminative models generally win on accuracy when rich features are available.

Key figures

Mitchell Marcus
John Lafferty
Andrew McCallum
Fernando Pereira

Seminal works

marcus1993
lafferty2001

Frequently asked questions

Why is part-of-speech tagging not trivial?: Many words are ambiguous — 'book' can be a noun or a verb — so the correct tag depends on context. Sequence models resolve this by considering surrounding words and tags jointly.