ScholarGate
Assistant

Stylometry and Authorship Attribution

Writers leave statistical fingerprints. The frequencies of small, unconscious words — the, of, and — vary little within an author's work but differ between authors, and stylometry exploits this to settle disputed authorship and to study style quantitatively.

Definition

The statistical analysis of measurable features of writing style to characterize authors and to attribute texts of uncertain or disputed authorship.

Scope

Covers the quantitative measurement of literary style and its use in attributing texts to authors: the choice of stylistic features, distance and classification measures such as Burrows's Delta, and the validation of attribution claims. Includes the field's history from the Federalist Papers to modern machine-learning methods, and its forensic applications.

Core questions

  • Which textual features best capture an author's distinctive style?
  • How can attribution claims be tested and validated?
  • Why are function-word frequencies so effective for attribution?
  • What are the limits of stylometry across genres, periods, and translation?

Key concepts

  • Function words
  • Burrows's Delta
  • Feature selection
  • Classification
  • Cross-validation

Key theories

Function-word frequency as authorial signal
Mosteller and Wallace showed that frequencies of common function words could discriminate authors, using Bayesian inference to attribute the disputed Federalist Papers.
Burrows's Delta
Burrows introduced Delta, a distance measure over the most frequent words that has become a standard, robust method for ranking candidate authors.
Modern attribution as classification
Stamatatos surveyed how authorship attribution is framed as a text-classification problem, comparing feature sets and machine-learning methods.

History

Quantitative authorship study dates to the nineteenth century, but Mosteller and Wallace's 1964 study of the Federalist Papers established the modern statistical approach. Burrows's Delta (2002) gave the field a widely adopted measure, and surveys such as Stamatatos (2009) mapped the shift to machine-learning classification and forensic use.

Debates

Reliability and confidence of attributions
Stylometric methods can be powerful yet sensitive to corpus size, genre, and preprocessing, raising questions about how much confidence attributions deserve, especially in forensic contexts.

Key figures

  • Frederick Mosteller
  • David Wallace
  • John Burrows
  • Efstathios Stamatatos

Related topics

Seminal works

  • mosteller1964
  • burrows2002
  • stamatatos2009

Frequently asked questions

Why focus on tiny words like 'the' instead of distinctive vocabulary?
Distinctive vocabulary often reflects a text's topic rather than its author. Common function words are used unconsciously and at stable rates within an author's writing but differ between authors, making them a reliable, topic-independent signal of style.

Methods for this concept

Related concepts