ScholarGate
Assistente

Confronta i metodi

Esamina i metodi selezionati fianco a fianco; le righe che differiscono sono evidenziate.

Handwritten Text Recognition for Archives×Historical Corpus Text Mining×
CampoDigital HistoryDigital History
FamigliaMachine learningProcess / pipeline
Anno di origine20192013
IdeatoreTranskribus and the READ projectFranco Moretti
Tipoml-recognition-pipelinetext-analysis-pipeline
Fonte seminaleMuehlberger, G., Seaward, L., Terras, M., et al. (2019). Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study. Journal of Documentation, 75(5), 954-976. DOI ↗Moretti, F. (2013). Distant Reading. Verso. ISBN: 9781781680841
AliasHTR, Manuscript transcription AI, Automatic handwriting transcription, Neural archival transcriptionDistant reading, Computational historical text analysis, Macroanalysis of corpora, Corpus-scale historical NLP
Correlati33
SintesiHandwritten text recognition for archives converts digital images of manuscript pages into searchable, machine-readable text, unlocking the vast holdings of handwritten material that optical character recognition, designed for print, cannot read. Exemplified by platforms such as Transkribus, developed in the READ project, modern HTR uses deep neural networks trained on transcribed examples to recognize the highly variable scripts of letters, registers, charters, and notebooks across centuries and languages. The pipeline first analyzes page layout and segments the image into text regions and lines, then a recurrent or transformer-based recognizer decodes each line into characters, typically using connectionist temporal classification to align pixels with text without needing character-level segmentation. Crucially, recognition models are trained and improved on ground-truth transcriptions supplied by scholars, so accuracy rises as more material is annotated. By making manuscripts machine-readable at scale, HTR is the gateway technology of digital archival history, feeding full-text search, named-entity recognition, and large-corpus text mining of sources that were previously legible only page by page.Historical corpus text mining applies computational methods to thousands or millions of historical documents at once, seeking macro-scale patterns that close reading of individual texts could never reveal. Associated above all with Franco Moretti's program of distant reading, the approach treats large bodies of text, newspapers, parliamentary records, novels, correspondence, as data to be measured rather than works to be interpreted one by one. By counting word frequencies, computing weighted term importance, fitting topic models, and tracking how vocabulary shifts across decades, researchers can chart the rise and fall of concepts, the diffusion of ideas, and the changing texture of public discourse over long spans. The method is explicitly quantitative and aggregative: its claims concern populations of documents, not exemplary passages. Adapting modern natural-language processing to historical material, however, requires confronting archaic spelling, OCR noise, and shifting word meanings. Done carefully, corpus text mining turns vast unread archives into evidence about how language, and the thought it carries, evolved historically.
ScholarGateInsieme di dati
  1. v1
  2. 2 Fonti
  3. PUBLISHED
  1. v1
  2. 2 Fonti
  3. PUBLISHED

Vai alla ricerca Scarica le diapositive

ScholarGateConfronta i metodi: Handwritten Text Recognition for Archives · Historical Corpus Text Mining. Consultato il 2026-06-24 da https://scholargate.app/it/compare