Σύγκριση μεθόδων

Εξετάστε τις επιλεγμένες μεθόδους δίπλα-δίπλα· οι γραμμές που διαφέρουν επισημαίνονται.

	Historical Corpus Text Mining ×	Handwritten Text Recognition for Archives ×
Πεδίο	Digital History	Digital History
Οικογένεια≠	Process / pipeline	Machine learning
Έτος προέλευσης≠	2013	2019
Δημιουργός≠	Franco Moretti	Transkribus and the READ project
Τύπος≠	text-analysis-pipeline	ml-recognition-pipeline
Θεμελιώδης πηγή≠	Moretti, F. (2013). Distant Reading. Verso. ISBN: 9781781680841	Muehlberger, G., Seaward, L., Terras, M., et al. (2019). Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study. Journal of Documentation, 75(5), 954-976. DOI ↗
Εναλλακτικές ονομασίες	Distant reading, Computational historical text analysis, Macroanalysis of corpora, Corpus-scale historical NLP	HTR, Manuscript transcription AI, Automatic handwriting transcription, Neural archival transcription
Συναφείς	3	3
Σύνοψη≠	Historical corpus text mining applies computational methods to thousands or millions of historical documents at once, seeking macro-scale patterns that close reading of individual texts could never reveal. Associated above all with Franco Moretti's program of distant reading, the approach treats large bodies of text, newspapers, parliamentary records, novels, correspondence, as data to be measured rather than works to be interpreted one by one. By counting word frequencies, computing weighted term importance, fitting topic models, and tracking how vocabulary shifts across decades, researchers can chart the rise and fall of concepts, the diffusion of ideas, and the changing texture of public discourse over long spans. The method is explicitly quantitative and aggregative: its claims concern populations of documents, not exemplary passages. Adapting modern natural-language processing to historical material, however, requires confronting archaic spelling, OCR noise, and shifting word meanings. Done carefully, corpus text mining turns vast unread archives into evidence about how language, and the thought it carries, evolved historically.	Handwritten text recognition for archives converts digital images of manuscript pages into searchable, machine-readable text, unlocking the vast holdings of handwritten material that optical character recognition, designed for print, cannot read. Exemplified by platforms such as Transkribus, developed in the READ project, modern HTR uses deep neural networks trained on transcribed examples to recognize the highly variable scripts of letters, registers, charters, and notebooks across centuries and languages. The pipeline first analyzes page layout and segments the image into text regions and lines, then a recurrent or transformer-based recognizer decodes each line into characters, typically using connectionist temporal classification to align pixels with text without needing character-level segmentation. Crucially, recognition models are trained and improved on ground-truth transcriptions supplied by scholars, so accuracy rises as more material is annotated. By making manuscripts machine-readable at scale, HTR is the gateway technology of digital archival history, feeding full-text search, named-entity recognition, and large-corpus text mining of sources that were previously legible only page by page.
ScholarGateΣύνολο δεδομένων ↗	v1 2 Πηγές PUBLISHED	v1 2 Πηγές PUBLISHED

Μετάβαση στην αναζήτηση → Λήψη διαφανειών