ScholarGate
Asistenti

Markup Languages and Standards

Markup is the layer of codes that turns a stream of characters into a structured document. The distinction between descriptive markup, which names what a thing is, and procedural markup, which says how to print it, shaped the standards — SGML, XML, and their successors — on which humanities encoding rests.

Gjeni temë me PaperMindSë shpejtiFind papers & topics
Tools & resources
Shkarko diapozitivat
Learn & explore
VideoSë shpejti

Definition

The formal languages and community standards — notably SGML and XML — used to add structured, machine-readable codes to documents, together with the principles that make such markup descriptive, validatable, and interchangeable.

Scope

Covers the languages and standards that underlie text encoding: the history of generic and descriptive markup, SGML and XML and their schema languages, and the principles that distinguish robust, interchangeable markup from presentation-oriented coding. Includes the influence of these standards on humanities computing.

Core questions

  • What distinguishes descriptive markup from procedural and presentational markup?
  • Why did the humanities converge on SGML and then XML?
  • How do schemas constrain and validate marked-up documents?
  • What are the limits of tree-structured markup languages?

Key concepts

  • SGML
  • XML
  • Descriptive vs procedural markup
  • Schema and DTD
  • Well-formedness and validity

Key theories

Descriptive over procedural markup
Coombs, Renear, and DeRose argued that markup naming the logical role of text (descriptive) is superior for scholarship to markup specifying appearance (procedural), because it preserves meaning and supports reuse.
Generic coding and separation of concerns
Separating a document's logical structure from its presentation lets a single encoded source drive analysis, search, and multiple renderings, a principle inherited from SGML and carried into XML.
Hierarchical document model
XML and its predecessors model documents as ordered trees, which is powerful for nested structure but strained by features that overlap across the hierarchy.

History

Generic coding ideas of the late 1960s led to GML and then SGML, standardized in 1986. The 1987 Coombs-Renear-DeRose paper made the case for descriptive markup in scholarship. XML, a streamlined SGML profile, was published by the W3C in 1998 and quickly became the basis for TEI P5 and most humanities encoding.

Debates

The adequacy of tree-based markup
Because XML enforces a single hierarchy, overlapping structures common in real texts require workarounds, fueling research into alternative or supplementary markup models.

Key figures

  • James H. Coombs
  • Allen Renear
  • Steven DeRose

Related topics

Seminal works

  • coombs1987
  • delittle1990

Frequently asked questions

Is XML still relevant given newer formats like JSON?
For document-centric humanities encoding XML remains dominant because it expresses rich, validatable structure and underlies the TEI. JSON and other formats are common for data interchange, but the descriptive-markup tradition is still central to scholarly text representation.

Methods for this concept

Related concepts