TEI and Document Modeling
The Text Encoding Initiative is the dominant standard for encoding humanities texts. Its guidelines offer a vast vocabulary of elements for marking up everything from verse lines to manuscript damage, while document modeling decides which of those features a given project will capture and how.
Definition
The use of the Text Encoding Initiative guidelines to create machine-readable representations of texts, together with the analytical work of deciding which document features to model and how to constrain a project's markup.
Scope
Covers the TEI Guidelines and their use in modeling documents: the structure of TEI P5, the TEI header and metadata, customization through schemas, and the practice of deciding what to encode for a given source and purpose. Includes the institutional history of the TEI Consortium and the role of community standards in scholarly encoding.
Core questions
- What does the TEI offer that ad hoc markup does not?
- How does a project customize the TEI to fit its sources without sacrificing interchange?
- Which features of a document are worth modeling, and at what cost?
- How do the TEI header and metadata support discovery and reuse?
Key concepts
- TEI header
- Customization (ODD)
- Element set
- Schema validation
- Standoff annotation
Key theories
- Community-maintained encoding standard
- The TEI is governed by a consortium that maintains an extensible, documented vocabulary, so that encoding choices are grounded in shared practice rather than reinvented for every project.
- Customization and constraint
- Because the full TEI is very large, projects define a customization (a constrained schema) that selects and adapts elements, balancing expressive coverage against consistency and validation.
History
The TEI was launched in 1987 by a consortium of scholarly associations to standardize humanities text encoding. Early editions (P1-P4) were SGML-based; TEI P5, released in 2007 and revised continuously since, is expressed in XML and supports customization through the ODD (One Document Does it all) framework. The standard now underlies a wide range of editions, corpora, and archives.
Debates
- Comprehensiveness versus usability
- The breadth of the TEI makes it powerful but daunting; debate continues over how much projects should customize and whether simpler subsets better serve interoperability.
Key figures
- Lou Burnard
- C. M. Sperberg-McQueen
- Nancy Ide
- Allen Renear
Related topics
Seminal works
- tei2024
- ide1995
- burnard2014
Frequently asked questions
- Do I have to use the whole TEI to use the TEI?
- No. Projects normally define a customization that selects the elements they need and constrains how they are used. This keeps encoding manageable and consistent while remaining compatible with the wider standard.