Data Models and Query Languages
Data models are the conceptual frameworks that specify how data is structured, related, and constrained, and query languages are the formal notations used to retrieve and manipulate data within those models.
Definition
A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints; a query language is a notation for requesting information from, and modifying, a database expressed in such a model.
Scope
This area covers the abstractions used to describe data — the relational model, entity-relationship diagrams, and semistructured and document models — together with the declarative and algebraic languages, chiefly relational algebra and SQL, used to express queries and updates against them. It treats how a model defines schemas, instances, keys, and integrity constraints, and how a query language's expressiveness relates to the underlying model. It excludes the physical storage, indexing, and execution of queries (covered in query processing and optimization) and the design discipline of refining schemas (covered in database design and normalization).
Sub-topics
Core questions
- How does a data model represent entities, relationships, and constraints?
- What is the relationship between the relational model and the relational algebra and SQL?
- How is a conceptual entity-relationship design mapped to a logical relational schema?
- How do semistructured and document models trade rigid schemas for flexibility?
- What determines the expressive power and limits of a query language?
Key concepts
- relation, tuple, and attribute
- schema and instance
- keys and integrity constraints
- relational algebra
- SQL
- entity-relationship diagram
- semistructured data
- document and JSON models
- data independence
Key theories
- The relational model
- Codd's relational model represents all data as relations (sets of tuples over named attributes), separating the logical view of data from its physical storage and providing a mathematical foundation based on set theory and predicate logic.
- Relational completeness
- A query language is relationally complete if it can express every query expressible in the relational algebra; this criterion, introduced by Codd, sets a baseline for the expressive power of practical languages such as SQL.
- The entity-relationship model
- Chen's entity-relationship model describes the world in terms of entities, their attributes, and the relationships among them, providing a high-level conceptual design notation that can be systematically translated into relational schemas.
Clinical relevance
Data models and query languages are the foundation of essentially all information systems: the relational model and SQL underpin enterprise databases, financial systems, and web back ends, while entity-relationship modeling structures requirements analysis, and document and semistructured models support flexible storage of web, log, and configuration data.
History
Early databases used hierarchical and network (CODASYL) models with navigation tied to physical storage. Codd's 1970 relational model introduced data independence and a declarative algebra; this led to System R and Ingres in the 1970s and to SQL, which became an ISO standard. Chen's 1976 entity-relationship model added a conceptual design layer, and semistructured and document models emerged with the web and XML in the late 1990s.
Key figures
- Edgar F. Codd
- Peter Chen
- Jeffrey D. Ullman
- Jennifer Widom
Related topics
Seminal works
- codd1970
- chen1976
- silberschatz2019
Frequently asked questions
- What is the difference between relational algebra and SQL?
- Relational algebra is a procedural mathematical language of operators (selection, projection, join, union, etc.) over relations, used to reason about and optimize queries. SQL is the practical declarative language used in real systems; it is based on relational algebra and calculus but adds features such as grouping, aggregation, null handling, and multiset (bag) semantics.
- Why model data conceptually before designing tables?
- Conceptual models such as entity-relationship diagrams capture requirements in terms users understand — things and the relationships between them — independent of implementation. Translating a validated conceptual design into relational tables reduces redundancy and design errors compared with inventing tables directly.