ScholarGate
Pembantu

Clinical Data Warehouse Design and Architecture

A clinical data warehouse is an integrated, query-oriented repository that consolidates data from a health system's transactional sources so they can be analyzed without disrupting operational care systems. Its design and architecture determine how source data are extracted, modeled, and exposed for research, quality measurement, and operational reporting.

Cari Topik dengan PaperMindTidak lama lagiFind papers & topics
Tools & resources
Muat turun slaid
Learn & explore
VideoTidak lama lagi

Definition

Clinical data warehouse design is the architecture and engineering of integrated repositories that consolidate health data from multiple operational sources into a structure optimized for querying, analysis, and reuse rather than for transactional care.

Scope

This topic covers the architectural patterns behind clinical data warehouses: the separation of analytic from transactional systems, extract-transform-load (ETL) pipelines, dimensional versus normalized modeling, and the use of common data models to make queries portable. It treats warehouse design as an informatics and data-engineering topic, not as operational instructions for any specific platform.

Key concepts

  • Separation of analytic and transactional (OLAP vs OLTP) workloads
  • Extract-transform-load (ETL) pipelines
  • Dimensional modeling (star and snowflake schemas)
  • Normalized (third-normal-form) enterprise warehouse design
  • Common data models
  • Data marts
  • Metadata and data lineage
  • Slowly changing dimensions

Mechanisms

Operational systems such as electronic health records are optimized for fast individual transactions, which makes them poorly suited to large analytic queries. A clinical data warehouse addresses this by periodically extracting data from those sources, transforming and cleaning them, and loading them into a separate repository structured for analysis. Two influential design traditions inform the modeling layer: the normalized enterprise-warehouse approach associated with Inmon, and the dimensional star-schema approach associated with Kimball, which organizes data into fact and dimension tables for efficient aggregation. In research settings, platforms such as i2b2 organize patient data around a star schema and a controlled ontology so that investigators can query cohorts. Mapping the warehouse to a common data model lets the same query run across institutions.

Clinical relevance

The architecture of a clinical data warehouse shapes what analyses are feasible and how reliably cohorts can be identified, which in turn affects quality measurement and research that informs care. Understanding warehouse design helps users interpret where analytic data come from and what transformations they have undergone. This is a reference description of infrastructure and does not provide individual clinical guidance.

History

Data warehousing emerged in general information systems in the late twentieth century, with Inmon's normalized enterprise model and Kimball's dimensional model framing the major design debate. Health care adopted these patterns as electronic records accumulated reusable data; research-oriented platforms such as i2b2 in 2010 demonstrated warehouse architectures tailored to clinical cohort discovery, and common data models later standardized cross-institutional querying.

Debates

Normalized enterprise warehouse versus dimensional modeling
Designers differ on whether to build a normalized, integrated enterprise warehouse (the Inmon tradition) from which data marts are derived, or to build dimensional star-schema marts directly (the Kimball tradition); the choice trades integration and flexibility against query simplicity and speed.

Key figures

  • William H. Inmon
  • Ralph Kimball
  • Shawn N. Murphy
  • Isaac Kohane

Related topics

Seminal works

  • inmon-2005
  • kimball-ross-2013
  • murphy-2010

Frequently asked questions

Why not just run analytics directly on the electronic health record database?
Transactional systems are tuned for many small reads and writes that support live care, so large analytic queries can slow them down and risk affecting clinical operations. A data warehouse separates analysis from care delivery and structures the data for efficient querying.
What is a common data model and why does it matter for warehouse design?
A common data model is a shared schema and vocabulary that multiple institutions adopt for their warehouses. Mapping to it lets the same analytic query run across sites without rewriting, which supports multi-institution research and reproducibility.

Methods for this concept

Related concepts