ScholarGate
Asystent

Data Warehousing and OLAP

Data warehousing consolidates data from many sources into a query-optimized store for analysis, and online analytical processing (OLAP) provides the multidimensional model and operations that let analysts explore that data interactively.

Znajdź temat z PaperMindWkrótceFind papers & topics
Tools & resources
Pobierz slajdy
Learn & explore
WideoWkrótce

Definition

A data warehouse is a consolidated, query-optimized repository of integrated historical data drawn from multiple operational sources for analysis; OLAP is the technology that organizes such data into a multidimensional model and supports fast aggregate queries and interactive exploration.

Scope

This topic covers the analytical side of data management: the data warehouse as an integrated, subject-oriented store separate from operational systems; the extract-transform-load (ETL) pipeline that populates it; dimensional modeling with star and snowflake schemas of facts and dimensions; the multidimensional data cube and OLAP operations (roll-up, drill-down, slice, dice, pivot); and the contrast between analytical (OLAP) and transactional (OLTP) workloads. It excludes transactional concurrency control and the general NoSQL stores, which are adjacent topics.

Core questions

  • How does a data warehouse differ from an operational (OLTP) database?
  • What is dimensional modeling, and how do star and snowflake schemas organize facts and dimensions?
  • How does the data cube generalize group-by and support multidimensional analysis?
  • What do the OLAP operations roll-up, drill-down, slice, dice, and pivot do?
  • How is the ETL process used to integrate and load warehouse data?

Key concepts

  • data warehouse
  • extract-transform-load (ETL)
  • star and snowflake schemas
  • fact and dimension tables
  • data cube
  • roll-up, drill-down, slice, dice, pivot
  • materialized views
  • OLAP versus OLTP

Key theories

Dimensional modeling
Warehouses are commonly modeled with star and snowflake schemas in which a central fact table of measurements references surrounding dimension tables (time, product, location), optimizing for the aggregate, read-heavy queries analysts run.
The data cube and OLAP operations
The data-cube operator generalizes group-by to compute aggregates over all combinations of dimensions, supporting roll-up, drill-down, slice, dice, and pivot for interactive multidimensional analysis.
Separation of OLAP from OLTP
Analytical workloads scan and aggregate large volumes of historical data, which differs fundamentally from short transactional updates, motivating a separate, integrated, read-optimized warehouse populated by ETL from operational systems.

Clinical relevance

Data warehousing and OLAP are the foundation of business intelligence: organizations consolidate operational data into warehouses and use OLAP to analyze sales, finance, and operations across dimensions such as time, region, and product, making these technologies central to data-driven decision-making.

History

Data warehousing emerged in the early 1990s as organizations separated analytical querying from operational databases; Kimball's dimensional-modeling approach and Inmon's enterprise-warehouse approach shaped the field. The data-cube operator (Gray et al., 1997) formalized multidimensional aggregation, and Chaudhuri and Dayal's 1997 overview consolidated warehousing and OLAP technology that underlies modern analytics platforms.

Key figures

  • Surajit Chaudhuri
  • Umeshwar Dayal
  • Jim Gray
  • Ralph Kimball

Related topics

Seminal works

  • chaudhuri1997
  • gray1997
  • kimball2013

Frequently asked questions

What is the difference between OLAP and OLTP?
OLTP (online transaction processing) handles many short read-write transactions, such as placing an order, with an emphasis on consistency and fast updates. OLAP (online analytical processing) handles complex read-mostly queries that aggregate large volumes of historical data for analysis. Warehouses are designed for OLAP and are kept separate from the OLTP systems that feed them.
Why use a star schema instead of a fully normalized design?
Analytical queries typically join a large fact table to several dimension tables and aggregate. A star schema deliberately denormalizes dimensions to minimize joins and make these aggregate queries fast and intuitive. The redundancy that normalization would remove is acceptable here because the warehouse is loaded in bulk and queried far more than it is updated.

Methods for this concept

Related concepts