NoSQL Data Stores
NoSQL data stores are non-relational databases — key-value, document, wide-column, and graph — that adopt flexible data models and distribution strategies to scale horizontally and stay available at the cost of some relational guarantees.
Definition
A NoSQL data store is a database that departs from the relational model, organizing data as key-value pairs, documents, wide sparse columns, or graphs, and typically distributing it across a cluster with replication and relaxed consistency to achieve scalability and availability.
Scope
This topic covers the main categories of NoSQL systems and their data models: key-value stores for simple lookups, document stores for nested records, wide-column stores for sparse, large tables, and graph databases for highly connected data. It treats the design choices common to these systems — sharding, replication, and tunable consistency — and the access patterns each model suits. It excludes the broad consistency theory (CAP and consistency models) and processing frameworks, which are adjacent topics.
Core questions
- What data model does each NoSQL category (key-value, document, wide-column, graph) provide?
- What access patterns and workloads suit each category?
- How do NoSQL stores shard and replicate data for scale and availability?
- What relational features (joins, transactions, schemas) do they relax, and why?
- How do tunable consistency settings let applications balance latency and freshness?
Key concepts
- key-value store
- document store
- wide-column store
- graph database
- sharding and replication
- tunable consistency
- schema flexibility
- denormalized access patterns
Key theories
- Key-value and wide-column models
- Key-value stores map opaque keys to values for simple, fast lookups, while wide-column stores organize data into rows with flexible, sparse column families; both, exemplified by Dynamo and Bigtable, scale to huge clusters with sharding and replication.
- Document and graph models
- Document stores hold self-describing nested records (often JSON) and support queries over their structure, while graph databases model entities and relationships as nodes and edges optimized for traversal of highly connected data.
- Relaxed guarantees for scale
- To scale horizontally and remain available, many NoSQL stores relax schemas, drop multi-row transactions and joins, and offer tunable or eventual consistency, shifting some responsibility for integrity to the application.
Clinical relevance
NoSQL stores are widely used building blocks of internet services: key-value and wide-column stores back session state, catalogs, and time-series data at massive scale, document stores fit flexible application data, and graph databases power recommendation and fraud-detection systems, making knowledge of their models essential for data engineering.
History
The NoSQL movement grew from internet companies' need to scale beyond single-node relational databases. Google's Bigtable (2006/2008) introduced the wide-column model and Amazon's Dynamo (2007) the highly available, eventually consistent key-value model; these influential designs spawned a generation of open-source key-value, document, wide-column, and graph databases in the late 2000s and 2010s.
Key figures
- Werner Vogels
- Jeffrey Dean
- Sanjay Ghemawat
Related topics
Seminal works
- decandia2007
- chang2008
Frequently asked questions
- How do I choose among key-value, document, wide-column, and graph stores?
- Match the model to the access pattern: key-value for simple lookups by a known key; document for self-contained, nested records queried by their fields; wide-column for very large, sparse tables with predictable row-key access; and graph for data dominated by relationships and traversals, such as social networks or recommendations.
- Do NoSQL stores support transactions?
- Historically many NoSQL stores offered only single-key atomic operations and no multi-record transactions, trading them for scalability. That has changed: a number of modern NoSQL and 'NewSQL' systems now provide multi-document or even distributed transactions, so transactional support varies widely and should be checked per system.