ScholarGate
עוזר

Big Data and NoSQL Systems

Big data and NoSQL systems are the data-management technologies built for volume, velocity, and variety that relational databases struggled with, trading strict relational guarantees for horizontal scalability, flexible schemas, and high availability.

מציאת נושא עם PaperMindבקרובFind papers & topics
Tools & resources
הורדת מצגת
Learn & explore
וידאובקרוב

Definition

Big data systems are data-management platforms engineered for data sets too large, fast, or varied for traditional single-node databases; NoSQL systems are non-relational stores that adopt flexible data models and relaxed consistency to achieve horizontal scalability and availability.

Scope

This area covers data systems designed for massive scale: NoSQL stores (key-value, document, wide-column, and graph) and their flexible data models; data-parallel processing frameworks descended from MapReduce; the consistency-availability trade-offs captured by the CAP theorem and the spectrum of consistency models; and data warehousing and OLAP for large-scale analytics. It treats how these systems relax or reorganize relational assumptions for scale. It excludes the internals of distributed commit and parallel query execution, which are covered in the distributed and parallel databases area.

Sub-topics

Core questions

  • What scalability and flexibility needs drove the move beyond relational databases?
  • What data models do the main NoSQL categories provide?
  • How do data-parallel frameworks process huge data sets across clusters?
  • What consistency-availability trade-offs does the CAP theorem describe?
  • How do data warehouses and OLAP support large-scale analytical queries?

Key concepts

  • key-value, document, wide-column, graph stores
  • horizontal scalability
  • schema flexibility
  • MapReduce and data-parallel processing
  • CAP theorem
  • eventual consistency
  • BASE versus ACID
  • data warehousing and OLAP

Key theories

Horizontally scalable NoSQL stores
NoSQL systems abandon the single-node relational model in favor of key-value, document, wide-column, or graph models that shard and replicate across commodity clusters, prioritizing scalability and availability over rich querying and strong consistency.
Data-parallel processing
Frameworks following the MapReduce model express large-scale computations as parallel map and reduce phases over partitioned data, hiding the complexity of distribution, scheduling, and fault tolerance from the programmer.
CAP trade-off
The CAP theorem states that a distributed data store cannot simultaneously guarantee consistency, availability, and partition tolerance, forcing designers to choose, during a network partition, between consistency and availability.

Clinical relevance

Big data and NoSQL systems power the data infrastructure of the modern web: key-value and wide-column stores back high-traffic services, data-parallel frameworks process logs and clickstreams at scale, and data warehouses serve business analytics, making these systems central to data engineering and large-scale applications.

History

Internet-scale workloads in the 2000s exceeded what single-node relational databases could handle. Google's MapReduce (2004/2008) and the open-source Hadoop ecosystem enabled cluster-scale data processing; Amazon's Dynamo (2007) and Google's Bigtable inspired a wave of NoSQL stores; and Brewer's CAP theorem framed the consistency-availability trade-offs these systems embody.

Debates

Strong versus eventual consistency
NoSQL systems often choose availability and eventual consistency to stay responsive under partitions, but this pushes conflict handling onto applications; the field debates when eventual consistency is acceptable versus when newer systems should restore stronger guarantees.

Key figures

  • Jeffrey Dean
  • Sanjay Ghemawat
  • Eric Brewer
  • Werner Vogels

Related topics

Seminal works

  • dean2008
  • decandia2007
  • brewer2012

Frequently asked questions

Does NoSQL mean no SQL at all?
No. NoSQL is usually read as 'not only SQL.' It refers to data stores that are not built on the relational model and do not center on SQL, but many NoSQL systems offer SQL-like query interfaces, and the term covers a broad family — key-value, document, wide-column, and graph databases — rather than a single technology.
When should I choose a NoSQL system over a relational database?
NoSQL systems are attractive when you need to scale horizontally across many machines, store flexible or rapidly evolving data, or maximize availability for simple access patterns. Relational databases remain preferable when you need rich queries, complex joins, and strong transactional consistency over structured data.

Methods for this concept

Related concepts