Modeling and Management of Big Data in Databases: 1.2. NoSQL | Saylor Academy

1. Introduction

1.2. NoSQL

The existing paradigms for dealing with regular data are neither enough nor suitable to deal with Big Data requirements. For that reason, at the data storage level, the introduction of novel approaches, such as the NoSQL databases, is required. NoSQL refers to Not Only SQL, the term used for all the non-relational databases. NoSQL databases are considered schema-less, as they are designed to work without structure; however, in practice, there is a need for a self-sufficient model to define how data will be organized and retrieved from the database. To solve this requirement, some diverse NoSQL data models are proposed.

Data Models

A data model is a representation of the structure of the data for processing and organization. A data model is considered a primary element for storage, analysis and processing in storage systems.
Currently, storage systems are classified into two large groups, relational and non-relational. Within the relational, the well-known models are the Entity–Relationship (ER), Extended Entity Relationship (EER), Key-Cube and Multidimensional, among others. The objective of this article is not to present a deep study of the models considered as classic: they are well-known and do not need to be explained. We only develop a study of the models that are a novelty for Big Data.
For non-relational systems, there are the NoSQL databases; for them, the data models are classified into four main categories:

Column-oriented
Document-oriented
Graph
Key-value

Column-Oriented

In this model, data are represented in tabular form by columns and rows. The columns are identifiable by a partition key that is unique and mandatory and the rows by an optional clustering key. The primary key is the combination of the partition and clustering key. Basically, the schema of the tables consists of a set of columns, a primary key and a data type. For Database Management Systems (DBMS) that use the column-oriented data model, we can mention Accumulo, Amazon SimpleDB, Cassandra, Cloudata, Druid, Elassandra, Flink, HBase, Hortonworks, HPCC, Hypertable, IBM Informix, Kudu, MonetDB, Scylla and Splice Machine, among others.

Document-Oriented

In this model, data are stored in key-value pairs, value documents in XML, JSON or BSON formats. Each of the documents can have nested subdocuments, indexes, fields and attributes. As examples of DBMS that use the document-oriented data model, we can mention ArangoDB, Azure, BagriDB, Cloud Datastore, CouchDB, DocumentDB, Elastic, IBM Cloudant, MongoDB, NosDB, RavenDB, RethinkDB, SequoiaDB, ToroDB and UnQlite, among others.

Graph

This model consists of a graph that contains nodes and edges. A node represents an entity and an edge represents the relationship between entities. There are several graph structures: Undirected/directed, Labeled graphs, Attributed graphs, Bigdata, Multigraphs, Hypergraphs and Nested graphs, among others. Some examples of DBMS that use the graph data model are AllegroGraph, ArangoDB, Infinite Graph, GraphBase, HyperGraphDB, InfoGrid, Meronymy, Neo4j, Onyx Database, Titan, Trinity, Virtuoso OpenLink, Sparksee and WhiteDB.

Key-value

In this model, the data are represented by a key-value tuple. The key represents a unique identifier indexed to a value that represents data of arbitrary type, structure and size. Secondary keys and indexes are not supported. Aerospike, Azure Table Storage, BangDB, Berkeley DB, DynamoDB, GenieDB, KeyDB, Redis, Riak, Scalaris, Voldemort, among others are examples of DBMS that use the key-value data model.
Table 2 summarizes the main characteristics of NoSQL data models, such as its main concept, structure, techniques to create the data model, advantages and disadvantages.

Table 2. NoSQL characteristics.

Characteristic/Data Model	Column-Oriented	Document-Oriented	Graph-Oriented	Key-Value
Concept	A model that allows representing data in columns	A model that allows representing data via structured text	A model that allows representing data and their connections	A model that allows representing the data in a simple format (key and values)
Structure	Data are stored in tables	Nesting of key-value pairs	Set of data objects (nodes)	Tuple of two strings (key and value)
	Data are stored in tables	Each document identified by a unique identifier	Set of links between the objects (edges)	A key represents any entity's attribute
	Values in a column are stored consecutively	Any value can be a structured document		Values can be of any data type
		Key and value are separated by a colon ":"
		Key-value pairs are separated by commas ","
		Data enclosed in curly braces denotes documents
		Data enclosed in square brackets denotes array collection
Techniques	With compression: Lightweight encoding Bit-vector encoding Dictionary encoding Frame of reference encoding Differential encoding	Denormalized flat model	Simple direct graph Undirected multigraph Directed multigraph	NA
		Denormalized model with more structure (metadata)	Weighted graph
	With join algorithm	Shattered, equivalent to normalization (https://pdfs.semanticscholar.org/ea15/945ce9ec0c12b92794b8ace69ce44ebe40cc.pdf)	Hypergraph
	With late materialization		Nested graph
	Tuple at a time		Nested graph
Applications	Consumer data Inventory data	JSON documents XML documents	Social networks Supply-chain Medical records IT operations Transports	User profiles and their attributes
Advantages	High performance in loading and querying operations Efficient data compression and partitioning (both horizontally and vertically) Scalability Support for massive parallel processing Well-suited for Online Analytical Processing and OnLine Transaction Processing workloads	Support for multiple document types Support for atomicity, consistency, isolation and durability transactions Scalability Suitable for complex data, nested documents and arrays	Easy modeling Fast and simple querying Scalability	Easy design and implementation Fault tolerance Redundancy Scalability High speed
Disadvantages	Difficult to use wide-columns Delays in querying specific data	Information duplication across multiple documents Inconsistencies in complex designs	Lack of a standard declarative language Support to limited concurrency and parallelism	Very basic query language Some queries can only depend on the primary key

Course Introduction

Course Syllabus

Unit 1: Business Intelligence Overview

1.1: What is Business Intelligence?

Business Intelligence

Introduction to Business Intelligence

1.1.1: What Business Intelligence is Not

Frontiers of Business Intelligence and Analytics

Business Intelligence Dashboards

1.1.2: Business Intelligence vs. Competitive Intelligence

What is Competitive Intelligence?

1.1.3: From Systems Engineering to Business Engineering

Information Architecture Analysis

Systems Engineering

Business Engineering

1.2.1: Contemporary Applications

Business Intelligence in ERP

Improving Outcomes with Business Intelligence

How Businesses Use Information

1.2.2: BI Approaches for Each Lifecycle Stage

The Business Cycle

Big Data Analytics in Supply Chain Management

1.2.3: BI for Prediction

Goal-Oriented BI

Big Data Analytics

BI System Effectiveness

Data Mining Analytics for BI and Decision Support

1.3: The Future of BI

Future Trends in Information Systems

Internet Trends

Trends in Information Technology

Technology Trends in the COVID-19 Pandemic

The Future of BI

1.3.1: Adapting Business Models to Globalization and Technology

Global Business Strategies for Responding to Cultural Differences

Internationalization and the Need of Business Model Innovation

1.3.2: Maintaining the Firm-Centric Approach

Designing BI Solutions in the Era of Big Data

1.3.3: Incorporating Data from the Internet of Things (IoT)

The Internet of Things

The Cognitive Internet of Things and Big Data

Data Science in Heavy Industry and the Internet of Things

Causality and Variables

The Internet of Things is Revolutionary

Unit 1 Discussion

Unit 1 Study Resources

Unit 1 Review Video

Study Guide: Unit 1

Unit 1 Assessment

Unit 1 Assessment

Unit 2: BI as Business Support

2.1: Defining the Problem

Choice and Happiness

2.1.1: Framing Internal Client Discussions

Overview of Managerial Decision-Making

2.1.2: Drafting the Terms of Reference (TOR)

Defining the Scope of your Project

Developing Terms of Reference

2.1.3: Negotiating the Project Scope

Scope Planning

Negotiation

2.2: The Art and Science of Decision-Making

Decision-Making in Management

Decision-Making Processes in the Workplace

2.2.1: Thinking about Thinking

Experience vs. Memory

Evidence Logs and Metacognitive Logs

2.2.2: Use Analysis, or "Go with Your Gut"?

Problem Solving, Thinking, and Intelligence

Using a Heuristics Checklist

2.2.3: Decision-Making Approaches

Decision-Making Tools

2.2.4: Structuring Decision-Making Effectively

RAPID Decision-Making

2.3: Using Data to Make Decisions

Business Intelligence Dashboards

2.3.1: Everyday Data

2.3.2: Why Expert Judgement is No Better than Yours

Why You Think You're Right Even if You're Wrong

2.3.3: How Forecasting can Help Decision-Making