1. Introduction

1.2. NoSQL

The existing paradigms for dealing with regular data are neither enough nor suitable to deal with Big Data requirements. For that reason, at the data storage level, the introduction of novel approaches, such as the NoSQL databases, is required. NoSQL refers to Not Only SQL, the term used for all the non-relational databases. NoSQL databases are considered schema-less, as they are designed to work without structure; however, in practice, there is a need for a self-sufficient model to define how data will be organized and retrieved from the database. To solve this requirement, some diverse NoSQL data models are proposed.


Data Models

A data model is a representation of the structure of the data for processing and organization. A data model is considered a primary element for storage, analysis and processing in storage systems.
Currently, storage systems are classified into two large groups, relational and non-relational. Within the relational, the well-known models are the Entity–Relationship (ER), Extended Entity Relationship (EER), Key-Cube and Multidimensional, among others. The objective of this article is not to present a deep study of the models considered as classic: they are well-known and do not need to be explained. We only develop a study of the models that are a novelty for Big Data.
For non-relational systems, there are the NoSQL databases; for them, the data models are classified into four main categories:

  1. Column-oriented
  2. Document-oriented
  3. Graph
  4. Key-value

Column-Oriented

In this model, data are represented in tabular form by columns and rows. The columns are identifiable by a partition key that is unique and mandatory and the rows by an optional clustering key. The primary key is the combination of the partition and clustering key. Basically, the schema of the tables consists of a set of columns, a primary key and a data type. For Database Management Systems (DBMS) that use the column-oriented data model, we can mention Accumulo, Amazon SimpleDB, Cassandra, Cloudata, Druid, Elassandra, Flink, HBase, Hortonworks, HPCC, Hypertable, IBM Informix, Kudu, MonetDB, Scylla and Splice Machine, among others.


Document-Oriented

In this model, data are stored in key-value pairs, value documents in XML, JSON or BSON formats. Each of the documents can have nested subdocuments, indexes, fields and attributes. As examples of DBMS that use the document-oriented data model, we can mention ArangoDB, Azure, BagriDB, Cloud Datastore, CouchDB, DocumentDB, Elastic, IBM Cloudant, MongoDB, NosDB, RavenDB, RethinkDB, SequoiaDB, ToroDB and UnQlite, among others.


Graph

This model consists of a graph that contains nodes and edges. A node represents an entity and an edge represents the relationship between entities. There are several graph structures: Undirected/directed, Labeled graphs, Attributed graphs, Bigdata, Multigraphs, Hypergraphs and Nested graphs, among others. Some examples of DBMS that use the graph data model are AllegroGraph, ArangoDB, Infinite Graph, GraphBase, HyperGraphDB, InfoGrid, Meronymy, Neo4j, Onyx Database, Titan, Trinity, Virtuoso OpenLink, Sparksee and WhiteDB.


Key-value

In this model, the data are represented by a key-value tuple. The key represents a unique identifier indexed to a value that represents data of arbitrary type, structure and size. Secondary keys and indexes are not supported. Aerospike, Azure Table Storage, BangDB, Berkeley DB, DynamoDB, GenieDB, KeyDB, Redis, Riak, Scalaris, Voldemort, among others are examples of DBMS that use the key-value data model.
Table 2 summarizes the main characteristics of NoSQL data models, such as its main concept, structure, techniques to create the data model, advantages and disadvantages.

Table 2. NoSQL characteristics.

Characteristic/Data Model Column-Oriented Document-Oriented Graph-Oriented Key-Value
Concept A model that allows representing data in columns A model that allows representing data via structured text A model that allows representing data and their connections A model that allows representing the data in a simple format (key and values)
Structure Data are stored in tables Nesting of key-value pairs Set of data objects (nodes) Tuple of two strings (key and value)
Each document identified by a unique identifier Set of links between the objects (edges) A key represents any entity's attribute
Values in a column are stored consecutively Any value can be a structured document Values can be of any data type
Key and value are separated by a colon ":"
Key-value pairs are separated by commas ","
Data enclosed in curly braces denotes documents
Data enclosed in square brackets denotes array collection
Techniques With compression:
Lightweight encoding
Bit-vector encoding
Dictionary encoding
Frame of reference encoding
Differential encoding
Denormalized flat model Simple direct graph Undirected multigraph Directed multigraph NA
Denormalized model with more structure (metadata) Weighted graph
With join algorithm Shattered, equivalent to normalization (https://pdfs.semanticscholar.org/ea15/945ce9ec0c12b92794b8ace69ce44ebe40cc.pdf) Hypergraph
With late materialization Nested graph
Tuple at a time
Applications Consumer data
Inventory data
JSON documents
XML documents
Social networks
Supply-chain
Medical records
IT operations
Transports
User profiles and their attributes
Advantages High performance in loading and querying operations
Efficient data compression and partitioning (both horizontally and vertically)
Scalability
Support for massive parallel processing
Well-suited for Online Analytical Processing and OnLine Transaction Processing workloads
Support for multiple document types
Support for atomicity, consistency, isolation and durability transactions
Scalability
Suitable for complex data, nested documents and arrays
Easy modeling
Fast and simple querying
Scalability
Easy design and implementation
Fault tolerance
Redundancy
Scalability
High speed
Disadvantages Difficult to use wide-columns
Delays in querying specific data
Information duplication across multiple documents
Inconsistencies in complex designs
Lack of a standard declarative language
Support to limited concurrency and parallelism
Very basic query language
Some queries can only depend on the primary key