Analyzing Big Data can create significant advantages for an organization because it enables the discovery of patterns and correlations in datasets. This paper discusses the state of Big Data management with a particular focus on data modeling and data analytics.
Related Work
As mentioned in Section 1, the main goal of this paper is to present and discuss the concepts surrounding data modeling
and data analytics, and their evolution for three representative
approaches: operational databases, decision support databases and Big
Data technologies. In our survey we have researched related works that
also explore and compare these approaches from the data modeling or data
analytics point of view.
Table 3. Comparison of the approaches from the Data Analytics perspective
Approaches Features |
Class of Application Domains |
Common Operations |
Operations |
Concrete Languages |
Abstraction Level |
Technology Support |
Operational |
OLTP |
Read/Write |
Select, Insert, Update, Delete, Join, OrderBy, GroupBy |
SQL-DML |
Logical, Physical |
Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM DB2 |
Decision Support |
OLAP |
Read |
Slice, Dice, Drill down, Drill up, Pivot |
SQL-DML, MDX, XMLA |
Logical, Physical |
Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM DB2, Microsoft OLAP Provider, Microsoft Analysis Services |
Big Data |
Batch-oriented processing |
Read/Write |
Map-Reduce, Select, Insert, Update, Delete, Load, Import, Export, OrderBy, GroupBy |
Hive QL, Pig Latin |
Logical, Physical |
Hadoop, Hive Pig |
Stream processing |
Read/Write |
Aggregate, Partition, Merge, Join, |
SQL stream |
Logical, Physical |
Storm, S4, Spark |
|
OLTP |
Read/Write |
Select, Insert, Update, Delete, Batch, Get, OrderBy, GroupBy |
CQL, Java, JavaScript |
Logical, Physical |
Cassandra, HBase |
|
Interactive ad-hoc queries and analysis |
Read |
Select, Insert, Update, Delete, OrderBy, GroupBy |
SQL-DML |
Logical, Physical |
Drill |
J.H. ter Bekke provides a comparative study between the Relational, Semantic, ER and Binary data models based on an examination session results. In that session participants had to create a model of a case study, similar to the Academic Management System used in this paper. The purpose was to discover relationships between the modeling approach in use and the resulting quality. Therefore, this study just addresses the data modeling topic, and more specifically only considers data models associated to the database design process.
Several works focus on highlighting the differences between operational databases and data warehouses. For example, R. Hou provides an analysis between operational databases and data warehouses distinguishing them according to their related theory and technologies, and also establishing common points where combining both systems can bring benefits. C. Thomsen and T.B. Pedersen compare open source ETL tools, OLAP clients and servers, and DBMSs, in order to build a Business Intelligence (BI) solution.
P. Vassiliadis and T. Sellis conducted a survey that focuses only on OLAP databases and compare various proposals for the logical models behind them. They group the various proposals in just two categories: commercial tools and academic efforts, which in turn are subcategorized in relational model extensions and cube- oriented approaches. However, unlike our survey they do not cover the subject of Big Data technologies.
Several papers discuss the state of the art of the types of data stores, technologies and data analytics used in Big Data scenarios, however they do not compare them with other approaches. Recently, P. Chandarana and M. Vijayalakshmi focus on Big Data analytics frameworks and provide a comparative study according to their suitability.
Summarizing, none of the following mentioned work provides such a broad analysis like we did in this paper, namely, as far as we know, we did not find any paper that compares simultaneously operational databases, decision support databases and Big Data technologies. Instead, they focused on describing more thoroughly one or two of these approaches.