Analyzing Big Data can create significant advantages for an organization because it enables the discovery of patterns and correlations in datasets. This paper discusses the state of Big Data management with a particular focus on data modeling and data analytics.
Conclusions
In recent years, the term Big Data has
appeared to classify the huge datasets that are continuously being
produced from various sources and that are represented in a variety of
structures. Handling this kind of data represents new challenges,
because the traditional RDBMSs and DWs reveal serious limitations in
terms of performance and scalability when dealing with such a volume and
variety of data. Therefore, it is needed to reinvent the ways in which
data is represented and analyzed, in order to be able to extract value
from it.
This paper presents a survey focused on both these two
perspectives: data modeling and data analytics, which are reviewed in
terms of the three representative approaches nowadays: operational
databases, decision support databases and Big Data technologies. First,
concerning data modeling, this paper discusses the most common data
models, namely: relational model and ER model for operational databases;
star schema model and OLAP cube model for decision support databases;
and key-value store, document-oriented database, wide-column store and
graph database for Big Data-based technologies. Second, regarding data
analytics, this paper discusses the common operations used for each
approach. Namely, it observes that operational databases are more
suitable for OLTP applications, decision support databases are more
suited for OLAP applications, and Big Data technologies are more
appropriate for scenarios like batch-oriented processing, stream
processing, OLTP and interactive ad-hoc queries and analysis.
Third,
it compares these approaches in terms of the two perspectives and based
on some features of analysis. From the data modeling perspective, there
are considered features like the data model, its abstraction level, its
concepts, the concrete languages used to described, as well as the
modeling and database tools that support it. On the other hand, from the
data analytics perspective, there are taken into account features like
the class of application domains, the most common operations and the
concrete languages used to specify those operations. From this analysis,
it is possible to verify that there are several data models for Big
Data, but none of them is represented by any modeling language, neither
supported by a respective modeling tool. This issue constitutes an open
research area that can improve the development process of Big Data
targeted applications, namely applying a Model-Driven Engineering
approach. Finally, this paper also presents some related
work on the data modeling and data analytics areas.
As future
work, we consider that this survey may be extended to capture additional
aspects and comparison features that are not included in our analysis.
It will be also interesting to survey concrete scenarios where Big Data
technologies prove to be an asset. Furthermore, this survey
constitutes a starting point for our ongoing research goals in the
context of the Data Storm and MDD Lingo initiatives. Specifically, we
intend to extend existing domain-specific modeling languages, like XIS and XIS-Mobile, and their MDE-based framework to support
both the data modeling and data analytics of data-intensive
applications, such as those researched in the scope of the Data Storm
initiative.