Conclusions

In recent years, the term Big Data has appeared to classify the huge datasets that are continuously being produced from various sources and that are represented in a variety of structures. Handling this kind of data represents new challenges, because the traditional RDBMSs and DWs reveal serious limitations in terms of performance and scalability when dealing with such a volume and variety of data. Therefore, it is needed to reinvent the ways in which data is represented and analyzed, in order to be able to extract value from it.

This paper presents a survey focused on both these two perspectives: data modeling and data analytics, which are reviewed in terms of the three representative approaches nowadays: operational databases, decision support databases and Big Data technologies. First, concerning data modeling, this paper discusses the most common data models, namely: relational model and ER model for operational databases; star schema model and OLAP cube model for decision support databases; and key-value store, document-oriented database, wide-column store and graph database for Big Data-based technologies. Second, regarding data analytics, this paper discusses the common operations used for each approach. Namely, it observes that operational databases are more suitable for OLTP applications, decision support databases are more suited for OLAP applications, and Big Data technologies are more appropriate for scenarios like batch-oriented processing, stream processing, OLTP and interactive ad-hoc queries and analysis.

Third, it compares these approaches in terms of the two perspectives and based on some features of analysis. From the data modeling perspective, there are considered features like the data model, its abstraction level, its concepts, the concrete languages used to described, as well as the modeling and database tools that support it. On the other hand, from the data analytics perspective, there are taken into account features like the class of application domains, the most common operations and the concrete languages used to specify those operations. From this analysis, it is possible to verify that there are several data models for Big Data, but none of them is represented by any modeling language, neither supported by a respective modeling tool. This issue constitutes an open research area that can improve the development process of Big Data targeted applications, namely applying a Model-Driven Engineering approach. Finally, this paper also presents some related work on the data modeling and data analytics areas.

As future work, we consider that this survey may be extended to capture additional aspects and comparison features that are not included in our analysis. It will be also interesting to survey concrete scenarios where Big Data technologies prove to be an asset. Furthermore, this survey constitutes a starting point for our ongoing research goals in the context of the Data Storm and MDD Lingo initiatives. Specifically, we intend to extend existing domain-specific modeling languages, like XIS and XIS-Mobile, and their MDE-based framework to support both the data modeling and data analytics of data-intensive applications, such as those researched in the scope of the Data Storm initiative.