Data Modeling and Data Analytics: Conclusions | Saylor Academy

Conclusions

In recent years, the term Big Data has appeared to classify the huge datasets that are continuously being produced from various sources and that are represented in a variety of structures. Handling this kind of data represents new challenges, because the traditional RDBMSs and DWs reveal serious limitations in terms of performance and scalability when dealing with such a volume and variety of data. Therefore, it is needed to reinvent the ways in which data is represented and analyzed, in order to be able to extract value from it.

This paper presents a survey focused on both these two perspectives: data modeling and data analytics, which are reviewed in terms of the three representative approaches nowadays: operational databases, decision support databases and Big Data technologies. First, concerning data modeling, this paper discusses the most common data models, namely: relational model and ER model for operational databases; star schema model and OLAP cube model for decision support databases; and key-value store, document-oriented database, wide-column store and graph database for Big Data-based technologies. Second, regarding data analytics, this paper discusses the common operations used for each approach. Namely, it observes that operational databases are more suitable for OLTP applications, decision support databases are more suited for OLAP applications, and Big Data technologies are more appropriate for scenarios like batch-oriented processing, stream processing, OLTP and interactive ad-hoc queries and analysis.

Third, it compares these approaches in terms of the two perspectives and based on some features of analysis. From the data modeling perspective, there are considered features like the data model, its abstraction level, its concepts, the concrete languages used to described, as well as the modeling and database tools that support it. On the other hand, from the data analytics perspective, there are taken into account features like the class of application domains, the most common operations and the concrete languages used to specify those operations. From this analysis, it is possible to verify that there are several data models for Big Data, but none of them is represented by any modeling language, neither supported by a respective modeling tool. This issue constitutes an open research area that can improve the development process of Big Data targeted applications, namely applying a Model-Driven Engineering approach. Finally, this paper also presents some related work on the data modeling and data analytics areas.

As future work, we consider that this survey may be extended to capture additional aspects and comparison features that are not included in our analysis. It will be also interesting to survey concrete scenarios where Big Data technologies prove to be an asset. Furthermore, this survey constitutes a starting point for our ongoing research goals in the context of the Data Storm and MDD Lingo initiatives. Specifically, we intend to extend existing domain-specific modeling languages, like XIS and XIS-Mobile, and their MDE-based framework to support both the data modeling and data analytics of data-intensive applications, such as those researched in the scope of the Data Storm initiative.

Course Introduction

Course Syllabus

Unit 1: Defining the Business Objective and Sourcing Data

1.1: Data Analysis Processes

Lifecycle of a Data Analysis Project

The Market Research Process

1.2: Data Analysis Business Objectives

Big Data Stream Analytics for Sentiment Analysis

Data Modeling and Data Analytics

1.3: Data Collection and Gathering Best Practices

Using BI and Decision-Making Process in Start-ups

Unit 1 Study Resources

Unit 1 Review Video

Unit 1 Review Slides

Study Guide: Unit 1

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Data Analysis

2.1: Data Analysis Methods and Models

Introduction to Analytics

The Stages of Analytics Development

Quantitative Methods

Qualitative Methods

Quantitative and Qualitative Data

Statistical Language

The Difference between Qualitative and Quantitative

Qualitative and Quantitative Research

Data-Driven Decisions

Research Design

2.2: Synthesizing Data Findings

Measures of the Center of the Data

Frequency, Frequency Tables, and Levels of Measurement

Frequency Tables

Unit 2 Study Resources

Unit 2 Review Video

Unit 2 Review Slides

Study Guide: Unit 2

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Visualization Principles and Processes

3.1: Visualization Concepts and Definitions

Data Visualization

Why Is Data Visualization Important?

Presenting Data in Meaningful and Interesting Ways

3.2: Interactive Visualizations and Dashboards

Data Visualization

Interactive Visualization of Refugee Demographics in the U.S.

3.3: Challenges in Visualization

Visualization in Exploratory Data Analysis

Visualizing Big Data with Augmented and Virtual Reality

Describing Data

Unit 3 Study Resources

Unit 3 Review Video

Unit 3 Review Slides

Study Guide: Unit 3

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Visualization Tools and Techniques

4.1: Number Representation

Visual Aids

Using PowerPoint with Excel

Using Charts with Word and PowerPoint

Describing Data

4.2: Formatting and Organizing Data

Best Visualization Practices

4.3: Selecting Visual Representations

Visualization Tools

Visualization Thought Process

4.4: Representing Data Values

Presenting Data with Graphs and Tables

4.5: Coordinating Data Positions and Scales

Improving Visualizations

Unit 4 Study Resources

Unit 4 Review Video

Unit 4 Review Slides

Study Guide: Unit 4

Unit 4 Assessment

Unit 4 Assessment

Unit 5: Evaluating Data Visualizations

5.1: Develop the Data Story