Methodologies for Data Warehousing: Literature Review | Saylor Academy

Literature Review

The data warehouse refreshes have been a research topic for more than a decade. The research is mostly related to storing and maintaining the current state of data. Current state of data fails to provide data lineage information. Discarding updates between two refresh points of time with periodic complete reloads leads to a loss of transaction lineage. Most previous work on data warehousing focused on design issues, data maintenance strategies in connection with relational view materialization and implementation aspects. There has been little research work done to date on capturing transaction lineage and the temporal view maintenance problem and most of the previous research ignores the temporal aspects of data warehousing. There is a consensus that Information Systems research must respond to theoretical contributions and make attempt to solving the current and anticipated problems of practitioners. There is a need for coming up with mechanisms to store transaction lineage in conventional databases. Current commercial database systems provide little built-in capabilities to capture transaction lineage or to support query language for temporal data management. As of today, a few companies started providing time-referenced data storing functionality and SQL facilities in their DBMS system. In data warehouses, data comes from many sources and data warehouse refreshes happen several times a day. Data warehouse is a shared environment and the data in it is typically used by so many applications. These applications may need a different time-slice of data. The data warehouses must cope with the temporal granularities of data.

Temporal data warehouses raise many issues including consistent aggregation in presence of time-varying data, temporal queries of multidimensional data, storage method, and temporal view materialization. The temporal aggregation problem was studied in to address the challenges of temporal data. Much research is now being done to improve the efficiency of range aggregate queries in a temporal data warehouse. Kaufmann presents a native support of temporal features for in-memory analytics databases. Stonebraker suggests new SQL for business intelligence (BI) queries as they are so resource-heavy that they get in the way of timely responses to transactions. The new SQL for temporal data with timestamp-based versioning is also very much needed.

Wang et al. study the "problem of how to maintain temporal consistency of real-time data in distributed real-time systems.c Malinowski and Zimanyi provide a conceptual model for temporal data warehouses that "support for levels, attributes, hierarchies, and measures". Chau and Chittayasothorn proposed a temporal object relational SQL language with attribute time-stamping – a superset of OR SQL language. Viqueira and Lorentzos propose an SQL extension for the management of spatio-temporal data. Mkaouar et al. study how to simplify querying and manipulating temporal facts in SQL by integrating time in a native manner. Li et al., Kvet and Matiasko, and Jestes et al. provide insights in ranking large temporal data. Gupta et al. provide an overview of outlier detections for various forms of temporal data.

In this article, we focus on an innovative approach for dealing with transaction lineage and storing them with time-stamp granularities. We present methodologies for refreshing data warehouses with time-varying data via batch cycles. This is suitable for large data warehouses with hundreds of subject areas and thousands of tables where refreshes occur in a span of one to four-hour windows. We propose the use of conventional extract-transform-load (ETL) tools to extract data from source systems and load the staging subject areas in the data warehouse without performing any kind of transformation tasks. As soon as staging tables are refreshed, the data warehouse software performs transformations to insert new rows in the actual data warehouse (analytical subject areas) tables and also update the tables by applying row expired timestamps to the preexisting rows that correspond to the newly arrived rows. ETL represents the most important stage of the (temporal) data warehouse design as 70% of the risk and effort attributed to this stage. We also examine the possibility of using metadata tables to recompile views based on subject area refresh timestamps. We show that there are opportunities to use different performance improvement features, such as indexing conventional commercial databases to load and query temporal data in the commercial non-temporal databases as these features are very important to handle large volume of transaction lineage data.

Course Introduction

Course Syllabus

Unit 1: Business Intelligence Overview

1.1: What is Business Intelligence?

Business Intelligence

Introduction to Business Intelligence

1.1.1: What Business Intelligence is Not

Frontiers of Business Intelligence and Analytics

Business Intelligence Dashboards

1.1.2: Business Intelligence vs. Competitive Intelligence

What is Competitive Intelligence?

1.1.3: From Systems Engineering to Business Engineering

Information Architecture Analysis

Systems Engineering

Business Engineering

1.2.1: Contemporary Applications

Business Intelligence in ERP

Improving Outcomes with Business Intelligence

How Businesses Use Information

1.2.2: BI Approaches for Each Lifecycle Stage

The Business Cycle

Big Data Analytics in Supply Chain Management

1.2.3: BI for Prediction

Goal-Oriented BI

Big Data Analytics

BI System Effectiveness

Data Mining Analytics for BI and Decision Support

1.3: The Future of BI

Future Trends in Information Systems

Internet Trends

Trends in Information Technology

Technology Trends in the COVID-19 Pandemic

The Future of BI

1.3.1: Adapting Business Models to Globalization and Technology

Global Business Strategies for Responding to Cultural Differences

Internationalization and the Need of Business Model Innovation

1.3.2: Maintaining the Firm-Centric Approach

Designing BI Solutions in the Era of Big Data

1.3.3: Incorporating Data from the Internet of Things (IoT)

The Internet of Things

The Cognitive Internet of Things and Big Data

Data Science in Heavy Industry and the Internet of Things

Causality and Variables

The Internet of Things is Revolutionary

Unit 1 Discussion

Unit 1 Study Resources

Unit 1 Review Video

Study Guide: Unit 1

Unit 1 Assessment

Unit 1 Assessment

Unit 2: BI as Business Support

2.1: Defining the Problem

Choice and Happiness

2.1.1: Framing Internal Client Discussions

Overview of Managerial Decision-Making

2.1.2: Drafting the Terms of Reference (TOR)

Defining the Scope of your Project

Developing Terms of Reference

2.1.3: Negotiating the Project Scope

Scope Planning

Negotiation

2.2: The Art and Science of Decision-Making

Decision-Making in Management

Decision-Making Processes in the Workplace

2.2.1: Thinking about Thinking

Experience vs. Memory

Evidence Logs and Metacognitive Logs

2.2.2: Use Analysis, or "Go with Your Gut"?

Problem Solving, Thinking, and Intelligence

Using a Heuristics Checklist

2.2.3: Decision-Making Approaches

Decision-Making Tools

2.2.4: Structuring Decision-Making Effectively

RAPID Decision-Making

2.3: Using Data to Make Decisions

Business Intelligence Dashboards

2.3.1: Everyday Data

2.3.2: Why Expert Judgement is No Better than Yours

Why You Think You're Right Even if You're Wrong

2.3.3: How Forecasting can Help Decision-Making