Interactive Visualizations of Big Data: 2. Theoretical Background | Saylor Academy

2. Theoretical Background

The profession of accounting has been handling large data sets for a long time; however, the integration of various data sources increases the necessity to change current evaluation and reporting practices. Big Data is not only increasing the data volume for analysis, but also its veracity and velocity. Visualizations and in particular new, complex type II visualizations are an essential component of identifying relationships, outliers and patterns within these large structured and unstructured data sets. However, based on previous studies it seems that newer and interactive visualizations are creating barriers inhibiting their widespread use.

According to the literature on adaption and/or resistance in an information systems context, it can be identified that barriers manifest if the costs outweigh its benefits. In a first step toward ascertaining whether type II visualizations are a vain attempt by domain experts, we are going to evaluate "benefits". Hence, we provide evidence for the need to adapt existing and frequently used visualizations to account for changes in tasks and data characteristics. In a second step, we are investigating potential reasons that enhance "costs". These seem to arise due to inherent factors of the technology (e.g. a poor visualization design, no accessibility to interactive type II visualizations, the necessity of purchasing software) and due to factors associated with human-computer interaction (e.g. poor interaction design, little or no experience with new visualization types).

2.1 Benefits of domain specific interactive visualizations

2.1.1 Information processing and visualizations

Visualizations per se have been identified to be useful for information processing since the 1970s. This is the case because visualizing information supports specific features of the data as well as various abilities of the decision maker. This interplay allows for visualizations to be seen as a mode that speaks a unified language, that supports the comprehension of large amounts of information, and that enhances the ability of humans to detect patterns, trends and sequences. As a result, visualizations are said to boost information processing by relying on the human perceptual system, which is highly developed and allows multiple processes to be executed simultaneously.

However, for an efficient and effective usage of visualizations, they need to be adjusted depending on the situation and the user. The need for adjustment concerning task characteristics was conceptualized in the 1990s by the theory of cognitive fit; this theory states that efficient and effective decision-making can only be achieved if the external representation (the visualization handed to the decision maker or user) fits the user's internal representation (the mental representation the decision maker associates with the task). Otherwise, additional cognitive effort is needed for processing, which deteriorates decision-making quality. This theory has been adjusted multiple times, allowing for other important influences such as data and user characteristics to be recognized.

Acknowledging this fact has led to a broad bandwidth of visualization options. The bandwidth thereby ranges from visualizations encountered in everyday life such as pie, line, or bar charts (for examples see Figure 1) to domain specific multi-dimensional and interactive visualization types such as parallel coordinates, scatterplot matrices, or force-directed graphs (for examples see Figure 2). Each visualization type is introduced because it particularly supports specific tasks and data sets in order to enhance decision-making.

Figure 1. Visualizations used in everyday life (type I visualizations)

Figure 2. Visualizations designed to cope with large structured and unstructured data sets (type II visualizations)

2.1.2 Domain specific visualizations

Consequently, when looking at big, complex and unstructured data sets it seems reasonable to adjust visualization practice accordingly. Two competing strategies (and a combination of both) can be found in the context of visualizing these data sets:

first, the use of new and interactive type II visualizations; and
second, the use of type I visualizations, particularly in an interactive form in combination with computer supported aggregation techniques to report summary data.

With respect to interactive type II visualizations, new forms such as sunburst, force-directed graph, treemap, heatmap, parallel coordinates, etc., have been generated on a regular basis. Type II visualizations are designed such that larger amounts of information can be presented to the user and each type is generated to emphasize particular features of the underlying data set. These visualizations are therefore created to serve a precise purpose and convey a specific message. This allows for insights to emerge that would otherwise have remained hidden. For example, in fraud detection small amounts of continuous money-outflow regularly stay unrecognized because on an aggregated reporting level they are hardly visible. However, through a Sankey chart or parallel coordinates every single transaction can be plotted, allowing for unauthorized payments to be detected with a considerably enhanced probability. These charts show connections and therefore visually indicate problems with missing or wrong classifications. However, these advantages also lead to implications for perceptual processing. It has been hypothesized that type II visualizations increase the need for domain knowledge during the sense-making process. Due to the lack of experience with interaction techniques and the visualization in use, information processing is considered to be impaired due to the increased risk of inducing a state of information overload through the complexity and breadth of the displayed data. Additionally, showing the full bandwidth of data can lead to an overlap of data points in the visual representation and further impair information processing.

The mentioned negative effects of type II visualizations have led to a second focus, namely, type I visualizations, which are extended to an interactive form. More precisely, the user can work with visualization techniques such as column charts but has the possibility to add filters, drill-down options or linking techniques in order to change or reduce the underlying data set. By actively clicking, scrolling and filtering the data, the user gains a deeper understanding of the relations within the data set. In this context, the users can rely on already known and broadly used visualization options while being allowed to interact with the data set. This approach reduces the load imposed on the decision maker because both the visual format and the data volume are presented immediately and stay the same. Through interaction, however, access to the underlying and bigger data set is granted. Interactive type I visualizations provide the user with a general overview before using interaction techniques to drill-down or filter for profound details. The process of interaction is controlled by the user and only applied if he or she is interested in further details or is able to capture anomalies as well as outliers in the presented overview. Negative aspects associated with this approach deal with the level and the technique of aggregation. Aggregation can lead to dangerous misinterpretation by relying on sums and averages, which can drastically reduce the likelihood of detecting anomalies and outliers as well as increasing the risk of hiding interesting relationships within the data set. In the context of data exploration in particular, aggregated data should be avoided due to unknown relationships within the given data.

2.1.3 Interactive visualizations

Both of the above-mentioned approaches have one commonality: interaction. With interaction, a limited amount of data is visible on the screen but at the same time, the user has the possibility to explore the whole data set. Allowing interactive features means providing control to the user by adjusting the values and properties according to their needs. Being allowed to choose what information to display and how to display it can increase understanding and comprehension. Dilla et al. summarize interactive visualization as being an "on demand visualization process that allows decision makers to navigate to selected data and display it at various levels of detail and in various formats". The user or decision maker can individually determine the sequence with which they want to explore the data. Interactive information visualization supports explorative data analysis to identify patterns and generate hypotheses. Moreover, the evaluation process can start according to Shneiderman's mantra: "overview first, zoom and filter, then details-on-demand" and therefore mitigate problems in connection with information overload.

2.2 Cost of domain specific interactive visualizations

2.2.1 Human-related factors

In order to successfully interpret visualizations in a Big Data context and to arrive at comprehensive insights, the user needs the expertise to decode the visualization and to use the provided interactive skillset. As explained before, during the sense-making process, an internal representation is created, which at best corresponds to the external representation. This internal representation stands for the optimum solution a user can think of when confronted with a particular problem. Consequently, the internal representation is dependent on similar task-related performances that the user has experienced previously. Visualizations that were helpful in a similar situation are considered as options. Therefore, an internal representation of a specific visualization can only exist if the user has experience with this particular visualization.

The importance of experience can further be explained by examining cognitive load theory. The theory states that one needs schemes stored in long-term memory in order to process information effectively and efficiently. In other words, the higher the experience of the user with a specific visualization, the better the construction of the corresponding schema in the user's long-term memory and the higher the probability that the appropriate visualization pops up as an internal representation. If no schema exists, processing is inhibited and users do not feel sufficiently supported, and consequently tend to dislike or even oppose the proposed visualization options.

Further, in order to create schemas for future processing, cognitive effort needs to be directed toward learning and rehearsal. On the one hand, this means that the user needs to be confident that directing cognitive resources toward learning is worth the effort (the costs outweigh the benefits); on the other hand, additional investment costs are incurred for sufficient training and support in the initial implementation phases entailed. Visualizations as well as possibilities of interacting with specific visualizations need to be explained and presented. Strategies for efficient and effective learning from cognitive load theory can be applied in this context (e.g. worked examples).

2.2.2 Technology-related factors

From a more technical perspective, Big Data is the collection of large data sets, which can also show a great diversity of data types. The biggest challenge is the efficient use of semi-structured and unstructured data sources (e.g. text, image and video). In this context, a study conducted by IBM showed that the integration of various data sources is on the rise and already common practice, especially when it comes to geo-location based and sensor-based data. Unlike structured data, which can be processed rather easily with traditional relational database management systems (ERP), semi-structured or unstructured data require specific tools for comprehensive data preparation (e.g. parsing, indexing) and analysis. Subsequently, the challenge arises through linking multiple data sources that were initially used as stand-alone silos. Consequently, the physical merging of various data sources calls for adequate technical support. Using multiple data sources therefore not only increases the complexity during decision-making processes but also calls for investment in data storage technologies and analytic tools.

When looking at the annually repeated study conducted by Gartner, we can also observe a change in the offered front-end products. New tools for an easy integration of various sources, as well as an adaption of reporting and planning practice, are offered by almost all major players (e.g. Microsoft, SAP, Oracle). With the increased usage of these tools by practitioners, the market is currently shifting from traditional ERP systems and their related standardized and static reporting practices toward online-platforms and self-service. Furthermore, tools also allow for the integration of interaction and type II visualizations. Easy access and a widespread utilization of such tools could drastically increase the familiarity and experience with these new visualization types.

2.3 Hypotheses development

As discussed in the theoretical background, two competing strategies can be associated with the use of Big Data: interactive type I and interactive type II visualizations. From a human-related perspective, it seems reasonable to keep using type I visualizations. By doing so, lower cognitive effort needs to be invested during analytical decision-making processes. Problems with visualizations and especially with type II visualizations emerge if no schemas are available for reading in long-term memory. The large amount of data in combination with the unusual layout can lead to information overload and consequently impair processing possibilities; as a state of information overload is perceived as stressful and unpleasant, the consequence should be a reduced perceived usability of type II visualization. This leads to the following hypotheses:

H1.

There is no difference in use between interactive type I and interactive type II visualizations.

H2.

The lower the use of type II visualizations, the lower their perceived ease of use (EoU).

Interaction is an essential part of the sense-making process and enhances the user's processing capabilities. Multiple options have been developed such as filtering, zooming, distorting, as well as linking and brushing. However, research also suggests that too much and too complex (e.g. zooming, drill-through, and multiple views of the same data) interaction techniques can negatively affect users. This is the case because the process of interaction is a rather new and challenging concept. Dealing with large amounts of options again increases the risk of information overload. Therefore, in the first stages of implementation the focus should be placed on simple interaction techniques which have also been used in previous reporting systems (e.g. filtering, drag and drop). As most companies are still in the early stages of implementing interactive visualization techniques, simple ones should result in higher performance and subsequently in higher preference and usage:

H3.

There is no difference in use between simple and advanced interaction techniques.

H4.

The lower the use of interaction techniques, the lower their perceived EoU.

Being able to interpret visualizations and use interaction techniques in an efficient and effective manner has a significant influence on preferences and subjective assessment. If no schema is available, either information processing is inhibited or one needs to direct additional cognitive resources toward processing and the creation of schemas. Directing additional resources, however, demands higher engagement and a motivated user. Confirming this line of argument, in their observational study Grammel et al. showed a strong selection bias of users toward already known visualizations even if it meant sacrificing performance. They concluded that when confronted with both, types I and II visualizations, only those familiar with the newer options turned to type II visualizations. More extensive use of type I visualization therefore can be reasoned by referring to the fact that related schemas have already been created as they have been around much longer and they are integrated in daily life use. We therefore propose the following hypotheses (Figure 3):

Figure 3. Research model human-related barriers

H5a.

The lower the familiarity with type II visualizations, the lower their use.

H5b.

The lower the familiarity with type II visualizations, the lower their perceived EoU.

H6.

There is no difference in familiarity between type I and type II visualizations.

It has been shown that type II visualizations and interaction techniques are helpful in the early stages of data exploration, especially for analyzing semi-structured and unstructured data sets. For conventional data sets from structured and internal ERP systems, type I visualizations, which have been employed for centuries, are still a viable option. Consequently, it seems reasonable that type II visualizations as well as interaction techniques are only integrated if a high proportion of semi-structured and unstructured data is used:

H7.

The lower the number of various data sources, the lower the use of type II visualizations.

H8.

The lower the number of various data sources, the lower the use of interaction techniques.

Current studies suggest that Microsoft Excel is today's most popular generic data analytics tool. Microsoft Excel is widely available at low cost; however, it does not provide sufficient possibilities to integrate interactions and it fails to adequately support the creation and usage of type II visualizations. Although by relying on Microsoft Excel, type I visualizations can be created effortlessly, more advanced visualization techniques are either impossible to illustrate or formidable expertise is needed. Thus, as the majority of users are neither visualization nor data analytics experts, new visualization types remain largely unknown or implementation seems to be too vigorous.

Following this line of argument, it seems plausible that with more accessible and usable tools, the community of users could possibly be extended beyond domain experts and novices could be enabled to actively work and interact with data. Tools for visual analytics such as Microsoft Power BI, Tableau, or Qlik have been introduced to the market, allowing the integration of domain specific and interactive visualizations into current reporting practices. We, therefore, propose that with the use of such tools barriers could be reduced, leading to our final hypotheses (Figure 4):

Figure 4. Research model technology-related barriers

H9.

The lower the number of visualization tools used, the lower the use of type II visualizations.

H10.

The lower the number of visualization tools used, the lower the use of interaction techniques.

Course Introduction

Course Syllabus

Unit 1: Business Intelligence Overview

1.1: What is Business Intelligence?

Business Intelligence

Introduction to Business Intelligence

1.1.1: What Business Intelligence is Not

Frontiers of Business Intelligence and Analytics

Business Intelligence Dashboards

1.1.2: Business Intelligence vs. Competitive Intelligence

What is Competitive Intelligence?

1.1.3: From Systems Engineering to Business Engineering

Information Architecture Analysis

Systems Engineering

Business Engineering

1.2.1: Contemporary Applications

Business Intelligence in ERP

Improving Outcomes with Business Intelligence

How Businesses Use Information

1.2.2: BI Approaches for Each Lifecycle Stage

The Business Cycle

Big Data Analytics in Supply Chain Management

1.2.3: BI for Prediction

Goal-Oriented BI

Big Data Analytics

BI System Effectiveness

Data Mining Analytics for BI and Decision Support

1.3: The Future of BI

Future Trends in Information Systems

Internet Trends

Trends in Information Technology

Technology Trends in the COVID-19 Pandemic

The Future of BI

1.3.1: Adapting Business Models to Globalization and Technology

Global Business Strategies for Responding to Cultural Differences

Internationalization and the Need of Business Model Innovation

1.3.2: Maintaining the Firm-Centric Approach

Designing BI Solutions in the Era of Big Data

1.3.3: Incorporating Data from the Internet of Things (IoT)

The Internet of Things

The Cognitive Internet of Things and Big Data

Data Science in Heavy Industry and the Internet of Things

Causality and Variables

The Internet of Things is Revolutionary

Unit 1 Discussion

Unit 1 Study Resources

Unit 1 Review Video

Study Guide: Unit 1

Unit 1 Assessment

Unit 1 Assessment

Unit 2: BI as Business Support

2.1: Defining the Problem

Choice and Happiness

2.1.1: Framing Internal Client Discussions

Overview of Managerial Decision-Making

2.1.2: Drafting the Terms of Reference (TOR)

Defining the Scope of your Project

Developing Terms of Reference

2.1.3: Negotiating the Project Scope

Scope Planning

Negotiation

2.2: The Art and Science of Decision-Making

Decision-Making in Management

Decision-Making Processes in the Workplace

2.2.1: Thinking about Thinking

Experience vs. Memory

Evidence Logs and Metacognitive Logs

2.2.2: Use Analysis, or "Go with Your Gut"?

Problem Solving, Thinking, and Intelligence

Using a Heuristics Checklist

2.2.3: Decision-Making Approaches

Decision-Making Tools

2.2.4: Structuring Decision-Making Effectively

RAPID Decision-Making

2.3: Using Data to Make Decisions

Business Intelligence Dashboards

2.3.1: Everyday Data

2.3.2: Why Expert Judgement is No Better than Yours

Why You Think You're Right Even if You're Wrong

2.3.3: How Forecasting can Help Decision-Making