Big Data and Business Analytics Trends
2. Recent Developments in Big Data Technology
Big data emerged for business with the development of social media and weblogs. This has placed basic analytics and business intelligence (BI) activity on new data sources and offers deep, real-time analytics and business intelligence with operational integration. The volume of data generated in the digital world grows exponentially and has become difficult to manage using data warehouse technology. The massive amount of raw data generated using various data sources that require big data technology for analysis have been reported by a number of studies recently. For instance, Wal-Mart processes more than a million customer transactions hourly and stores 2.5 petabytes of customer data. Similarly, the Library of Congress collects 235 terabytes of new data per year and stores 60 petabytes of data. Over 5.5 billion mobile phones were used in 2014; each phone creates one terabyte of call record data yearly. In the mid-2000s, International Data Corporation (IDC), a premier global market intelligent film report reveals that digital universe which was 4.4 ZB in 2003, will grow to 44 ZB by 2020. In addition, a recent study by McKinsey reveals that the pieces of content uploaded to Facebook are in the 30 billion while the value of big data for the healthcare industry is about 300 billion. These growths are necessitated by technological changes, and both internal and external activities in electronic commerce (e-commerce), business operations, manufacturing, and healthcare systems. Moreover, recent development in in-memory databases has provided an increase in database performance and makes data collection through the Internet of things (IoT) and cloud computing facilities that provide persistent large-scale data storage and transformation achievable. The surge in data volume is driven by a number of technologies, which include:
i. Distributed computing: Big data in large-scale distributed computing systems, which is based on open-source technology, are providing direct access and long-term storage for petabytes of data while powering extreme performance.
ii. Flash memory in solid-state drives allows computers to become universal. It delivers random-access speeds of less than 0.1 milliseconds unlike disk access of 3 to 12 milliseconds. There is a high possibility that future big data solutions will use a lot of flash memory to improve access time to data.
iii. Mobile devices: Which represent computers everywhere, create much of the big data, and equally receives outputs from big data solutions.
iv. Cloud computing: This created an entirely new economy of computing by moving storage, databases, services, into the cloud and offers great access for rapidly deploying big data solutions.
v. Data analytics: This is a multistage approach that includes data collection, preparation, and processing, analyzing and visualizing large scale data to produce actionable insight for business intelligence.
vi. In-memoryapplications: These are significantly increasing database performance.
A huge percentage of these data for big data analytics is unstructured data derived from various data sources and applications such as text files, weblogs, and social media posts, emails, photo images, audio, and movie. Big data are meant to handle and manage unstructured data using key-value pairs. The concept of big data is defined by Will Dailey and Gartner. Dailey defined big data as, "a supercomputing environment engineered to parallel process compute jobs across massive amounts of distributed data for the purpose of analysis". He viewed big data as Global Data Fabric in action and the Centerpiece for the entire biosphere of modern computing. The Global Data Fabric idea shows how big data creates strong connections among institutions and enables them to work as a team. On the other hand, Gartner defined big data as data with high-volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making. There are various areas that big data analytics have been actively implemented for developing effective business decision making.
For example, a solution can be developed to tie customer/merchants bank verification number (BVN) and subscriber identification module (SIM) registration details to a unique digital identity. The solution will utilize the unique digital identification number (id) and stream mobile payment transaction data through a mobile device into a big data repository. The collected data are continuously monitored and standard machine learning techniques can be applied to discover if there is an occurrence of fraudulent or false payment alert from a customer to a merchant. Such happening would trigger a warning alert that could be shared with their mobile operators, and the merchant's bank, possibly even before the merchant releases his product. At the mobile operator end, the Sim registration record and Global Positioning System (GPS) technology can be used to create the customer's crime chart and alert the police for the offender's arrest. At the back end, the intelligent agent model running in the bank application would trigger a warning alert to the merchant to ignore such a transaction request.
While at the big data repository, all of this data can then be mapped to other data, such as network failure log, failed payment transaction, technology awareness data and wrong debit record. These can undergo further analysis to understand users experience and ascertain the root cause of low acceptance of mobile money by merchant across the country. The information could then be used to develop an intelligent business model and enable policy that will build merchants and customers trust for mobile money payment. This, in general, will rapidly help actualize the government initiative of a cashless society.
Big data are characterized by various vectors as outlined by Gartner and shown in Figure 2 below.
Figure 2. The Gartner's Vector model.
These vectors include volume, variety, velocity, veracity, and value. The big data volume focuses on the size of data set generated through various applications and sources and are growing at the rate of megabytes to petabytes. Variety aims at the heterogeneous nature of data that constitute big data. These include textual data, social media data, traffic information, health-related data, and other multimodal data. Velocity refers to the speed and dynamic nature of the data collection process and how to generate these data in real-time. Furthermore, veracity depicts the reliability of data sources and if the sources of data generation can be trusted. Finally, the value of big data shows the insight and hidden values that can be discovered from a large amount of dataset.
These vectors made it challenging for traditional data warehouse technology to handle huge data volumes of hundreds of terabytes. Furthermore, big data is not quantifiable, not the same for all companies, and does not depict better data. There is no quantifiable amount of data that determines whether your data met some artificial thresholds. The size of big data varies from organization to organization. Bigger data is not necessarily better data, but data usually is always better than no data. Accordingly, big data analytics provide hosts of great new tools including business analytics for visualizing and manipulating data insights. This makes it easy to visualize data into charts, graphs, models, and 3D. Therefore, big data analytics is a collection of tools and techniques aimed at handling a large volume of unstructured data that is beyond the capability of the traditional database system. Big data analytics solutions help the organization see changes in their business and innovate in real time. Different companies have different use cases and obviously different data. A solution that works for one company may be ineffective or completely wrong for another. While it is valuable to benchmark others, it is necessary to understand the motivations that drive their technology choices and the analytics they use to capture the true sensitivity of their businesses. Replication of solution is, therefore, necessary where it makes sense, but most importantly understands your business drivers for the application of big data.
Recent analyses show that big data giants like Google, Facebook and Twitter have used big data analytics effectively. Google indexes the entire internet for rapid Google searches and was said to process 24 petabytes of data per day in 2009. It offers cloud storage (Google Drive) and big data solution with Google Big Query. Moreover, Google performs machine learning and analytics on massive data sets (think reverse image search and voice recognition). With their rapid growth, they continue to be the world's leading search engine. On the other hand, Facebook and Twitter each store information on over a billion users. There are hundreds of millions of shares, likes, tweets, image posts, etc., a day that must be tracked. They use machine learning tools and algorithms to recommend friends and display trending topics. Their estimated revenue for 2014 was $12.5 billion, for Facebook and Twitter made $1.4 billion respectively.
Other businesses that have successfully implemented a big data analytics framework are Wal-Mart and American Express. Wal-Mart uses big data and machine learning to improve product searches and recommendations. The adoption saw its purchase completion rate increased by 10-15 percent. American Express analyzes its big data to predict customer churn and identify 24% of Australian accounts that will close within four months. Macy's adjusts product pricing in real time for millions of items. BancaCarige implemented IBM® DB2® Analytics Accelerator on a new IBM Enterprise® EC12 that enabled rapid query response times. This helps over 1000 business users to get fast access to vital insights. The positive results derived from big data analytics by various business organizations have seen the development of various tools to aid organizational big data analysis. In this paper, these tools are discussed in Section 4, with their strengths and weaknesses outlined to aid organizations' choice of tools for their data analysis.
Analytics involves the use of statistical techniques (measures of central tendency, graphs, and so on), information system software (data mining, sorting routines), and operations research methodologies (linear programming) to explore, visualize, discover and communicate patterns or trends in data. For example, weather measurements collected from metrological agencies can be analyzed and use to predict weather pattern. Furthermore, analysis of business data held the key to the development of successful new products. Analytics process in a big data world reveals how to tap into the powerful tool of data analytics to create a strategic advantage and identify new business opportunities. It has wide applications which include credit risk assessment, marketing, and fraud detection. There are many types of analytics approaches, and these can be categorized as:
i. Descriptive analytics: This is a simple statistical technique (graph) that describes what is contained in a data set or database. Descriptive statistics, including measures of central tendency (mean, median, mode), measures of dispersion (standard deviation), charts, graphs, sorting methods, frequency distributions, probability distributions, and sampling methods. The result of this process can be used to find possible business-related opportunities. For example, the smartphone ownership bar chart can be deployed to show the number of users that own smartphones for an IT firm that wants to determine the market for their mobile payment app based on phone ownership level.
ii. Predictive analytics is an application of advanced statistical, information software, or operations research methods to identify predictive variables and build predictive models into a descriptive analysis. The results here predict opportunities in which the firm can take advantage to improve their products and services. For instance, multiple regression can be used to show the relationship (or lack of relationship) between ease of use, cost, and security on merchants' acceptance of mobile money payment. Knowing that relationships exist helps explain why one set of independent variables influences dependent variables such as business performance.
iii. Diagnostic analytics uses the analysis of past data to ascertain the cause of certain events. Therefore, diagnostic analytics augments descriptive analytics by asking why certain events occurred using the patterns in the collected data. The diagnostic analytics process is effectively utilized in machine health monitoring and prognosis, fault detection and maintenance.
iv. Prescriptive analytics deploys the power of decision science, management science, and operations research methodologies (applied mathematical techniques) to make the best use of allocated resources. Resources are allocated to take advantage of the predicted opportunities. For example, a department store that has a limited advertising budget to target customers can use linear programming models and decision theory to optimally allocate the budget to various advertising media. Linear programming (a constrained optimization methodology) has been used to maximize the profit in the design of supply chains.
These analytic approaches can be used independently or in combination by an organization to provide information for decision making. For instance, Marist school based in the USA implemented an open source analytics platform from Pentaho to identify students who may be at risk of dropping a class and intervene in time to help them complete the course successfully. The process works by aggregating basic student data, such as GPAs, SAT scores, student's addresses, and other demographic data, and then combine this information with course-specific data, such as how often students submit assignments and engage with instructors through online forums. The information is analyzed through predictive modeling and data mining, and the outcome presented an accurate picture of who is likely to drop a particular class. In order to assist the students at risk, prescriptive modeling is applied to give insight on how the instructors may prevent the foreseen occurrences and use an approach that will engage the class as a whole.
Analytics, business analytics (BA), and business intelligence (BI) are often used interchangeably in business literature and they convert data into useful information. However, they differ in purpose and methodologies used for each of the descriptive, predictive, diagnostic and prescriptive analytics. Analytics can involve any one of the four types of analytics processes. For clarity, definitions of these terms are presented below:
- Business analytics (BA): According to a recent paper, business analytics is beyond plain analytics. It sequentially applies a combination of descriptive (what is happening), predictive (why something is happening, what new trends may exist, what will happen next), diagnostic (why did it happen) and prescriptive analytics (what is the best course for the future) to generate new, unique and valuable information that create an improvement in measurable business performance as shown in Figure 3. Analyzed data can be sourced from business reports, database, and business data stored in the cloud. Business analytics processes include reporting results about business intelligence and in addition seeks to explain why the results occur based on the analysis.
- Business intelligence (BI): This focuses on querying and reporting and can include reported information from a business analytics (BA) approach. Moreover, business intelligence seeks to answer questions such as what is happening now and where, and also what business actions are needed based on prior experience.

In the past, business analytics and business intelligence were used for structured DBMS-based content to report and understand what happened in the past. With the growth of big data, they can be used alongside big data analytics techniques to provide opportunities for extracting actionable insight from data by using analytical processes and tools. Their implementation is seen in structured data analytics, text analytics, web analytics, network analytics, and mobile analytics [29,30]. Moreover, the volume and velocity of big data present an opportunity to use big data and analytical tools to predict the future and make new discoveries.
Business demand for business analytics and business intelligence has been demonstrated by a number of studies as shown in recent studies. Moreover, successful business intelligence and analytics applications have also been reported in a broad range of industries, from health care and airlines to major IT and telecommunication firms.
Most successes recorded by organizations that deploy big data analytics are largely noticed in developed countries. This is perhaps why huge successes have not been seen for businesses in a developing country. International Data Corporation (IDC) in 2011 showed that business analytics was second Information Technology (IT) priorities for large enterprises that year. An online survey conducted by asserted that among 930 businesses across the globe in various industries, provides insight into the current state of business analytics in today's organization. The research findings highlighted the fact that most organizations still rely on traditional technology and depend on spreadsheets for business analytics. There is moderate growth in the use of business analytics within companies. Nonetheless, it is narrowly used within departments or business units, and not integrated across the organization. For some organizations, analytics are used as part of the decision process at varying levels. In addition, organizations are in search of analytics that will primarily help in reducing costs, improving the bottom line, and managing risks. Meanwhile, fear of data accuracy, consistency, and even access is a challenge in the adoption or use of business analytics. Many organizations lack skills to implement analytics and some businesses that attempted it lack the knowledge to apply the results. Companies that have built an "analytics culture" are reaping the benefits of their analytics investments. Therefore, bridging the knowledge gap for the organization to apply big data and business analytics in their organization is vital for effective decision making and business success. To provide this knowledge gap, this paper also discusses the various teams for big data analytics framework in Section 5. These teams include business expert, big data analyst, big data architecture and Hadoop operators and engineers.