3. Big data Analytics Platforms

Big data technology consists of a large number of open source software components, majorly Apache project, available for use in constructing a big data platform. This software is designed to work in a distributed and cloud computing environment. However, common problems faced by computer scientists in designing efficient and effective big data computing platforms include, how to move large volumes of transactional data across the back pane; how to move large volumes of static data across the network; how to process large volumes of data very fast; how to ensure even job scheduling and fair usage of resources; how to handle errors interrupting other jobs; and how to coordinate and optimize resources. Consequently, earlier solutions were done at the hardware level, which significantly increased the cost. Recently, Hadoop was designed as an open source framework to handle big data analytics through the batch processing approach. It was designed on the principles which include less dependency on expensive high-end hardware platforms and infrastructure, parallel processing to reduce computing time, not moving the data from disk to the central application to be processed, embrace failure, build applications with less dependent on infrastructures and utilization of flexibility of Hadoop. These design principles helped in cost reduction, platform optimization, fast processing and achieving efficiency. In this section of the paper, the Hadoop ecosystem that enables implementation of big data and business analytics is explained. We outlined the structure, components, and tools that provide effective and efficient processing of big data.