Big Data gives organizations unprecedented opportunities to tap into their data to mine valuable business intelligence. Read this study to learn how businesses can utilize this analytics framework to analyze consumers' product preferences, leading to more effective marketing and production strategies.
The Big Data Stream Analytics Framework
An overview of the
proposed framework that leverages Big Data Stream Analytics for online
Sentiment Analysis (BDSASA) is depicted in Figure 2. The BDSASA
framework consists of seven layers, namely data stream layer, data
pre-processing layer, data mining layer, prediction layer, learning and
adaptation layer, presentation layer, and storage layer. For these
layers, we will apply sophisticated and state-of-the-art techniques for
rapid service prototyping. For instance, Storm, the open-source
Distributed Data Stream Engine (DDSE) for big data is applied to process
streaming data fed from dedicated APIs and crawlers at the Data Stream
Layer. For instance, the Topsy API is used to retrieve product related
comments from Twitter.
The Storage Layer leverages Apache HBase
and HDFS for real-time storage and retrieval of big volume of consumer
reviews discussing products and services. The Stanford Dependency Parser
and the GATE NER module are applied to build the Data
Pre-processing Layer. Our pilot tests show that the size of the
multilingual social media data streams is within the range between 0.2
and 0.4 Gigabytes on a daily basis, and this volume is steadily growing.
For the feature extraction layer, the Affect Miner utilizes a novel
community-based affect intensity measure to predict consumers' moods
towards products. Among the big six classes i.e., anger, fear,
happiness, sadness, surprise, and neutral commonly used in affect
analysis, we focus on the anger, fear, sadness, and happiness classes
relevant for product sentiment analysis. The WordNet-Affect lexicon extended by a statistical learning method is used by the Affect Miner.
Since social media messages are generally noisy, one novelty of our
framework is that we reduce the noise of the "affect intensity" measure
by processing messages really related to consumers' comments about
products or services.
Previous research employed the HMM method
to mine the latent "intents" of actors. We exploit a novel and more
sophisticated online generative model and the corresponding distributed
Gibbs sampling algorithm to build our Latent Intent Extractor that
predicts the intents of consumers for potential product or service
acquisitions. The Sentiment Extractor utilizes well-known sentiment
lexicons such as OpinionFinder to extract the sentiment words embedded
in consumer reviews. Finally, overall sentiment polarity prediction for
consumer reviews is performed based on a novel inferential language
modeling method. The computational details of this inferential language
modeling method for context-sensitive sentiment analysis will be
explained in the next section. The overall sentiment polarity against a
product or a product category is communicated to the user of the system
via the presentation layer. Different modes of presentations (e.g.,
text, graphics, multimedia on desktops or mobile devices) are supported
by our framework.
Figure 2. An overview of the BDSASA framework.

In addition, a novel parallel co-evolutionary
genetic algorithm (PCGA) is designed so that the proposed prediction
model is equipped with a learning and adaptation mechanism that
continuously tunes the whole service with respect to possibly changing
features of the problem domain. The PCGA can divide a large search space
into some subspaces for a parallel and diversified search, which
improves both the efficiency and the effectiveness of the heuristic
search process. Each subspace (i.e., a sub-population) is hosted by a
separate cluster. Three fundamental decisions are involved for the
design a genetic algorithm (GA), that is, a fitness function, chromosome
encoding, and a procedure that drives the evolution process of
chromosomes. First, the fitness function of our PCGA is developed
based on a performance metric (e.g., accuracy of sentiment polarity
prediction). Second, since various components of the proposed service
should be continuously refined, there are multiple sub-populations of
chromosomes to be encoded and co-evolved simultaneously. During each
evolution cycle, the best chromosome of a sub-population (e.g.,
prediction features, social media sources, system parameters) is
exchanged with that of other sub-populations. Armed with all the
essential information, each chromosome of a sub-population represents a
feasible prediction, and its fitness can be assessed accordingly.