The Cognitive Internet of Things and Big Data

Site:	Saylor Academy
Course:	BUS610: Advanced Business Intelligence and Analytics
Book:	The Cognitive Internet of Things and Big Data

Printed by:	Guest user
Date:	Tuesday, July 1, 2025, 4:20 AM

Description

Read this paper to capture a more contemporary perspective on data architecture. It provides a detailed and in-depth challenge to the existing architecture. Also, it proposes a new architecture for the Cognitive Internet of Things (CIoT), which adds the human brain and big data to the mix.

Abstract
1. Introduction
2. Related works
3. Open research issues
4. Basic concepts of IoT
- 4.1. Generic IoT Architectures
- 4.2. Congnitive IoT
5. Data flow based CIoT
6. A new architecture for CIoT and big data
7. Comparative analysis of the CIoT and big data architecture with others
8. Conclusion

Abstract

Big data and the Internet of Things (IoT) are considered as the main paradigms when defining new information architecture projects. Accordingly, technologies that make up these solutions could have an important role to play in business information architecture. Solutions that have approached big data and the IoT as unique technology initiatives, struggle in finding value in such efforts and in the technology itself. A connection to the requirements (volume, velocity, and variety) is mandatory to reach the potential business goals. In this context, we propose a new architecture for Cognitive Internet of Things (CIoT) and big data. The proposed architecture benefits computing mechanisms by combining the data WareHouse (DWH) and Data Lake (DL), and defining a tool for heterogeneous data collection.

Source: Mohamed Saifeddine Hadj Sassi, Faiza Ghozzi Jedidi, Lamia Chaart Fourati, https://www.sciencedirect.com/science/article/pii/S1877050919313924
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

1. Introduction

Taking a look around us, technology has an impact on almost every object in the world and reach every domain. Hence, computing and communication technologies became the most potential era of the age. Every object related to the word "smart" can perform a high level of interaction not only with people but also with other smart things. Therefore, IoT appears to change our world and also to create more opportunities. This new paradigm aims at providing internet connectivity between physical as well virtual objects anywhere, anytime, and with anything. It refers to the world where network connectivity and computing capabilities, through sensors and other physical objects, provide interactions between devices to reduce human intervention. The increasing volume, variety, and velocity of data that are produced by IoT will continue to fuel the explosion of data. A second important implication of technology is that it gives people the power and responsibility to analyze data and make decisions on the basis of quantitative analysis.

These technologies require new architectures. Thus, enterprises will need to deal with the influx of data and analyze them in real time as they grow by the minute. Hence, big-data tools are capable of handling masses of data that are transmitted from IoT devices. This leads to the appearance of several architectures in order to find a solution for constraints and requirements of business. Therefore, we present our architecture that enables non-technical users to explore data and to choose technologies that fit well in their requirements. Through this insight, businesses may be able to gain an edge over their rivals and make superior business decisions.

The remainder of this paper is organized as follows: Section II defines the related works. Besides, open research issues are highlighted in the third section. In addition, we highlights the basic concepts of IoT which contains an IoT and CIoT architectures in fourth section. The fifth section defines data flow based CIoT related to data sources, data collection, data ingestion, data storage, data analysis. Moreover, we describe the relationship between the different data process. Furthermore, We present a new architecture for CIoT and big data architecture in the sixth section. Moreover, we explain how to deal with constraints. The seventh section describes comparative analysis of our architecture with other. In addition, we compare it with other existing architectures. Finally, concluding remarks are presented.

2. Related works

Recently, various systems exhibit similar properties but in different architectures. Therefore, we review the architecture that enhances the processing related to data collection from heterogeneous data sources such as IoT middleware for water management, Dynamic configuration of sensors using mobile sensor hub in internet of things paradigm, and Capturing sensor data from mobile phones using global sensor network middleware. Thus, the tool is capable to be a solution for the heterogeneity issue for data collection. Moreover, in Internet of food and farm 2020, even non-technical users are able to configure dynamic intelligent environment without technical assistance. In Dynamic configuration of sensors using mobile sensor hub in internet of things paradigm, the use of data center HDFS, many-sided frameworks and more flexible set of features facilitates to deal with larger volumes and different heterogeneous data sources. other architectures improve the processing related to ETL and business analysis. the choice of the data store is depending on the selected tool for data ingestion. Hence, moving the data ingestion tool in Big Data and the Internet of Things: Enterprise Information Architecture for a New helps for processing the complex data transformations. In addition, the high-speed links connect to enterprise data through a programmable interface to ease seamless resource management and control physical resources as in Four-Layer Architecture for Product Traceability in Logistic Applications. Furthermore, the architecture of Fog computing: A platform for internet of things and analytics is suitable for projects from small proof of concepts to large application. In Big Data and the Internet of Things: Enterprise Information Architecture for a New and Design and implementation of smart environment monitoring and analytics in real-time system framework based on internet of underwater, the time of the data and the real-time consideration for the analysis process is slow because of increasing the amount of data and using complex algorithms. In addition we had an several investigations related to cognitive IoT published by the research community. While these papers define technologies, tools, and platform for different IoT applications and architectures. Most of them let the user have a limited review without analyzing solutions for technical constraints of each architecture. Knowing which techniques and tools can fit well in data flow process is important. Hence, In A survey on Internet of Things architectures, Internet of things: architectures, protocols, and applications, Internet-of-things-based smart environments: state of the art, taxonomy, and open research challenges, Security and privacy in the Internet of Things: Current status and open issues, the authors discuss IoT architectures and classify the architectures by their domains. However, the process of data flow, data control and technical constraints have not been discussed.

Fig 1. Comparison for IoT architectures classification.

Other papers such as A survey on internet of things architecture, protocols, possible applications, security, privacy, realworld implementation and future trends, A survey of Internet-of-Things: Future vision, architecture, challenges and services, A survey on internet of things: Architecture, enabling technologies, security, A survey on trust management for Internet of Things,, Internet of things in industries: A survey, and Security and privacy challenges in industrial internet of things illustrate the services, layers and security requirements of cognitive IoT architectures without defining a deep comparison of other architectures in several domains that implement different frameworks and IoT projects. However, authors of On the security and privacy of Internet and Autonomic and cognitive architectures for the Internet of Things explain frameworks and the usage of them in different IoT solution. Also the, they present IoT projects for different architectures and presents a deep investigation on each security requirement as criteria for each project. Despite a lack of literature related to data flow architecture based cognitive IoT in these survey articles Internet of things: architectures, protocols, and applications, Autonomic and cognitive architectures for the Internet of Things, and A survey of technologies in internet of things, we can reasonably argue that investigation on the process of data flow is well suited for future developments to select the right tools, technologies, and methodologies against constraints. As shown in figure 1, the architectures in these survey articles were classified due to the domains, services, layers, IoT projects, wireless network, framework, and security. The identification of these domains will help in developing IoT solution. Moreover, we study and analyze the existing technologies, tools, and techniques from the related works and several survey papers in order to select the appropriate components for our architecture as can be seen in table 1.

Tablica 1. Mapping between data flow and existing technologies.

Data flow phases	Components and subcomponents of architecture	Examples of existing technologies, tools, and techniques
Data source	Structured	Retail, financial, ERP...
	Semi-structured	Web logs, documents, email...
	Unstructured	Image, video, web pages, audio, social media...
Data collection	Stream processing (Data in motion)	Apache Spark, Extrahop
Data collection	Batch processing (Data at rest)	MapReduce in hadoop framework, Apache Sqoop...
Extract Transform Load	Data ingestion	ETL: Apache Kafka, Apache Hive, Apache Spark, Apache Pig... ELT: Apache NiFi Middleware: REST, .NET, J2EE, CORBA, web services: (SOAP, WSDL, UDDI)…
	Data store	Data lake, Data warehouse, Cloud
	Data wrangling (Refined data,Trusted data, Data discovery )	Spark, Hadoop, Storm, RapidMiner, Mahout, Orange, Weka, DataMelt, KEEL, SPMF, Rattle...
Business analytics	Data analysis	Teradata, Teradata Aster, Spark SQL, Vertica, Ad-hoc queries (Apache Drill, Hazecast, SAP Hana), Birst, GoodData, MicroStrategy, SAP Lumira Cloud, Tibco, Spotfire, Cloud, Bime..
Service	Application	Smart devices
Service	Models	Histograms, Conceptual model

3. Open research issues

In spite of the fact that the architectures which have described in earlier section make cognitive IoT concept feasible, there are still some open research challenges. In particular, a large effort of research is needed to enhance the technical constraints for IoT solutions. There are still many research challenges in the area of data collection, data ingestion, data storage, and data analysis as dynamic heterogeneous resources, data-centric issues, scalability, reliability, interoperability, fault tolerance, context awareness, cost, security, privacy, and data quality. In the following, we discuss some of the technical challenges due to 3V (volume, variety, velocity) problems:

Volume: Data source generates the amount of data every millisecond. Hence, the volume of data has been increasingly high. For instance, smart environment features a number of computing and sensing devices that provide for a high volume of collected data. Thus, the space of data store will not be enough if the amount of data gathered from sensor increases. In addition, some data store can have an insufficient store space such as the Hadoop cluster that needs to have space for MapReduce jobs, other workloads, and for the data storage requirements. Therefore, data volumes coming from sensors or other data sources that can be huge and tend to grow can bypass the storage capacity of a Hadoop cluster. Hence, the system may shut down.
Variety: Under the explosive growth of data, many various forms of data appear from a huge number of sources. Hence, entropies finds it so difficult to deal with the raw, semi-structured, and structured data by traditional relational tools. Hence, semantic and syntactical interoperability between various sources of data from different systems is very challenging in IoT because of heterogeneity.
Velocity: It is a frequency that is generated by massive data, shared and captured into a small period of time which is very important for greater real-time use cases in architectures. The increase in the frequency of data generation or data delivery in big data has to be able to deal with high velocity. The traditional architecture is not capable enough to perform the analyzing of data in motion or to deal with streaming data. For example, the data ingestion from various devices to the database would be continuously moving at certain speed. Thus, the time of the data analysis process is getting slower because it's influenced by the amount of data and complex algorithms.

4. Basic concepts of IoT

In this section, we introduce the IoT paradigm. Our intention will be oriented on the generic IoT architecture as well on the concept of cognitive IoT paradigm and architecture. The IoT bundles many emerging technologies (sensors, actuators, semantics, context-aware computing, big data analytics, communication technologies, data lake, service management). together to build a new solution that revolutionizes our world. During the last years, the IoT has gained a lot of attention from academic researchers, industrial and enterprise stockholders due to the capabilities and the advantages that the IoT will offer accordingly many IoT definition and visions (oriented objects, oriented internet, oriented semantic) have been identified. The Internet of Things allows people and things to be connected anytime, anyplace, with anything and anyone, ideally using any path/ network and any service. We found that this definition is the most suitable one that covers and fits the broader vision of the IoT. After this brief IoT introduction, we present in the following subsection the most IoT sub-layers for IoT system. As mentioned previously that IoT handles many technologies, accordingly, IoT could be considered as an umbrella that supports these technologies and provides a relationship between them.

4.1. Generic IoT Architectures

The architecture of IoT system can be structured into three layers as presented in figure 2:

The perception layer: It handles the ability to perceive, detect objects, gather information, and exchange data with various devices through internet and communication networks. It senses the physical objects and obtains data from different devices as cameras, sensors, and RFID etc..
The network layer: This layer forwards collected data from the perception layer and transports them to the following layer using communication and internet technologies. As mentioned by Internet of things: Objectives and scientific challenges, this layer can be divided into two sub-layers which are the data exchange sub-layer that handles the transparent transmission of data and the information integration sub-layer which aggregates, cleans and fuses collected data, and extracts them from acquired data.
The application layer: It aims to create a smart environment. Hence, it receives the information and process content to deliver intelligent services to different users. The seamless integration between IoT environment with intelligence leads to the appearance of the cognitive IoT based on the evolution of pervasive and ubiquitous computing.

Fig 2. Generic IoT architecture.

4.2. Congnitive IoT

CIoT is the merge of cognitive computing technologies with collected data from connected devices. Consequently, the evolution of ubiquitous computing leads to heterogeneous infrastructure challenges. Through the objective context, IoT handles those challenges represented in context awareness by producing a smart system to achieve user requirement. Therefore, figure 3 presents cognitive IoT architecture. Cognitive IoT infuses intelligence into the physical world through physical objects. Therefore, the data produced by IoT devices and web data are sensing through a context management middle-ware. The objective context is used for the external intelligence of service while the subjective context is used to improve services to fit some specific spatio-temporal situations. Big data management provides a high level of data quality and accessibility for big data analytics. An intelligent service is required to ensure the monitoring and control of the system.

Fig 3. Cognitive IoT architecture.

5. Data flow based CIoT

The increasing volume, variety, and velocity of data produced by the Internet of Things will keep going to fuel the information exploding. Under the explosive growth of data, many various forms of data appear from a huge number of sources. In order to make entropies finds it so difficult to deal with the raw, semi-structured, and structured data, various system architectures are used to handle data flow from an input source to the outputs source that is desired and needed to understand the required data as presented in figure 4.

Fig 4. Process of data flow.

Figure 5 shows the functional process that illustrates data transformation from the data format of a source data system into the data format of a destination data system. Therefore in this section, we define basic components and transformations:

Data sources: Data is scattered over the network and without any understandable documentation, So it can't be directly explored and used to extract the expected result.
Data collection: It also known as data sensing or data acquisition is dealing with the collection of data (actively or passively) from the device, system, or as a result of its interactions. For data collection, critical information needs to be available at the right point in a timely manner, and in the right form as mentioned by Karnouskos et al. The data gathering from different sensory devices with the collection of environmental data or identification of real-world objects needed to incorporate into the system. Therefore, there are two subcomponents for collecting data which are batch processing that aims at acquiring data at rest and stream processing that aims at acquiring data in motion.

Fig 5. Process of data collection.

Extract, Transform and Load ETL: BI system has one of the most important components which are extract, transform, and load process. It allows manipulating data from various data sources, to reformat and to store the data in a repository.
Data ingestion: The collected data needs to be transformed from input source to output source.
Data Storage: The huge volume and velocity of collected data necessitate to store data in a data store that can handle all the collected data.
Data wrangling: It is the process of mapping raw data to another format which aids the required data to be identified, extracted, cleaned, and integrated. Hence the output data will be a suitable data set for exploration and analysis.
Business analytics: The business analytics will be able to explore data in preparation for data modeling such as service. It includes tools that produce reports and desired outputs to help managers to make decisions. Through tools that produce reports or desired and needed outputs to figure out the data required. It will be able to explore data and to offer a single version of truth, increase competitive ability and customer satisfaction and facilitate the alignment with business strategy. This level includes tools that help to produce reports, scoreboard and with

Fig 6. Process of ETL.

high data quality. These data have to be precise, subject-oriented, complete, accessible, and time-dependent with respect to security and confidentiality constraints.

Data analysis: This component prepares information, calculate intermediate results, and creates insight from the output of data wrangling.
Service: It helps to capture the desired data and their relationships, evaluate and create an intelligent data modeling to provide a visual representation of the data.
Data distribution: It distributes and makes information and analyzes results available to end-user through a conceptual model or an application interface.

Fig 7. Process of data flow from business analytics to service.

During the whole process for data flow, we should take into consideration the following constraints as presented in figure 8:

Fig 8. Architecture of data flow from data sources to services.

Metadata: It describes the provenance of each data item such as the different data processing, transformation phases, and analysis techniques.
Data quality: It handles data quality problems in all phases. For example, the quality issues while combining data from various sources and the data quality in data sets need to be enhanced by correcting errors in the data, completing empty attributes, identifying and eliminating noise, and preventing the data store to became a data swamp.
Privacy and security: It uses all methods and techniques to ensure the privacy of the stored data and prevent data loss while enabling data protection during all steps.

6. A new architecture for CIoT and big data

In this section, we present a new architecture for CIoT and big data. Thus, we map the system components to more concrete oriented system components. Hence, we present the new architecture for CIoT and big data.

Fig 9. New architecture for CIoT and big data.

We analyze the data processing within proposed existing technologies at the related work section. This aims to enhance the data collection processing. Thus, we must to deal with the variety of data and transmit collected data to a structured data that can be analyzed. Therefore, in the first and second layer, a tool is required to collect data from various sources using smart device features such as human sensors, user input, documents, environmental sensor, localization, and movement. This tool can extract and recognize data from unstructured data which is a big challenge for smart devices. However, it has to deal with unstructured data like image recognition, text analysis, and speech recognition. Hence, an integrated methods into the tool through algorithms can make the device thinking like a human being and recognizing what in the collected data. In the knowledge and decision making layer, the output data from the tool and other data can be extracted and loaded into a central storage such as "DL". Besides, it will be transformed into the dataset. Furthermore, an ELT is able to store it into the Data WareHouse (DWH). Data cube can send this data to a model. In addition, the data analysis makes information and analysis results available to end-user through an intelligent service. The architecture defines a solution for heterogeneous data and 3V requirements while relying on the tool and existing technologies. The tool can handle the variety of collected data from different sources. It is able to transform different types of data into one file with one format. It draws connections between different sources and provides a file that collects all the data. Moreover, The utility of Data Lake (DL) as a data store in highlights a solution for many important challenges such as the 3V requirements. Data can be ingested as they are. Hence, the user can add DL with their native format. This leads to ensuring the scalability of the architecture. Thus, DL can handle unlimited quantity of data. It can produce data sets to integrate with the traditional DWH which is well-situated to deal with the needs of automated reporting. However, DL has an analytic power such as the flexibility of their analytics requirement. Otherwise, the creativity of DWH is limited and lacks the required detail for their models. Therefore, merging both of DWH and DL improve the enterprise performance. DWH is responsible of the automated standardized analysis and the reporting capabilities while DL deals rapidly and efficiently with the complex analysis.

7. Comparative analysis of the CIoT and big data architecture with others

In this section, we solve open research issues that are cited in third section and we compare our architecture with others. Accordingly, In addition, our architecture is suitable for all projects that deal with the large quantity of data such as the architecture of Reality mining in sensor-based mobile-driven environments. In addition, users can configure the tool without any technical assistance. Hence, all users are able to configure dynamic intelligent environment without technical assistance like in Security and privacy in the Internet of Things: Current status and open issues. The tool has to contain friendly interfaces and straightforward services to make all the user feeling comfortable. In addition, we propose automated functions such as sensors to locate and track the user movements. Moreover, we propose simple methods such as disabling the unneeded functions of the service in order to improve energy efficiency. Our mechanism is independent. Thus, Its components do not depend on each other. For example, if the user does not ingest the file into the platform, the system will show empty results. If an error occurs while the system is working, it will ask to reload the last process and not the whole data flow process. Furthermore, the tool can collect all type of data and adds values and knowledge to raw of collected sensor data. Our architecture solves the real-time consideration by sending the collected data to be ingested in the platform. Hence, the user can get the data required by sending requests for data feeds. Likewise in Fog computing: A platform for internet of things and analytics, we solve the speed of data flow in the architecture by choosing some existing technologies such as ELT to ingest, extract, load, and transform data in high speed. In addition, we use a programmable interface to ease seamless resource. In addition, it can support a huge number of simultaneous and paralleled events such as generating data from location sensor while generating data from a record. It is capable of collecting all the varieties of data from different sources such as Fog computing: A platform for internet of things and analytics and Four-Layer Architecture for Product Traceability in Logistic Applications Big Data and the Internet of Things: Enterprise Information Architecture for a New. In addition, we tried to find another solution without moving the data ingestion tool as illustrated in A survey on internet of things architecture, protocols, possible applications, security, privacy, realworld implementation, and future trends. Thus, we combine both of DL and DWH to improve the complex data transformations and standardized analysis capabilities. In DL platforms such as Kylo and Zaloni that are built on Apache Hadoop and Spark, the stored data sych as in Comma-Separated Values (CSV) file loaded into a Hive table which makes data processing on Hadoop easier by providing a database query interface to Hadoop. Furthermove, we use the CIoT and big data architecture as a high level solution for computer-aided software engineering tool in order to deal several constraints such as configuration, dynamicity, context-aware computing, type of data, object capabilities, real-time consideration, fault tolerance, energy management, self-system, and business models interaction.

8. Conclusion

In this paper, at first, we have studied and analyzed the existing technologies, tools, and techniques from the related works and several survey papers. This allowed us to select the appropriate technologies for our architecture. In addition, we have surveyed the most important aspects of the IoT and CIoT. Unlike other IoT papers, this paper focused on limitations of existing works. Hence, it could help future IoT researchers who would create an architecture based cognitive IoT concept that fits well with the scalability of the business requirements. Moreover, the proposed solution merges DL and DWH in order to solve the limitation due to velocity, variety, and volume. Thus, the quantity of collected data (structured, unstructured, semi-structured) from several device features. Moreover, the tool solve the speed of data flow. In the future, we intend to enhance the tool by considering new methods such as deep learning for extracting and recognizing the data from data sources that could be used to improve the data collection; in addition to real cases of the implementation of the approach.