The Cognitive Internet of Things and Big Data
5. Data flow based CIoT
The increasing volume, variety, and velocity of data produced by the Internet of Things will keep going to fuel the information exploding. Under the explosive growth of data, many various forms of data appear from a huge number of sources. In order to make entropies finds it so difficult to deal with the raw, semi-structured, and structured data, various system architectures are used to handle data flow from an input source to the outputs source that is desired and needed to understand the required data as presented in figure 4.
Fig 4. Process of data flow.
Figure 5 shows the functional process that illustrates data transformation from the data format of a source data system into the data format of a destination data system. Therefore in this section, we define basic components and transformations:
- Data sources: Data is scattered over the network and without any understandable documentation, So it can't be directly explored and used to extract the expected result.
- Data collection: It also known as data sensing or data acquisition is dealing with the collection of data (actively or passively) from the device, system, or as a result of its interactions. For data collection, critical information needs to be available at the right point in a timely manner, and in the right form as mentioned by Karnouskos et al. The data gathering from different sensory devices with the collection of environmental data or identification of real-world objects needed to incorporate into the system. Therefore, there are two subcomponents for collecting data which are batch processing that aims at acquiring data at rest and stream processing that aims at acquiring data in motion.
Fig 5. Process of data collection.
- Extract, Transform and Load ETL: BI system has one of the most important components which are extract, transform, and load process. It allows manipulating data from various data sources, to reformat and to store the data in a repository.
- Data ingestion: The collected data needs to be transformed from input source to output source.
- Data Storage: The huge volume and velocity of collected data necessitate to store data in a data store that can handle all the collected data.
- Data wrangling: It is the process of mapping raw data to another format which aids the required data to be identified, extracted, cleaned, and integrated. Hence the output data will be a suitable data set for exploration and analysis.
- Business analytics: The business analytics will be able to explore data in preparation for data modeling such as service. It includes tools that produce reports and desired outputs to help managers to make decisions. Through tools that produce reports or desired and needed outputs to figure out the data required. It will be able to explore data and to offer a single version of truth, increase competitive ability and customer satisfaction and facilitate the alignment with business strategy. This level includes tools that help to produce reports, scoreboard and with
Fig 6. Process of ETL.
high data quality. These data have to be precise, subject-oriented, complete, accessible, and time-dependent with respect to security and confidentiality constraints.
- Data analysis: This component prepares information, calculate intermediate results, and creates insight from the output of data wrangling.
- Service: It helps to capture the desired data and their relationships, evaluate and create an intelligent data modeling to provide a visual representation of the data.
- Data distribution: It distributes and makes information and analyzes results available to end-user through a conceptual model or an application interface.
Fig 7. Process of data flow from business analytics to service.
During the whole process for data flow, we should take into consideration the following constraints as presented in figure 8:
Fig 8. Architecture of data flow from data sources to services.
- Metadata: It describes the provenance of each data item such as the different data processing, transformation phases, and analysis techniques.
- Data quality: It handles data quality problems in all phases. For example, the quality issues while combining data from various sources and the data quality in data sets need to be enhanced by correcting errors in the data, completing empty attributes, identifying and eliminating noise, and preventing the data store to became a data swamp.
- Privacy and security: It uses all methods and techniques to ensure the privacy of the stored data and prevent data loss while enabling data protection during all steps.