This case study provides insight into how a data warehouse was built for a firm in the financial sector using its existing Microsoft technology. It touches on the current form of "static reports" currently used within the company, which we have identified as problematic. This case study showcases a step-by-step method of how this DW is built. After reading, you should understand the theory and practical application of the DW approach. How would you apply a similar framework to a large department store chain's supply chain?
2. Literature Review
2.1. Data Capture/Acquisition
The acquisition component is the back end of the data warehousing system and consists of systems that have interface with the operational systems to load data into the DW. Data is first entered or treated by a daily business process that is based on Online Transaction Processing (OLTP) environment and stored in operational database, which may consist of common relational databases such as Oracle, SQL Server, MySQL, DB2, etc. Before data is loaded from the operational database and external sources into the DW, it needs to be processed through three main functions: extraction, transformation and loading.
In the first phase of extraction, data are extracted from the available internal and external sources. A logical distinction can be made between the initial extraction, where the available data relative to all past periods are fed into the empty DW, and the subsequent incremental extractions that update the DW using new data that become available over time. The selection of data to be imported is based upon the DW design, which in turn depends on the information needed for business intelligence analysis and decision support systems operating in a specific application domain.
The next step is the transformation phase that intends to improve the quality of the data extracted from the different sources, through the correction of inconsistencies, inaccuracies and missing values. Some of the major shortcomings that are removed during the data cleaning stage include inconsistencies between values recorded in different attributes that have the same meaning, data duplications, missing data and existence of inadmissible values. Other common operations inside the transformation phase includes the conversion of character sets (e.g., from EBCDIC to Unicode), normalization/denormalization of the data into the desired dimensional DW schema, the assignment of surrogate keys to the data and maintenance of slowly changing dimensions by detecting changes and taking the actions necessary to contain with these.
In the loading phase the cleaned data is loaded into the tables of DW. Typically, the traditional DW architecture model assumes that new data loading occurs only at certain times, when the warehouse is taken offline, and the data is integrated during a more or less lengthy time interval.