MSBI : BI # 31 : Business Intelligence – Tools & Theory # 23 : Introduction to Data Warehousing #1 : Introduction to Data Warehousing & Data Mart

Hi Folks,

This post is part of Series Business Intelligence – Tools & Theory

Currently running topic for this series is listed as below :

Series Business Intelligence – Tools & Theory

>>Chapter 1 : Business Intelligence an Introduction

>>Chapter 2 : Business Intelligence Essentials

>>Chapter 3 : Business Intelligence Types

>>Chapter 4 : Architecting the Data

>>Chapter 5 : Introduction of Data Mining

>>Chapter 6 : Data Mining Techniques

>>Chapter 7 : Introduction to Data Warehousing<You are here>

Continuing from my previous post on this series, If you have missed any link please visit link below

We are going to Cover the Following Points in this article

  • Introduction to Data Warehousing
  • Data Warehousing
  • Advantages and Disadvantages of Data Warehousing
  • Data Warehouse
  • Data Mart
  • Aspects of Data Mart

Introduction to Data Warehousing

By now, you must be familiar with the introduction of Data Mining and its different techniques. This unit introduces you to Data Warehousing and its various characteristics, differences in Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). You will also learn about the different data models for multidimensional databases, OLAP tools and practical applications of OLAP.

In the 1970s, departments that processed data were unable to tackle excess requests for data analysis. The data were recorded in tapes for specific information manipulation. Application data was hidden behind mainframe files and databases.

In the 1980s, Distributed Database Management Systems (DDBMS) were used. But it did not solve the problem of ‘islands of data’ and accessing data from a transactional database was difficult.

In the 1990s, data warehousing was introduced. It seemed to be the most appropriate solution to manipulate and optimize data. Here, the data that is required is gathered in an optimized database in spite of the difference in application and difference in platform that is used to generate the source of the data.

This unit helps you to analyze the importance of data warehousing in any organization and also the different methods by which data can be represented. We will discuss data mart which is a subset of the collection of data. In data warehousing, the data models used are star and snowflake schema. OLAP and OLTP are also discussed here, which are the tools to access the data.

Data Warehousing

In order to survive the market competition, an organization has to monitor the changes within the organization and outside the organization. An organization that cannot study the current trends within the organization and without its operations such as corporate relations will not be able to survive in the world today. This is where data warehousing and its applications comes into play.

Data warehousing technology is the process by which the historical data of a company (also referred to as corporate memory) is created and utilized. A data warehouse is the database that contains data relevant to corporate information. This includes sales figures, market performance, accounts payables, and leave details of employees. However, the data warehouse is not limited to the above mentioned data. The data available is useful in making decisions based on past performances of employees, expenses and experiences.

To utilize the data warehousing technology, companies can opt for Online

Transaction Processing (OLTP) or Online Analytical Processing (OLAP).

The uses of data warehousing are many. Let us consider a bank scenario to analyze the importance of data warehousing. In a bank, the accounts of several customers have to be maintained. It includes the balance, savings, and deposit details. The particulars of the bank employees and the information regarding their performance have to be maintained. Data warehousing technologies are used for the same.

Data warehousing transforms data to information and enables the organizations to analyze its operations and performances. This task is done by the staging and transformation of data from data sources. The data stores may be stored on disk or memory.

To extract, clean and load data from Online Transaction Processing (OLTP) and the repositories of data, the data warehousing system uses backend tools. Data warehousing consists of the data storage area composed of the data warehouse, the data marts and the data store. It also provides tools like OLAP to organize, partition and summaries data in the data warehouse and data marts. Mining, querying and reporting on data requires front end tools.

Advantages and Disadvantages of Data Warehousing

Data warehousing is a popular method adopted by most of the organizations as a management technique. However, there are advantages and disadvantages of data warehousing. Let us discuss its advantages and disadvantages.

Advantages of Data Warehousing

· It is very important for an organization to carefully analyze information related to business affairs. Data warehousing provides better end-user access.

· Accessing data becomes easier.

· It provides flexibility of use of major corporate data.

· With the use of data warehousing technology, creating reports becomes an easy job. Reports such as financial statements, accounts can be created easily with the data available in the data warehouse.

Disadvantages of Data Warehousing

· The major disadvantage of data warehousing is that it is time consuming. Creating a consistent and compatible system of data collection for storage and retrieval is time consuming.

· Organizations may be asked to modify the existing database system.

This could be a problem considering the cost of the system and the software required as the organizations need to adapt to the given model.

· Data warehousing is a very good concept but it may have security problems. When the database contains sensitive information, it is important to grant access of the database to limited users.

· The most common issue in data warehousing is that it requires regular data cleaning, extraction and loading data in the system. This regular maintenance is necessary but it is time consuming.

Data Warehouse

A Data warehouse is a part of the data warehousing system. It provides consolidated, accessible and flexible collection of data for end user analysis and reporting.

The characteristics of data warehouse as defined by Inmon as ‘a subject- oriented, integrated, non-volatile, time variant collection of data that supports the decision making process of an organization.’

He explains the terms in the above definition as:

· Subject-Oriented: Data Warehouse is subject-oriented as the data gives information about a particular subject instead of about a company’s ongoing operation.

· Integrated: Data Warehouse is integrated as the data is gathered from a variety of sources into the data warehouse and merged into a coherent whole.

· Time Variant: Data warehouse is time-variant as all the data in it is identified with a particular time period.

· Non-Volatile: Data is stable in a data warehouse. More data is added but data is never removed. Thus, the management can gain a constant picture of the business. Hence the data warehouse is non-volatile (long term storage).

Data warehousing is different from data warehouse. It is necessary to distinguish between the two. It can be distinguished as follows:

· Data warehousing comprises a complete architecture whereas the data

warehouse is data stored in the form of fact tables, aggregated fact tables, and lookup fact tables.

· Data warehousing provides tools for the end users to access collection of stored data whereas a data warehouse is a collection of data that support the decision making process of a management.

Data Mart

As mentioned earlier in this unit, data mart is a storage area of the data warehousing system.

Data Mart is a subset of the collection of data of an organization that is applicable to a specific set of users. It is generally oriented towards a specific purpose or a main data subject that may be shared to support business requirements. For example, the sales division data mart may confine its subject to items sold, number of items sold, profits and loss.

Data marts are represented as one dimensional model or a star schema with the fact table and multidimensional table. The concept of data mart can be applied to any kind of data. It is basically designed to meet the immediate requirements and is appropriate.

Data Marts can be classified as Independent data mart or Dependent data mart depending on the source of data.

These can be defined as follows:

· Independent data mart originates from data gathered from one or more operational systems or external information providers, or data gathered locally within a company or a particular area.

· Dependent data mart originates from enterprise data warehouses.

Aspects of Data Mart

We have been introduced to data mart and its concepts. Now let us look into the aspects of data mart which talks about the points to remember while using a data mart.

· External Data: While using external data in a data mart, one may come across two issues.

1. If more than one data mart requires the same external data, then the external data should be placed in the data warehouse and later moved on to the specific data mart. This reduces the data redundancy. It also avoids the other data marts from receiving this data.

2. While collecting external data, the ancestry of the data is expected to be saved as well. It includes the source, size and acquisition of external data; filtering and editing criteria of the data and so on.

· Reference Data: The reference table enables the users to access data faster. It relates data in the data mart to its expanded version. These tables are stored in addition to basic data in data mart. The reference tables are copied over from data warehouses directly. In rare occasions data mart itself manages its reference tables. In DSS environment, the content of reference tables should be managed over time. Time management is a complex task which data warehouse can best manage.

· Performance Issues: The performance factors differ in context with Decision Support System (DSS) and OLAP environment (to be discussed in the next section). The response time issue is entirely different- In DSS environment, the real time or online response is not as important as it is in OLTP systems. The response time can be loosened and can vary from 1 minute to 24 hours. This is applicable especially to data warehousing where there is abundance of data to be dealt with. It is not the same with data mart environment. In data mart, the data is specific and requirements are clear. Therefore, certain performance expectations can be set and achieved. For example, performance can be achieved by using star joins. In the case of multidimensional (MDDB) environment, the performance issues differ again. Better performance can be achieved if the MDDB is not overloaded.

We can say that good performance in a data mart environment can be achieved by limiting volume of data, using star joins in indexes, and creating aggregated data, pre-joining tables and arrays of data.

· Requirements Monitoring for a Data Mart: Monitoring the data mart behavior at specific time intervals is necessary. Monitoring the data mart is indispensable when the data mart is growing significantly. In this context, by monitoring we mean the data usage and data content tracking.

There are certain queries to be dealt with where data usage is concerned. They are:

· What data is being accessed?

· Who are the users who are active?

· What is the quantity of data access?

· What are the usage timings?

· What is the best way to access?

The data content tracking has the following queries to be dealt with:

· What are the contents of the data mart?

· Is there any bad data in the data mart?

· How much is the data mart growing?

· How fast is the data mart growing?

· Security in the Data Mart: When there is confidential information in the data mart; care must be taken to protect it well. Confidential information includes financial information, medical records and human resources information. Such information can be protected using encryption and decryption, log on/off security, application security, DBMS security, and firewall.

Hope you will like Series Business Intelligence – Tools & Theory series !

If you have not yet subscribe this Blog , Please subscribe it from “follow me” tab !

So that you will be updated @ real time and all updated knowledge in your mail daily for free without any RSS subscription OR news reading !!

Happy Learning and Sharing !!

For More information related to BI World visit our all Mentalist networks Blog

SQL Server Mentalist … SQL Learning Blog

Business Intelligence Mentalist … Business Intelligence World

Microsoft Mentalist … MVC,ASP.NET, WCF & LinQ

MSBI Mentalist … MS BI and SQL Server

NMUG Bloggers …Navi Mumbai User Group Blog

Architectural Shack … Architectural implementation and design patterns

DBA Mentalist …Advance SQL Server Blog

MVC Mentalist … MVC Learning Blog

Link Mentalist … Daily Best link @ your email

Infographics Mentalist … Image worth explaining thousand Words

Hadoop Mentalist … Blog on Big Data

BI Tools Analysis … BI Tools

Connect With me on

| Facebook |Twitter | LinkedIn| Google+ | Word Press | RSS | About Me |


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s