MSBI : BI # 34 : Business Intelligence – Tools & Theory # 26 : Introduction to Data Warehousing #4 : Difference between OLAP & OLTP & Multidimensional Data Model

Hi Folks,

This post is part of Series Business Intelligence – Tools & Theory

Currently running topic for this series is listed as below :

Series Business Intelligence – Tools & Theory

>>Chapter 1 : Business Intelligence an Introduction

>>Chapter 2 : Business Intelligence Essentials

>>Chapter 3 : Business Intelligence Types

>>Chapter 4 : Architecting the Data

>>Chapter 5 : Introduction of Data Mining

>>Chapter 6 : Data Mining Techniques

>>Chapter 7 : Introduction to Data Warehousing<You are here>

Continuing from my previous post on this series, If you have missed any link please visit link below

We are going to Cover the Following Points in this article

  • Difference between OLAP and OLTP
  • Multidimensional Data Model

Difference between OLAP and OLTP

We have studied about OLAP in the previous section. Let us now study about OLTP in brief. Online Transaction Processing refers to the category of systems that assist and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. Transaction in this context refers to the business or commercial transactions. OLTP has also been used to refer to processing in which the system responds quickly to user queries. For example, an automatic teller machine (ATM) for a bank is an example of a commercial transaction processing application.

Table given below distinguishes between OLAP and OLTP with the key

features.

Table Differences between OLAP and OLTP

Feature

OLTP

OLAP

Characteristic

Operational processing

Informational processing

Orientation

transaction

analysis

Function

Day-to-day operations

Long term informational requirements, decision support

DB Design

ER based application oriented

Star/snowflake, subject- oriented

Data

Current, guaranteed up-to-date

Historical, accuracy maintained over time

Summarization

Primitive, highly detailed

Summarized, consolidated

View

Detailed, flat relational

Summarized, multidimensional

Unit of work

Short, simple transaction

Complex query

Access

Read/write

Mostly read

Focus

Data in

Information out

Operations

Index/hash on primary key

Lots of scans

Priority

High performance, high availability

High flexibility, end-user autonomy

Metric

Transaction throughput

Query throughput, response time

User

Clerk, DBA, database professional

Knowledge worker( like manager, executive, analyst)

No of Records

Accessed

tens

millions

Multidimensional Data Model

Data warehouses and OLAP tools are based on multidimensional data models where the data is viewed as a data cube. Multidimensional data model is the most popular data model for data warehouses. The multidimensional data model can exist as a Star schema or a Snowflake schema. Let us look at each of these data models in detail.

Star Schema: It is the simplest of dimensional modeling. It consists of few fact tables (or just one) referencing few other dimension tables. The resulting schema resembles a starburst with the central fact table and surrounding dimensional table in radial pattern.

Figure is an illustration of star schema

clip_image002

Star schema

An example of a star schema for a hospital particular is shown in figure. The hospital particular is considered along four dimensions, namely time, location, branch and treatment provided. The schema consists of a central fact table for hospital particulars which contains the keys to each of the four dimensions.

In star schema, each dimension is represented by only one table and each table contains a set of attribute. For example, the location dimension table contains the attribute set {city, state, country}. This constraint may bring in redundancy. For example, Pune and Mumbai are in Maharashtra, India. Entries for such cities in the location dimension table will create redundancy among the attributes state and country; that is (Pune, Maharashtra, India) and (Mumbai, Maharashtra, India).

Snowflake Schema: In snowflake schema, few dimension tables are normalized and hence the data are further broken into additional tables. The resulting schema resembles a snowflake.

Figure is an illustration of snowflake schema.

clip_image002[6]

Figure Snowflake schema

An example for hospital particular is shown in the figure  the hospital particular is the same as that of the star schema in figure the single dimension table for treatment in the star schema is normalized in snowflake schema, resulting in a new treatment, type_of_surgery dimension table. For example, the treatment dimension table now contains the attributes surgery_key, medication and treatment_key. The surgery_key is linked to the type_of_surgery dimension table, containing plastic surgery, neurosurgery and surgery_key.

The main difference between the star schema and the snowflake schema is the normalization factor. The dimensional tables in snowflake schema are in the normalized form. This could be an advantage as the storage space reduces. When the dimensional structure is included as columns, the dimension table may be large. The overall space requirement can be reduced when we use normalized form as much of the data is redundant data. However, the effectiveness of browsing reduces in a snowflake schema as more joins are required to execute a query. This affects the performance of the system. Depending on the performance expectations one can decide on the schema to be used.

Data Modeling Multi-fact Star Schema or Snowflake Schema

Each of the dimension table consists of a single field primary key that has one-to-many relationship with a foreign key in the fact table. Let us look into some facts related to star and snowflake schema.

Model

The fact table consists of the main data and the other smaller dimension tables contain the description for each value in the dimensions. The dimension tables can be connected to the fact table.

Fact tables consist of a set of foreign keys that makes a complex primary key. Dimension tables consist of a primary key.

One of the reasons for using star schema is because it is simple. The queries are not complex as the joins and conditions involve a fact table and few single level dimension tables. In snowflake schema the queries are complex because of multiple levels of dimension tables.

Uses

The star and snowflake schema are used in dimensional data where the speed of retrieval is more important than the efficiency of data management. Therefore, data is not normalized much.

The decision as to which schema should be used depends on two factors: the database platform, the query tool to be used. Star schema is relevant in environment where the queries are much simpler and the query tools expose the users to the fundamental table structures. Snowflake schema would be apt for environments with several queries with complex conditions where the user is detached from the fundamental table structures.

Data Normalization and Storage

The data in the database could be repeated. To reduce redundancy we use normalization. Commonly repeated data are moved into a new table. Therefore, the number of tables to be joined to execute a query increases. However, normalization reduces the space required for the storage of redundant data and other places where it has to be updated.

The dimensional tables are smaller compared to fact table when storage is concerned.

Benefits of Snowflaking

· Some OLAP multidimensional database modeling tools that use dimensional data marts as a data source are optimized for snowflake schemas.

· If a dimension is meagre (that is most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking would be appropriate.

· To assist reporting, a multidimensional view is sometimes added to an existing transactional database. In this case, the tables which describe the dimensions will already exist and will typically be normalized. Hence, a snowflake schema will be easier to implement.

· A snowflake schema at times can reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.

· Some users may wish to submit queries to the database which cannot be expressed within a simple star schema using conventional multidimensional reporting tools. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Snow flaking would usually be required to permit simple query tools to form such a query, especially if condition for these forms of query weren’t expected when the data warehouse was first designed.

 

Hope you will like Series Business Intelligence – Tools & Theory series !

If you have not yet subscribe this Blog , Please subscribe it from “follow me” tab !

So that you will be updated @ real time and all updated knowledge in your mail daily for free without any RSS subscription OR news reading !!

Happy Learning and Sharing !!

For More information related to BI World visit our all Mentalist networks Blog

SQL Server Mentalist … SQL Learning Blog

Business Intelligence Mentalist … Business Intelligence World

Microsoft Mentalist … MVC,ASP.NET, WCF & LinQ

MSBI Mentalist … MS BI and SQL Server

NMUG Bloggers …Navi Mumbai User Group Blog

Architectural Shack … Architectural implementation and design patterns

DBA Mentalist …Advance SQL Server Blog

MVC Mentalist … MVC Learning Blog

Link Mentalist … Daily Best link @ your email

Infographics Mentalist … Image worth explaining thousand Words

Hadoop Mentalist … Blog on Big Data

BI Tools Analysis … BI Tools

Connect With me on

| Facebook |Twitter | LinkedIn| Google+ | Word Press | RSS | About Me |

4 thoughts on “MSBI : BI # 34 : Business Intelligence – Tools & Theory # 26 : Introduction to Data Warehousing #4 : Difference between OLAP & OLTP & Multidimensional Data Model

  1. Relational(OLTP) Data Modeling
    Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts, real or abstract about which information is collected. Entities are associated with each other by relationship and attributes are properties of entities. Business rules would determine the relationship between each of entities in a data model.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s