MSBI : BI # 28 : Business Intelligence – Tools & Theory # 20 : Data Mining Techniques #2 : Statistics & Similarity Measures

Hi Folks,

This post is part of Series Business Intelligence – Tools & Theory

Currently running topic for this series is listed as below :

Series Business Intelligence – Tools & Theory

>>Chapter 1 : Business Intelligence an Introduction

>>Chapter 2 : Business Intelligence Essentials

>>Chapter 3 : Business Intelligence Types

>>Chapter 4 : Architecting the Data

>>Chapter 5 : Introduction of Data Mining

>>Chapter 6 : Data Mining Techniques<You are here>

Continuing from my previous post on this series, If you have missed any link please visit link below

We are going to Cover the Following Points in this article

  • Statistics
  • Similarity Measures


To extract the necessary data from the information, it is very important the Information Technology (IT) team and Business Intelligence (BI) team to work together. Business intelligence systems have Information Technology in it. Information technology is the core part of all the business intelligence application that deals with gathering, storing, sorting and analyzing the data of an organization.

The most obvious and important business intelligence application that directly with information technology is statistics. Statistics helps the managers and executives of an organization to get an idea of what is happening, what has happened, and what may happen within their enterprise.

Statistics is a scientific application of mathematical principles for the collection, analysis, and presentation of numerical data. It is a discipline that involves the development and application of methods to collect, analyze and interpret data. Statistics can also be referred as the science of learning from the data.

A statistician is the one who is well versed in successful application of statistical analysis. Statisticians add their contribution to the scientific enquiry by applying mathematical and statistical knowledge to the design of surveys and experiments. This includes collecting, processing, analyzing the data and interpreting the results.

Statistical knowledge can be applied to various subject areas, such as biology, economics, engineering, medicine, public health, psychology, marketing, education, and sports. Modern statistical method involves:

· The design and analysis of experiments and surveys.

· The modification of biological, social and scientific phenomenon.

· The practical application of statistical principles to understand more about the world around us.

Modern statistical method has become the most important factor in decision making in areas such as the medical, biological and social sciences, economics, finance, marketing research, manufacturing and management, government, research institutes and so on.

Need of Statistics

Statistics is essential for decision making under uncertainties. It is concerned with the most basic of human needs. It also finds out more about the world and how it operates in face of variation and uncertainty. According to H.G. Wells1, “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”

Statistics is required to place knowledge on a systematic evidence base. It forms the means to communicate the information of the knowledge we know. Data are known to be the crude information which is the integral part of most areas of human enterprise. Data themselves do not form knowledge. There is sequence of flow to form the knowledge from the data which is described as follows:

1. Data to information: Data can be considered as information when the data becomes important for decision problem.

2. Information to facts: Information can be considered as fact when the data can support the information.

3. Fact to knowledge: Fact can be considered as knowledge when the fact becomes useful in the successful completion of the decision process.

The Figure below, shows the statistical thinking process based on data in constructing statistical models for taking decision in the period of uncertainties.


                                 Statistical Thinking Process Graph

Similarity Measures

Similarity measures determine the similarity between the two objects. Similarity measure is an important tool of business intelligence system which is useful in determining the similarities between the two factors in a business application. This helps the user to adapt the suitable steps to improve the business and bring in the necessary steps to achieve business intelligence concepts. When concerned with pure verification and identification application in a business, it is very important to determine whether the new template matches with the stored one.

Similarity measures determine the match between the two essential components of a business application which helps in taking critical decisions.

It gives the similarity characteristic between the objects. In internet, all the web pages represent the whole database. These pages are classified into two category i.e. pages that answer the given query and those that do not answer. The pages that answer the query are more similar like each other than those pages which do not answer the query. Query stated determines the similarity between the pages in this case.

The similarity between the two objects ti and tj, sim(ti, tj), in the database D is a mapping from D*D to the range [0, 1]. The objective is to define the similarity mapping such that documents that are more alike have higher similarity value. The characteristic of good similarity measure is given below.


There are four methods to determine the similarity characteristic between the two objects namely:

· Dice

· Jaccard

· Cosine

· Overlap




Hope you will like Series Business Intelligence – Tools & Theory series !

If you have not yet subscribe this Blog , Please subscribe it from “follow me” tab !

So that you will be updated @ real time and all updated knowledge in your mail daily for free without any RSS subscription OR news reading !!

Happy Learning and Sharing !!

For More information related to BI World visit our all Mentalist networks Blog

SQL Server Mentalist … SQL Learning Blog

Business Intelligence Mentalist … Business Intelligence World

Microsoft Mentalist … MVC,ASP.NET, WCF & LinQ

MSBI Mentalist … MS BI and SQL Server

NMUG Bloggers …Navi Mumbai User Group Blog

Architectural Shack … Architectural implementation and design patterns

DBA Mentalist …Advance SQL Server Blog

MVC Mentalist … MVC Learning Blog

Link Mentalist … Daily Best link @ your email

Infographics Mentalist … Image worth explaining thousand Words

Hadoop Mentalist … Blog on Big Data

BI Tools Analysis … BI Tools

Connect With me on

| Facebook |Twitter | LinkedIn| Google+ | Word Press | RSS | About Me |

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s