MSBI : BI # 27 : Business Intelligence – Tools & Theory # 19 : Data Mining Techniques #1 : Techniques of Data Mining & Statistical Perspective of Data Mining

Hi Folks,

This post is part of Series Business Intelligence – Tools & Theory

Currently running topic for this series is listed as below :

Series Business Intelligence – Tools & Theory

>>Chapter 1 : Business Intelligence an Introduction

>>Chapter 2 : Business Intelligence Essentials

>>Chapter 3 : Business Intelligence Types

>>Chapter 4 : Architecting the Data

>>Chapter 5 : Introduction of Data Mining

>>Chapter 6 : Data Mining Techniques<You are here>

Continuing from my previous post on this series, If you have missed any link please visit link below

We are going to Cover the Following Points in this article

  • Data Mining Techniques
  • Statistical Perspective on Data Mining

Data Mining Techniques

By now you must be familiar with Data Mining Process and its functionalities. Data mining is a process of extracting the hidden predictive information from large database. It is a new powerful technology with great potential to attack problems such as obtaining efficient summaries of large amount of data. It can also identify the interesting structures and relationships within a data set. Data mining tools are capable of predicting the future trends and behaviors which helps to take proactive and knowledge driven decisions in businesses. It can answer business questions that are time consuming within a short period of time. Data mining tools hunt the databases for hidden patterns that are outside the expectations of experts.

Today, many organizations are involved in the development of information systems that establishes effective linkages with their suppliers, customers, and other partners in channel who are involved in various activities like transportation, distribution, warehousing, and maintenance. These linkages lead to large data warehouses that integrate operational data with supplier, customer, and market information. Data mining techniques can be used to structure and priorities information from large data warehouses in order to address the specific end users. Various business benefits are accomplished by integrating data mining techniques in information system.

Data mining is evolving from technology-driven concepts to business solution-driven concepts. Before, the information technology consumers were keen to use data mining technique without much regard to its present business processes and organizational disciplines. Today, business division are ahead in implementing data mining techniques more than the information technology division in various corporations.

Statistical Perspective on Data Mining

The aim of data mining and its tools have common characteristics as that of classical statistics. Data mining is regarded as a collection of methods for drawing inferences from data. The inference includes, understanding the patterns of correlation and links among the data or predicting the future data values.

Data mining is the business of answering questions that have not been asked yet. It reaches deep into database. Data mining can be classified into two categories namely descriptive and predictive data mining.

Descriptive data mining provides information to be acquainted with what is happening inside the data without a predetermined idea. Predictive data mining allows the user to submit records with unknown field values, and the system will guess the unknown values based on previous patterns discovered form the database.

Data mining models can be categorized based on the tasks they perform. They are Classification and Prediction, Clustering, Association Rules. Classification and prediction fall under the predictive model of data mining while the clustering and association falls under the descriptive models of data mining. The most commonly performed task in data mining is classification. It recognizes pattern that identifies the group to which an item belongs by examining the existing items that have been already classified and inferred with a set of rules. Clustering is very similar to classification; the only difference is that no groups have been predefined. Prediction is construction and use of a model to assess the class or value or value ranges of a given unlabeled object. Forecasting is different from predictions as it estimates the future value of continuous variables based on patterns within the data.

Statisticians have established techniques for attacking problems identifying pattern and summarizing the large data. There are several statistical models available for determining the relationships in the data set or for predicting the data set. The statistical models like cluster analysis, discriminant analysis, and nonparametric regression can be used to solve huge data problems. Since data mining techniques can tackle this problem effectively, it is urging statistician to consider data mining as a branch or a part of statistics.

Statistics follows an approach that involves specifying a model for the probability distribution of the data and drawing the inferences as probability statements. Data mining follows different approach compared to that of classical statistics. When data mining is applied to the familiar statistical problems such as classification and regression, it retains some distinct features.

It is common to have both continuous and discrete valued variables in a data set. The multivariate analyses methods in statistics are designed for continuous variables where as data mining methods are designed for discrete variables. In applications, where data is a combination of continuous and discrete values, it is useful to use data mining and statistical methods to solve the problem efficiently.

Data mining method often minimize a loss function expressed in terms of prediction error. Cross validation estimates the prediction error, it is a technique known to statistics but widely used in data mining process. Minimizing prediction error using cross validation is a powerful technique. It can be used in nested fashion to optimize several aspects of the application model.

Greater complexities of data mining methods are not always acceptable. Statistical methods are preferable for fairly simpler application model. There are situations where the usage of data mining techniques does not give any progress to the given task. In such cases it is suitable to use statistical method.

Hope you will like Series Business Intelligence – Tools & Theory series !

If you have not yet subscribe this Blog , Please subscribe it from “follow me” tab !

So that you will be updated @ real time and all updated knowledge in your mail daily for free without any RSS subscription OR news reading !!

Happy Learning and Sharing !!

For More information related to BI World visit our all Mentalist networks Blog

SQL Server Mentalist … SQL Learning Blog

Business Intelligence Mentalist … Business Intelligence World

Microsoft Mentalist … MVC,ASP.NET, WCF & LinQ

MSBI Mentalist … MS BI and SQL Server

NMUG Bloggers …Navi Mumbai User Group Blog

Architectural Shack … Architectural implementation and design patterns

DBA Mentalist …Advance SQL Server Blog

MVC Mentalist … MVC Learning Blog

Link Mentalist … Daily Best link @ your email

Infographics Mentalist … Image worth explaining thousand Words

Hadoop Mentalist … Blog on Big Data

BI Tools Analysis … BI Tools

Connect With me on

| Facebook |Twitter | LinkedIn| Google+ | Word Press | RSS | About Me |

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s