MSBI : BI # 30 : Business Intelligence – Tools & Theory # 22 : Data Mining Techniques #4 : Kohonen’s Self-Organising Maps & Genetic Algorithms

Hi Folks,

This post is part of Series Business Intelligence – Tools & Theory

Currently running topic for this series is listed as below :

Series Business Intelligence – Tools & Theory

>>Chapter 1 : Business Intelligence an Introduction

>>Chapter 2 : Business Intelligence Essentials

>>Chapter 3 : Business Intelligence Types

>>Chapter 4 : Architecting the Data

>>Chapter 5 : Introduction of Data Mining

>>Chapter 6 : Data Mining Techniques<You are here>

Continuing from my previous post on this series, If you have missed any link please visit link below

We are going to Cover the Following Points in this article

  • Kohonen’s Self-Organizing Maps
  • Genetic Algorithms

Kohonen’s Self-Organizing Maps

Kohonen‟s Self-Organizing Maps (SOMs) have become a promising technique in data mining for cluster analysis. They are based on unsupervised learning. The weights in connection are assigned with small random numbers at the beginning. The incoming input vectors represented by the sample data are received by the input neurons. The input vector is transmitted to the output neurons through the connections. The output neuron with the weight most similar to that of the input vector becomes active.

In the learning stage, weights are assigned following Kohonen‟s rule. Weight can be updated only to the active output neurones and their topological neighbours. The neighbourhood is large when it starts and slowly decreases in size over time since the leaning rate is reduced to size zero when the learning process converges.

Once the learning process ends, similar sets of items activate the same neuron. Self-Organizing Maps divides the input set into similar records. Hence SOM is referred as a method of cluster analysis and it is often used in vector humanization. In data mining, cluster techniques based Kohonen‟s self-organizing maps have the following advantages over standard statistical methods.

· Data mining usually deals with high-dimensional data. A record in

database normally consists of a large number of items. The data do not have regular multivariate distribution. Therefore, the traditional statistical methods have their limitations and they are not effective. Self- Organizing Maps work with high-dimensional data efficiently.

· Kohonen‟s self-organizing maps provide means for visualization of multivariate data. This is possible because the two clusters of similar members activate output neurons with small distance in the output layer. The neurons that share a topology resemblance will be sensitive to inputs that are similar.

Data mining is human centered. It is implemented through knowledge discovery loops coupled with human-computer interaction and visual representation. The main aim is to extract novel, plausible, relevant and interesting knowledge from the database. SOM can be efficiently used to serve this application of data mining. SOM is a dynamic system which learns to abstract structures in high dimensional input space using low dimensional space for representation. A well designed SOM can he used to organize the high-dimensional clusters in a low-dimensional map. These low-dimensional cluster maps can he used to assist the human in discovering knowledge because they could be easily visualized. Figure 6.4 shows self-organizing data mining method.

clip_image002

Self-Organizing Data Mining Method

Genetic Algorithms

Genetic algorithm is a global search algorithm based on the principle of evolution. It incorporates the ideas of natural evolution. Genetic algorithm refers to the evolutionary systems, but in particular it is the algorithm that states, how the population of organism should be formed, evaluated, and modified. Genetic algorithms are easily parallelizable algorithm. It is used for classification and optimization of problems. It is also used to evaluate the fitness of other algorithms in data mining.

Genetic algorithm is a predictive tool of business intelligence. It gives competitive edge in solving business problems. It can learn to solve prediction, classification and optimization problems common to business needs. Genetic algorithm is the best suited technique to tackle the optimization problems like finding the best schedules, financial indicators, mixes, model variables, locations, parameter settings and portfolios in business application. It can be used alone to optimize a trading system or it can complement a system built with neural nets.

Genetic algorithm is an optimizing technique. This algorithm takes very complex problem to solve and come up with good solutions without having a detailed understanding of the problem. It can be applied to diverse set of problems which results a best solution which is much superior to random guessing technique.

In business, problems might arise that are quite complex and completely new which does not have any previous well thought solution, then in such cases genetic algorithm can provide optimum solution. In business problems, there are four key attributes that can benefit from genetic algorithm. If the attributes given below are present in the problem, we have to use genetic algorithm, else other techniques exists and it is preferred over genetic algorithm. The attributes a problem can possess are:

· The problem is very complex to solve to which the direct solutions cannot be obtained.

· The problem is comparatively new and the optimization technique is not determined to find solution.

· The problem has large number of variables which can produce large scale effect.

· The values from diverse proposed solutions can be well defined.

A primary population is formed consisting of randomly generated rules. Each rule is represented by a string of bits. For example, suppose the samples in a given training set are described by two Boolean attributes, A1 and A2, and two classes, C1 and C2. The rule “IF A1 and not A2 THEN C2“can be encoded to represent the bit string “100” where the two leftmost bits represent attributes A1 and A2, respectively, and the rightmost bit represents the class. Likewise, the rule “If not A1 and not A2 then C1” can be encoded as”001”. If an attribute possesses k values where k > 2, then k bits may be used to encode the attribute’s values. Classes can be encoded in a similar fashion.

A new population is formed based on the notion of survival of the fittest that has the fittest rules in the current population, along with the offspring of the rule. Genetic operator like crossover and mutation creates the offspring. In crossover, substrings from the pairs of rules are swapped to form the new pairs of rules. In mutation the randomly selected bits in a rule string are inverted.

Application of Genetic Algorithm

Genetic algorithm is an emerging science which is widely used in business applications. There are three main areas to which genetic algorithm can be applied:

· Optimization – Genetic algorithm can be used to automatically determine the optimal value for the variables that optimize the profit in business.

· Prediction – Genetic algorithm can be used as meta level operators that can optimize other data mining techniques. It is used to optimize the weighs in neural network.

· Simulation – Genetic algorithm is used to simulate the large number of entities in specific business to avoid overtime.

Genetic Algorithm is very useful in giving optimal solutions to direct marketing problems. Alex singer was the first person to relate genetic algorithm to real time direct marketing problem. The problem states, “What is optimal numbers of the coupons that should be put into coupon mailers in order to optimize profit?” this problem seems to be simple, but there are various other factors which complicates the given problem.

· The factors that complicate the problem are:

· The more number of coupons, the more the mailer weights and the higher the mailing costs which deceases the profit of the business.

· The coupon that does not appear in the mailer is not used by the consumer which results in the loss of the revenue.

· If there are too many coupons in the mailer, the consumer will be overloaded and might not usage of the coupons.

The above problem can be encoded into a simple genetic algorithm, where each simulated organism has a single gene that represents the organism. This is referred to the correct number of coupons in the problem.

The simple computer program indicates how many coupons to put into the mailer. The genetic algorithm can be processed with this optimization to create the population of single- gene organisms at random. This is done through simulating evolutions, modifying the genes, deleting the worst performers, and making copies with slight modifications of the best performers. The over time of optimal number of coupons is determined. The simulated organisms reproduce the similar copies of themselves into the next generation. As the problem is simple, random guessing of numbers and evaluating the guesses is sufficient to arrive at the solution. When the problem is more complicated, random guessing is not adequate and using genetic algorithm will give the right solution to the problem.

Hope you will like Series Business Intelligence – Tools & Theory series !

If you have not yet subscribe this Blog , Please subscribe it from “follow me” tab !

So that you will be updated @ real time and all updated knowledge in your mail daily for free without any RSS subscription OR news reading !!

Happy Learning and Sharing !!

For More information related to BI World visit our all Mentalist networks Blog

SQL Server Mentalist … SQL Learning Blog

Business Intelligence Mentalist … Business Intelligence World

Microsoft Mentalist … MVC,ASP.NET, WCF & LinQ

MSBI Mentalist … MS BI and SQL Server

NMUG Bloggers …Navi Mumbai User Group Blog

Architectural Shack … Architectural implementation and design patterns

DBA Mentalist …Advance SQL Server Blog

MVC Mentalist … MVC Learning Blog

Link Mentalist … Daily Best link @ your email

Infographics Mentalist … Image worth explaining thousand Words

Hadoop Mentalist … Blog on Big Data

BI Tools Analysis … BI Tools

Connect With me on

| Facebook |Twitter | LinkedIn| Google+ | Word Press | RSS | About Me |

Leave a comment