Hi Folks,
This post is part of Series Business Intelligence – Tools & Theory
Currently running topic for this series is listed as below :
Series Business Intelligence – Tools & Theory
>>Chapter 1 : Business Intelligence an Introduction
>>Chapter 2 : Business Intelligence Essentials
>>Chapter 3 : Business Intelligence Types
>>Chapter 4 : Architecting the Data<You are here>
Continuing from my previous post on this series, If you have missed any link please visit link below
We are going to Cover the Following Points in this article
- Metadata
- Total Data Quality Management (TDQM)
Metadata
Metadata is the data about data or the information about the information. It describes the other data. It gives information about content of a certain object or an item. Metadata is a structured data that describes, explains and locates an information resource which makes it easier to retrieve, use and manage. It describes the characteristics of resource. The term “meta” is derived from Greek word which denotes the nature of a higher order or more fundamental kind. For example, a metadata of text document contains information about the length of the document, number of words in the document, author of the document and abstract of the document.
Metadata is differently used in different application. It is used to refer the machine understandable information in some application where as it used as records that describe the electronic resources in business applications. Metadata describes resources at any level of aggregation. It can describe a collection, a single resource, or a component part of a larger resource. There are three types of metadata namely:
· Descriptive metadata: Describes the resources that serve the purpose of application like discovery and identification. This type of metadata includes elements like title, author, abstract and keywords.
· Structural metadata: Describes or indicates how compound objects are put together. For example, pages are ordered to form a chapter in a book.
· Administrative metadata: Provides information which helps to manage the resource. It is useful in business application to build business intelligence. It includes information like when and how the resource was created, file type, and other technical information as who can access it.
An administrative metadata contains several subsets.
Two subsets among them are sometimes listed separately that are mentioned below:
· Rights management metadata contains the information which deals with intellectual property rights.
· Preservation metadata contains information required to archive and preserve the resource.
A metadata record consists of a number of pre-defined elements that represents specific attributes of a resource. Each element in the metadata record has one or more values.
The characteristics of metadata record are mentioned below:
· Limited number of elements
· The name of each element
· The meaning of each element
Metadata can be directly created in the database which can be linked to the resource. It is very useful and growing in popularity as it is an independent activity which creates resource themselves.
There are many ways in which the metadata can be deployed. They are:
· It can be embedded in a web page by a creator using „meta‟ tags in the
html coding of the page.
· It can be deployed as a separate HTML document linked to the resource it describes.
Metadata records can be directly created in a database or it can be extracted from another source like web page.
When syntax is not strictly followed in the metadata schema, the data in the record will unusable until the encoding scheme understands the semantics of it. Computer programs can process the metadata through encoding scheme.
There are few important encoding schemes like:
· HTML (Hyper-Text Markup Language)
· SGML (Standard Generalized Markup Language)
· XML (Extensible Markup Language)
· RDF (Resource Description Framework)
· MARC (Machine Readable Cataloging)
· MIME (Multipurpose Internet Mail Extensions)
Total Data Quality Management (TDQM)
Data is computer stored information. The quality of the data can substantially influence on the three strategic vectors of organizations namely customer interaction, asset configuration, and power of knowledge. Hence data is an important resource for any organization competitiveness. Incorrect data during the period of customer interaction may lead to significant perceived quality problem by clients. High quality data can help the organization serves the clients better. It also improves inventory management, production and resources planning which is an important factor for asset configuration of an organization. Poor quality of data results in accumulating errors through the value chain and high quality data may reduce human coordination costs. Business analyst will be able to find proper insights in production and service processes and propose ways of improving them using high quality data.
There are two relevant views on data quality which has to be emphasized. They are:
· Product based view: this defines the quality as the difference in quantity of some desired ingredient material in a product and reality.
· Production oriented view: this defines data quality as a process
conformance to requirements.
In total data quality management program both the views have been integrated and implemented the idea of fitness-for-use.
Table describes the quality dimensions of total data quality management process.
Quality Dimension of TDQM
Data quality category |
Data quality dimensions |
Intrinsic data quality |
Believability, Accuracy, Objectivity, Reputation |
Contextual data quality |
Value-added, relevancy, timeliness, completeness, appropriate amount of data |
Representational data quality |
Interpretability, representation consistency, concise representation |
Accessibility data quality |
Accessibility, access security |
Total Data Quality Management is the program implemented to achieve the high state of data in an organization. Any organization irrespective of goals and operating environment, they have to develop their own specific total data quality management program. Regardless of the differences, an organization that is successful in the implementation of practical and efficient total data quality management consists of iterative process of defining, measuring, analyzing, and improving.
Within the framework of total data quality management, an organization must follow the given rules. It should:
· Clearly define what quality means in general and data quality in
particular.
· Develop a set of metrics that measure the vital dimensions of data quality for the organization and that can be linked to the organization‟s general goals and objectives.
To implement total data quality management, the quality should be defined. Each company or industry should choose a definition that is appropriate to its goal and internal culture. Information of the company should be treated as a product that is delivered to the customer. To understand the customer requirement, an organization should follow the customer hierarchy of need model which is shown in the figure below.
Customer Hierarchy of Need Model
This model implies that total data quality is a necessary condition and its lays the foundation for any customer-supplier relationship. To move up in the hierarchy towards the committed relationship is possible only with strong basic foundation. Understanding the requirements of the customers is the critical path towards the goal of creating committed relationship with customers and suppliers.
Total Data Quality Management (TDQM) Cycle
Total data quality management is based on the Deming cycle version which has four main steps Plan, Do, Check and Act. Total Data Quality Management that is directed towards quality of data will have four steps namely Defining, Measuring, Analyzing, and Improving.
Define phase of TDQM
The define phase of TDQM has three steps. First step of the define phase determines the characteristics of data products. There are two levels to describe the characteristics of product.
The highest level describes the characteristics of total data product and lowest level describes the attribute of each product individually.
The figure given below illustrates the difference between data products and data attributes.
Figure Paths of Data Product and Data Attribute
It is useful to focus more on the data products or attributes whose quality problems have highest impact on the organization. This process should be accomplished in the step one of the define phase.
The step two of the define phase determines the requirements for the data products.
This determines the important quality dimensions for data products or attributes by identifying the following details:
· The perceived level of quality in dimension.
· The expected level of quality in dimension.
· The importance of the dimension.
The step three of define phase determines the data manufacturing process. This process contains the data flow from supplier to the user of the data and also includes certain processing activities and quality checks. The knowledge of the data manufacturing process helps to get better understanding on the importance of quality dimensions. The clear defined process helps in data quality management by codifying processes, making them person independent and reliable.
Measurement Phase of TDQM
This phase of the data quality management determines the quality of the dimensions identified in the define phase.
This phase has two steps:
1. The step one of the measurement phase selects the proper metrics. To determine a metric for a data quality dimension, the team working on it should keep in mind the underlying business rules and laws that have contributed to importance a dimension. There are three quality measurement factors for a dimension. They are:
· Simple ratio: Is the measure of the ratio of outcomes of a selected variable to total outcomes of that variable.
· The min or max operation: Handles the dimensions that require the
aggregation of multiple data quality variables.
· The Weighted Average: Is appropriate when the organization determines the importance of each variable with respect to the overall evaluation of a dimension.
2. The step two of measurement phase, measures and presents data.
Measurements are conducted based on the determined metrics. For example, if there are 20000 products in the database, sample data of
400 products size is essential to reach a 95% of reliability considering
5% variance. There are several charts available to display the result of measurement. To identify the dimension which has caused the problem, the result can be displayed on Pareto diagrams. In this chart, the X-axis (horizontal axel) displays the errors in the different dimensions while the y-axis (vertical axel) displays the percentage of the errors in the different dimensions compared to the total number of errors. The Pareto charts provides an easy way to determine the dimensions that causes most of the problems.
Analysis phase of TDQM
This phase of the total data quality measurement analyses and find out the root cause for the problems in the different dimensions. There are three method involved in analyzing the problems of dimensions. They are:
· Cause effect diagram
· Interrelationship diagram
· Current reality tree
Among the three methods, Cause effect diagram is the most common and easy way to identify the root cause of the problem. The main goal of this step is to find out an answers for the questions like what the problem is, when did it occur, why did it happen, and how does it impact the overall goals of an organization.
Improvement phase of TDQM
This phase explains about the steps taken by an organization total data quality management team to improve the quality and mitigate the existing problem in a dimension. This phase is implemented in three steps. They are
· Solution generations: A propose of this step is to use the information manufacturing analysis matrix to generate the solutions required to reduce the problems.
· Select solution: This step of the improvement phase determine the impact of the solution and cost to design and implement the solution. It also considers the resource required to achieve the selected solution. Solution priorities can be stated when the importance of each evaluation criterion is known.
· Action plan: this step determines the action taken to implement the selected solution. An organization should have a separate team to work on the action plan to ensure that the action is executed. Action plan has a project tracker that keeps the track of all actions and their status.
TDQM Team
The data quality management team should be well trained in data quality assessment and management skills.
This team should consist of:
· Team leader: A senior executive member of the team who is able to implement chosen solutions.
· Team engineer: A person who has knowledge on the methods and techniques used to implement the chosen solution.
· Data suppliers: they have to create or collect data for the data product.
· Data manufacturers: they have to design, develop and maintain the data and systems infrastructure for the data product.
· Data consumers: These are the people in the team who set data product in their work.
· Data product managers: These are the people in the team who are responsible for managing the entire data product production process throughout the data product life cycle.
Hope you will like Series Business Intelligence – Tools & Theory series !
If you have not yet subscribe this Blog , Please subscribe it from “follow me” tab !
So that you will be updated @ real time and all updated knowledge in your mail daily for free without any RSS subscription OR news reading !!
Happy Learning and Sharing !!
For More information related to BI World visit our all Mentalist networks Blog
SQL Server Mentalist … SQL Learning Blog
Business Intelligence Mentalist … Business Intelligence World
Microsoft Mentalist … MVC,ASP.NET, WCF & LinQ
MSBI Mentalist … MS BI and SQL Server
NMUG Bloggers …Navi Mumbai User Group Blog
Architectural Shack … Architectural implementation and design patterns
DBA Mentalist …Advance SQL Server Blog
MVC Mentalist … MVC Learning Blog
Link Mentalist … Daily Best link @ your email
Infographics Mentalist … Image worth explaining thousand Words
Hadoop Mentalist … Blog on Big Data
BI Tools Analysis … BI Tools
Connect With me on
| Facebook |Twitter | LinkedIn| Google+ | Word Press | RSS | About Me |