MSBI # 48 – BI # 4 – What is Big Data ? What is Hadoop ? What is relation between Hadoop and Big Data

Hi Friends,

Based on recent PASS discussion there is lot of talking in the market regarding Big Data And Hadoop ,So I thought to give some focus on same topic !!

As big data is related to all the technology such as SSIS,SSAS or SSRS I have included this under general category as BI #

If you want details regarding SQL Server 2012 or PASS announcement go to my following links

Lets understand this What are this terms and what is its useful ?

Introduction to Big Data and Hadoop

As we know majority of our Data is in Unstructured format which comes for many source ,So Companies that can extract facts  from the huge volume of data can better control processes and costs, can better predict demand and can build better product .

Dealing with big data requires two things

  1. Inexpensive, reliable storage
  2. New tools for analyzing unstructured and structured data.

Apache Hadoop is a powerful open source software platform that addresses both of  problems

What is Hadoop ?

It includes a fault tolerant storage system called the Hadoop Distributed File System .

HDFS – Hadoop Distributed File System

Hadoop creates clusters of machines and coordinates work among them. 
Clusters can be built with inexpensive computers. 
If one fails, Hadoop continues to operate the cluster without losing data or interrupting work, by shifting work to the remaining machines in the cluster. 
HDFS manages storage on the cluster by breaking incoming files into pieces, called “blocks,” and storing each of the blocks redundantly across the pool of servers.  
In the common case, HDFS stores three complete copies of each file by copying each piece to three different servers:

How Hadoop works in Big Data

Following are HDFS distributes file blocks among servers

image

Hadoop as Big Data Analysis

  • Hadoop is a different sort of tool.
  • Hadoop is aimed at problems that require examination of all the available data.
  • Hadoop uses a technique called MapReduce to carry out this exhaustive analysis quickly.
  • Hadoop takes advantage of this data distribution by pushing the work involved in an analysis out to many different servers

How Hadoop pushes work out to the data

    Following diagram shows how Hadoop pushes work out to the data ?

image

All the request we can see how Hadoop is processing parallel.

So we can conclude here that Hadoop’s MapReduce and HDFS use simple, robust techniques on inexpensive computer systems to deliver very high data availability and to analyze enormous amounts of information quickly. Hadoop offers enterprises a powerful new tool for managing big data .

Big Data is really big data but key for Big Data is Hadoop .

Hope this explanation is useful for you !!

Thanks for visiting my blog !!

If you really like reading my blog and understood at lest few thing then please don’t forget to subscribe my blog .

If you wan daily link and analysis or interesting link go to following website which will give @ your inbox please subscribe our following link resource blog

Where todays links are

Link Resource Website

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: