MSBI # 48 – BI # 4 – What is Big Data ? What is Hadoop ? What is relation between Hadoop and Big Data
Based on recent PASS discussion there is lot of talking in the market regarding Big Data And Hadoop ,So I thought to give some focus on same topic !!
As big data is related to all the technology such as SSIS,SSAS or SSRS I have included this under general category as BI #
If you want details regarding SQL Server 2012 or PASS announcement go to my following links
- MSBI #45 – SQL Server 2012 #1 – Now SQL Server ‘Denali’ is 2012 !
- MSBI #46 -Data Explorer #1 – Introduction to Microsoft Codename “Data Explorer” i.e Once Done Use forever & Share Everywhere !!
- BISQL # 59 – SQL Server 2012 Developer Training Kit Web Installer Preview
- Introduction to Big Data and Hadoop
- What is Hadoop ?
- How Hadoop works in Big Data
- Hadoop as Big Data Analysis
- How Hadoop pushes work out to the data
In this article we are covering
Talk has been started when Microsoft announce they are going to support Hadoop , and through this they will also support Big Data .
Lets understand this What are this terms and what is its useful ?
Introduction to Big Data and Hadoop
As we know majority of our Data is in Unstructured format which comes for many source ,So Companies that can extract facts from the huge volume of data can better control processes and costs, can better predict demand and can build better product .
Dealing with big data requires two things
- Inexpensive, reliable storage
- New tools for analyzing unstructured and structured data.
Apache Hadoop is a powerful open source software platform that addresses both of problems
What is Hadoop ?
It includes a fault tolerant storage system called the Hadoop Distributed File System .
HDFS – Hadoop Distributed File System
Hadoop creates clusters of machines and coordinates work among them.
Clusters can be built with inexpensive computers.
If one fails, Hadoop continues to operate the cluster without losing data or interrupting work, by shifting work to the remaining machines in the cluster.
HDFS manages storage on the cluster by breaking incoming files into pieces, called “blocks,” and storing each of the blocks redundantly across the pool of servers.
In the common case, HDFS stores three complete copies of each file by copying each piece to three different servers:
How Hadoop works in Big Data
Following are HDFS distributes file blocks among servers
Hadoop as Big Data Analysis
- Hadoop is a different sort of tool.
- Hadoop is aimed at problems that require examination of all the available data.
- Hadoop uses a technique called MapReduce to carry out this exhaustive analysis quickly.
- Hadoop takes advantage of this data distribution by pushing the work involved in an analysis out to many different servers
How Hadoop pushes work out to the data
- Following diagram shows how Hadoop pushes work out to the data ?
All the request we can see how Hadoop is processing parallel.
So we can conclude here that Hadoop’s MapReduce and HDFS use simple, robust techniques on inexpensive computer systems to deliver very high data availability and to analyze enormous amounts of information quickly. Hadoop offers enterprises a powerful new tool for managing big data .
Big Data is really big data but key for Big Data is Hadoop .
Hope this explanation is useful for you !!
Thanks for visiting my blog !!
If you really like reading my blog and understood at lest few thing then please don’t forget to subscribe my blog .
If you wan daily link and analysis or interesting link go to following website which will give @ your inbox please subscribe our following link resource blog
Where todays links are