Hadoop What is it?
Hadoop has become a buzzword in the tech world of today. If you are like me you probably know that
Hadoop deals with Big Data analytics and that
is about all that you know. That
broad definition doesn’t help an individual understand the true nature of
Hadoop and what it can do.
A better understanding of Hadoop comes from looking at a high level but
going deeper that just saying Big Data.
The best way to view Hadoop in my opinion is to look at it as taking
normal analytics and virtualizing compute power in order to crunch vast amounts
of data. According the hadoop.apache.org, the official site, Hadoop is a “framework
that allows for the distributed processing of large data sets across clusters
of computers using simple programming models. It is designed to scale up from
single servers to thousands of machines, each offering local computation and
storage.” [1]
Or in other words it is scalable data-crunching monster that will
analyze data that before tools like this was too large or too complex for
conventional tools to analyze.
Hadoop
can do this by levering the power of massive parallel
processing by using normal servers and pooling their resources. This model
is much more cost effective and easier to implement than when compared
to buying specialized high performance and extremely expensive servers.
Another great definition which sums up what Hadoop is comes from Mike Olson, CEO of Cloudera Hadoop is “Hadoop
platform was designed to solve problems where you have a lot of data — perhaps
a mixture of complex and structured data — and it doesn’t fit nicely into
tables.”[2]
Now that we know more of what Hadoop is, it is important to understand
where it came from. Hadoop’s underlying technology came from Google when Google
first started to look at Big Data analytics.
When Google started there were no tools in place to analyze such vast
quantities of data so Google built their own platform. As demand and need materialized an open
source project called Nutch came into being.
Taking the torch from Nutch and with massive help from Yahoo Hadoop was
born for enterprise applications.
In the future I would expect Hadoop to become more standard and standard out of the box applications to be developed for sale so that Application architects aren’t need to create productive Hadoop for every single company. [3] The amount of unstructured data that Hadoop’s tools can process is colossal and right now so much data isn't being leveraged. Analysis of this data could lead to higher conversion rates and boosts in sales. As such Hadoop and what it works to do will become industry standard by my prediction in the next coming decades. Right now is the time to get involved and be the early adopter and not the company lagging behind looking in from the outside.
[1] What is Apache Hadoop? Retrieved February 17 2014, from http://hadoop.apache.org/
[2] Hadoop: What it is, how it works, and what it can do. January 12,
2011, retrieved February 16, 2014 from: http://strata.oreilly.com/2011/01/what-is-hadoop.html
[3]
Hadoop for Dummies. 2012 retreived February 16, 2014, from http://public.dhe.ibm.com/common/ssi/ecm/en/dcm03002usen/DCM03002USEN.PDF
No comments:
Post a Comment