There
are a lot of blogs, podcasts, articles, even books about data analytics for Big Data or sometimes referred as Big Data Analytics. I wanted to write more on
this subject because, Big Data or Big Data Analytics is not a buzz word that
shines and disappears in a year or two. I believe Big Data or Big Data Analytics
is something that we will be hearing for years to come. There is no question in
my mind this will be a game changer towards data and data analytics in every field
and industry. Big Data Analytics is no longer a specialized solution for
cutting-edge technology companies. It is evolving into a viable, cost-effective
way to store and analyze large volumes of data across almost all industries.
What is Web Analytics?
The
Digital Analytics Association defines web analytics as the measure, collection,
analysis and reporting of internet data for purposes of understanding and
optimizing web usage.[1] You can read more about Web Analytics on my previous blog
Web Analytics and Data Warehouse.
What is Big Data?
Big
Data is a collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools or traditional
data processing applications. The challenges include capture, storage, search, sharing,
analysis and visualization.[2]
Some
examples of Big Data include medical records, photography archives, video
archives, large scale e-commerce, internet search indexing, call detail
records, astronomy, atmospheric science, genomics, biogeochemical, biological
and other complex scientific researches, web logs, RFID, military surveillance and other similar data.
Big
Data technologies like Apache Hadoop, open-source software framework, provide a
framework for large-scale, distributed data storage and processing across clusters
of hundreds or even thousands of networked computers. The objective is to provide
scalable solution for this Big Data while minimizing the processing time.
In
2010 alone, our world produced one zetabyte (1,000,000,000,000 gigabytes) of
data coming from five billion mobile phones, 30 billion posts shared on
Facebook per month, and millions of networked sensors connected to mobile
phones, energy meters, automobiles, shipping containers, retail packaging and more [3][4]
There
have been different challenges for companies to implement Big Data and Big Data
Analysis projects. Some of these are
Photo Courtesy: [5] |
·
For many years, companies faced upfront infrastructure
cost for Big Data and Big Data Analysis projects. Also, companies were not able
to respond to scale-out requirements because of infrastructure. This problem
has been solved by Big Data cloud services like Amazon’s Elastic MapReduce or
Microsoft’s Hadoop distribution for Windows Azure which enable companies to
lease infrastructure for their Big Data projects.
Photo Courtesy: [6] |
·
For most companies, integrating Big Data
with other components of Data Warehouse environment is critical. Big Data does
not replace Data Warehouse. Hadoop is built for fairly simple workloads, such
as sorting, aggregating, converting, and filtering. It is not intended to
manage schema structure and database security. Therefore, database management is
still important for companies. The challenge has been how to integrate these
two. IBM, Informatica, Microsoft, Oracle and SAP have released tools to
interface Hadoop and relational database management systems which solved this
problem.
Photo Courtesy: [7] |
·
When we come to Big Data Analysis, getting
user-friendly tools had been a challenge. Even though, there are some tools
like Apache Pig and Apache Hive which provides SQL-like frameworks for advanced
data analysts to run queries directly against data stored in Hadoop, these
tools require technical expertise. Recently, Microsoft has announced the Hive
ODBC driver and the Hive add-in for Excel which will allow end users to access data stored in Hadoop though Excel, Power Pivot and Analysis Services. Also,
Tableau has released a tool that allow users to drag and drop Hadoop reports.
These tools will allow end users to work on Big Data Analysis much more easily.
Since
the above challenges have been resolved, in the coming years, we will see a
dramatic growth on Big Data Analysis. Companies likely to get the most out of
Big Data analytics include:[3]
Supply chain, logistics, and
manufacturing
|
With RFID
sensors, handheld scanners, and on-board GPS vehicle and shipment tracking
produce vast quantities of information offering significant insight into
route optimization, cost savings and operational efficiency.
|
Financial services
|
Financial markets generate immense
quantities of stock market and banking transaction data that can help companies
maximize trading opportunities or identify potentially fraudulent charges,
among various users.
|
Energy and utilities
|
Smart instruments and electronic
sensors attached to machinery, oil pipelines and equipment generate streams
of incoming data that must be stored and analyzed to uncover and fix
potential problems.
|
Media and telecommunications
|
Streaming media, smartphones, tablets,
browsing behavior and text messages are captured at ever-increasing rates all
over the world, representing a potential treasure trove of knowledge about
user behavior and tastes.
|
Health care and life sciences
|
Electronic medical records systems are
some of the most data-intensive systems in the world and making sense of all
this data to provide patient treatment options and analyze data for clinical
studies can have dramatic effect.
|
Retail and consumer products
|
Retailers can analyze vast quantities
of sales transaction data to uncover patterns in users behavior and monitor
brand awareness.
|
References:
[2] http://www.zdnet.com/blog/virtualization/what-is-big-data/1708
[3] http://allthingsd.com/20120110/big-data-analytics-trends-to-watch-for-in-2012/
[4] http://www.idc.com/
[5] http://claritics.com/
[6] http://www.publicpolicy.telefonica.com/blogs/blog/2012/11/09/big-data-under-analysis-at-the-oecd/
[7] http://www.userfriendlycc.com/rates.html
[9] http://books.google.com/books?id=Wu_xeGdU4G8C&pg=PA3#v=onepage&q&f=false
[10] http://mike2.openmethodology.org/wiki/Big_Data_Definition
Nice information, good research!
ReplyDeleteI enjoyed your breakdown, it was a helpful and efficient portrayal of key aspects of big data. It seems like it has the potential to dramatically impact a variety of large industries.
ReplyDelete