A simple
google search on Big Data will list out a bunch of articles that one can read
and instantly become aware of what
exactly Big Data means. This very Digital Analytics blog has several
comprehensive posts on it [1] [2] [3] . We all understand
(at the least) the definition of Big Data and the different tools (NoSQL
database systems, Hadoop, etc) used to mine Big Data. But where does the use of Big Data start and end? What is
the purpose of collecting such huge amounts of data? What is the end goal we
are looking at? My intention in this blog post is to explore beyond the mere buzz
of Big Data, its applications in the domain of analysis. For everyone this
might be food for thought.
I am a big
fan of TED talks. As weird as it may sound, I watch them to kill time, but at
the same time learn something. There are of course thousands of talks, but I
would like to summarize a few that truly inspired me to think about the
different analysis that can be performed using Big Data.
The very
first one is about the birth of a word by Deb Roy. I had watched this video two
years ago, but it stayed with me for some reason. At first it sounds like a
topic related to Linguistics and Human Language Development, but applying the
model to social media completely changes the perspective.
When
people start communicating via different social streams about an event, the
creation of new social structures can be mapped from data – Big Data. Aggregating
data from different channels across the web like photos, videos, audio
recordings, tweets, blogs, pod casts, emails, etc. is a challenge here, but still
simple to solve. Deb and his colleagues have taken this research further and
applied it to Social TV programs and commercials. A social conversation that begins
expanding on the web is tied to its stimulus – a TV show. Their company BluefinLabs [4] focuses on adding
context to social commentary and has given birth to a whole new world of Social
TV Analytics. When a human being browses through such real-time social
commentary, it is easier for the human brain to apply context. But since we
rely on machines for Big Data analysis, the biggest challenge Bluefin Labs has
been able to solve is teaching machines to link language to context through language
grounding techniques [5] [6] . Without this context, mining of Big Data and
performing analysis over it would be useless. This is exactly what we have
learnt in our Digital Analytics class – Context
is Queen.
The next
talk is by Jean-Baptiste Michel and Erez Lieberman Aiden. It is about
digitization of all the books published through time, to transform data into
understanding our language, history and culture.
Again
all this data is Big Data and the transformation process is called extracting
information and to add context to it is called Analytics. Google Labs’ NGramViewer is a treat. Try this out – type
in the word happy and observe the
trend. Why would people be less happier over time? This trend requires some context.
Then try typing in synonyms of happy and observe the trend. See what happened
after 1980 for one of the synonyms - gay?
We can easily add context here and give an explanation. Also try the American
slang words dorky and freak. What I would like to think about
is how this concept could be used in the future? If we could have charts that
add demographic segments to synonym comparisons, we might be able to see word
origins, depth of usage and more. One point of debate though would be the
degree of difference, if any, between spoken and written language, which could
give us an accuracy measure on the history of a language.
Insights
The last
talk is by a data scientist named Dan Berkenstock. His perspective on Big Data
is satellite imagery collection and the analytics that can be applied over it.
We enjoy
the convenience of sitting at home and using Google Maps to see a street view
in a country miles away from us. But have you realized that these images from
Google are not up-to-date? Try looking at our business building in street view;
it is still under construction. Dan’s research effort is to now to collect
satellite images of the earth in real-time. Moreover, the little bits of
information that show real-time data related to the image you are looking at is
phenomenal. He says that their team wants to use satellite imagery to “apply
scalable analytics to find insights”.
By now I
hope you have realized that as potential job seekers we need to understand that
collection of data is only the beginning. Big Data is pointless without context,
analysis and drawing insights. Those are our end goals. Remember, Big Data is
just large amounts of data, but what we should care about is what the different
analysis we can apply over it are? There is unlimited potential for Big Data
Analytics in every field you can think of today. From all the above inspirational
talks I would like to say – You are
dealing with Big Data, so think Big!
[1]
|
"http://dauofu.blogspot.com/2013/05/big-data-made-easy.html".
|
[2]
|
"http://dauofu.blogspot.com/2013/05/big-data-analytics-cutting-through.html".
|
[3]
|
"http://dauofu.blogspot.com/2013/05/demystifying-big-data.html".
|
[4]
|
"https://bluefinlabs.com/".
|
[5]
|
"http://en.wikipedia.org/wiki/Symbol_grounding".
|
[6]
|
"http://www.springer.com/computer/ai/book/978-1-4614-3063-6".
|
No comments:
Post a Comment