Tuesday, February 18, 2014

What's the Big Effing Deal about Big Data?

Many of us working in technology have heard the rumblings and high and mighty boasts about using big data and how it can transform your business, create new opportunities, increase efficiency and usher in a new epoch of world history that will bring about world peace (alright, I made up the last one, but you get the point, big data has been mentioned as a prescription for many problems, but is it really the answer?) The question is, what is Big Data, what are its challenges and how can Big Data ultimately help you to better your business’s bottom line?

What is Big Data some of you may wonder?

Big data is “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.”[1]  Essentially, with the improvements in data storage devices, coupled with their decrease in price and the explosion of both structured and unstructured data during the Web 2.0 movement, new streams of data have been created that can now be analyzed for actionable insights into areas like customer behavior to allow businesses the ability to better market and target their core customers.  Unstructured data has provided the biggest surge in the supply of data and is basically “data [that] is heterogeneous and variable in nature and comes in many formats, including text, document, image, video and more.”[2]  The ability to use tools to make sense of all of the unstructured data has completely changed how we actually store the data.

Big data has proven to be a challenge to organizations wanting to implement a system to capture and make sense of all of it.  Big Data is challenging for several reasons: 

1)      The three V’s of data: volume, velocity and variety[3].  Although this coinage has been adopted throughout Big Data discussions on the Web, it was first described by Doug Laney in a paper he wrote back in 2001 on how organizations must create information management systems to cope with these three aspects of data.  Volume: the increase in depth and breadth of data that is collected.  Velocity: the increase in speed of data that is being generated and hence needed to be stored.  Variety: structured and the great increase in unstructured data that does not fit neatly into a typical relational database.

2)      How to process the data into information.  Much of the data being collected, now unstructured, does not fit neat into a relational database where fields must be defined for length and data type being stored.  The rise of NoSQL databases, pioneered by many of the biggest names in technology such as Google and Facebook, has taken precedence as the best way right now to store unstructured data.  Although no standard definition has been officially created for NoSQL, it is essentially a database that adheres to the following characteristics[4]:

                                                                          i.      Not using the relational model

                                                                         ii.      Open source

                                                                        iii.      Designed to run on large clusters

                                                                        iv.      Based on the needs of 21st century web properties

                                                                         v.      No schema

This different type of database requires professionals with different a skill set than working with standard relational databases and the cost with hiring/training professionals to implement and maintain a NoSQL database must be considered when choosing whether or not to pursue a big data strategy for the business.

3)      What to do with the data.  This is the last and from my perspective, the most important challenge that must be addressed. 

As the cartoon above illustrates, the ultimate goal of Big Data is to drive actionable insights which were not available before and help remove things like bias, from decision making.  Most of the business value in respect to Big Data comes from the analysis and knowledge that is created, not from the stored data itself.  It is not enough to just efficiently and effectively store the data that your business generates, but more importantly are the key decision makers in the business ready to listen to what the analysts tell them and prepared to take action based on this knowledge.  

Big Data can truly be a game changer for your business.  I believe it can benefit all industries, but some areas like computer and electronic, finance, insurance and government will see greater benefits.[5]  I think much of the productivity gains to be realized by effectively using Big Data and acting on the insights gained from it has currently only reached the tip of the iceberg.  This is where the right-brain people will get excited: what new creative insights can we gain from the data and where can we creatively apply Big Data analytics to solve problems? I think we are only limited by our own imagination and that is what excites me so much in this field—the realm of possibilities is endless.