Monday, May 6, 2013

Big Data Made Easy

In the last few years the internet and media has been buzzing about big data and if you haven’t noticed it yet, you will now. For the purpose of this post, let’s assume you are going to have lunch with a few top executives at your company to talk about the last quarter. While at lunch, the topic of big data surfaces and the executives know its gaining popularity and is becoming a very powerful resource for companies. Like most non-tech savvy people, this is about the extent of their knowledge and that is where this post will help out.  After reading this post you be able to have an intelligent conversation with the executives or other people about big data, what it is, and how it can be a valuable resource to any industry.

What is Big Data?
Big data is a collection of data so large and complex that is becomes nearly impossible to process using traditional database management tools and processing applications. Everything from capturing, curating, storing, searching, sharing, and analyzing such large sets of data becomes a challenge using traditional methods. Having such large sets of data allows for a more accurate correlation to be found to spot business and financial trends, determine quality of research, prevent diseases, link legal citations, combat crime and determine real-time traffic conditions just to name a few. As of recent the size limit of data sets that could be processed in a reasonable amount of time were on the scale of exabytes. [1]

How is it used?
One of the best examples of big data is right on our own backyard near Utah Lake which is the United States NSA Data Center or National Cybersecurity Initiative Data Center. When the facility is finished it will be able to handle yottabytes (1 trillion terabytes) of information collected by the NSA over the internet. The datacenter can allegedly collect data from all forms of communication including complete content of private emails, cell phone calls, and search engine results. This data will be analyzed, deciphered and stored in an effort to spot national security threats.[2] This example gives a better understanding how big data is used on some of the biggest scales imaginable but can be thought of the same way but on a smaller scale for any industry. Retailers are capturing vast amounts of data everyday though ecommerce and in person. Big data is allowing for the most amount of data to be captured and analyzed by retailers to spot key factors in shopping habits and buying trends. Based on these findings the retailer can then make more profitable decisions and find key areas for improvement. Social media and the surge in popularity of YouTube, Twitter, Facebook and other forms of social media has created huge potential for the use of big data and is already being utilized today. With almost a billion users on Facebook it has become a data company and with over 400 million tweets per day, this data can be collected and analyzed in real time. Having such a vast amount of people speaking their mind regarding just about every topic makes for endless possibilities and the usefulness of such data.
Big Data Technologies

As mentioned above, traditional database systems are unable to handle to vast amounts of data that is collected. Apache has created an open source tool called Hadoop which makes it possible to utilize big data. Hadoop is a framework that distributes MapReduce work across a cluster of servers. One of the key benefits of Hadoop is that servers can be added or taken away with relative ease to accommodate demand.  

MongoDB is another technology that makes big data possible. It has similar features to Hadoop such as scalability ease but unlike Hadoop, mongoDB is a data storage system meant for storing and the quick retrieval of large amounts of data. Mongo is a mature system which has a lot of useful tools allowing it to be a replacement for traditional database systems. [3]
There you have it, by no means are you an expert on big data, but at least you have a better understanding about what it is and how the utilization possibilities are endless. If you are curious to learn more about big data may I suggest this free book available to download here.