Wednesday, February 13, 2013

Terabytes in seconds!

In the last post, I was writing about how big data is setting the new trend for 2013 moving forward.  Even though the concept of big data is still a buzz , but companies like Facebook, Google, Amazon, etc. are already using and dealing with big data.

Businesses and big data

As technology advances, businesses see more demands  to be able to generate more information in a short period of time to better serve their customers and stay on top of competition. Collecting lots of raw data (including emails, blogs, logs, picture files, etc) is meaningless unless they can make sense and understand what the data means for their business. Therefore, new technologies and techniques are needed to handle the massive amount of data that are collected and make it better serve the business' analytical purposes.

Take social media for example, the ability to analyze and mine data in scale within social networks is enabling a range of intriguing and useful applications that can plug into social media networks and make use of the knowledge inside them.  For instance, Facebook uses the insights gleaned through its analytics on how people behave to enable personalization and better user experiences.

To offer a technology for businesses to tackle the use of big data, Google offers a web service that allows users do interactive analysis of massive data sets- up to billions of rows just in seconds; it is scalable and easy to use.  That is where BigQuery (BQ) comes in.

What is BigQuery?

As Google describes it "BigQuery allows you to run SQL-like against very large datasets, with potentially billions of rows."  It is ideal for running queries over vast amount of data as well as analyzing vast quantities of data quickly.  In data analysis terms, BigQuery is an OLAP (online analytical processing) system.

image (

Wikipedia gives a little more detail that "[BigQuery] is an Infrastructure as a Service (IaaS) that may be used complementary with MapReduce.

What role does BigQuery play in analytics?

Let's face it, nowadays, when we talk about big data, it does not simply mean large databases full of marketing details or customer records, it can also refer to emails, office documents, blogs, log files, or picture files- anything that constitutes valuable information to your organization.

An example of how Google BigQuery comes to play a role in data analytics:


Clarictics is one of the companies that have been taking advantage of Google BigQuery tools to serve their firm's analytics purposes. This company is a social analytics firm who uses web-based application to help social and mobile game developers, advertisers, and media companies gain real-time insights into the behavior or game players and app users. Claritics helps developers analyze vast amount of data to allow them ways to make their games or applications more effective and attract more users' engagements.

As the company continued to analyze growing amounts of data, the owner decided to get on board using Google BigQuery from using Hadoop system, which is more expensive and is required continual upgrade to expand its storage capability. After converted to Google BigQuery service, Claritics was able to bring new products and services to market nearly four times faster.  They reduce time to run complex queries on large data sets from 30 minutes to 20 seconds as well as shorten the amount of time spent to maintain their data analysis infrastructure by up to 40%.  Hence, this tool enable them to react to their clients' need much faster.

Google BigQuery's features and best uses?


BigQuery offers the following features:
  • Speed: it can analyze billions of rows in seconds.
  • Scale:  it allows you to go through Terabytes of data, trillions of records.
  • Simplicity: it is a SQL-like query language, hosted on Google infrastructure. 
  • Sharing: it is a powerful group-and user-based permissions using Google accounts.
  • Security: it provides secure SSL access.
  • Multiple access methods:  users can connect to BigQuery using the BigQuery browser, command-line tools, the REST AP, or Google Apps Scripts
BigQuery is best used for:
  • Ad-hoc analysis
  • Standardized reporting
  • Data exploration
  • Web applications
How does BigQuery work?


BigQuery Data Model:

BigQuery is designed on the idea of web search and parallel DBMSs, where its architecture borrows the concept of a serving tree used in distributed search engines that allow a query gets pushed down the tree and is rewritten at each step.  The results of the query is assembled by aggregating the replies received from lower level of the tree.  With SQL-like language to express ad hoc queries and column-striped storage representation, the application can execute queries natively and reads less data from secondary storage which reduces CPU costs due to cheaper compression.


It is a challenge for companies that are dealing with large volume of data that are structurally complex. As the traditional relational database is not designed to handle this type of big data, so many engineering teams choose to move toward applications that are highly scalable like Google BigQuery over the relational database.  Though this approach is effective in storing and retrieving data, it is still a new and being evolved, and it poses challenges for interactive data analysis as well as


No comments:

Post a Comment