Thursday, May 2, 2013

Search engines & privacy

Credit to www.realtrafficproductions.com
The current Big 3

A brief search engine primer

Search engines have changed the computing world, become a mainstay of the Internet, and have changed the marketing landscape in ways never thought possible. Initially, a search engine was designed to find keywords in a manually created index of all websites, or information on the world wide web. In it's earliest state, a search engine was a simple query or a program. Today, a search engine is much more than just a simple query. Now they have evolved into complex systems that span data centers and are the bread and butter of global corporations. 

History

Searcheginehistory.com states that "The first few hundred web sites began in 1993 and most of them were at colleges, but long before most of them existed came Archie. The first search engine created was Archie, created in 1990 by Alan Emtage, a student at McGill University in Montreal. The original intent of the name was "archives," but it was shortened to Archie. Archie helped solve this data scatter problem by combining a script-based data gatherer with a regular expression matcher for retrieving file names matching a user query. Essentially Archie became a database of web filenames which it would match with the users queries." [1]

From those humble beginings, the Search Engines of today have evolved into a much more robust tool. The contemporary search engine consists of 3 parts:

  • Spiders which follow links on the web. Pages are crawled and added to the Index.
  • An Index roughly represents the content of pages crawled, which is accessed through a SEI.
  • A Search Engine Interface (SEI), is basically the front end through which users access the Index. Google has perfected the interface with the art of "less is more." [1]
Google employee Matt Cutts, has created an excellent video in which he discusses how search engines work. 


Google really changed the face of search engines by moving into the advertising space. It has been debated if Google is more of an Ad agency than anything else; advertising is considered to be their main source of revenue. Adwords, Google's "powerful online advertising tool", offers CPC, CPM, and site-targeted advertising for text, banner and rich-media ads [3].

It is beyond the scope of this post to explore all of the services and advertising options provided by Google. Please follow up on the references listed for more details. Using Google as our example, we see a company which has become a multi-billion dollar global enterprise and it all started with a search engine. Using their search engine platform, ads became much more targeted based on the amount of data a user provided to Google while using their services. The bottom line is that when we search on Google, we are sharing what our current "quest" is and they use that shared information to provide targeted advertisements. Many of us don't realize that we are sharing vast amounts of information with companies like Google. These corporations are creating excellent "freemium" products which are designed to capture an increasingly detailed amount of information, which is then used for advertising.

Privacy

Credit to www.searchenginejournal.com
Are your searches truly private and secure?
The services provided are quite impressive (consider Gmail).We gladly use the service, accepting the EULA (End User License Agreement) and the unspoken trade-off of our data for their service  We thereby give the company access to all of the information in our account. Privacy is important because it provides a unique glimpse into our personalities and private lives. For example, search terms have been used to convict a wireless hacker and lock up a man charged with killing his wife [4]. The searches and products that we use from these companies are not truly free, we pay for them, just not with money.

Policy

Determining what the exact privacy policy is for each company can be frustrating. Each of the major search engines have different policies. For example, Google claims that they only keep data for 18 months and then it is partially anonymized. Bing on the other hand claims that while data is only retained for 18 months like Google, they delete their data permanently at the end of that period. However, Bing uses behavioral targeting [4]. 

Privacy and security have become a major sticking point (perhaps a pain point?) for these large corporations. Some argue that it is no longer who has the best search but who does the best job at preserving your privacy [5]. Google instituted new privacy policies in March 2012, however it was argued that it really wasn't as much about user privacy as about what Google can do with the data they have collected about you [6].

Where this line blurs and becomes quite gray is when you have an account with the company. For instance with Google, users quite possibly have Gmail (w/ calendar), Google+, Google Drive, and Google Talk which can capture a fair amount of information by themselves. If you add in Chrome, Blogger, YouTube, Photos/Picasa, an Android phone, and the user's search data you will have a very (disturbingly) complete profile of a person. Depending on the degree that a user actively uses Google's products, you could have an idea of where a person is during the day and what they will be doing in the future, with whom they are talking to, and what their specific interests and needs are within a fairly current time frame. That is advertising gold and where the money is at. With that kind of information, and the use of their services across platforms, Google can get a really good sense of what to advertise to you at the right time.

How some of the data about you is captured


credit to omnipixel.blogspot.com
It is all about the cookies.
First and foremost is the cookie. Some significant progress has been made here. Google used to keep the unique identifiers (tying the cookie to a specific user) for 30 years. That has been dropped to 2. However, that 2 years is not what it seems. Every time you visit Google, the 2 year deadline resets. In theory, that could be a lot longer than 30 years; it could be your entire lifetime [5]. What better advertising gold is that; data about you from the first time you get your hands on a computer as a child until the last day you do so as a senior. The second offender is the free email accounts which provide gobs of data. Let there be no doubt that the email in these accounts are scanned and targeted ads are generated based on the contents.

Suggestions

There are two possible options: 1.) a person could simply accept that this is the state of affairs and the price of use is your personal data, or 2.) search out ways to keep your web habits private. The World Privacy Forum and the Electronic Frontier Foundation have numerous recommendations. A few are listed below:
  • The EFF provides a free tool called "HTTPS Everywhere" which is designed to encrypted web browsing whenever possible [7].
  • The EFF also provides multiple opportunities for persons to become involved in a variety of internet privacy debates and concerns. Complaining about privacy does nothing. Get involved to make a difference [8].
  • The World Privacy Forum lists a number of best practices, tips, and tools to improve privacy [9].

Conclusion and Recommendation

Search engines have really become a part of the "internet life" and provide a valuable service. Running those services is certainly not cheap or easy. To pay for those services, advertising has become the backbone of the search engine industry. To grow as a business, new revenue streams need to be found and existing ones improved upon. In the search engine industry this has been done with innovation in the area of targeted advertising using the data that the user has provided. It could be said that the innovation has come at the cost of user privacy; the user accepts this tradeoff by using the services provided. Payment for use is with a different monetary standard; personal data and privacy. Companies are regularly providing new services, which are quite impressive, but at an increased cost (e.g. more data gathered and less privacy). It is recommended that the reader become educated on the topic, rather than remain passive, to decide how much of their privacy they are willing to part with.

References