Wednesday, February 20, 2013

Unique Visitor Confusion and the Hotel Problem

One of the common stumbling blocks that affects many new comers to Web Analytics is understanding exactly what a Unique Visitor is, and what it is not.

The true definition of a Unique Visitor is any single individual which visits a site during the reporting period.  If a visitor were to return multiple times during the reporting period, they would still register as a single Unique Visitor.  By this definition, Unique Visitors measures unique people.

So what's the problem?
However, the honest truth is that there is not an analytics product on today's market that is yet capable of accurately measuring actual people.  Today's analytics platforms measure the closest simulations of people which we are able to track or extrapolate.  Typical methodologies for getting at this measurement include cookie tracking or panel measurement.  So in reality, for most platforms today, we are actually measuring unique cookies, rather than Unique Visitors.  And yet, most analytics platforms still refer to these unique cookies as Unique Visitors.

Cookies typically overestimate Unique Visitors
So what is the difference and why is it an issue?  Well, while in a perfect world the cookie methodology could actually take us very close to a count of actual people, our world of web tracking is far from perfect.  But it's data, it must have some clean answer/resolution, right?  Wrong.  There are a number of accepted shortcomings of cookie tracking which must be understood in order to make the proper inferences regarding the underlying data.  For example, if I visit your site from a single computer, but in different visits I choose different browsers, then I will count as a different Unique Visitor on your site for each browser I used, because each browser will have a unique cookie associated with it.  Or, if I visit your site from a variety of devices, I again will count as multiple Unique Visitors.  Again, I may use the same computer and the same browser, but I have cleared my cookies during the reporting period on which you are running your report, then, you guessed it...I again am reported as multiple Unique Visitors.

BUUUT...Cookies can underestimate Unique Visitors at the same time...
In fact, the exact opposite can be true as well.  If multiple users hit your site from the same machine and browser (think shared home computers or internet cafes, etc.), then in that case you may be under-counting your Unique Visitors as they are all registering as a single Unique Visitor (remember, just one cookie being registered).  While Unique Visitors tend to be over counted due to the higher frequency of the above mentioned issues, you can see how the skewing can actually go both ways and make it very difficult to resolve the actual Unique Visitor count.

So, can I even trust my Unique Visitor count?
Short answer, yes.  The truth is that the insights that can come from cookie tracking of visitors are still very powerful.  In fact, in some cases understanding the variety of browsers and devices which are interacting with your site can be extremely useful.  Certainly a deeper insight into who is using each browser and device would be insightful as well, but the current cookie tracking methodologies of most platforms lends great information into whos, whats and hows of traffic on your site.  Really, the point here is not that the data is bad...the point is that the name is misleading and, if misunderstood, can lead to unnecessary confusion. 

There are even some analytics providers that are pushing for wider adoption of the Digital Analytics Association and IAB's and ABCe Terminology Guidelines for Counting Audience Size which declares that Unique Visitors should be reserved for Audience/Census measurement (cases where actual people can be counted as uniques) and that Unique Browsers should, in actuality, be the industry preferred term for measuring unique cookies[1]. (Insert shameless internal plug for comScore Digital Analytix here...)

One more thing to consider- The Hotel Problem
There is one additional pitfall that new users tend to experience when it comes to counting "Unique Visitors" on their website.  This is summarized neatly in what is referred to as the Hotel Problem.  Take, for example, a hotel with two rooms.  Let's say we wanted to measure the number of visitors to this hotel over the course of a three day time period using the following[2]:
As you can see, we can calculate our Unique Visitors to this hotel a number of ways.  First, by day.  On day 1 we had 2 unique visitors.  On day 2, 2 unique visitors. On day 3, 2 unique visitors.  So 2 unique visitors per day.  One would think then, that we could sum these to get 6 unique visitors for the 3 day period.  However, some of those visitors weren't unique across the 3 day period. They were, in fact, return visitors.  So simply summing unique visitors from a subsection of the reporting period will clearly not cut it.

Similarly, if we take a count of unique visitors within each room.  Room A had 2 unique visitors. Room B had 2 unique visitors.  Sum them together and you get 4, right?  Well, not necessarily. 

The truth is that in order to count Unique Visitors, we can't subdivide the data in any way within the reporting period without somehow skewing the data.  In this case, we can clearly count that we have a total of 3 Unique Visitors to the hotel during the time period.  We can't sum across days or rooms without losing some of the deduplication which makes the Unique visitor truly unique.  We also can not disassociate the count from the reporting period without possibly reintroducing duplication into the equation[3].

For this very reason, in many platforms if you run a report that splits the Unique Visitors out by any subdivision within the reporting period (say Unique Visitors by day over a week's time), then the total will be removed to avoid the very confusion evidence through the hotel problem.

At the end of the day, Unique Visitors, or better put Unique Browsers are an extremely rudimentary and useful data point, but some care must be taken with new users to ensure that they understand the origin of the data and methodology for its collection to avoid making the beginner assumptions which can often lead to confusion on how to use the data.


1 comment: