Web Analytics and Data Warehouse
By Michael Beiene
What is Web Analytics?
The Digital Analytics Association defines web analytics as the measure, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage. There are two ways of collecting digital analytics data.
The older method of collecting digital analytics data has been from web servers’ log file. Web servers record file requests from browsers. By opening the web servers log file, it used to be easy to count how many times the site has been accessed identifying unique users based on their IP addresses. But, this method started to die out with the start of search engine and dynamically assigned IP addresses.
· Incorrect hit count because of search engine - With the start of search engines, like Alta Vista, Yahoo and Google, result of a web search by a user will be logged on the web servers even if the user has not actually opened the site. This gives a site a hit count because the name of the site has appeared as a search result even if the user has not opened the web site.
· Incorrect his count because of dynamically assigned IP addresses – Web servers log file uses IP address to identify unique users. Back in the days, IP addresses used to be static. This is to say, a user will have the same IP address all the time which makes it easy to count how many unique users have accessed a site from the web servers’ log file. But, mid-way to 1990’s, dynamically assigned IP addresses were introduced. Instead of using the same IP address all the time, users’ IP address change after a certain period of time. This made it harder to count how many times users have accessed a certain web site based on their IP addresses.
What is Data Warehouse?
Data Warehouse or sometimes referred as Enterprise Data Warehouse is a central repository of data that is created by integrating data from different sources. Unlike database, which stores only current data, Data Warehouse stores current as well as historical data for report, data analysis and to make projections based on trends. In order to create a Data Warehouse, ETL process needs to be performed. ETL stands for Extract, Transform and Load. First, the data is Extract from the different sources (marketing database, Sale database….). Then, the data is transformed to an appropriate format on a staging database. Finally, the data will be loaded to the data warehouse. While database focuses on transaction, Data Warehouse focuses on data analysis. The below diagram illustrates the process of ETL.
Web Analytics and Data Warehouse
In today’s competitive market, there are terabytes of data from web analytics tools. Moving or importing these data to a data warehouse not only will make analysis of the data easier, but also will have the following advantages.
· It will be easier to extract web analytics data from different sources or different sites and combine them together for better analysis. For example, General Motor Corporation may want to see and analyze web analytics data from different authorized car dealers’ web sites. By using data warehouse, they can combine the web analytics data for better analysis of the data.
· When the behavior of a visitor changes, that signals for opportunities. It is easier to notice these changes when web analytics data is in data warehouse, since data warehouse is built for better analysis of data.
· Data warehouse provides more data visualization capabilities compared with web analytics tools.
· Data warehouse provides advanced segmentation of data capability compared with web analytics tools.
· Data warehouse provides complex, automated distribution of customized reports compared with web analytics tools.
These are only some of the advantages of moving or importing web analytics data to a data warehouse instead of trying to analyze it within web analytics tools.