The Problem
Data Quality is something that has been, and will continue to be, of utmost importance to organizations and their analysts because analyses are only as accurate as the data with which they were created. The problem is that data quality is currently getting worse. Worldwide, the amount of inaccurate data has risen from 17 to 22 percent. In the U.S. specifically, organizations believe that up to 25 percent[1] of their data is inaccurate. This means a quarter of the data companies use to make decisions could be misleading them. So why is this happening?Three leading causes:
1.
More Streams: Today companies are getting data
from an ever increasing amount of sources. In addition to the traditional web
traffic we now have access to mobile, video, social media and GPS, etc.
Gathering and making sense of all this data has led to quality issues.
2.
More People: As Internet access continues to
expand globally we now have all the streams of data mentioned above, but now
they are coming from hundreds of different countries. A simple example of this
would be dates. Different countries store and display dates differently. So
understanding which country the dates are coming from could make a huge
difference in analyzing the data.
3.
More Data: As more and more people begin to use
more devices for more things there is inevitably going to be more data. Many
companies don’t have the bandwidth or ability to make sure all of this new data
is up to the necessary standard.
How are companies addressing this issue?
Here are a few of the many possible solutions for improving data strategy:
Fix the data at the point of capture: The first thing a company should do is make sure the data they have control over is always as clean as possible. The easiest way to get
good customer data is to have the customer input it correctly when they create
it. To accomplish this, the company needs to know in what format they want the
data (i.e. Do they want addresses broken up into individual pieces or in a
single entry?). Once the business need is understood, validations can be put
into place to ensure the format is correct.
Conduct source system data assessment: Running an analysis
of the current database/source system can give a company an idea of the current
state of the data. If there are missing or invalid values there is a good
chance that there is an issue with the data at the point of entry or somewhere
else in the process.
Text Mining: For data where the input cannot be controlled
(i.e. social media data) text mining can be a helpful tool. Text mining can
parse through all of a company’s data and replace a set of variations of words
with standardized terms. This can ensure that more of a company’s data is in
the correct format.
Conclusion:
There are many data related challenges for online companies. As tools inprove, companies will be able to better handle the diverse streams of data. However, as more and more people begin to access a companies website it is vital that
organizations are smart about how they are keeping the quality of their
data high.
*for more insights on how to improve data quality go to http://dataqualitypro.com
_________________________________
[1]http://cdn.qas.com/us-marketing/whitepapers/unlock-the-power-of-data-2.pdf
[1]http://cdn.qas.com/us-marketing/whitepapers/unlock-the-power-of-data-2.pdf
No comments:
Post a Comment