Data quality gives a competitive edge. Everybody agrees how important good data quality is. And everybody has been agonized by erroneous data. We’ve all lost a lot of time working with crappy data, and “Garbage In, Garbage Out” is probably the most commonly cited proverb in IT. Then how come it is always so hard to find volunteers to do something about it?
Because the consequences of non-quality data are propagated throughout the organization, one seemingly innocent problem upstream can easily cause a dozen problems downstream, and sometimes even more! The accumulated costs of dealing with the resulting errors can become staggering. Tackling and resolving the issues that cause data quality problems is one of the most high-leverage investments a company can make, in a world that is increasingly relying on digital information.
Why do these problems exist, and why do they live on? It often appears to be business misalignment of the worst kind when many ‘bystanders’ realize there are indeed data problems, but nobody “owns” these problems. This commonly recurring phenomenon lies at the heart of the omnipresent challenge to find resources (both money and time) to overcome such data quality problems.
1. What is data quality?
Data quality is determined not only by the accuracy of data, but also by relevance, timeliness, completeness, trust and accessibility (Olson, 2003). All these “qualities” need to be attended to if a business wants to improve its competitive advantage, and make the best possible use of its data. Data quality implies its fitness for use, including unanticipated future use. Accuracy takes up a special place because none of the others matter at all if the data is inaccurate to begin with! All other qualities can be compromised, albeit at your peril.
2. Data non-Quality is expensive
“Reports from the Data Warehousing Institute on data quality estimate that poor-quality customer data costs US business a staggering $611 billion a year in postage, printing and staff overhead” (Olson, 2003). There are many ways in which non-quality data can cost money: typically these costs remain largely hidden. Senior management either doesn’t notice these costs, or even more likely: is grappling with problems of which it never becomes clear that they are caused by poor-quality data.
3. Quantifying the cost of non-quality is very important
Since data quality has such a strong tendency to go unnoticed, it is even more important to translate the consequences of poor-quality data to the one dimension each and every manager understands so well: dollars. This also gives a perspective on the kinds of investments that are appropriate to make in order to resolve such issues. Also, a mechanism for prioritizing improvement programs is desirable. You want to begin picking the low-hanging fruit first, but you certainly also want to know where the whoppers are! According to Gartner, Fortune 1000 enterprises may los data hk e more money in operational inefficiency due to data quality issues than they spend on Data Warehouse and CRM initiatives.
4. Data quality issues typically arise when existing data are used in new ways
In my experience as a data miner, where I am very often looking for new ways of using existing data, this is where many problems originate. The data itself hasn’t changed, but it are new uses for existing data that make problems apparent that were already there. So what constitutes “data quality” needs be considered in relation to its intended use. And change of usage then brings up new ways to evaluate the quality and hence may bring up concerns. The reason these problems didn’t surface before is usually because the business adapted to the data, the way they are. People and processes avoided the consequences of inaccurate entries. Which incidentally, is also why legacy system migrations can be so painful.