Standardizing the Definition of Data Quality
If you ask ten people what data quality means, you’ll get ten different answers.
What’s more, you’ll also find that the majority of folks don’t know where their data comes from or how it’s stitched together. To be fair, it’s all very complex stuff — and that’s just one reason it needs to be standardized. After all, data quality is vitally important to most brand marketers and other business decision makers. They deserve to have standards around it.
There are certain data segments that have been measured and standardized in the past, thanks to companies like Nielsen. Age, gender, household income, and other demographic data has always been reliable and can actually be measured for accuracy via a panel. But that’s pretty much the only quality measurement we have today, and we’ve had it for eons.
Now that companies are using data to drive marketing strategies, product development, and other key business decisions, stakeholders need to know more. They need to know whether data represents an intent signal or an interest signal. They have a right to know the honest origins of the data they’re using — whether it’s been pulled from bidstream or it’s truly opt-in data from a reliable publisher. They deserve to know that the data they’re using has been collected in a privacy-safe manner and if permission has been ethically obtained. Furthermore, business users should have some transparency around modelled data and declared data. They should have visibility into what’s inside each segment.
The questions around modelled data are particularly important because it’s not always labeled as such. Certainly, there is a place and a purpose for modelled and predictive data, particularly to build out broader audience segments. However, it needs to be clear to the customer that they’re buying modelled data. Clients have a right to know what type of data they’re investing in and what the methodology is behind it. After all, they are building additional strategies, planning products, and pulling insights using this data as a building block. They deserve to know what’s under the hood. Modelled data may not be reliable enough to bet the business on – or maybe it is, but that choice should be clear and up to the buyer.
That, of course, opens up the can of worms we know as transparency, which has been problematic in data-driven advertising since day one.
How can we arrive at new standards?
The responsibility for standardization falls to both the buyers and the sellers as well as organizations governing our industry. Just as we standardize advertising units, we should have rock-solid criteria for data quality. We need impartial groups like the IAB to oversee this standardization, so we’re not grading our own homework. Factors to consider in grading data quality might include these accepted quality dimensions:
- Accuracy: Is the data correct?
- Completeness: Is the data comprehensive? Does it include what the customer will expect within the context of demographics, psychographics, etc.?
- Consistency: Is the data the same in every context and in every database?
- Timeliness: Is the data fresh and still useful?
- Validity: Is it structured or formatted in the way it needs to be for the user’s purpose?
- Uniqueness: Have the records been checked for duplicates?
- What is the origin of this data?
- Was the data collected in a privacy-compliant manner?
- How was permission obtained, and is there proof of this permission?
- Is this declared or modelled data?
- Where and how is the data stored?
While sellers need to conform to standards, buyers also need to understand what they’re purchasing. The standards we set should be packaged and shared with buyers in a way that they can easily adopt and apply.
Even if we’re not able to adopt formal standards in a timely fashion, the questions above are important ones buyers should be using and sellers should be able to answer truthfully. Sellers: if you can’t answer these questions honestly, and/or you know your answers won’t instill buyers with confidence, the onus is on you to do better.
As business becomes more reliant on third-party data, and as our needs become more sophisticated, guardrails such as industry-adopted standards must be set and maintained. With ten different answers to every question about data quality today, the time to set these standards is now — before things get even more confused.
Kristina Prokop is CEO of Eyeota.