The Real Problem With Duplicate Data | Street Fight

The Real Problem With Duplicate Data

The Real Problem With Duplicate Data

Data

Our industry is obsessed with duplicate data when, in fact, the real problem businesses should worry about is publishing inconsistent data.

It seems like every day I read a blog post, talk with an industry expert, or attend a conference, I experience a collective hand-wringing about businesses publishing duplicate listings. You know the story: duplicate listings confuse search engines like Google. When Google gets confused, Google penalizes your duplicate listings in search results. You lose, and your customers lose. Therefore you need to devote time and budget to eliminate duplication, and all will be right with the world.

But duplicate information — or information that is exactly like something else — isn’t the problem. The problem is publishing inconsistent data (or publishing two different listings) that are referencing the same location.

Location Data Authority  
Businesses need to claim their identity with one authoritative, well-defined piece of location data that is composed (for starters) of one’s name, address, and phone number (NAP), plus a business category, hours of operation, and a location-specific URL.

Publishing duplicate listings, or duplicate location data, means your correct critical location data appears in more than one instance in search results, which, in my opinion, is not a problem. In fact, having your location data appear multiple times is good because your business has a larger share of voice in search results.

The problem with duplicate data is complicated by a widespread misunderstanding of what, exactly, constitutes duplicate data. Here is an example of a listing that some would consider a duplicate:

YP-image

  • Notice how listings 10 and 11 share the same name (“Sports Authority”) with addresses on Harlem Avenue in Norridge, IL. They have two different phone numbers (708-452-7323 and 708-453-0190) and slightly different addresses ((“4250 N Harlem Ave. Norridge, IL 60706” and “4350 N Harlem Ave. Norridge, IL 60706”) for the same location.
  • These are not duplicate listings. These are inconsistent listings resulting from a company sharing inconsistent location data throughout the search ecosystem. If both listings were exactly the same, then a duplication would create an advantage for the Sports Authority in Norridge, as they would hold two spots in the search results.
  • It is clear that the Sports Authority has a data structuring issue, not a duplication issue. Location data that is built on a sound foundation — based on the right tools and structure — allows for alternate phone numbers, with one flagged as primary in the ecosystem.

When a business lacks a strong location data management strategy, this type of issue may occur frequently to names of businesses as well as phone numbers and can be sometimes referred to as alternate names or aliases. The phenomenon exists with addresses, as well. For example, near my home in Elmhurst, Illinois, is a major road known alternatively as Highway 83 (by out-of-towners) and the Kingery Highway (by locals). If your data is improperly structured, then an inconsistent listing could be “born” with a different address for the same location. Google is smart enough in most cases to deliver a correct answer for someone using either term for the address. But ultimately it’s incumbent on you, the business, to build your location data in a way that supports the various aliases and alternates needed to accurately describe how people look for and find your business.

Building your data asset and distributing it to the major data amplifiers — publishers and data aggregators who share your location data where people conduct near me searches — is further  complicated by the fact that each amplifier has a nuanced way to store and communicate your data to the ecosystem. So it is further incumbent on you as a business owner or marketer to format your location data to suit the needs of each amplifier.

It’s About Conflation, Not Suppression
Structuring data so that multiple business identities are conflated into one piece of authoritative location data is the most effective way to address the problem of publishing inconsistent data. When a business manages and distributes a well-structured data asset via the data amplifiers, the important data publishers are able to merge — or conflate — any inconsistent data objects that your business might have built over the years. Through conflation, the inconsistent data is merged into one rich history. That history contains assets like check-ins, social shares, bookmarks, reviews, user contributed photos or videos, click history, related stores, and so on.

Consequently, your customers will find you through one authoritative piece of location data that offers a semantic understanding of your business to search-and-discovery platforms. Those platforms rely on this understanding to answer search queries by showing your business in search results.

But if you just suppress a listing because you are concerned about publishing a duplicate listing, your history is lost, and your listing gets less useful to everyone involved. So if someone tells you to get duplicate listing removed from the ecosystem, they are hurting the value of your location data object to the ecosystem. What you want is to build a location data object that will be authoritative and primary, allowing the ecosystem to conflate learnings about your NAP data — as it has been expressed through data distribution over the years — into one, detailed semantic understanding of who you are and how your business is useful to people making queries.

What You Should Do
My advice to businesses with local storefronts is threefold:

  • Pay attention to building your local identity, which is built on location data.
  • Share your identity far and wide to all the places where people are conducting near me searches — and work with data amplifiers to do that.
  • Focus efforts on addressing inconsistent listings by structuring data so that multiple business identities are conflated into one piece of authoritative location data. Don’t let someone talk you into suppressing data as an antidote to inconsistent listings.

Remember, the name of the game is being findable and converting searches into sales by sharing compelling content that drives the next moment. Don’t let concerns about duplicate data distract you from the end game.

Gib-Olander-150x150Gib Olander is vice president of product at Chicago-based SIM Partners.

This post was edited from its original version for clarity.

3 thoughts on “The Real Problem With Duplicate Data

  1. Your YP.com example is incorrect. The Sports Authority locations do not share the same address: #10 is 4250 N Harlem Ave vs #11 is 4350 N Harlem Ave.

  2. I understand you are trying to differentiate “duplicate” from “inconsistent”… but in all honesty, splitting them apart makes it more confusing. More often than not, if there is a second listing for a business, it’s going to have inconsistent information – I mean, why else would a duplicate be created if all the information was exactly the same?

    And are you saying the YP example shouldn’t be fixed? That’s what I couldn’t understand… The #2 and #3 negative ranking factors are “listing found at false business address” and “Inconsistent NAP” – this Sports Authority would fall victim to both of these factors. So I’d argue the duplicate needs to be removed and/or “merged” into the correct listing so this inconsistent data gets removed from the equation… but that would “lose the history” of 4350, which is a fake address. So that would be wrong?

    I don’t know, this was all a little confusing

  3. Tom, thank you for commenting. I have amended the article to note that Sports Authority has a problem with not only an inconsistent phone number but also an inconsistent address listed for the same location. The larger problem remains: in the example I cite, the business has allowed inconsistent data to be published, not duplicate data. The issue I cited with Sports Authority resulted because data amplifier Neustar Localeze has two different listings within its index, which they created from organic signals about Sports Authority that Neustar Localeze utilizes in its build. Had Sports Authority distributed its location data properly with Neustar Localeze, then the two inconsistent listings on YP.com would be conflated (or merged) to reflect one, accurate representation of the location.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *

3 thoughts on “The Real Problem With Duplicate Data

  1. Your YP.com example is incorrect. The Sports Authority locations do not share the same address: #10 is 4250 N Harlem Ave vs #11 is 4350 N Harlem Ave.

  2. I understand you are trying to differentiate “duplicate” from “inconsistent”… but in all honesty, splitting them apart makes it more confusing. More often than not, if there is a second listing for a business, it’s going to have inconsistent information – I mean, why else would a duplicate be created if all the information was exactly the same?

    And are you saying the YP example shouldn’t be fixed? That’s what I couldn’t understand… The #2 and #3 negative ranking factors are “listing found at false business address” and “Inconsistent NAP” – this Sports Authority would fall victim to both of these factors. So I’d argue the duplicate needs to be removed and/or “merged” into the correct listing so this inconsistent data gets removed from the equation… but that would “lose the history” of 4350, which is a fake address. So that would be wrong?

    I don’t know, this was all a little confusing

  3. Tom, thank you for commenting. I have amended the article to note that Sports Authority has a problem with not only an inconsistent phone number but also an inconsistent address listed for the same location. The larger problem remains: in the example I cite, the business has allowed inconsistent data to be published, not duplicate data. The issue I cited with Sports Authority resulted because data amplifier Neustar Localeze has two different listings within its index, which they created from organic signals about Sports Authority that Neustar Localeze utilizes in its build. Had Sports Authority distributed its location data properly with Neustar Localeze, then the two inconsistent listings on YP.com would be conflated (or merged) to reflect one, accurate representation of the location.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *