How Much Should We Still Care About Duplicate Content?

Share this:

Ever since Google Panda was unleashed, duplicate content has been a focus of SEOs. In this instance, duplicate content is referred to as large blocks of content within the website that is repeated. The belief has been that duplicate content — although it is not cause for a penalty, unless it appears deceptive — hurts the overall quality of the website. Therefore, for years, duplicate content has been on a website audit checklist as an item to fix. Fast forward to October 2017, the following conversation shocked the SEO community:

(Source: Google Product Forums)

“A Rainy Day” Fix? What Does That Mean?
First, let’s revisit what Google has to say about duplicate content in its Webmaster Guidelines. Google references duplicate content as content that might be replicated within your site or on other domains. The examples provided include discussion forums with both “regular and stripped-down pages targeted at mobile devices” (Webmaster Guidelines), items within an online store that are shown via multiple different URLs and print versions of the web pages. When duplicate content exists, Google filters the search results, so that only pages with distinct information shows. While filtering might not be a penalty, it definitely feels like it when your pages aren’t showing up in search results, meaning you are not getting traffic. Google explains that when it encounters duplicate content a “Google algorithm groups the duplicate URLs into one cluster and selects what the algorithm thinks is the best URL to represent the cluster in search results (for example, Google might select the URL with the most content)” (Learn the impact of duplicate URLs).

Now, here is where things recently became confusing, John Mueller explained that as far as priority, it is often something you save for a rainy day. This statement is what many people ran with, but take a closer look at the question and response. The question referenced a tool that provided data on duplicate content. That was the area that Mueller said to not get caught up on. He even cautioned on “blindly focusing on numbers like that.” In addition, which this is the most important part in my opinion, Mueller recommended focusing on where the underlying issues of the duplicate content might lie. That is the key. Duplicate content in and of itself might not be a high priority item, but it is often an indication that something bigger is wrong with the website. In other words, instead of fixating on the symptom (i.e. duplicate content), focus on the diagnosis, which could be a myriad of things.

Duplicate Content Culprits
Below are three common areas that cause duplicate content and how to fix them.

HTTP vs. HTTPs, WWW vs. non-WWW: One of the most common culprits I see in terms of duplicate content has to do with the domain setup. Many sites have migrated to HTTPs, but failed to set up 301 redirects from the non-secured URL to the secured URL. Failing to do so creates a heck of a lot of duplicate content. Make sure these redirects are in place. Note the use of the 301 redirect, as it is a permanent redirect. It is preferred over 302 redirects, which are temporary and can be problematic for SEO purposes. Here is an extra tip, be sure to update old redirects that are currently in place, so that you avoid creating a redirect chain. 

The same concept applies to WWW versus non-WWW. The site should not be indexed for both versions. One version should be selected as the preferred domain and the other version should redirect to it.

Page Found Under Multiple URLs: This scenario is common in the ecommerce environment. For example, more than one URL might take you to the same product page. One URL should be selected as the primary location and any other URLs should include a 301 redirect to it. The canonical link element is also a good way to let the search engines know which URL should be indexed, but it is typically only honored by the search engines if the page is identical in content. This means the body content, as well as the other page elements, such as the page title and heading tags, are exactly the same.

If the duplicate content is caused by session IDs or other URL parameters, consider using Google’s parameter tool within Search Console. Using this tool, you can let Google know how to handle the URL parameters.

Boilerplate Content: Many businesses are required to include some type of legal disclaimer on their websites, such as a law firm or insurance company. If this boilerplate text is on every page of the website and it is substantial, it could cause duplicate content issues. Plus, it will dilute the topic of each page and how it is optimized. Google suggests including a short summary and then linking to a page that contains more details.

What Now?
Circling back to John Mueller’s response to the duplicate content question, sure, of all the things to focus on in terms of priority, fix what is the most critical, such as indexing and crawling roadblocks, but it doesn’t mean duplicate content is not something to address. Remember, it could be a symptom of a bigger problem on your website.

Mindy Weinstein is the founder and president of Market MindShift, as well as a national speaker, trainer and digital marketing strategist. She teaches part-time at Grand Canyon University and has been a search geek since 2007.

Tags: