Fix Google Search Console Issues

Question 1

I am building a news site about cultural events.

Most of the posts that are published are about events (theatre, music, arts etc) and are taken from press releases that are distributed by email by news agencies, galleries etc.

This fact leads to a lot of sites getting said press releases and thus google possibly marking some of them as containing duplicate content.

Since my site is the newest of the lot, it is getting penalised (hidden from search results in some searches) due to the fact that it is considered a source of content duplication.

Is there any way (apart the obvious one to not use the press releases as they are and have editors change them) to avoid being treated as a source of content duplication?

Question 2

Are you adding any of your own content to the pages that feature these syndicated press releases? If you add some of your own content around the press releases on the page, so almost as if you are quoting it, some people argue this can sometimes navigate the issue - but you need to be really adding more of your own content to the page as compared to the original source, and possibly adding a link to credit it.

However, this is still not really ideal and no guarantee that your site won't get impacted because of duplicate content, and I would try to avoid using syndicated content at all, or blocking the pages from being counted as duplicate content.

You can do this in too ways:

You can use a cross-domain canonical tag on pages in question. This is where you define a canonical tag in the page's source code referencing the original source of the press release:

<link rel="canonical" href="http://www.example.com/original-source-of-press-release" />

This indicates to Google that you are aware the page contains duplicate content, so not to count it during indexing. However this will mean the page on your site will over time be removed from the search results.

The 2nd option is to noindex the page so that Google no longer indexes the page and won't count it as duplicate content. The best way to to this is add the following robots meta tag:

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">

This means that Google can still freely crawl the page, but will no longer indexed it, meaning it will be removed from the search results.

Search This Blog

Fix Google Search Console Issues

Can a news site with posts from press releases avoid being marked as duplicate?

Comments

Post a Comment

Popular posts from this blog

Years after news site changed name, Google is appending the old name to search titles and news stories

Is it possible to outrank Google for a search term on their own search engine?

Load Wikipedia sourced biographies via Ajax or render it with the rest of the page as part of the initial request?