How long does it take for Google to remove page from index? Or, why aren't these pages excluded?

How long does it take for Google to remove page from index? Or, why aren't these pages excluded? - Google Search Console is a free application that allows you to identify, troubleshoot, and resolve any issues that Google may encounter as it crawls and attempts to index your website in search results. If you’re not the most technical person in the world, some of the errors you’re likely to encounter there may leave you scratching your head. We wanted to make it a bit easier, so we put together this handy set of tips about google-search-console, google-search, robots.txt, googlebot to guide you along the way. Read the discuss below, we share some tips to fix the issue about How long does it take for Google to remove page from index? Or, why aren't these pages excluded?.Problem :


A set of pages are marked as noindex and nofollow, both in robots.txt and with X-Robots-Tag: noindex, nofollow When checking with Google Webmaster Tools, the pages are reported as "Denied by robots.txt", which is nice. Also, as mentioned in this answer, disallowed pages may still be indexed even if not technically crawled, cause that's how Google rolls.



However, after adding the Robots-Tag two weeks ago, the pages still appears in Google search results.



For example, this test page http://www.english-attack.com/profile/scott-s-sober is found when searching for its h1 title "Scott S. Sober" https://www.google.com/search?q=%22Scott+S.+Sober%22



Why is this?


Solution :

Putting a file into robots.txt will not prevent Google from indexing the page. It will only prevent Googlebot from re-crawling the page. If the page has been previously crawled, Google may find the version of the content it knows about compelling enough to stay in its index for months.



If enough external links point to that page, Google occasionally keeps pages blocked by robots.txt indexed forever. Sometimes it even indexes pages it has never crawled. In such cases it uses only anchor text of inbound links for the keywords of the page, and does not have a cached version of the page.



If you want Google to remove the page from the index you should allow Googlebot to crawl the page and see the "noindex" in the meta robots tag. If you don't allow Googlebot to crawl the page, it will never learn that you don't want it indexed. So take out the Disallow: line for those files from robots.txt.



Alternately, you could use Google Webmaster Tool to request removal for each URL. That can be painful if you have more than a handful of URLs that you wish de-indexed.



The cause of the problem is that Google is not seeing the newly added X-Robots-Tag because it's not re-indexing the page.



Removing the ban from robots.txt and letting Google fetch the pages with their headers does remove the pages from the results.


If the issue about google-search-console, google-search, robots.txt, googlebot is resolved, there’s a good chance that your content will get indexed and you’ll start to show up in Google search results. This means a greater chance to drive organic search traffic to your site.

Comments

Popular posts from this blog

Years after news site changed name, Google is appending the old name to search titles and news stories

Is it possible to outrank Google for a search term on their own search engine?

Load Wikipedia sourced biographies via Ajax or render it with the rest of the page as part of the initial request?