Fix Google Search Console Issues

Question 1

I have a proxy at http://rahul2001.com/proxy/

I site can be 'proxied' by like this:
http://rahul2001.com/proxy/proxy.php/http://site-to-be-proxied.example.com/

The problem is that Google seems to be crawling pages through my proxy :/

I do NOT link a proxied version of Yahoo food or Google Careers or Google Search Hindi Help, but these still turn up in the search results...

PROBLEMS:

-I DO NOT want to block my website/proxy from search engines entirely, I still want the proxy service itself to be able to be located in search.

-I DO NOT want Google to use up my bandwidth by crawling useless sites.

-I DO NOT want to use captcha since a few of my apps use this proxy.

-I DO NOT want Google to spoil the search results of my website in this manner.

What do I do??

ALSO, why is Google entering random URLs into the form??

EDIT
After adding the meta tag, I get an error :(

proxy.php (first few lines):

<head><meta name="robots" content="noindex, nofollow" />
</head>
<?php
/*
miniProxy - A simple PHP web proxy. <https://github.com/joshdick/miniProxy>
Written and maintained by Joshua Dick <http://joshdick.net>.
miniProxy is licensed under the GNU GPL v3 <http://www.gnu.org/licenses/gpl.html>.
*/

Error:

Warning: Cannot modify header information - headers already sent by (output started at /home/rahulcom/public_html/proxy/proxy.php:3) in /home/rahulcom/public_html/proxy/proxy.php

HELL BREAKS LOOSE - GOOGLE IS CRAWLING THE INTERNET USING MY PROXY!!

https://www.google.co.in/search?q=site:rahul2001.com+proxy

Question 2

Add this in robots.txt (Just places this file in root folder, mostly under public_html or where your home page sits)

User-agent: *

Disallow: /proxy/*

Question 3

OK, so, you do not want to block the proxy from search engines, but you don't want the result to show up on search engines? Sorry, I don't get it :) Is it that you want the original site to show up on top on the proxies? Google decides which pages are more important I'm afraid.

Also be careful with duplicate content. The same information should only be on one URL. For duplicates you should use canonical links (http://speckyboy.com/2012/07/16/what-a-canonical-link-is-and-how-to-use-it-properly/) to tell Google which of the page is the original one. This is probably the one that will show up on Google.

There is not need for two URLs to show up on Google with the same information in it?

EDIT

From comments i was lead to believe that you want to show /proxy/ but not anything under this page, like /proxy/subpage. In this case, use robots.txt like this:

User-agent: *

Disallow: /proxy/

Allow: /proxy/$

Search This Blog

Fix Google Search Console Issues

Stop Google from crawling proxied pages but still allow the proxy itself to be found via search engines

HELL BREAKS LOOSE - GOOGLE IS CRAWLING THE INTERNET USING MY PROXY!!

Comments

Post a Comment

Popular posts from this blog

Years after news site changed name, Google is appending the old name to search titles and news stories

Is it possible to outrank Google for a search term on their own search engine?

Load Wikipedia sourced biographies via Ajax or render it with the rest of the page as part of the initial request?