Is possible to prevent third-party links to web page from appearing in Google Search results?

Is possible to prevent third-party links to web page from appearing in Google Search results? - Google Search Console is a free application that allows you to identify, troubleshoot, and resolve any issues that Google may encounter as it crawls and attempts to index your website in search results. If you’re not the most technical person in the world, some of the errors you’re likely to encounter there may leave you scratching your head. We wanted to make it a bit easier, so we put together this handy set of tips about google, google-search, googlebot, to guide you along the way. Read the discuss below, we share some tips to fix the issue about Is possible to prevent third-party links to web page from appearing in Google Search results?.Problem :


We have a situation in which a sensitive website is blocked from being crawled using a robots.txt file. This works well, however the problem is that for a period of time the team used semantic urls in which /sensitive-stuff-are-leaked-through-the-very-url.



Links to these pages were sent via email and some recipients had antiviruses that automatically uploaded a link to the scanned web page on some public database (the many website checkers that can be found online to test if a URL is safe).



Now the problem is that when certain search terms are used, even though the website itself does not appear in Google Search results, these antivirus scan result pages containing the scanned link show up. The pages have been deleted and we've ramped up our security / privacy practices since then. But these search results remain nevertheless a HUGE problem. The semantics of the URLs leak project names and customer names among other things.



So this is quite a huge problem to have this show up when the name of our client's company is searched. 90% of these antivirus website owners have been cooperative but a couple of others haven't, and this is a problem.



Would a noindex meta tag help in this situation? I am not sure if this would prevent third party pages mentioning the link from appearing in search results, since these third-party pages would be legitimately indexed.


Solution :

You should be following best practices by ensuring you inform Google, Bing and other search engines not to index these sensitive pages.



You can use one or more:




  1. Noindex <meta name="robots" content="noindex" /> in the <head> of these pages.

  2. Robots.txt Disallow: /sensitive-area/

  3. Using both of the methods works great as a fallback should the reboots.txt get deleted, or should the noindex be removed by mistake.



The above code ensures that your website does not directly appear on major platforms, should other platforms ignore your no-index or disallow, there is little you can do about it other than make a complaint to the platform but as you know, many don't respond.



If you have sensitive information or want to stop them completely then you should be using auth of some sort, even if it's just a simple htpasswd this will prevent them from accessing the page. Alternatively, you can look at honeypots and redirects, but even then you never going to stop everything.



You're using a domain that has been registered, this itself isn't a private service and therefore the moment you register it, the world knows about it, that domain will be mentioned in hundreds of places across the globe, and yes, if people are using toolbars or antivirus software that checks for viruses, you can bet it will also 'check a public domain' - in fact, if you block it that process, it may say your site is UNTRUSTED by their AV software, or worse, block completely forcing the user to click advanced, allow or some other steps.



This is why companies use their own internal domains rather than an external domain to stop people knowing about it, and accessing it, even if they know the domain name.


If the issue about google, google-search, googlebot, is resolved, there’s a good chance that your content will get indexed and you’ll start to show up in Google search results. This means a greater chance to drive organic search traffic to your site.

Comments

Popular posts from this blog

Years after news site changed name, Google is appending the old name to search titles and news stories

Is it possible to outrank Google for a search term on their own search engine?

Load Wikipedia sourced biographies via Ajax or render it with the rest of the page as part of the initial request?