Preventing robots from crawling specific part of a page

Preventing robots from crawling specific part of a page - Google Search Console is a free application that allows you to identify, troubleshoot, and resolve any issues that Google may encounter as it crawls and attempts to index your website in search results. If you’re not the most technical person in the world, some of the errors you’re likely to encounter there may leave you scratching your head. We wanted to make it a bit easier, so we put together this handy set of tips about html, search-engines, forum, web-crawlers to guide you along the way. Read the discuss below, we share some tips to fix the issue about Preventing robots from crawling specific part of a page.Problem :


As a webmaster in charge of a tiny site that has a forum, I regularly receive complains from users that both the internal search engine and that external searches (like when using Google) are totally polluted by my users' signatures (they're using long signatures and that's part of the forum's experience because signatures makes a lot of sense in my forum).



So basically I'm seeing two options as of now:




  1. Rendering the signature as a picture and when a user click on the "signature picture" it gets taken to a page that contains the real signature (with the links in the signature etc.) and that page is set as being non-crawlable by search engine spiders). This would consume some bandwidth and need some work (because I'd need an HTML renderer producing the picture etc.) but obviously it would solve the issue (there are tiny gotchas in that the signature wouldn't respect the font/color scheme of the users but my users are very creative with their signatures anyway, using custom fonts/colors/size etc. so it's not that much of an issue).


  2. Marking every part of the webpage that contains a signature as being non-crawlable.




However I'm not sure about the later: is this something that can be done? Can you just mark specific parts of a webpage as being non-crawlable?


Solution :

Here is the same answer I provided to noindex tag for google on Stack Overflow:


You can prevent Google from seeing portions of the page by putting those portions in iframes that are blocked by robots.txt.


robots.txt


Disallow: /nocrawl/

index.html


This text is crawlable, but the following is
text that search engines can't see:
<iframe src="/nocrawl/content.html" width="100%" height=300 scrolling=no>

/nocrawl/content.html


Search engines cannot see this text.

Instead of using using iframes, you could load the contents of the hidden file using AJAX. Here is an example that uses jquery ajax to do so:


his text is crawlable, but the following is 
text that search engines can't see:
<div id="hidden"></div>
<script>
$.get(
"/nocrawl/content.html",
function(data)$('#hidden').html(data),
);
</script>


Another solution is to wrap the sig in a span or div with style set to display:none and then use Javascript to take that away so the text displays for browsers with Javascript on. Search engines know it's not going to be displayed so shouldn't index it.



This bit of HTML, CSS and javascript should do it:



HTML:



<span class="sig">signature goes here</span>


CSS:



.sig 
display:none;



javascript:



<script type="text/javascript"> 
$(document).ready(function()

$(".sig").show();

</script>


You'll need to include a jquery library.



I had a similar problem, I solved it with css but it can be done with javascript and jquery too.



1 - I created a class that I will call "disallowed-for-crawlers" and place that class in everything that I did not want the Google bot to see, or place it inside a span with that class.



2 - In the main CSS of the page I will have something like



.disallowed-for-crawlers 
display:none;



3- Create a CSS file called disallow.css and add that to the robots.txt to be disallowed to be crawled, so crawlers wont access that file, but add it as reference to your page after the main css.



4- In disallow.css I placed the code:



.disallowed-for-crawlers 
display:block !important;



You can play with javascript or css. I just took advantage of the disallow and the css classes. :) hope it helps someone.


One way to do this is to use an image of text rather than plain text.

If the issue about html, search-engines, forum, web-crawlers is resolved, there’s a good chance that your content will get indexed and you’ll start to show up in Google search results. This means a greater chance to drive organic search traffic to your site.

Comments

Popular posts from this blog

Years after news site changed name, Google is appending the old name to search titles and news stories

Is it possible to outrank Google for a search term on their own search engine?

Load Wikipedia sourced biographies via Ajax or render it with the rest of the page as part of the initial request?