Google Please Don’t Crawl This Server
February 16, 2012 Leave a comment
For some reason the Googlebot has found it necessary to crawl a development server of mine, I suspect that one of the users uses Google Chrome which probably snarfs urls browsed.
Google tells us that one way to do this is to return the 410 HTTP status code , and the way to enforce this in httpd.conf is:
# Tell crawlers to go away
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (Googlebot|bingbot|Validator|MJ12bot|Baiduspider)
RewriteRule ^.* - [G]
I have included other crawlers in the list just to make sure.