28 Jan, 2009 in Apache by admin

How to stop or restrict robots

Ever wondered why so many clients are interested in a file called robots.txt which you don’t have, and never did have?

These clients are called robots (also known as crawlers, spiders and other cute names) – special automated clients which wander around the web looking for interesting resources.

Most robots are used to generate some kind of web index which is then used by a search engine to help locate information.

robots.txt provides a means to request that robots limit their activities at the site, or more often than not, to leave the site alone.

When the first robots were developed, they had a bad reputation for sending hundreds/thousands of requests to each site, often resulting in the site being overloaded. Things have improved dramatically since then, thanks to Guidelines for Robot Writers, but even so, some robots may exhibit unfriendly behavior which the webmaster isn’t willing to tolerate, and will want to stop.

Another reason some webmasters want to block access to robots, is to stop them indexing dynamic information. Many search engines will use the data collected from your pages for months to come – not much use if you’re serving stock quotes, news, weather reports or anything else that will be stale by the time people find it in a search engine.

If you decide to exclude robots completely, or just limit the areas in which they can roam, create a robots.txt file; refer to the robot information pages provided by Martijn Koster for the .

Bookmark This

No Responses so far | Have Your Say!

Leave a Feedback

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>