How to Write Robots.txt

Mar 14, 2013, by admin

What is Robots.txt?

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. By default search engine bots crawl everything possible unless they are forbidden from doing so. They always scan the robots.txt file before crawling the web site. The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don’t find it there, they will assume that there is no Robots.txt file they index everything they find along the way. So, if you don’t put robots.txt in the right place the search engines will index your whole site.

Format of Robots.txt

Allow indexing of everything

User-agent: *
Disallow:

Disallow indexing of everything

User-agent: *
Disallow: /

Disawllow indexing of a psecific folder

User-agent: *
Disallow: /folder/

Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder

User-agent: Googlebot
Disallow: /folder1/
Allow: /folder1/myfile.html

References

http://www.robotstxt.org/orig.html

http://www.robotstxt.org/wc/faq.html

http://www.robotstxt.org/