Basically, robots.txt allows to regulate access to a site, its parts or individual files by search engines' robots according to The Robots Exclusion Protocol. Must be located in the root folder of the site, obligatory supports standard instructions (directives) User-Agent and Disallow. The symbol '#' is used to mark comments. Anything after this character and before the first line break is ignored (not interpreted) by robots.
In the first example instead User-agent is an asterisk –
we address all the robots which will listen to our instructions.
The file path can be either absolute or relative to the root folder. In the
first two cases we deny access to all files in the appropriate folders, and
in the third and fourth - to the specific files.
At the second part the access to entire site is
denied for Googlebots.
Pay attention to the absence of extra blank lines between the robot's name (User-agent) and instructions for it.
Additional directives (may not be supported by search robots officially, so to speak; supported by major search engines including Google, Yahoo, MSN and Ask): Allow - useful when you deny access to a whole folder (directory) except for some files; Crawl-delay - sets the interval (in seconds) between successive requests to a server; Sitemaps auto-discovery - refers to the url of Sitemap. As an example:
If you are trying to hide your valuable information from the eyes of outsiders, then keep in mind that not all robots will obey your instructions (as stated on http://www.robotstxt.org/, think of Disallow as of a "No Entry" sign, but not a locked door).
If you've got any questions about robots.txt:
the Web Robot Pages -
an information resource dedicated to web robots;
using robots.txt - webmaster's manual from Yandex.ru (in Russian);
Robots Exclusion Standard -
a page on wikipedia.org