Warning! Use with caution. Incorrect
use of this feature can result in your blog being ignored by search engines.
Custom robots.txt is a way for you
to instruct the search engine that you don’t want it to crawl certain pages of
your blog (“crawl” means that crawlers, like Googlebot, go through your
content, and index it so that other people can find it when they search for
it). For example, let’s say there are parts of your blog that have information
you would rather not promote, either for personal reasons or because it doesn’t
represent the general theme of your blog -- this is where you can clarify these
restrictions.
However, keep in mind that other
sites may have linked to the pages that you’ve decided to restrict. Further,
Google may index your page if we discover it by following a link from someone
else's site. To display it in search results, Google will need to display a
title of some kind and because we won't have access to any of your page
content, we will rely on off-page content such as anchor text from other sites.
(To truly block a URL from being indexed, you can use meta tags.)
To exclude certain content from
being searched, go to Settings | Search Preferences and click Edit
next to "Custom robots.txt." Enter the content which you would like
web robots to ignore. For example:
User-agent: *
Disallow: /about
Disallow: /about
You can also read about robot.txt on
this post on the Google Webmaster’s blog.

No comments:
Post a Comment