Does Feedbot respect robots.txt?
Yes. However, because Feedbot acts both as an RSS aggregator and a search engine, it treats robots.txt differently depending on what role it's taken on when it's crawling your site. Feedbot also respects the Feed Access Control standard when indexing your RSS.
HTML and content pages
Feedbot will always honor robots.txt with regard to URLs that do not contain RSS.
RSS feeds
When an RSS feed is submitted by a user to Feedbot, Feedbot will fetch the feed and aggregate it regardless of whether the directory containing the XML file is disallowed in robots.txt, unless Feedbot is explicitly disallowed as a user-agent or access is explicitly disabled using the Feed Access Control standard.
To keep your RSS feeds from being aggregated by Feedbot, you must explicitly disallow Feedbot as a user-agent in your robots.txt file, or utilize the Feed Access Control standard suggested by Bloglines, which Feedbot respects in all cases.
If content referenced by an access-enabled RSS feed lives in a directory disallowed by robots.txt, the RSS will be indexed, but the referenced content will not.
Example entries for robots.txt files
The examples that follow assume that you know something about robots.txt files and how they work. Visit the Web Robots Pages for more information on the robots.txt standard.
How do I explicitly allow Feedbot while still disallowing other robots?
Explicitly allowing Feedbot to crawl your site or directory means Feedbot will be able to successfully crawl and index your site's content and RSS feeds. To explicitly allow Feedbot, add these two lines anywhere in your robots.txt:
User-agent: Feedbot
Allow: /path/to/dir
How do I explicitly disallow Feedbot?
Explicitly disallowing Feedbot from crawling your site or directory will stop Feedbot from indexing your content and RSS feeds. To explicitly disallow Feedbot, add these two lines anywhere in your robots.txt:
User-agent: Feedbot
Disallow: /path/to/dir
Report problems
If you feel that Feedbot has been crawling or indexing your site improperly, or that it has been crawling it correctly but too frequently, please use our contact form or email us directly at [at] and we will work with you as quickly as possible to find a solution.

