Search Appliance SBE

 

Thunderstone Search Appliance Manual

Robots.txt Agents

Syntax: one or more user-agent strings, one per line

This is a list of user agents to respect when checking robots.txt on a site. The robots.txt group with the User-agent string that is a case-insensitive substring of the earliest agent listed in Robots.txt Agents will be used; i.e. the Robots.txt Agents should be listed highest-priority first. If multiple robots.txt groups match the same agent, the group with the longest substring-matching User-agent is used. If no agents match, and a group for agent "*" is present, it is used. The default value for this setting is "thunderstonesa".

For example, changing this setting to MyBot and Googlebot and given this robots.txt file:

User-agent: Google
Disallow: /some/google/dir

User-agent: MyBot
Disallow: /some/other/dir

then the Search Appliance will not walk /some/other/dir, but will still walk /some/google/dir: while both agents substring-match, and Google is a longer substring, MyBot is listed first in Robots.txt Agents and is thus higher priority.

Given this robots.txt with the same setting:

User-agent: Google
Disallow: /some/google/dir

User-agent: Googlebot
Disallow: /some/bot/dir

then the Search Appliance would not walk /some/bot/dir, because while both agents substring-match Googlebot, Googlebot is the longer match.


Copyright © Thunderstone Software     Last updated: Dec 10 2020