Search Appliance SBE

 

Thunderstone Search Appliance Manual

Keep Tags

Syntax: one or more pairs of strings

All data not between specified begin and end tag pairs will be stripped from the HTML before the text is extracted (i.e. links are unaffected). These are simple strings, not patterns or REX expressions, and the case is ignored. This is useful for extracting prime interest areas of HTML pages without the surrounding boilerplate. Tag pairs should not nest nor overlap in documents. Documents with no begin tag will be unaffected. Documents with no end tag after the last begin tag will still keep HTML from the last begin tag to end of document.


Copyright © Thunderstone Software     Last updated: Dec 10 2020