- Confidential information could be displayed (Actually an indication of a misconfigured website, but blame the messenger)
- Inner pages could be displayed bypassing "guard" pages
- The spiders could consume a lot of (then) expensive bandwidth
AltaVista which was the first popular search engine, designed a special file "robots.txt" for webmasters to include. This file could be used to instruct robots not to index part of a site. Most people never see this file as it is usually uninteresting, but it's often there and creating it is one of the less interesting parts of creating a website
Tonight is Halloween and Google has added these lines to the bottom of their robots.txt file
User-agent: Kids Disallow: /tricks Allow: /treats