Honeytrap Module Guide

WARNING: When set up correctly with a firewall or similar this module will slow down and block traffic to your website. This potentially could include yourself, website owners, maintainers and legitimate visitors to your website. If you use this module then you do so at your own risk.

Summary

The Honeytrap module allows site owners and system administrators to monitor web crawlers that do not follow the rules set out in the robot.txt file or via the RobotsText module or similar method and as a result put an unnecessarily high load on servers.

This is especially important for large, very high traffic and/or high profile sites where the activity of these non-compliant crawlers can bring servers to their knees. If these crawlers are not blocked or slowed down quickly enough then these crawlers can result in servers being knocked completely offline.

Note:

The Honeytrap module does not directly block or slow down offending IP addresses itself; it only logs and reports them, leaving you in full control of how you want to deal with them.

Requirements

Installation

Install as usual, see drupal.org/node/70151 for further information.

List of Terms

To make things as clear as possible I have included a list of common terms that are used in this guide.

Configuration

Customization

Suggested Usage

The Honeytrap module makes use of three lists. For automatic and optimal site performance these lists should be used in conjunction with a firewall or similar system as follows:

Tips

  1. Make sure that you add a "Crawl-delay" entry to your robot.txt file. This can be used to slow down the crawl rates of compliant web crawlers.
  2. Add your IP address to the Honeytrap white list.
  3. Set up a firewall to enforce the Honeytrap's naughty and black lists.
  4. Test your setup first by manually visiting a trap url then by using a real, non-compliant web crawler (see the FAQ section).

Troubleshooting

Below is a list of common problems that you may encounter. Each problem is followed by a list of likely causes which should help you to resolve the problem.

Items on the naughty list are never expiring

Items on the naughty list are expiring, but later than I expected

Everything seem to be hitting my traps, even things like googlebot

A Yahoo crawler is hitting my traps

Addresses on the black list are not getting blocked

Addresses on the naughty list are not getting throttled

Addresses on the naughty list are getting blocked rather than throttled

FAQs

Q: How can I test that my traps work?

A: The easiest way to test your traps is simply by visiting a trap in the browser. You will find an example URL for a trap on the Settings tab. Once you are happy with this you can carry out more "real life" tests using something like "wget" or by downloading a web crawler that will ignore the robot.txt file.

WARNING: Be careful that you don't block yourself, especially if you have a firewall fully connected up to the Honeytrap. This is even more important if you only have access to your site via a single IP address. I suggest that you add your IP address to the white list first before carrying out any of these tests. You will be able to monitor hits to the trap urls via the watchog log even if your IP address is on the white list.

I can see IP addresses appearing on the naughty/black lists but they are not getting blocked, why?

A: Have you set up your firewall to read in the list files created by the Honeytrap? If so, check that the files exist where you expect them and that they contain the anticipated IP addresses. If they do, then check your firewall setup is correct and that it is actually reading the files. Also check that the list file format is what your firewall expects.

CREDITS

Created by:

Mike Jessop a.k.a. Mikey Bunny (don't ask)

Sponsored by:

Moo Free Chocolates
www.moofreechocolates.com
Manufacturer of scrummy tasting dairy free chocolates.