Perl vs. Joe Job


October 26th, 2008

One of my domains has recently been suffering a spate of backscatter from an email Joe-job. Basically, spammers are sending out emails with forged From: fields, that appear to be sent from, say abcdef@ry.ca, to random recipients. There is no practical way to prevent or stop these attacks, since those emails never even come near my server before the recipients see them.

A significant percentage of the forged emails are either caught by the recipient’s spam filter, or are simply sent to addresses that don’t exist. Those messages should die right then and there, but some servers will actually accept the message, and then bounce it back to the (apparent) sender, instead of rejecting it immediately at the SMTP level.

It turns out there are enough of these servers to subject my server to over a million bounces per week.

The server runs exim, which uses a maildir format for its local storage, meaning, one-file-per-email. When I returned from my holidays, the filesystem had nearly run out of inodes (Linux/UNIX filesystems have a limit on the total number of files (inodes) they may contain–in this case, about 4 million).

I wrote some filter rules to drop the bounces to forged addresses, but I still had a directory containing a rather inconvenient number of files, to say the least.

Perl to the rescue

My goal was to prune the directory, but keep the most recent samples for analysis. Fortunately, exim stores each email with a filename beginning with the number of seconds since 1970 (a common timestamp method)–this eliminates the need to hit the disk again to get the actual timestamp of the file, which is a relatively costly operation. The following Perl script takes no parameters and gives no output; it simply deletes all exim email files that are more than 3 days old, from the current directory:

#!/usr/bin/perl

use warnings;
use strict;

my $DAYS = 3;  # Keep this much history

my $older = scalar time - 86400 * $DAYS;

opendir DIR, ".";
while (my $file = readdir DIR) {
    next unless ($file =~ /^(\d{10})\./);
    unlink $file if ($1 < $older);
}

I hereby release this code to the public domain. Use at your own risk; after all, it’s designed to delete files by the millions, very efficiently. It will potentially delete any file that starts with ten digits followed by a dot.

To mitigate the risk of accidental deletions (notably, some maildir implementations (e.g., cyrus) use sequential integers instead of timestamps), the script will not delete any files with nine or fewer digits. This corresponds to September 8, 2001, which should be fine for most purposes.

It took just shy of an hour to process ~4 million directory entries, and I was left with a far more manageable directory. Following this, I did some analysis on the remaining messages to help identify patterns and set up additional rules.

Comments are closed.