Archive for the ‘Perl’ Category

Function return value versus exit code


Wednesday, November 19th, 2008

In projects I’ve worked on in the past few years, I am noticing what seems like an increased confusion toward program exit codes and function return values. In particular, I’ve found that some programmers seem to feel these two separate things should be related somehow. They’re not related, nor should they be. In a nutshell: programs exit with 0 for success; boolean functions return 1 for success. In this article, I’ll discuss some of the motivation behind these separate and seemingly counter-intuitive conventions.

Function Return Values

As far as most computers (and their programming languages) are concerned, the value 0 is almost universally equivalent to “false”, while 1 (or, any positive integer) typically equates to “true”. This is basic Boolean logic. Here is a typical construct in C:

int is_even(int num) {
        return (num & 1);
}

/* Later... */
if (is_even(some_number))
        printf("some_number is even\n");

Most functional languages have similar constructs. Even in languages that support exceptions or other error mechanisms, the tried and true Boolean is still appropriate in a large number of situations–enough for many languages to define a dedicated “bool” or “Boolean” type. In the above code, for example, it would not be at all appropriate to throw an exception if the number is odd. All of this is well-known and seldom questioned.

Program Exit Codes

Program exit codes, however, are a completely different concept. Unlike return values, which occur within a program, exit codes are essentially part of the program’s output. While the semantics differ slightly between operating systems, the basic theory is the same:

  • An exit code of 0 means success
  • A non-zero exit code means some kind of failure

Hence, to preserve the 0 = false thinking, it may be easier to semantically think of exit codes as “error codes”. 0 means, “no, there was no error”. DOS actually got this right; it calls its exit status variable %errorlevel%.

Unix/Linux/Mac

Under Unix-like operating systems, exit codes are pretty well-defined:

0 Success
1..126 Failure (the program itself will decide what the numbers mean)
127 Command not found
128..254 The program did not exit normally. (E.g., it crashed, or received a signal)
255 Invalid exit code

From a Bourne shell, the variable $? holds the exit code of the last command run in that shell. (For csh, use $status). For example, if I run:

$ grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
$ echo $?
0

As you can see, grep actually found a match, which means it was successful. So, when I check the exit status ($?), the result was 0.

If a command isn’t successful, it goes like this:

$ grep no-such-animal /etc/passwd
$ echo $?
1

If I attempt to run a command which doesn’t exist:

$ grepadoodle foo
-bash: grepadoodle: command not found
$ echo $?
127

Finally, let’s say I ran the following command, and then was forced to kill it from another shell:

$ grep "Jimmy Hoffa" /dev/random
# This command will run forever if I let it, so at this point I go to another shell and kill it
Terminated
$ echo $?
143

Why 143? The TERM signal (the default signal sent with the kill(1) command) has a value of 15. 128 + 15 = 143.

Why This Matters

Unix shells have built-in Boolean operators like && (logical and) and || (logical or) that can be used to compare against the exit status of a command. These operators assume 0 = success (true) and !0 = failure (false). This allows us to write constructs like the following (Bourne shell):

$ grep fluffy /etc/passwd && echo "Your user list may contain a poodle."

$ [ -f somefile.txt ] || cp somefile.txt.defaults somefile.txt

If a program’s exit code does not follow the 0 = success rule, the logic rapidly gets confusing.

Windows/DOS

Here the situation is a little bit different, but not by much. In DOS/Windows batch files (or a command shell) the variable %errorlevel% contains the exit code of the last command executed, similar to Bourne shell’s $?. The semantics are similar:

0 Success
1..255 Error (again, meaning depends on the individual program)

Examples:

C:\> echo "Hello"
Hello

C:\> echo %errorlevel%
0

C:\> type nonexistent-file.txt
The system cannot find the file specified.

C:\> echo %errorlevel%
1

Where People Tend to Get Mixed Up

While virtually every built-in or well-known 3rd party command follows the above exit code rules, I have seen my share of proprietary applications that flip this logic around for no good reason. When speaking with the developers, I almost universally hear “but in [insert programming language here], 0 is false and 1 is true”. If you find yourself falling into this trap, just remember that the exit code of your program is part of its output, not part of its logic. You wouldn’t think of dumping a raw struct or object to the user’s terminal, would you? Instead, you format the output in a way that makes sense. Exit codes are no different.

Occasionally, I even see people flipping the boolean logic within programs for no good reason (and often without documentation), which leads to all sorts of confusing and error-prone constructs like this:

if (did_it_work())
        fprintf(STDERR, "Error\n");  /* Are you sure? */

There are a few reasonable exceptions to this rule, such as strcmp(3), which returns 0 if the strings are equal, < 0 if s1 is less than s2, and >0 if s1 is greater than s2. However, strcmp(3), fork(2) et al., are not boolean functions; they return a range of values instead of a simple truth value.

Summary

When in doubt, remember the following:

  1. Program exit codes are not function return values
  2. Boolean functions return 1 for true, 0 for false
  3. Programs exit with 0 on success, non-zero on failure

Keep it clean!

Perl vs. Joe Job


Sunday, October 26th, 2008

One of my domains has recently been suffering a spate of backscatter from an email Joe-job. Basically, spammers are sending out emails with forged From: fields, that appear to be sent from, say abcdef@ry.ca, to random recipients. There is no practical way to prevent or stop these attacks, since those emails never even come near my server before the recipients see them.

A significant percentage of the forged emails are either caught by the recipient’s spam filter, or are simply sent to addresses that don’t exist. Those messages should die right then and there, but some servers will actually accept the message, and then bounce it back to the (apparent) sender, instead of rejecting it immediately at the SMTP level.

It turns out there are enough of these servers to subject my server to over a million bounces per week.

The server runs exim, which uses a maildir format for its local storage, meaning, one-file-per-email. When I returned from my holidays, the filesystem had nearly run out of inodes (Linux/UNIX filesystems have a limit on the total number of files (inodes) they may contain–in this case, about 4 million).

I wrote some filter rules to drop the bounces to forged addresses, but I still had a directory containing a rather inconvenient number of files, to say the least.

Perl to the rescue

My goal was to prune the directory, but keep the most recent samples for analysis. Fortunately, exim stores each email with a filename beginning with the number of seconds since 1970 (a common timestamp method)–this eliminates the need to hit the disk again to get the actual timestamp of the file, which is a relatively costly operation. The following Perl script takes no parameters and gives no output; it simply deletes all exim email files that are more than 3 days old, from the current directory:

#!/usr/bin/perl

use warnings;
use strict;

my $DAYS = 3;  # Keep this much history

my $older = scalar time - 86400 * $DAYS;

opendir DIR, ".";
while (my $file = readdir DIR) {
    next unless ($file =~ /^(\d{10})\./);
    unlink $file if ($1 < $older);
}

I hereby release this code to the public domain. Use at your own risk; after all, it’s designed to delete files by the millions, very efficiently. It will potentially delete any file that starts with ten digits followed by a dot.

To mitigate the risk of accidental deletions (notably, some maildir implementations (e.g., cyrus) use sequential integers instead of timestamps), the script will not delete any files with nine or fewer digits. This corresponds to September 8, 2001, which should be fine for most purposes.

It took just shy of an hour to process ~4 million directory entries, and I was left with a far more manageable directory. Following this, I did some analysis on the remaining messages to help identify patterns and set up additional rules.