Posts Tagged ‘exit status’

Function return value versus exit code


Wednesday, November 19th, 2008

In projects I’ve worked on in the past few years, I am noticing what seems like an increased confusion toward program exit codes and function return values. In particular, I’ve found that some programmers seem to feel these two separate things should be related somehow. They’re not related, nor should they be. In a nutshell: programs exit with 0 for success; boolean functions return 1 for success. In this article, I’ll discuss some of the motivation behind these separate and seemingly counter-intuitive conventions.

Function Return Values

As far as most computers (and their programming languages) are concerned, the value 0 is almost universally equivalent to “false”, while 1 (or, any positive integer) typically equates to “true”. This is basic Boolean logic. Here is a typical construct in C:

int is_even(int num) {
        return (num & 1);
}

/* Later... */
if (is_even(some_number))
        printf("some_number is even\n");

Most functional languages have similar constructs. Even in languages that support exceptions or other error mechanisms, the tried and true Boolean is still appropriate in a large number of situations–enough for many languages to define a dedicated “bool” or “Boolean” type. In the above code, for example, it would not be at all appropriate to throw an exception if the number is odd. All of this is well-known and seldom questioned.

Program Exit Codes

Program exit codes, however, are a completely different concept. Unlike return values, which occur within a program, exit codes are essentially part of the program’s output. While the semantics differ slightly between operating systems, the basic theory is the same:

  • An exit code of 0 means success
  • A non-zero exit code means some kind of failure

Hence, to preserve the 0 = false thinking, it may be easier to semantically think of exit codes as “error codes”. 0 means, “no, there was no error”. DOS actually got this right; it calls its exit status variable %errorlevel%.

Unix/Linux/Mac

Under Unix-like operating systems, exit codes are pretty well-defined:

0 Success
1..126 Failure (the program itself will decide what the numbers mean)
127 Command not found
128..254 The program did not exit normally. (E.g., it crashed, or received a signal)
255 Invalid exit code

From a Bourne shell, the variable $? holds the exit code of the last command run in that shell. (For csh, use $status). For example, if I run:

$ grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
$ echo $?
0

As you can see, grep actually found a match, which means it was successful. So, when I check the exit status ($?), the result was 0.

If a command isn’t successful, it goes like this:

$ grep no-such-animal /etc/passwd
$ echo $?
1

If I attempt to run a command which doesn’t exist:

$ grepadoodle foo
-bash: grepadoodle: command not found
$ echo $?
127

Finally, let’s say I ran the following command, and then was forced to kill it from another shell:

$ grep "Jimmy Hoffa" /dev/random
# This command will run forever if I let it, so at this point I go to another shell and kill it
Terminated
$ echo $?
143

Why 143? The TERM signal (the default signal sent with the kill(1) command) has a value of 15. 128 + 15 = 143.

Why This Matters

Unix shells have built-in Boolean operators like && (logical and) and || (logical or) that can be used to compare against the exit status of a command. These operators assume 0 = success (true) and !0 = failure (false). This allows us to write constructs like the following (Bourne shell):

$ grep fluffy /etc/passwd && echo "Your user list may contain a poodle."

$ [ -f somefile.txt ] || cp somefile.txt.defaults somefile.txt

If a program’s exit code does not follow the 0 = success rule, the logic rapidly gets confusing.

Windows/DOS

Here the situation is a little bit different, but not by much. In DOS/Windows batch files (or a command shell) the variable %errorlevel% contains the exit code of the last command executed, similar to Bourne shell’s $?. The semantics are similar:

0 Success
1..255 Error (again, meaning depends on the individual program)

Examples:

C:\> echo "Hello"
Hello

C:\> echo %errorlevel%
0

C:\> type nonexistent-file.txt
The system cannot find the file specified.

C:\> echo %errorlevel%
1

Where People Tend to Get Mixed Up

While virtually every built-in or well-known 3rd party command follows the above exit code rules, I have seen my share of proprietary applications that flip this logic around for no good reason. When speaking with the developers, I almost universally hear “but in [insert programming language here], 0 is false and 1 is true”. If you find yourself falling into this trap, just remember that the exit code of your program is part of its output, not part of its logic. You wouldn’t think of dumping a raw struct or object to the user’s terminal, would you? Instead, you format the output in a way that makes sense. Exit codes are no different.

Occasionally, I even see people flipping the boolean logic within programs for no good reason (and often without documentation), which leads to all sorts of confusing and error-prone constructs like this:

if (did_it_work())
        fprintf(STDERR, "Error\n");  /* Are you sure? */

There are a few reasonable exceptions to this rule, such as strcmp(3), which returns 0 if the strings are equal, < 0 if s1 is less than s2, and >0 if s1 is greater than s2. However, strcmp(3), fork(2) et al., are not boolean functions; they return a range of values instead of a simple truth value.

Summary

When in doubt, remember the following:

  1. Program exit codes are not function return values
  2. Boolean functions return 1 for true, 0 for false
  3. Programs exit with 0 on success, non-zero on failure

Keep it clean!