PWC 161 › Abecedarian Words

This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

I know, it’s been a long time!

This week, I’m back. I submitted the challenges for this week, so I thought it’s only fair that I give them a go myself. Unfortunately, I realized too late that this Abecedarian Words is essentially the same as Week 111’s “Ordered Letters” submitted by the venerable E. Choroba. I thought we hadn’t done it. Nuts. I’m sorry.

Challenge Description

An abecedarian word is a word whose letters are arranged in alphabetical order. For example, “knotty” is an abecedarian word, but “knots” is not. Output or return a list of all abecedarian words in the dictionary, sorted in decreasing order of length.

From: PWC Challenge #161

Pure Perl

First off, one assumption: the dictionary contains only ASCII letters a..z. This turns out to be a fair assumption, because I supplied the dictionary. :-)

For this one, it’s tempting to simply sort the letters and compare:

sub is_abcd_sort { $_ eq join '', sort split // }

Since words are, on average, short, that actually performs decently well, and since you’re likely to simply run this once over your list of words at the start of a script, it’s probably fine.

However, sometimes there are lessons to be learned by optimizing an elegant one-liner to within an inch of its life, so here we go. I came up with four pure-Perl solutions, including the sort one from above:

sub is_abcd_reduce { '~' ne reduce { $a gt $b ? '~':$b } split // }

sub is_abcd_regex  { /^a*b*c*d*e*f*g*h*i*j*k*l*m*n*
                       o*p*q*r*s*t*u*v*w*x*y*z*$/x }

sub is_abcd_sort   { $_ eq join '', sort split // }

sub is_abcd_loop   {
    my $last;
    for my $ch (split //) {
        return if $last gt $ch;
        $last = $ch;
    }
    $_;
}

GitHub: perl/ch-1.pl

The reduce version is a little cheeky. The ASCII character “~” will compare higher than any letter, so I use that as a sentinel. If any letter is greater than the previous letter, the tilde will poison the result. It’s inefficient, as it always loops over the entire word, rather than bailing out early, but importantly, there is also a subroutine call for every letter. Perl sub calls are very expensive, as there is a lot going on under the hood.

The regex simply matches any word that contains zero or more of each letter, in order, and nothing else. Easy. Unfortunately, also kind of slow, and this would not extend as easily to non-ASCII alphabets.

The loop is perhaps a bit less Perlish, but I don’t hold that against it. It’s a natural feeling solution to this problem, and as you’ll see, it holds its own.

Inline::C

Since the simple looping version is already a top performer, and almost looks like C code already, let’s convert it to Inline::C.

/* This does the actual checking */
int __is_abcd(unsigned char *s) {
    unsigned char last = 0;
    for (unsigned char *p = s; *p; last = *p, p++ )
        if (last > *p)
            return 0;

    return 1;
}

/* Boolean, works on $_ */
int is_abcd_inline() {
    SV *var = get_sv("_", GV_ADD);
    unsigned char *s = SvPVutf8_nolen(var);

    return __is_abcd(s);
}

I used a wrapper here so we could keep the same calling syntax as the other pure Perl subs and also write another C function that you’ll see in a minute. Here is how I call all of the is_abcd...() subs:

my @abcd = &is_abcd_inline, @words;

Now, since all of this is running in a tight loop over a large list of @words, I decided to write one more C function to operate on the entire list of @words and return all abecedarian words. If you’ve never worked with Perl guts, some of this might look a little strange, but the stack is where we get all of our arguments (words) from, and we shove the results on the stack as well.

/* Process the entire list */
void abcd_words(SV *word, ...) {
    Inline_Stack_Vars;

    Inline_Stack_Reset;
    for (int i = 0; i < Inline_Stack_Items; i++) {
        if (__is_abcd(SvPV(Inline_Stack_Item(i), PL_na)))
            Inline_Stack_Push(Inline_Stack_Item(i));
    }
    Inline_Stack_Done;
}

Benchmarking Results

First, let’s quickly acknowledge that all of the different implementations run in linear time. Keep that in mind.

My ch-1.pl script can be run with --benchmark to benchmark all of the above on your system. On mine, here are the results:

         Rate reduce  regex   sort   loop inline inlAll
reduce 8.70/s     --    -9%   -25%   -40%   -89%   -98%
regex  9.52/s    10%     --   -18%   -34%   -88%   -98%
sort   11.6/s    33%    22%     --   -20%   -85%   -97%
loop   14.5/s    67%    53%    25%     --   -81%   -97%
inline 76.7/s   782%   705%   561%   428%     --   -83%
inlAll  454/s  5117%  4663%  3812%  3023%   492%     --

These results are not surprising, although I had hoped the regex would do a little better than it did. The simple loop did very well compared to more “clever” solutions, but the two inline solutions were the (very) clear winners. Perl is efficient for a lot of tasks, but tight looping is not one of them.

Final Thoughts

Which one would I deploy in the real world? Probably “loop”. It’s clear, reasonably concise, and likely “fast enough” for any application. It processes the entire dictionary in 69ms, and there’s no reason to do that more than once.

If it turned out there was some arcane use case for looping over dictionaries repeatedly, I might be inclined to ship the C version (as XS) with the pure Perl fallback.

Leave a Reply

Your email address will not be published. Required fields are marked *