PWC 161 › Pangrams

This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

For this week’s second challenge, the task is to generate a pangram.

A pangram is simply a collection of words that contain every letter in the alphabet (English, in this case). A classic pangram example is:

the quick brown fox jumped over the lazy dog

Wikipedia

Algorithm

There are a lot of different ways to approach this problem, but I settled on a simple iterative approach in O(n²). It does not find the shortest, nor is it the fastest, but it does a pretty good job of both.

I also tried a bit mask approach on a 26-bit integer, but that was needlessly complicated and slower.

Perl code

 sub pangram {
    my @pangram;  # Pangram gets built here
    my %has;

    # Trade some space for time
    my %words = map { $_ => [ uniq split // ] } @_;

    while (keys %has < 26) {
        my %best = (word => undef, score => -26);

        for my $word (keys %words) {
            my $new = grep { !$has{$_} } @{$words{$word}};
            if ($new == 0) {
                delete $words{$word};
                next;
            }

            my $score = $new * 2 - length;
            %best = (word => $word, score => $score)
                if $score > $best{score};
        }

        # Put the best word in the @pangram
        push @pangram, $best{word};
        $has{$_} = 1 for @{$words{$best{word}}};
    }

    @pangram;
}

GitHub: perl/ch-2.pl

The calling syntax is simply pangram(@words) where @words is the dictionary. I read the dictionary from the file Mohammad checked into the git repository, but you can point it at any dictionary you like by giving the script a --dict=/path/to/dict.txt argument.

The script also takes a --min=length argument to filter out words below the specified minimum length. (Default is 4.)

Algorithm Remarks

We first do some housekeeping: the %has hash keeps track of the letters that we already have in our @pangram, built up over time. If $has{x} is false, we don’t have the letter x in our pangram, yet.

Then we make a hash of %words, and each word is mapped to an array ref of its unique characters. For example, ‘coffee’ would map to [ 'c', 'o', 'f', 'e' ]. Order doesn’t matter.

Each time through the main loop, we look for the %best word to add to the pangram. I have arbitrarily defined a heuristic of “new letters, minus existing letters”. We therefore tend to get words with a lot of new letters, without getting many duplicate letters.

This greedy method does not generate the shortest pangram, but rather a “pretty short” pangram in O(n2) time. Since we’re removing many words from %words on each iteration, the average time is significantly faster than n2.

Leave a Reply

Your email address will not be published.