This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)
For this week’s second challenge, the task is to generate a pangram.
A pangram is simply a collection of words that contain every letter in the alphabet (English, in this case). A classic pangram example is:
the quick brown fox jumped over the lazy dog
Wikipedia
Algorithm
There are a lot of different ways to approach this problem, but I settled on a simple iterative approach in O(n²). It does not find the shortest, nor is it the fastest, but it does a pretty good job of both.
I also tried a bit mask approach on a 26-bit integer, but that was needlessly complicated and slower.
Perl code
sub pangram {
my @pangram; # Pangram gets built here
my %has;
# Trade some space for time
my %words = map { $_ => [ uniq split // ] } @_;
while (keys %has < 26) {
my %best = (word => undef, score => -26);
for my $word (keys %words) {
my $new = grep { !$has{$_} } @{$words{$word}};
if ($new == 0) {
delete $words{$word};
next;
}
my $score = $new * 2 - length;
%best = (word => $word, score => $score)
if $score > $best{score};
}
# Put the best word in the @pangram
push @pangram, $best{word};
$has{$_} = 1 for @{$words{$best{word}}};
}
@pangram;
}
GitHub: perl/ch-2.pl
The calling syntax is simply pangram(@words)
where @words
is the dictionary. I read the dictionary from the file Mohammad checked into the git repository, but you can point it at any dictionary you like by giving the script a --dict=/path/to/dict.txt
argument.
The script also takes a --min=length
argument to filter out words below the specified minimum length. (Default is 4.)
Algorithm Remarks
We first do some housekeeping: the %has
hash keeps track of the letters that we already have in our @pangram
, built up over time. If $has{x}
is false, we don’t have the letter x
in our pangram, yet.
Then we make a hash of %words
, and each word is mapped to an array ref of its unique characters. For example, ‘coffee’ would map to [ 'c', 'o', 'f', 'e' ]
. Order doesn’t matter.
Each time through the main loop, we look for the %best
word to add to the pangram. I have arbitrarily defined a heuristic of “new letters, minus existing letters”. We therefore tend to get words with a lot of new letters, without getting many duplicate letters.
This greedy method does not generate the shortest pangram, but rather a “pretty short” pangram in O(n2) time. Since we’re removing many words from %words
on each iteration, the average time is significantly faster than n2.