This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)
The zip6
function has long been a staple of the List::MoreUtils CPAN module. The Week 40 challenge #1 describes it very well. But even more succinctly: zip6
takes an array of arrays (AoA) and returns another AoA of all the 1st elements, then the 2nd, and so on.
It’s not a difficult algorithm by any stretch. In fact, the description more or less tells you how to do it. Simply iterate through all of the arrays in parallel, building up the resulting array as you go.
With a simple nested map
(and borrowing max
from core module List::Util, though that isn’t strictly necessary), the solution is easy:
sub my_zip6(\@\@;\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@) {
my $max = max map $#$_, @_; # Use the size of the longest array
map { my $i = $_; [ map $_->[$i], @_ ] } 0..$max;
}
That prototype does look a bit silly, but I was trying to match List::MoreUtils’ behaviour of accepting bare arrays, like so: my_zip6 @a1, @a2, @a3;
But if we don’t care about compatibility with List::MoreUtils, we can allow an arbitrary number of arrays (including anonymous arrays) by accepting refs instead.
At minimum, all we need to do is provide a different prototype! Since the \@
in the original prototype coerces the input arrays to refs already, the code is exactly the same, and unsurprisingly, performs exactly the same as well.
However, for a robust implementation, we might be tempted to add type checking, since Perl will no longer ensure all arguments are arrays. In fact, it would actually die with a somewhat useful error message (Not an ARRAY reference
), but the line number of the error will be the line number within the sub, not that of the caller. We can fix that by doing our own check and failing with Carp‘s croak
.
Here is the whole sub:
sub zip6ref_check($$;@) {
croak "Not an ARRAY ref" if any { 'ARRAY' ne ref } @_; # Type check
my $max = max map $#$_, @_;
map { my $i = $_; [ map $_->[$i], @_ ] } 0..$max;
}
Now the line number of the error will point to the caller, which is far more useful.
Unfortunately, the additional validation slows things down significantly on small arrays, cutting the performance almost in half. See below for the benchmark results. As always, there are tradeoffs for performance, convenience, and safety.
As always, my Perl 5 solution is available on GitHub.
How does List::MoreUtils do it?
It turns out the pure Perl List::MoreUtils::PP’ zip6
code is almost identical to my first solution, save for even more \@
s in the prototype. There just aren’t many good ways to iterate through n arrays.
sub zip6 (\@\@;\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@)
{
my $max = -1;
$max < $#$_ && ($max = $#$_) foreach @_;
map {
my $ix = $_;
[map $_->[$ix], @_];
} 0 .. $max;
}
Also, List::MoreUtils chose to roll their own maximum calculation, rather than using List::Utils max. For small arrays, at least, the performance difference is negligible.
Going faster: List::Utils::XS
To go even faster, List::MoreUtils::XS has a concise C solution that runs blazingly fast. Here are the numbers for all the versions, brought to you by Benchmark‘s cmpthese
:
Rate ref w/chk ref my_zip6 zip6(PP) zip6(XS)
ref w/chk 531258/s -- -39% -41% -42% -79%
ref 873274/s 64% -- -2% -5% -66%
my_zip6 893209/s 68% 2% -- -3% -65%
zip6(PP) 921625/s 73% 6% 3% -- -64%
zip6(XS) 2535241/s 377% 190% 184% 175% --
List::MoreUtils::XS is not installed along with List::MoreUtils, so you’re missing out if you aren’t already using it. However, if you do have the XS version installed, don’t be tempted to directly use
it in your code. use List::MoreUtils
instead; it will upgrade to the XS version if it’s installed, or gracefully fall back to pure-Perl when running on systems without the XS version.
Raku Version
Raku has a zip routine that should need no introduction, as it works more or less identically to the zip6 routines we just looked at.
.Str.say for zip @a1, @a2, @a3;
The .Str
is necessary to convert each returned array to a string representation, to get rid of the brackets. Without it, the output doesn’t quite match the challenge specification.
There is also the builtin zip operator (Z), which is both infix and chaining, so it can support as many arrays as you like:
.Str.say for @a1 Z @a2 Z @a3;
It’s just that easy.