PWC 301 › Hamming Distance and Large Numbers

This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

Task 1 › Large Numbers

The first task this week has us taking a list of numbers and finding the permutation whose concatenation gives the largest number. For example: largest(20,3) = 320.

At first, I thought I might get away with a greedy join('', reverse sort @ints) but the greedy approach doesn’t work in this case. For example, join('', reverse sort(5, 50, 51)) would give 51505, but the correct answer is 55150. So, a more deliberate approach is needed. However, see the update below this section!

What I settled on is a relatively simple recursive algorithm that starts with the full list of @ints, sort that alphanumerically (in reverse order), and will then recurse for each integer that starts with the highest remaining digit. Clear as mud? Here’s a worked example:

largest(5 50 51):
  $sub->('' => reverse sort(5 50 51)) # 51 50 5
    $sub->(51 => 50 5)
      $sub->(5150 => 5) = 51505  # new max
      $sub->(515 => 50) = 51550  # new max
    $sub->(50 => 51 5)
      $sub->(5051 => 5) = 50515
      $sub->(505 => 51) = 50551
    $sub->(5 => 51 50)
      $sub->(551 => 50) = 55150  # new max
      $sub->(550 => 51) = 55051

The maximum value is kept in a closure and returned by the parent function.

Here is the complete subroutine:

sub largest(@ints) {
    my $sub = sub {
        my $num = shift;
        return $num.($_[0]//0) if @_ <= 1;
        my $max_val = 0; # Our candidate answer

        # Loop until current number starts with a lower digit than previous
        for my $i (0..$#_) {
            last if $i > 0 and substr($_[$i], 0, 1) ne substr($_[0], 0, 1);

            my @new_ints = @_;
            splice @new_ints, $i, 1; # Remove $i-th element

            my $val = __SUB__->($num.$_[$i], @new_ints);
            $max_val = length($val) > length($max_val) ? $val
                            : $val gt        $max_val  ? $val : $max_val;
        }

        $max_val;
    };

    return $sub->('',reverse sort @ints);
}

2025-01-04: Updated to use a stringwise numerical max calculation, which handles large numbers better.

I also got this great suggestion from long-time contributor wanderdoc in the comments below this post, which bears highlighting. His version is as follows:

sub largest(@ints) { join '', sort { $b.$a <=> $a.$b } @ints }

It seems my Holiday brain didn’t take my initial hunch of join '', sort { ... } quite far enough! This is exactly the kind of concise solution I like, and even handles large results just fine, so long as no pair of numbers in @ints concatenate into a value larger than the maximum integer.

Task 2 › Hamming Distance

Hamming distance calculations are a simple form of error detection. It measures the number of changes needed to turn one value into another. With integers x and y (represented in binary), the Hamming distance is simply the number of bits that differ between the two numbers. Note that x ^ b (XOR or exclusive or) will give us a 1 for any bit that differs, so we simply count the 1s in x ^ b. For example:

x   = 15 = 1111b
y   =  4 = 0100b
----------------
x^y = 11 = 1011b -> 3 1s, so hamming(15,4) = 3

In Perl, the Hamming distance between two numbers can therefore be implemented simply:

sub hamming($x,$y) { ()= sprintf('%b', $x ^ $y) =~ /1/g }

If you are squinting at the ()= in that expression, then let me introduce you to the =()= operator in Perl!

You can see I’m converting $x ^ $y to binary, and then running it through a simple regex, /1/g. The /g makes the regex return all of the matches, but we’re not interested in getting a bunch of 1s; we want to know how many matches there were. So the ()= puts the expression into list context, and then back into scalar context.

If you instead wanted to assign the result to a variable, you could write it as follows:

my $hamming_distance =()= sprintf('%b', $x ^ $y) =~ /1/g

That, of course, only calculates the Hamming distance between two numbers, whereas the task asks us to calculate the Hamming distance between all numbers, and sum the result. Why? I have no idea. I’m not aware of any good reason to do this, except that it’s a fun bit of recreational coding.

sub hamming_multi(@ints) {
    my $dist = 0;

    for my $i (0..$#ints-1) {
        $dist += hamming($ints[$i], $_) for @ints[$i+1..$#ints];
    }

    $dist;
}

As you can see, a simple nested loop does the trick.

Going faster: C implementation

The gcc compiler provides access to the __builtin_popcount() builtin, which can often calculate a single Hamming distance extremely quickly. Most modern CPUs have some kind of “popcount” instruction that counts the number of 1s in an integer, in a single instruction. On x86 for example, the POPCNT instruction does all of the heavy lifting, and __builtin_popcount() will use this platform-specific instruction (or one like it) if available. So our C function is simply:

int hamming(int x, int y) {
    return __builtin_popcount(x ^ y);
}

The hamming_multi() instruction is very similar to the Perl version:

int hamming_multi(int count, int ints[]) {
    int dist = 0; // Total Hamming distance

    for (int i = 0; i < count - 1; i++)
        for (int j = i + 1; j < count; j++)
            dist += hamming(ints[i], ints[j]);

    return dist;
}

5 Replies to “PWC 301 › Hamming Distance and Large Numbers”

dima says:

December 30, 2024 at 04:28

x = 15 = 1111b
y = 8 = 0100b
—————-
x^y = 11 = 1011b -> 3 1s, so hamming(15,8) = 3

=================

except that 0100b is 4, not 8.

x = 15 = 1111b
y = 8 = 1000b
—————-
x^y = 7 = 0111b -> 3 1s, so hamming(15,8) = 3

1. Ryan Thompson says:
  
  December 30, 2024 at 07:11
  
  Indeed. Thanks for the catch!
  
wanderdoc says:

January 4, 2025 at 07:19

return join(”, sort { $b.$a $a.$b } @_); returns 55150 with (5, 50, 51).

1. Ryan Thompson says:
  
  January 4, 2025 at 10:51
  
  Hey wanderdoc! Good to hear from you. That’s a lovely way to solve the problem. And to top it off, while checking it against my test cases, I also found that a couple of my test cases were wrong, and my code doesn’t handle large numbers as well as I thought it did. Dang. I blame it on being hopped up on turkey, sage, and cranberry sauce. 🙂
  
wanderdoc says:

January 4, 2025 at 11:14

Hi, I am happy you found it useful 🙂 I learned a lot from you.

Task 1 › Large Numbers

Task 2 › Hamming Distance

Going faster: C implementation

5 Replies to “PWC 301 › Hamming Distance and Large Numbers”

Leave a Reply to Ryan Thompson Cancel reply