PWC 289 › 3rd Maximum and Jbmueld Wrods

This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

Task 1 – 3rd Maximum

Task 1 this week is to take a list of integers and find the 3rd unique maximum value, or if there are fewer than 3 unique values, return the overall maximum. Examples:

Integers3rd Maximum
5, 6, 4, 14
4, 55
1, 2, 2, 31

To accomplish this, I just sort the input integers and get the uniq list from that (from core module List::Util). The sort is O(n log n), but it’s only done once, so that’s our overall complexity.

It would also be possible to use a sliding window for this, but then every insert would be more expensive, up to O(wn) complexity where w is the window size. For constant w where w << n, this might be a win, but Perl’s slow looping speed might make that constant factor dominate, and the code would have been more complex. Still, it could have been fun to experiment with some profiling, given more time.

The general case solution is extremely simple:

sub nth_max {
    my $n = shift;
    my @uniq_sorted = uniq sort { $b <=> $a } @_;

    $uniq_sorted[ @uniq_sorted >= $n ? $n - 1 : 0 ]
}

And if you want a dedicated wrapper for the 3rd max to match the problem description:

sub third_max { nth_max(3 => @_) }

Task 2 – Jumbled Words

I wrote this task. I’ll include my preamble:

An Internet legend dating back to at least 2001 goes something like this:

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

This supposed Cambridge research is unfortunately an urban legend. However, the effect has been studied. For example—and with a title that probably made the journal’s editor a little nervous—Raeding wrods with jubmled lettres: there is a cost (PubMed, paywalled) by Rayner, White, et. al. looked at reading speed and comprehension of jumbled text.

The task is essentially to create jumbled text like that classic example. The first and last letters stay the same, but everything else is scrambled. Whitespace, punctuation, capitalization, and word order do not change.

This turns out to be a one-liner:

perl -MList::Util=shuffle -ne'print s!\b\w\K(\w\w+?)(?=\w\b)!join "", shuffle split //, $1!egr'

Written instead as a function that takes and returns a string:

sub readable { $_[0] =~ s! \b\w\K (\w\w+?) (?=\w\b)
                         ! join '', shuffle split //,$1 !egrx }

Note that I’m using \w here mainly for brevity in this blog post. Matching only letters requires [[:alpha:]] instead. I’ll still say “letters” here for clarity, but just know that this code will also scramble any \w word character, which includes numbers and a few other characters such as underscore, depending on your Perl version.

The regexp is the same in the one-liner and function; I just added /x to include some whitespace to make it a bit more readable. Here’s what each piece does:

  • \b\w\K – This looks for a word boundary, matches the first letter, and then asserts a zero-width positive lookbehind, which here has the purpose of ignoring that first letter later.
  • (\w\w+?) – Matches at least two letters, but does so non-greedily so we can still match the last letter in the word later. The braces capture the result into \1 or \$1.
  • (?=\w\b) – Matches the last letter in the word, and the word boundary. The (?= ...) makes the group a positive lookaround, meaning it will be matched and consumed, but not replaced.

The (eval’d) substitution simply split()s the middle of the word into letters, shuffle()s them up, and join()s them back together.

Flexible Matching

To atone for my sins with \w, here’s a version with flexible matching:

sub readable_flex {
    my $m = shift;
    $_[0] =~ s! \b$m\K ($m{2,}?) (?=$m\b)
              ! join '', shuffle split //,$1 !egrx
}

That one has a different calling syntax; the first argument ($m) is any regexp that matches the characters to be matched (including beginning and end), so jumbling words with only English letters looks like this:

readable_flex(qr/[[:alpha:]]/ => 'Word 12345.') # 12345 is not jumbled

Why [[:alpha:]] and not qr/[a-z]/i?

You might see [a-z] used for patterns like this. That only works if you know for a fact that your input only contains ASCII (and actually only lower (7-bit) ASCII, at that!). Even if your input is guaranteed to be English, we use many loanwords and proper nouns that use diacritical marks other Unicode letters. Touché!

Leave a Reply

Your email address will not be published. Required fields are marked *