PWC 110 › Phone Number Validation

This post is part of a series on Mohammad Anwar’s excellent Perl Weekly Challenge, where Perl and Raku hackers submit solutions to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

Personal note: I’ve had to take a break from participating in the PWC, but I’m back for this week, at least. Hopefully I’ll be able to contribute more again.

The first task this week is a sort of simple phone number validation, based on provided templates. Numbers must match the following, where n is any decimal digit:

+nn  nnnnnnnnnn
(nn) nnnnnnnnnn
nnnn nnnnnnnnnn

Based on the provided sample output, it seems clear that leading and trailing whitespace are ignored. Internal whitespace is also compressed, as the first provided template has two spaces after +nn, yet the phone number +44 1148820341 is supposed to match.

Let’s try two different methods of matching, with Perl and Raku.

Perl

For Perl, I’m going to take the templates as plain text input. I’ve dumped them directly in the source for convenience, but they could be just as easily read from a file. The templates are then automatically trimmed of trailing/leading whitespace, and any internal whitespace is compressed to single spaces:

my %valid =
    map { y/ / /sr => 1 }
    split /\s*(\n|$)\s*/, q{
        +nn  nnnnnnnnnn
        (nn) nnnnnnnnnn
        nnnn nnnnnnnnnn
    };

Now, $valid{$number} exists (and is true) if and only if $number matches one of the provided templates. All we need now is a sub to turn a given phone number into a template string, and check %valid:

# Check if a number matches any template in %valid
sub check_number {
    local $_ = shift;

    s/^\s+//, s/\s+$//; # Trim leading and trailing whitespace
    y/0-9/n/, y/ / /s;  # Replace digits, squash internal spaces

    return $_ if $valid{$_};
}

Checking all provided phone numbers against every template can now be done in O(n) time:

print for grep { check_number($_) } <>;

Leading and Trailing Whitespace Performance

When I trim leading and trailing whitespace, I sometimes do it with the following single regex:

s/^\s+|\s+$//g

However, the alternation there makes this regex significantly slower than the one I used in the highlighted line, above. Breaking the regex apart into two statements is somewhat unintuitively faster, and doesn’t really cost anything in terms of readability. In this case, it made the code for this task nearly 50% faster overall, so it was definitely worth it.

Other Approaches

It’s also possible to instead turn the templates into regular expressions (or even one big regex), and match against that, but the above way is significantly cleaner, and performs about as well.

Raku

In Raku, I opted to use a grammar. It breaks down phone numbers into a country code, optional whitespace, internal whitespace, and the local (ten digit) portion of the number.

grammar Phone-Number {
  token TOP   { <ows> <cc> <iws> <local> <ows> }
  token cc    { \+ \d\d | \( \d\d \) | \d ** 4 }
  token ows   { \s*      }
  token iws   { \s+      }
  token local { \d ** 10 }
}

I kept it simple, but it should be fairly obvious how this could be extended to parse additional phone number formats. There are easily hundreds of different dialing conventions around the world, and parsing all of them would require the sort of heavy lifting grammars can provide.

Once the grammar is defined, running standard input through it is easy:

.say for lines.grep: { Phone-Number.parse($_) }

That prints the matching phone numbers like the challenge asks. However, all of the tokens in the grammar are now available to you! For example, the first matching phone number looks like this:

「0044 1148820341」
 ows => 「」
 cc => 「0044」
 iws => 「 」
 local => 「1148820341」
 ows => 「」

Depending on what you needed the phone numbers for, you could modify the grammar to tokenize it just the way you like. Of course, regular expressions can do just as well, but grammars provide a powerful, expressive language.

Full Code

The complete code for this week’s solutions is available on GitHub.

Leave a Reply

Your email address will not be published. Required fields are marked *