PWC 166 › K-Directory Diff

This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

Task #2 this week is another one of mine. The idea is to take any number of pathnames on the command line (and we must support at least three), and output a side-by-side diff of the files that are different. This is a skin-deep analysis, so there is no recursion, and no comparison of file sizes, modification times, or contents, although those would be worthy additions if someone wants to take it further.

I will be implementing a fairly basic version, although I have a more complex version I’ve been using for years that I might be persuaded to release if there is interest.

Data Design

I am going to have one main data structure, %dirs, that will be built up as we traverse the directories. It will hold all directory and file information, as well as a maxlen value for each directory, for ease of formatting columns later. Here’s what %dirs might look like, given a very simple set of input directories:

%dirs = (
  dir_a =>  {
      files  => {
                "Arial.ttf"       => 1,
                "Comic_Sans.ttf"  => 1,
              },
      maxlen => 14,
  },
  dir_b => {
      files  => {
                "Arial.ttf"       => 1,
                "Comic_Sans.ttf"  => 1,
                "Courier_New.ttf" => 1,
              },
      maxlen => 15,
  },
)

For convenience, I’ll also keep:

@dirs – List of directories with original order preserved (@ARGV)
@uniq – Unique filenames across all directories, in sorted order

Building `%dirs`

Building up %dirs is the heart and soul of this script, but is surprisingly easy:

sub read_all_dirs {
    map {
        my $dir = $_;
        my %hash = map  { $_ => 1 } 
                   map  { -d "$dir/$_" ?  "${_}/" : $_ }
                   grep { -f "$dir/$_" or -d "$dir/$_" } read_dir($dir);

        $dir => {
            files  => \%hash,
            maxlen => max map length, keys %hash, $dir
        }
    } @_ 
}

I did use read_dir() from File::Slurper, but I hope you’d agree it would not be difficult to avoid that non-core module dependency if I wanted to.

The my %hash = map { ... } grep { ... } map { ... } read_dir() section is where the magic happens. Read from the bottom up, we get the list of everything in $dir, then we filter out anything that isn’t a regular file or directory (e.g., symlinks, devices, etc.), and then append a / if it’s a directory. The last map simply turns the file/dir name into a hash key/value pair with a true value (1).

The next block ($dir => { ... }) creates the directory hash for $dir, with all of the files, and we also calculate the maxlen (maximum length of the filenames in $dir, or $dir itself if it is the longest) for later use.

Formatting and printing the results

I want every directory to have its own column, and for that column to be as wide as the longest filename. As I will need to have headings, a divider, and then several rows, I don’t want to repeat myself, so I’ll define a printf-style format string I can use for all three cases:

my $fmt = join(" | ", map { "%-$dirs{$_}{maxlen}s" } @dirs) . "\n";

We’re most of the way there now. Let’s output the headings and horizontal divider:

printf $fmt, @dirs;
printf $fmt, map { '-' x $dirs{$_}{maxlen} } @dirs; # Divider

Finally, we loop over all @uniq filenames, and output (using $fmt) a line for any filename that does not exist in all @dirs:

for my $file (@uniq) {
    my @files = map { $dirs{$_}{files}{$file} ? $file : '' } @dirs;
    next if all { length } @files; # Exists in all directories

    printf $fmt, @files;
}

ch-2.pl full source

Easy! On the sample input, it looks something like:

$ ./ch-2.pl dir_*
dir_a          | dir_b           | dir_c          
-------------- | --------------- | ---------------
Comic_Sans.ttf | Comic_Sans.ttf  |                
               | Courier_New.ttf | Courier_New.ttf
Georgia.ttf    |                 |                
               |                 | Monaco.ttf     
Old_Fonts/     |                 |                
               | Tahoma.ttf      |

That’s it! See you next week.

Data Design

Building %dirs

Formatting and printing the results

Leave a Reply Cancel reply

Building `%dirs`