This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)
Task #2 this week is another one of mine. The idea is to take any number of pathnames on the command line (and we must support at least three), and output a side-by-side diff of the files that are different. This is a skin-deep analysis, so there is no recursion, and no comparison of file sizes, modification times, or contents, although those would be worthy additions if someone wants to take it further.
I will be implementing a fairly basic version, although I have a more complex version I’ve been using for years that I might be persuaded to release if there is interest.
Data Design
I am going to have one main data structure, %dirs
, that will be built up as we traverse the directories. It will hold all directory and file information, as well as a maxlen
value for each directory, for ease of formatting columns later. Here’s what %dirs
might look like, given a very simple set of input directories:
%dirs = (
dir_a => {
files => {
"Arial.ttf" => 1,
"Comic_Sans.ttf" => 1,
},
maxlen => 14,
},
dir_b => {
files => {
"Arial.ttf" => 1,
"Comic_Sans.ttf" => 1,
"Courier_New.ttf" => 1,
},
maxlen => 15,
},
)
For convenience, I’ll also keep:
@dirs
– List of directories with original order preserved (@ARGV
)@uniq
– Unique filenames across all directories, in sorted order
Building %dirs
Building up %dirs
is the heart and soul of this script, but is surprisingly easy:
sub read_all_dirs {
map {
my $dir = $_;
my %hash = map { $_ => 1 }
map { -d "$dir/$_" ? "${_}/" : $_ }
grep { -f "$dir/$_" or -d "$dir/$_" } read_dir($dir);
$dir => {
files => \%hash,
maxlen => max map length, keys %hash, $dir
}
} @_
}
I did use read_dir()
from File::Slurper
, but I hope you’d agree it would not be difficult to avoid that non-core module dependency if I wanted to.
The my %hash = map { ... } grep { ... } map { ... } read_dir()
section is where the magic happens. Read from the bottom up, we get the list of everything in $dir
, then we filter out anything that isn’t a regular file or directory (e.g., symlinks, devices, etc.), and then append a /
if it’s a directory. The last map
simply turns the file/dir name into a hash key/value pair with a true value (1
).
The next block ($dir => { ... }
) creates the directory hash for $dir
, with all of the files
, and we also calculate the maxlen
(maximum length of the filenames in $dir
, or $dir
itself if it is the longest) for later use.
Formatting and printing the results
I want every directory to have its own column, and for that column to be as wide as the longest filename. As I will need to have headings, a divider, and then several rows, I don’t want to repeat myself, so I’ll define a printf
-style format string I can use for all three cases:
my $fmt = join(" | ", map { "%-$dirs{$_}{maxlen}s" } @dirs) . "\n";
We’re most of the way there now. Let’s output the headings and horizontal divider:
printf $fmt, @dirs;
printf $fmt, map { '-' x $dirs{$_}{maxlen} } @dirs; # Divider
Finally, we loop over all @uniq
filenames, and output (using $fmt
) a line for any filename that does not exist in all @dirs
:
for my $file (@uniq) {
my @files = map { $dirs{$_}{files}{$file} ? $file : '' } @dirs;
next if all { length } @files; # Exists in all directories
printf $fmt, @files;
}
Easy! On the sample input, it looks something like:
$ ./ch-2.pl dir_*
dir_a | dir_b | dir_c
-------------- | --------------- | ---------------
Comic_Sans.ttf | Comic_Sans.ttf |
| Courier_New.ttf | Courier_New.ttf
Georgia.ttf | |
| | Monaco.ttf
Old_Fonts/ | |
| Tahoma.ttf |
That’s it! See you next week.