PWC 165 › Simple SVG generator

This post is part of a series on Mohammad Anwar’s excellent Weekly Challenge, where hackers submit solutions in Perl, Raku, or any other language, to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

The tasks this week are ones I devised. Allow me a moment to explain the motivation behind them.

I often have a need to quickly visualize some random bits of data, and while I’ve gotten great mileage out of terminal output, and things like gnuplot or spreadsheet imports, sometimes better and more convenient results are possible by generating the image myself.

There is ImageMagick (see perlmagick for a Perl interface), but its dependencies are heavy, and getting it to run at all on some systems (particularly the embedded systems I work on), can be challenging. And of course, it only works with raster images, which do not scale well.

Fortunately, it’s very easy to generate vector images with pure Perl (or most any language, for that matter). You can even do it easily without any CPAN modules, but remember, even you can use CPAN!

Enter Scalable Vector Graphics (SVG).

Raster images are comprised of a grid of pixels. Vector images use shapes like circles, lines, and curves. Because these are defined mathematically, vector images can be resized, squished, or rotated with no loss in quality.

Quick Introduction to SVG

For this task, I will be using the SVG module by Morgane Oger and recently maintained by our very own Mohammad Anwar. However, SVG files are simply XML documents, so it would not be much harder to generate the XML yourself. Here’s what SVG source looks like:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
  "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg xmlns="http://www.w3.org/2000/svg" 
     xmlns:svg="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     height="300" width="400">

	<g fill="#f73" id="points">
		<circle cx="36"  cy="82"  r="3" />
		<circle cx="358" cy="259" r="3" />
		<circle cx="327" cy="256" r="3" />
   </g>
	<g id="lines" stroke="#369" stroke-width="4">
		<line x1="0" x2="378" y1="48.4172" y2="244.3444" />
	</g>
</svg>

The highlighted lines are essentially boilerplate, Within the <svg> tag, though, you’ll note that’s where the width and height are specified. If you just insert this image into your web page, that’s how big it will be by default. However, unlike raster image types like JPEG, PNG, and GIF, you can resize an SVG graphic to any size you like, and it will be as smooth and sharp as ever.

Next, you will see a <g id="points" ...> tag. This is a group I have named “points” which allows you to group similar elements together and apply a consistent appearance to them all. The points have all been given a fill of #f73 (orange). You’ll note the radius (r="3") has to be specified for each <circle>, as that is an essential property of the circle, rather than a style.

In the case of our “lines” group, we only have one line, so the group wasn’t strictly necessary, but we might have multiple lines, and it’s a good organizational tactic anyway.

So now you see how easy it is to generate SVG. Having therefore satisfied ourselves that we could easily do it, we can now use a library without any guilt whatsoever. ;-)

Perl SVG CPAN Module

The SVG module automates the generation of all of that XML code with a convenient object interface. Here’s my code for taking an array of points and lines and returning the resulting XML:

# Do the actual SVG plot
sub plot_svg {
    my $svg = SVG->new(width => $o{width}, height => $o{height},
        $o{credits} ? () : (-nocredits => 1));

    # Style the points and lines
    my $lg = $svg->group(           id => 'lines', 
                        'stroke-width' => $o{'stroke-width'},
                                stroke => $o{'line-color'});

    my $pg = $svg->group(           id => 'points',
                                  fill => $o{'point-color'});

    # Plot points and lines
    for (@_) {
      $lg->line(  x1 => $_->[0], y1 => $_->[1], 
                  x2 => $_->[2], y2 => $_->[3])                 if @$_ == 4;
      $pg->circle(cx => $_->[0], cy => $_->[1],r => $o{radius}) if @$_ == 2;
    }

    $svg->xmlify;
}

That’s essentially all you need to create an SVG file, and that’s about 90% of this task complete. I parameterize the colors, point and line sizes as command line arguments, however in a full application, I would have used CSS instead.

Now, we just need to read the points and lines from STDIN.

Reading Input

Our input is a collection of points and lines, separated by newlines, with coordinates separated by commas or whitespace. Happily, that’s the output format of Task #2. Imagine that. The following block reads the points and lines and passes them to the plot_svg() sub we just defined, above.

say plot_svg(map {
    my @n = split /[\s,]+/;

    my $err = sub { die "ERROR: Line $.: $_[0], input was:\n  $_\n" };
    $err->('Not a number')                 if grep { /[^0-9.-]/ } @n;
    $err->('Expected 2- or 4-number list') if @n != 2 and @n != 4;

    [ @n ];
} <>);

The only slightly sneaky thing I did here was create an anonymous sub ($err) as a closure around @n and $_ to keep our errors consistent. It didn’t make the code much shorter, but it was worth it so as to not repeat myself (DRY).

ch-1.pl full source

Task 2: Line of Best Fit

We can now move straight on to task 2, which is calculating the line of best fit and passing that (along with the input points) to our task 1 plotter, above.

I’ll be using the very common least squares method. I won’t go through all the math here, as other sites dedicated to math have already done a great job at explaining it, such as this one by Math is Fun.

Translating the math to Perl code is quite straightforward, but first we need some syntactic sugar!

sub X() { $_->[0] }
sub Y() { $_->[1] }

# Without sugar                      # With Sugar
sum map { $_->[0] * $_->[1] } @_;    sum map { X * Y } @_;

As you can see, these subs will make our code a lot cleaner, and better communicate our intent.

Line of Best Fit

sub best_fit {
    my $mean_x = (sum map X, @_)  / @_;
    my $mean_y = (sum map Y, @_)  / @_;
    my $sum_sq =  sum map { X * X } @_;
    my $sum_xy =  sum map { X * Y } @_;

    my $m = ($sum_xy / @_ - $mean_x * $mean_y) / 
            ($sum_sq / @_ - $mean_x * $mean_x);
    my $b =  $mean_y - $m * $mean_x;

    ($m, $b, $mean_x, $mean_y);
}

This will return the slope ($m) and y-intercept ($b), which is all you need to describe the line of best fit, using the standard equation for a line:

\(y = mx + b\)

Plotting Everything

The last step is writing all of our points (copied from our input), along with the line of best fit. With all the work we’ve already done, this part is simple:

my @points = map { chomp; [ split /[\s,]+/ ] } <>;
my $width  = max map X, @points;
my ($m,$b) = best_fit(@points);

my $line = [ 0, $b, $width, $m*$width + $b ];

say join ',', @$_ for @points;
printf "%d,%.4f,%d,%.4f\n", @$line;

Plotting the Line

The highlighted line, above, demonstrates how we use the standard equation for a line \(y = mx + b\) to get the coordinates for a line segment from x = 0 to x = $width. That gives us a line of best fit that stretches across our entire image.

I chose to plot the points with y = 0 at the top of the image, as that’s what I use most often. Mathematicians would place the origin (0,0) at the bottom left. Changing that would be as easy as replacing $y with $height - $y in the plot.

ch-2.pl full source

Bonus: Point Generator

In my submission this week, I’m also including my random point generator. This is the same script I used to generate the sample input for this task.

If you generate truly random points, your line of best fit will not be very interesting. So instead, I generate points along a line, with a random variance from the line. Because I thought it looked more interesting, I also spread the points out more, the farther to the right we are. Here is the main sub:

use Math::Trig;

sub generate_points {
    my ($m, $b) = @_;
    my $angle = atan($m);

    map { 
        my $x = rand($o{width});
        my $y = $m*$x + $b;

        # Perturb the points more towards the right
        my $dev = 100 + $x/3; # hypotenuse
        my $ydev = $dev * sin($angle);
        $y += rand($ydev) - $ydev/2;

        [ map { int } $x, $y ]
    } 1..$o{points};
}

gen_points.pl full source

Pipeline

Of course, it’s no accident that task #1 is a simple SVG plotter! Assuming a UNIX-ish system, you can set up a simple pipeline as follows:

$ ./gen_points.pl --width=400 --height=300 --points=100 --m=0.5 --b=50 \
      | ./ch-2.pl \
      | ./ch-1.pl > regression.svg

If everything went according to plan—as it most certainly always does!—the .svg file will look something like this:

Line of best fit
Line of best fit

Final Thoughts

I hoped to introduce some people to how easy it can be to generate SVG graphics. I’d be very happy if at least one person made it through both tasks and then said, “I want to play around with this some more,” and will go on to extend or use similar tools in other projects. If this is all old hat to you, then I at least hope you were able to have a little fun with it.

I’d love to hear from you, either with a comment here, or by email at rjt@cpan.org.

One Reply to “PWC 165 › Simple SVG generator”

Leave a Reply

Your email address will not be published. Required fields are marked *