Benchmarking Scala with ScalaMeter, Pt. 2 (Scala DCP #004)

awwsmm

Andrew (he/him)

Posted on March 27, 2020

Benchmarking Scala with ScalaMeter, Pt. 2 (Scala DCP #004)

Daily Coding Problem is a website which will send a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Scala, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!

Problem

This is a continuation of my earlier post, where I provided three different methods for solving Daily Coding Problem #004 using Scala. In this second section, I'll benchmark these three solutions (in time and memory usage) using ScalaMeter, a Scala-focused benchmarking tool which is preferred by some Scala language developers over JMH and has previously been featured here on DEV.to!

Strategy

ScalaMeter's website has an easy-to-follow example showing how to do a simple benchmark. Essentially what we do is generate some sample data, then pass it to each method we want to benchmark. Of the available predefined test configurations, the one which works best for this example is Bench.OfflineReport. This will run each benchmark in a separate JVM and generate an HTML report for easy visualization.

You can see the other available configurations in ScalaMeter's scaladoc.

There are some bugs you have to work around since the documentation on the website is out of date, and things like running multiple test classes as a suite of tests don't quite work. If you want a nicely-formatted HTML report, the best solution I've found so far is to simple put all of your code into a single benchmark object which extends Bench.OfflineReport:

object Benchmark extends Bench.OfflineReport {

  // benchmarks go here!

}
Enter fullscreen mode Exit fullscreen mode

I put my three methods, as well as the random array generator, into another object called DCP004 and import them at the top of Benchmark. My first working benchmark looked something like:

object Benchmark extends Bench.OfflineReport {
  import DCP004._

  performance of "Example #1: randArray(5, 0, 1)" in {

    val arrayLength = Gen.single("n")(5)
    val arrays = for (n <- arrayLength) yield randArray(n, 0, 1)

    using(arrays) curve "findMissingMutable" in findMissingMutable
    using(arrays) curve "findMissingInPlace" in findMissingInPlace
    using(arrays) curve "findMissingSet" in findMissingSet
  }

  // ...
}
Enter fullscreen mode Exit fullscreen mode

In the above snippet, the methods findMissingMutable(), findMissingInPlace(), findMissingSet(), and randArray() are all defined in the DCP004 object. The sample data are created by Generators, which create warmup data, as well.

If you're unfamiliar with the concept of warming up the JVM, especially during benchmarking, check out this SO post, which gives a short overview.

arrayLength is what you might call a "pure" generator. It's defined by a method built into ScalaMeter. But ScalaMeter provides only a handful of built-in generators:

  • Gen.single() -- which generates a single predefined value over and over
  • Gen.range() -- which generates a range of Ints, given a minimum value, maximum value, and step size
  • Gen.exponential() -- which generates an exponential range of Ints, given a minimum value, maximum value, and multiplicative factor

...and a few others. Note that exponential() and range() only generate Int values and have to be processed to generate, for instance, Doubles. You can build composed generators from these basic generators by using for expressions. For example:

val answers = Gen.single("ASCII *")(42)
val decimals = for (answer <- answers) yield answer / 100.0
Enter fullscreen mode Exit fullscreen mode

...this will generate the decimal value 0.42 over and over each time the decimals generator is used. In my example above, instead of just generating Doubles, I use a composed generator to generate entire random arrays:

val arrayLength = Gen.single("n")(5)
val arrays = for (n <- arrayLength) yield randArray(n, 0, 1)
Enter fullscreen mode Exit fullscreen mode

These arrays are then passed to each of the using statements, which each spawn a new JVM, independent of any other using statements, perform a full warm-up, then benchmark the included code. The curve "name" syntax allows us to generate an HTML report with multiple named curves, so all of our results can be easily compared. Here's an example of what that HTML report might look like:

Simple Benchmarking HTML Report

In the sidebar, we have a legend / menu which tells us what each of the bars represent:

Simple Benchmarking Legend

Selecting and deselecting these curves, which we defined in our code above, will make them disappear and reappear in the plot. Also notice the x-axis selection at the bottom of the plot area. When we run tests with more x-values, we'll have the option of selecting additional points here. In fact, let's do that right now:

  performance of "Example #2: randArray(n, 0, 1) -- linear" in {

    val arrayLength = Gen.range("n")(100, 1000, 100)
    val arrays = for (n <- arrayLength) yield randArray(n, 0, 1)

    using(arrays) curve "findMissingMutable" in { findMissingMutable }
    using(arrays) curve "findMissingInPlace" in { findMissingInPlace }
    using(arrays) curve "findMissingSet" in { findMissingSet }
  }
Enter fullscreen mode Exit fullscreen mode

This is the second benchmark I've included inside object Benchmark. It uses a range generator to generate values on the range [100, 1000] (inclusive on both ends) with an interval of 100. So, using a composed generator, we can generate random arrays of length 100, 200, 300, ..., 1000, and time how long it takes our implementations to find the smallest missing positive integer in the array. This code will generate a different plot in our HTML report, something like:

Benchmark with linearly-spaced array lengths

We can start to see some trends now. It looks like findMissingInPlace (purple) and findMissingSet (brown) are more or less comparable in the time it takes them to find the smallest missing positive integer in their respective arrays. But findMissingMutable (red) seems to blow them out of the water, requiring only 0.02 ms on average for n = 1000, while findMissingInPlace and findMissingSet require 0.30 and 0.20 ms, respectively -- ten times slower!

You can find the values of the points in the plots by inspecting the data.js file which is created when an OfflineReport HTML report is generated. The data are presented in JSON format and all benchmark times are reported in milliseconds. Or, if you don't want to sift through the JSON data, you can screenshot the plot and use Ankit Rohatgi's WebPlotDigitizer to get approximate values -- that's what I did.

Will this trend hold for larger values of n? There's only one way to find out...

  performance of "Example #3: randArray(n, 0, 1) -- exponential" in {

    val arrayLength = Gen.exponential("n")(2, Math.pow(2, 16).toInt, 2)
    val arrays = for (n <- arrayLength) yield randArray(n, 0, 1)

    using(arrays) curve "findMissingMutable" in { findMissingMutable }
    using(arrays) curve "findMissingInPlace" in { findMissingInPlace }
    using(arrays) curve "findMissingSet" in { findMissingSet }
  }
Enter fullscreen mode Exit fullscreen mode

To get even bigger arrays, I decided to use an exponential generator, starting with arrays of length 2 and doubling up to 65536 (2^16). What does this benchmark look like? I'm glad you asked:

Benchmark with exponentially-spaced array lengths

findMissingInPlace (silver) and findMissingSet (mustard) are statistically indistinguishable performance-wise, over this range. If you want a solid Scala-style solution, and you're not extremely concerned with performance, either of these implementations should be fine.

However, if you are concerned about the running time of the algorithm, neither of these implementations can compete with findMissingMutable, our Java-style solution which absolutely destroys the other implementations, requiring an order of magnitude less time to solve this problem. There is not a single point on this graph where findMissingMutable isn't the most performant solution.

So that's it! findMissingMutable wins!


...okay, you're right, I only looked at the length of the arrays and not the other parameters, fracBad and maxScaling. Luckily, (we can easily run multidimensional benchmarks using ScalaMeter). All we have to do is use another composed generator -- this time, with multiple enumerators which are all simple generators themselves:

  performance of "Example #4: randArray(a, b, c)" in {

    val arrayLength = Gen.exponential("n")(2, Math.pow(2, 20).toInt, 2)
    val pctBad = Gen.range("fracBad")(0, 100, 5)
    val pctScaling = Gen.range("maxScaling")(25, 300, 5)

    val arrays = for {
      length <- arrayLength
      bad <- pctBad
      scale <- pctScaling
    } yield randArray(length, bad / 100.0, scale / 100.0)

    using(arrays) curve "findMissingMutable" in { findMissingMutable }
    using(arrays) curve "findMissingInPlace" in { findMissingInPlace }
    using(arrays) curve "findMissingSet" in { findMissingSet }
  }
Enter fullscreen mode Exit fullscreen mode

ScalaMeter is smart enough to understand that we want to benchmark these methods across all of the 3-dimensional points defined by the arrayLength, pctBad, and pctScaling ranges. I let the array length get all the way up to 1 million elements (2^20), and tested fracBad and maxScaling over a huge range of values. This test took a few hours to run on my personal laptop and the results...

Multidimensional benchmark

...are a bit difficult to understand? You could, of course, pull this data out of the JSON file and plot it in a more robust environment (maybe an R shell?), but you inspect it very easily in ScalaMeter, which -- at the moment -- only offers line and bar plots by default.

ScalaMeter's documentation suggests that you should be able to create your own custom chart types, but the source code seems to suggest otherwise. (This functionality seems to be missing entirely.)

We can tidy the plot a bit by only looking at particular ranges of values, though. By dragging the boxes underneath the plot, we can select a different parameter to plot on the x-axis. It looks like fracBad affects the outcome much more drastically than maxScaling...

Impact of  raw `fracBad` endraw  on performance

...at least over the ranges we've provided:

Impact of  raw `fracBad` endraw  on performance

I say this because, in the plot above, the performance of the algorithms as a function of fracBad can be found along the x-axis. For findMissingSet (orange) this is a range of about 275 ms down to about 2 ms, over 2 orders of magnitude. But when looking at any particular value of fracBad, say 45 percent, the performance range relative to the median performance is much smaller -- about 115 ms to about 140 ms. Only about a 15% variation relative to a median of about 130 ms.

So let's pick one value of fracBad and look at the performance as a function of n and maxScaling:

Impact of  raw `maxScaling` endraw  and  raw `n` endraw  on performance

In every case, findMissingMutable (light blue) is the fastest implementation, by about an order of magnitude over findMissingInPlace (dark blue) and findMissingSet (orange). I think it's safe to say that findMissingMutable is far and away the most performant of these three algorithms. And, since it's an in-place algorithm, it's likely the most performant in terms of memory usage, as well.

I say "most likely" because I've tried to benchmark the memory footprint of these algorithms using ScalaMeter but have so far been unable to do so. According to ScalaMeter's documentation, "ScalaMeter can measure memory footprint starting from version 0.4. Measurer.MemoryFootprint is a measurer the measures the memory footprint of the object the return value of the benchmark snippet references."

This is fine, but our algorithms only return Ints. What we really want to measure is not the memory footprint of the return value but the memory required while running the particular algorithms. As far as I can tell, this is not currently possible with ScalaMeter.

Also, because of the way it calculates memory footprints, ScalaMeter can sometimes yield confusing (even negative) results for memory usage.

This tutorial only scratches the surface of what's available in ScalaMeter. You can choose to ignore measurements in which the garbage collector runs, introduce noise into your measurements, remove outliers, specify the minimum and maximum number of warmup runs, run regression tests, and more. ScalaMeter also nicely integrates with sbt, the Scala Build Tool.

All in all, I'm moderately happy with ScalaMeter. It's a nice library for doing small performance comparisons among competing implementations of an algorithm. However, it doesn't seem to be very well-documented. The Scaladoc itself offers almost no help when it comes to trying to implement things which aren't specifically spelled out on the website or in people's responses to ScalaMeter's GitHub issues. It's a library which has the potential to be very beginner-friendly, but has a long way to go to realising that promise, in my opinion.

You can play around with the HTML report I generated at awwsmm.com/resources/DCP/ScalaMeter.


All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.

Suggestions? Let me know in the comments.

If you enjoyed this post, please consider supporting my work by buying me a coffee!

💖 💪 🙅 🚩
awwsmm
Andrew (he/him)

Posted on March 27, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related