Measuring performance using BenchmarkDotNet - Part 1

tonycknight

Tony Knight

Posted on March 15, 2021

Measuring performance using BenchmarkDotNet - Part 1

Introduction

We all must build fast software, right? Right? It’s true that microservices tend to introduce latencies - stateless functions mean a whole lot more network calls, and you can wave goodbye to data locality. But a microservice is still dependent on its own code being fast, and at least fast enough.

In the past we’ve relied on profilers, stopwatches, dedicated performance teams, and sometimes plain old complaints from the field. All of these methods require some form of measurement; unfortunately they tend to be “big picture” performance that lacks detail - and often without concrete scenarios. This gets very expensive very quickly.

Very often, you just want to measure the code’s performance without the baggage of dependencies. You might have a critical piece of code that absolutely must meet certain performance criteria. Measuring such microcode cam obviously be done with profilers - dotTrace, ANTS to name just two. The problem is they bring their own baggage as well, and worse can’t be easily relied upon in a CI pipeline. So how can you measure microcode performance in CI? Unit tests are a terrible idea, what else is there? Step forward BenchmarkDotNet.


TL;DR

Measure your code’s performance with benchmarks at near zero cost and. All you need are:

  • .NET 7 SDK
  • VS/VSCode
  • BenchmarkDotNet from Nuget

We’ll talk about how to write simple benchmarks, how to run them and how to interpret the results.


What is BenchmarkDotNet?

BenchmarkDotNet does what it says on the tin: benchmark .net code. It’s available as a Nuget packaged library for inclusion into your .net console applications. It is very widely used by all major players in the .net world, including the dotnet core runtime project itself.

What does a HelloWorld benchmark look like?

Let’s say you have a very basic Fibonacci implementation - and you want to measure its resource usage growth as more numbers are generated.

By “resource usage” I mean time and memory consumed per method call.

In other words, you'd want to know how it scales. Here's an implementation of "get the first N Fibonacci numbers":



public static IEnumerable<int> GetFibonacci(this int count)
{
    var w = 1;
    var x = 1;

    yield return x;
    foreach (var _ in Enumerable.Range(1, count - 1))
    {
        var y = w + x;
        yield return y;
        w = x;
        x = y;
    }
}


Enter fullscreen mode Exit fullscreen mode

No prizes are sought for best efficiency here. Please do not take this as a reference implementation of Fibonacci!


To answer the scaling question, we would implement a benchmark, run it and analyse the results. Skipping forward a rendered benchmark report would look something like the below:

alt text


What do all the headers actually mean?

The column What it means
Method The name of the code-under-test; a single benchmark may have several methods under test for, e.g. scenarios. This value is lifted directly from your benchmark code.
Count An arbitrary parameter: in this case the number of Fibonacci numbers generated by the method under test.
Mean/Error/StdDev/StdError Execution time statistics. Note that these can be given down to nanoseconds, depending on how fast your code is. Low is best.
Min/Q1/Median/Q3/Max Quartile execution time statistics: note the time units. Low is best.
Ops/sec The number of operations executed per second for the method/parameter combination. High is good.
Rank The fastest performing method/parameter combination. Low is best.
Gen 0/1/2 The total number of collections per generation
Allocated Total bytes allocated against all generations

Note the header information in the report! It’ll give details on the OS, CPU, .Net version, JIT method and GC configuration. Always benchmark like-for-like!


OK… what do those numbers really mean?

Let’s look at each iteration of Count, and we’re using it here to get the first Count numbers of the Fibonacci sequence.

Where Count is 1 the mean execution time is 103.4 nanoseconds. That’s 0.1 microseconds, or 0.0001 milliseconds. I like that: nice and fast.

Where Count is 13 (yes, the parameters themselves follow Fibonacci!) the mean time is 407.2 ns: four times what Count=1 is, yet the Count is 13 times bigger. I’ll take that, for now.

Where Count is 34 the mean time is 1,077.9 ns, or 1.077 microseconds, or just over 0.0001 milliseconds. That’s taking 2.6 times more time than Count = 13. Let’s compare against Count = 1: Count is 34 times bigger , yet takes 10 times the time. I’ll take that too.

If we plot Count against the time ratio we see this:

alt text

In other words, time used does not grow as Count grows. If it did, the lines would be parallel.

So the benchmarks are showing that the implementation has reasonably acceptable scaling. It's not constant time, but it’s better than O(n) time: a pleasant surprise.

If you're not satisfied with the performance results, simply make your changes, re-run the benchmarks & re-analyse. That's it.


You haven’t mentioned the memory yet, have you?

Trust me, I’m getting to that.

Pay particular attention to memory usage. Garbage collections and memory allocations are as important as sheer speed!

  • Count=1 used 128 bytes.
  • Count=13 used 312 bytes
  • Count=34 used 744 bytes.

If we plot Count against the allocation growth ratios, we see this:

alt text

This means the used memory isn’t constant either: the memory used for Count=34 is greater than the memory used for Count=1. Again it's better than O(n). To my mind this is OK, but not great: we need more investigation. It's probably incurred with yield return, but do we want to sacrifice the readability? Probably not, but in any case we’re getting new perspectives on our code. This is a good thing.


What other rendered reports can you get?

You can output a markdown version of your report and many other formats; Markown output is GitHub inspired.

You can use the following attributes to output the many different types of rendered reports:



[JsonExporterAttribute.Full()]
[CsvMeasurementsExporter()]
[CsvExporter(CsvSeparator.Comma)]
[HtmlExporter()]
[MarkdownExporterAttribute.GitHub()]


Enter fullscreen mode Exit fullscreen mode

An example of the GitHub Markdown report:

alt text

Charting is supported through the R project. As R is a world in itself, I’m going to skip the subject.

If you want charts, consider importing the rendered JSON into Excel. The CsvExporter attribute will generate a CSV with the data you need.


Full code example

What does the benchmark code look like using BenchmarkDotNet? It might surprise you to see how simple it is.

BenchmarkDotNet relies on declarative code over which it will reflect. Leaving aside the class attributes (more on those later), note the [Params] attribute over Count from the report above, likewise [Benchmark] and Fibonacci().



using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Exporters.Csv;

namespace benchmarkdotnetdemo
{
    [InProcess()]
    [MemoryDiagnoser]
    [RankColumn, MinColumn, MaxColumn, Q1Column, Q3Column, AllStatisticsColumn]
    [JsonExporterAttribute.Full, CsvMeasurementsExporter, CsvExporter(CsvSeparator.Comma), HtmlExporter, MarkdownExporterAttribute.GitHub]
    [GcServer(true)]
    public class FibonacciBenchmark
    {
        [Params(1, 2, 3, 5, 8, 13, 21, 34)]
        public int Count { get; set; }

        [Benchmark]
        public void Fibonacci()
        {
            var xs = Count.GetFibonacci().ToList();
        }
    }
}


Enter fullscreen mode Exit fullscreen mode

You’ll notice that the benchmarks have a return type of void and do not have any assertions. Remember: we’re not proving functional correctness here, we’re measuring resource usage.


Show me the code!

I’ve created a simple BenchmarkDotNet implementation here:

GitHub logo NewDayTechnology / benchmarking-performance-part-1

A simple demonstration of BenchmarkDotNet

There’s only the one C# project in there - benchmarkdotnetdemo.csproj - that contains the minimal files.

BenchmarkDotNet will only work if the console project is built with a Release configuration, that is with code optimisations applied. Running in Debug will result in a run-time error.

Setup

This is the Program.cs file, and like all C# console apps you need an entry point:



using System;
using BenchmarkDotNet.Running;

namespace benchmarkdotnetdemo
{
    class Program
    {
        static int Main(string[] args)
        {
            try
            {
                BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly)
                    .Run(args);

                return 0;
            }
            catch (Exception ex)
            {
                Console.ForegroundColor = ConsoleColor.Red;
                Console.WriteLine(ex.Message);
                Console.ResetColor();
                return 1;
            }
        }
    }
}


Enter fullscreen mode Exit fullscreen mode

Aside from the standard method entry point let’s go over it bit by bit.



using BenchmarkDotNet.Running;


Enter fullscreen mode Exit fullscreen mode

For bootstrapping BenchmarkDotNet, this is the only import you need.




BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);


Enter fullscreen mode Exit fullscreen mode

This one-line-to-rule-them-all will perform all command line parsing, all help, all benchmark execution and all report generation.

One point here is .FromAssembly(typeof(Program).Assembly) - this informs BenchmarkDotNet of its benchmark search scope. Benchmarks are internally discovered by reflection - you’ll see soon enough.

NOTE: If you were to run the project without any command line arguments, BenchmarkDotNet will assume an interactive CLI.

.Run(args) returns a sequence of report objects comprised of the same data used for rendered reports: I’ve excluded them for simplicity. If you want to run benchmarks and fail CI builds if performance dips they are your first place to look.


Create a new benchmark

There is a file called SimpleBenchmark.cs. Let’s have a look.



using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Exporters.Csv;

namespace benchmarkdotnetdemo
{
    [InProcess()]
    [MemoryDiagnoser]
    [RankColumn, MinColumn, MaxColumn, Q1Column, Q3Column, AllStatisticsColumn]
    [JsonExporterAttribute.Full, CsvMeasurementsExporter, CsvExporter(CsvSeparator.Comma), HtmlExporter, MarkdownExporterAttribute.GitHub]
    [GcServer(true)]
    public class SimpleBenchmark
    {
        [Benchmark]
        public void NoopTest() { }

        [Benchmark]
        public int AddTest() => int.MaxValue + int.MinValue;

        [Benchmark]
        public int MultiplyTest() => 11 * 3;
    }
}


Enter fullscreen mode Exit fullscreen mode

FibonacciBenchmark.cs

Just for completeness: note the similar declarations as SimpleBenchmarks.cs. In this case, we’re adding a [Params] parameter to support benchmark permutations.



using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Exporters.Csv;

namespace benchmarkdotnetdemo
{
    [InProcess()]
    [MemoryDiagnoser]
    [RankColumn, MinColumn, MaxColumn, Q1Column, Q3Column, AllStatisticsColumn]
    [JsonExporterAttribute.Full, CsvMeasurementsExporter, CsvExporter(CsvSeparator.Comma), HtmlExporter, MarkdownExporterAttribute.GitHub]
    [GcServer(true)]
    public class FibonacciBenchmark
    {
        [Params(1, 2, 3, 5, 8, 13, 21, 34)]
        public int Count { get; set; }

        [Benchmark]
        public void Fibonacci()
        {
            var xs = Count.GetFibonacci().ToList();
        }
    }
}


Enter fullscreen mode Exit fullscreen mode

How are benchmarks executed?

Without going into too much detail, BenchmarkDotNet will attempt to run your benchmarks many many times over to settle on mean and median values.

When you run the benchmarks you may first be confused by just how many iterations are involved, so let’s give a simplistic explanation. Modern OSs are preemptive multitaskers, CPUs have pipeline caches as well as instruction reordering features. .NET itself has the JIT compiler. This means that no single execution of code can be relied upon to give a canonical result.

This is part of the reason why unit tests are terrible for benchmarking! They only run once and incur their own (unaccounted) overheads.

BenchmarkDotNet will run warm up iterations before it can take representative values. These show up as various stages: OverheadJitting & WorkloadJitting, WorkloadPilot, OverheadWarmup, OverheadActual.

JIT comes at a cost: the first time any .NET code executes it must first be JIT compiled. The more complex the code the higher the JIT cost, usually showing as CPU and time costs. As we’re interested only in runtime performance, these steps eliminate JIT costs from measurements.

In the same vein other warmup steps are run to eliminate other “once only” costs, for instance to warm up pipelining caches.

alt text


After these steps have completed, BenchmarkDotNet will iterate these operations to yield the final statistics; these are shown as WorkloadActual steps.

alt text


If you want more detail, please refer to BenchmarkDotNet’s own documentation. In these code samples we’re using the default Throughput strategy for microbenchmarking.


How long does it take?

It depends ;) Simple calculations, such as in the demo project, will run in under a minute. Adding permutations (such as with [Params]) will linearly increase the benchmarking time, as each parameter will be benchmarked in its own right.

With that in mind, it’s quite clear that resource hungry algorithms, benchmarked with a large variety of parameters, will take a considerable amount of time.

Don’t expect to parallelise BenchmarkDotNet: it runs benchmarks sequentially. Thread context switching is itself a cost and extremely difficult to compensate for.


What have we learned?

  • We’ve seen how to get BenchmarkDotNet
  • We’ve seen how to integrate it in a simple console application
  • We’ve seen the minimum work needed to build benchmarks
  • We’ve had a taste of the reports and inferences we can gain from BenchmarkDotNet

Next Steps

How to incorporate into CI?


More Information

  • What is benchmarking - Wiki
  • BenchmarkDotNet
  • BenchmarkDotNet on Github
  • A real world use case of BenchmarkDotNet
  • GitHub repository:

    GitHub logo NewDayTechnology / benchmarking-performance-part-1

    A simple demonstration of BenchmarkDotNet

    Measuring performance with BenchmarkDotNet part 1

    Build

    Contributor Covenant

    A simple demonstration of the superlative BenchmarkDotNet and its integration into Github Actions.

    Measuring code performance is self evidently a vital discipline to software engineering and yet is so often skipped, usually for false economies. BenchmarkDotNet makes this essential task simplicity itself, with a syntax and style that's immediately intuitive to anyone versed in unit testing.

    Just exercise your code in a declarative way, include it in your CI pipeline, and enjoy the results.

    This project just demonstrates the basics: the .net project, the CI pipeline and the resultant reports.

    The Benchmarks

    • NoopTest The absolute minimum function that can be benchmarked - it does nothing.

    • AddTest A simple addition metric, again of minimal complexity.

    • MultiplyTest A simple multiplication metric, again of minimal complexity.

    • Fibonacci Benchmarking a Fibonacci implementation, measuring the computation time for the first N Fibonacci numbers.

    Builds

    Builds are managed with love…



💖 💪 🙅 🚩
tonycknight
Tony Knight

Posted on March 15, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related