Yawar Amin
Posted on July 1, 2024
THERE are various techniques and tools to do unit testing in OCaml. A small selection:
- Alcotest - a colourful unit testing framework
- OUnit2 - an xUnit-style test framework
- ppx_expect - a snapshot testing framework
- Speed, a new framework announced right here on dev.to, with an emphasis on a fast feedback loop.
While these have various benefits, it is undeniable that they all involve using a third-party library to write the tests, learning the various assertion functions and their helpers, and learning how to read and deal with test failure outputs. I have lately been wondering if we can simplify and distill this process to its very essence.
When you run a unit test, you have some expected output, some 'actual' output from the system under test, and then you compare the two. If they are the same, then the test passes, if they are different, the test fails. Ideally, you get the test failure report as an easily readable diff so you can see exactly what went wrong. Of course, this is a simplified view of unit testing–there are tests that require more sophisticated checks–but for many cases, this simple approach is often 'good enough'.
Enter dune
And here is where dune, OCaml's build system, comes in. It turns out that dune ships out of the box with a 'diff-and-promote' workflow. You can tell it to diff two files, running silently if they have the same content, or failing and printing out a diff if they don't. Then you can run a simple dune promote
command to update the 'expected' or 'snapshot' file with the 'actual' content.
Let's look at an example.
Example project
Let's set up a tiny example project to test out this workflow. Here are the files:
dune-project
(lang dune 2.7)
This file is needed for dune to recognize a project. You can use any supported version of dune here, I just default to 2.7.
lib/dune
(library
(name lib))
This declares a dune library inside the project.
lib/lib.ml
let add x y = x + y
let sub x y = x - y
This is the implementation source code of the library. Here we are just setting up two dummy functions that we will 'test' for demonstration purposes. Of course in real-world code there will be more complex functions.
test/test.expected.txt
(This file is deliberately left empty.)
test/test.ml
let test msg op x y = Printf.printf "%s: %d\n\n" msg (op x y)
open Lib
let () =
test "add 1 1" add 1 1;
test "sub 1 1" sub 1 1
This file defines a test
helper function whose only job is to just print out a message and then the result of the test, together, to standard output. Then we use the helper repeatedly to test various scenarios. This has the effect that we just print out a bunch of things to standard output.
test/dune
(test
(name test)
(libraries lib)
(action
(diff test.expected.txt test.actual.txt)))
(rule
(with-stdout-to
test.actual.txt
(run ./test.exe)))
Here is where the magic happens. It has two stanzas. Let's look at them one by one.
test
- this stanza defines the test 'component' for dune. Now dune will carry out the test when we run the dune test
command. It says that this test depends on the lib
library (defined earlier), and for the actual action of the test, it should diff the two given files. The first file, test.expected.txt
, is meant to be committed into the codebase. It is initially empty, and we will update it as part of our testing workflow.
rule
- this stanza defines how to generate the second file needed by the diff
action of the test
stanza. It's somewhat like a makefile rule. The with-stdout-to
field tells dune to run the ./test.exe
executable, which it knows how to get by compiling test.ml
, and redirect the output into test.actual.txt
. Once this is done, the test
stanza can proceed and diff the two files.
Notice that dune understands the inputs and outputs of both these stanzas, and will recompile and rerun the actions as necessary to update the files.
First test
Now let's run the initial test:
$ dune test
File "test/test.expected.txt", line 1, characters 0-0:
diff --git a/_build/default/test/test.expected.txt b/_build/default/test/test.actual.txt
index e69de29..1522c5b 100644
--- a/_build/default/test/test.expected.txt
+++ b/_build/default/test/test.actual.txt
@@ -0,0 +1,4 @@
+add 1 1: 2
+
+sub 1 1: 0
+
Promotion
The diff says that the actual output content is not what we 'expected'. Of course, we deliberately started with an empty file here, so let's update the 'expected file' to match the 'actual' one:
$ dune promote
Promoting _build/default/test/test.actual.txt to test/test.expected.txt.
Rerun test
After the promotion, let's check that the test passes:
$ dune test
$
No output, meaning the test succeeded.
Add tests
Let's add a new test:
let () =
test "add 1 1" add 1 1;
test "sub 1 1" sub 1 1;
test "sub 1 -1" sub 1 ~-1
And run it:
$ dune test
File "test/test.expected.txt", line 1, characters 0-0:
diff --git a/_build/default/test/test.expected.txt b/_build/default/test/test.actual.txt
index 1522c5b..17ccf8e 100644
--- a/_build/default/test/test.expected.txt
+++ b/_build/default/test/test.actual.txt
@@ -2,3 +2,5 @@ add 1 1: 2
sub 1 1: 0
+sub 1 -1: 2
+
OK, we just need to promote it: dune promote
. Then the next dune test
succeeds.
Fix a bug
Let's say we introduce a bug into our implementation:
let sub x y = x + y
Now let's run the tests:
$ dune test
File "test/test.expected.txt", line 1, characters 0-0:
diff --git a/_build/default/test/test.expected.txt b/_build/default/test/test.actual.txt
index 17ccf8e..29adb0b 100644
--- a/_build/default/test/test.expected.txt
+++ b/_build/default/test/test.actual.txt
@@ -1,6 +1,6 @@
add 1 1: 2
-sub 1 1: 0
+sub 1 1: 2
-sub 1 -1: 2
+sub 1 -1: 0
It gives us a diff of exactly the failing tests. Obviously, in this case we are not going to run dune promote
. We need to fix the implementation: let sub x y = x - y
, then rerun the test. And we see that after fixing and rerunning, dune test
exits silently, meaning the tests are passing again.
Discussion
So...should you actually do this? Let's look at the pros and cons.
Pros
- No need for a third-party testing library. Dune already does the heavy lifting of running tests and diffing outputs.
- No need to learn a set of testing APIs that someone else created. You can just write your own helpers that are custom-made for testing your libraries. All you need to do is make the output understandable and diffable.
- Diff-and-promote workflow is really quite good, even with a bare-bones setup like this. Conventional unit test frameworks really struggle to provide diff output as good as this (Jane Street's ppx_expect is an exception which takes a hybrid approach and wants to make the workflow a joyful experience).
- You have all expected test results in a single file for easy inspection.
Cons
- It's tied to dune. While dune is today and for the foreseeable future clearly the recommended build system for OCaml, not everyone is using it, and there's no guarantee that the ecosystem will stick to it in perpetuity. It's just highly likely.
- You have to define your own output format and helpers. While usually not that big of a deal, it may still need some thought and knowledge to define printers for complex custom types.
- You can't run only a subset or a single test. You have to run all tests defined in the executable test module. This is not a huge deal if tests usually run fast, but can become problematic when you have slow tests. Of course, many things become problematic when you have slow unit tests.
- It doesn't output results in a structured format that can be processed by other tools, eg
junit.xml
that can be used by CI pipelines to report test failures, or test coverage. - It goes against the 'common wisdom'. People expect unit tests to use conventional-style frameworks, and can be taken aback when they don't.
Overall, in my opinion this approach is fine for simple cases. If you have more complex needs, fortunately there are plenty of options for more powerful test frameworks.
Posted on July 1, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.