From PHPUnit to Go: Data-Driven Unit Testing for Go Developers

medunes

medunes

Posted on November 10, 2024

From PHPUnit to Go: Data-Driven Unit Testing for Go Developers

In this post, we'll explore how to bring the PHP unit testing mindset, particularly the PHPUnit framework's data provider approach, into Go. If you're an experienced PHP developer, you’re likely familiar with the data provider model: gathering test data separately in raw arrays and feeding this data into a test function. This approach makes unit tests cleaner, more maintainable, and adheres to principles like Open/Closed.

Why the data provider Approach?

Using a data provider approach to structure unit tests in Go provides several advantages, including:

Enhanced Readability and Extensibility: Tests become visually organized, with clearly separated arrays at the top representing each test scenario. Each array's key describes the scenario, while its content holds the data to test that scenario. This structure makes the file pleasant to work on and easy to extend.

Separation of Concerns: The data provider model keeps data and test logic apart, resulting in a lightweight, decoupled function that can remain largely unchanged over time. Adding a new scenario only requires appending more data to the provider, keeping the test function open for extensions but closed for modification—a practical application of the Open/Closed Principle in testing.

In some projects, I’ve even seen scenarios dense enough to warrant using a separate JSON file as the data source, manually built and fed to the provider, which in turn supplies data to the test function.

When it is very encouraged to use data providers?

Using data providers is especially encouraged when you have a large number of test cases with varying data: each test case is conceptually similar but differs only in input and expected output.

Intermingling data and logic in a single test function can reduce Developer Experience (DX). It often leads to:

Verbosity Overload: Redundant code that repeats statements with slight data variations, leading to a codebase that is verbose without added benefit.

Reduced Clarity: Scanning through the test function becomes a chore when trying to isolate the actual test data from the surrounding code, which the data provider approach naturally alleviates.

Nice, so what is exactly a data provider?

The DataProvider pattern in PHPUnit where basically the provider function supplies the test function with different sets of data that gets consumed in an implicit loop. It ensures the DRY (Don't Repeat Yourself) principle, and aligns with the Open/Closed Principle as well,by making it easier to add or modify test scenarios without altering the core test function logic.

Solving the problem without a data provider?

To illustrate the drawbacks of verbosity, code duplication, and maintenance challenges, here is a snippet of an example of unit test for the bubble sort function without the help of data providers:

<?php

declare(strict_types=1);

use PHPUnit\Framework\TestCase;

final class BubbleSortTest extends TestCase
{
    public function testBubbleSortEmptyArray()
    {
        $this->assertSame([], BubbleSort([]));
    }

    public function testBubbleSortOneElement()
    {
        $this->assertSame([0], BubbleSort([0]));
    }

    public function testBubbleSortTwoElementsSorted()
    {
        $this->assertSame([5, 144], BubbleSort([5, 144]));
    }

    public function testBubbleSortTwoElementsUnsorted()
    {
        $this->assertSame([-7, 10], BubbleSort([10, -7]));
    }

    public function testBubbleSortMultipleElements()
    {
        $this->assertSame([1, 2, 3, 4], BubbleSort([1, 3, 4, 2]));
    }

    // And so on for each test case, could be 30 cases for example.

    public function testBubbleSortDescendingOrder()
    {
        $this->assertSame([1, 2, 3, 4, 5], BubbleSort([5, 4, 3, 2, 1]));
    }

    public function testBubbleSortBoundaryValues()
    {
        $this->assertSame([-2147483647, 2147483648], BubbleSort([2147483648, -2147483647]));
    }
}

Enter fullscreen mode Exit fullscreen mode

Are there issues with the above code? sure:

Verbosity: Each test case requires a separate method, resulting in a large, repetitive code-base.

Duplication: Test logic is repeated in each method, only varying by input and expected output.

Open/Closed Violation: Adding new test cases requires altering the test class structure by creating more methods.

Solving the problem with data provider!

Here’s the same test suite refactored to use a data provider

<?php

declare(strict_types=1);

use PHPUnit\Framework\TestCase;

final class BubbleSortTest extends TestCase
{
    /**
     * Provides test data for bubble sort algorithm.
     *
     * @return array<string, array>
     */
    public function bubbleSortDataProvider(): array
    {
        return [
            'empty' => [[], []],
            'oneElement' => [[0], [0]],
            'twoElementsSorted' => [[5, 144], [5, 144]],
            'twoElementsUnsorted' => [[10, -7], [-7, 10]],
            'moreThanOneElement' => [[1, 3, 4, 2], [1, 2, 3, 4]],
            'moreThanOneElementWithRepetition' => [[1, 4, 4, 2], [1, 2, 4, 4]],
            'moreThanOneElement2' => [[7, 7, 1, 0, 99, -5, 10], [-5, 0, 1, 7, 7, 10, 99]],
            'sameElement' => [[1, 1, 1, 1], [1, 1, 1, 1]],
            'negativeNumbers' => [[-5, -2, -10, -1, -3], [-10, -5, -3, -2, -1]],
            'descendingOrder' => [[5, 4, 3, 2, 1], [1, 2, 3, 4, 5]],
            'randomOrder' => [[9, 2, 7, 4, 1, 6, 3, 8, 5], [1, 2, 3, 4, 5, 6, 7, 8, 9]],
            'duplicateElements' => [[2, 2, 1, 1, 3, 3, 4, 4], [1, 1, 2, 2, 3, 3, 4, 4]],
            'largeArray' => [[-1, -10000, -12345, -2032, -23, 0, 0, 0, 0, 10, 10000, 1024, 1024354, 155, 174, 1955, 2, 255, 3, 322, 4741, 96524], [-1, -10000, -12345, -2032, -23, 0, 0, 0, 0, 10, 10000, 1024, 1024354, 155, 174, 1955, 2, 255, 3, 322, 4741, 96524]],
            'singleNegativeElement' => [[-7], [-7]],
            'arrayWithZeroes' => [[0, -2, 0, 3, 0], [-2, 0, 0, 0, 3]],
            'ascendingOrder' => [[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]],
            'descendingOrderWithDuplicates' => [[5, 5, 4, 3, 3, 2, 1], [1, 2, 3, 3, 4, 5, 5]],
            'boundaryValues' => [[2147483648, -2147483647], [-2147483647, 2147483648]],
            'mixedSignNumbers' => [[-1, 0, 1, -2, 2], [-2, -1, 0, 1, 2]],
        ];
    }

    /**
     * @dataProvider bubbleSortDataProvider
     *
     * @param array<int> $input
     * @param array<int> $expected
     */
    public function testBubbleSort(array $input, array $expected)
    {
        $this->assertSame($expected, BubbleSort($input));
    }
}

Enter fullscreen mode Exit fullscreen mode

Are there any advantages of using the data provider? oh yeah:

Conciseness: All test data is centralized in a single method, removing the need for multiple functions for each scenario.

Enhanced Readability: Each test case is well-organized, with descriptive keys for each scenario.

Open/Closed Principle: New cases can be added to the data provider without altering the core test logic.

Improved DX (Developer Experience): Test structure is clean, appealing to the eyes, making even those lazy developers motivated to extend, debug, or update it.

Bringing Data Providers to Go

  • Go doesn't have a native data provider model like PHPUnit, so we need to use a different approach. There could be many implementation with several levels complexity, the following is an average one that might be a candidate to simulate data provider in Go land
package sort

import (
    "testing"

    "github.com/stretchr/testify/assert"
)

type TestData struct {
    ArrayList    map[string][]int
    ExpectedList map[string][]int
}

const (
    maxInt32 = int32(^uint32(0) >> 1)
    minInt32 = -maxInt32 - 1
)

var testData = &TestData{
    ArrayList: map[string][]int{
        "empty":                            {},
        "oneElement":                       {0},
        "twoElementsSorted":                {5, 144},
        "twoElementsUnsorted":              {10, -7},
        "moreThanOneElement":               {1, 3, 4, 2},
        "moreThanOneElementWithRepetition": {1, 4, 4, 2},
        "moreThanOneElement2":              {7, 7, 1, 0, 99, -5, 10},
        "sameElement":                      {1, 1, 1, 1},
        "negativeNumbers":                  {-5, -2, -10, -1, -3},
        "descendingOrder":                  {5, 4, 3, 2, 1},
        "randomOrder":                      {9, 2, 7, 4, 1, 6, 3, 8, 5},
        "duplicateElements":                {2, 2, 1, 1, 3, 3, 4, 4},
        "largeArray":                       {-1, -10000, -12345, -2032, -23, 0, 0, 0, 0, 10, 10000, 1024, 1024354, 155, 174, 1955, 2, 255, 3, 322, 4741, 96524},
        "singleNegativeElement":            {-7},
        "arrayWithZeroes":                  {0, -2, 0, 3, 0},
        "ascendingOrder":                   {1, 2, 3, 4, 5},
        "descendingOrderWithDuplicates":    {5, 5, 4, 3, 3, 2, 1},
        "boundaryValues":                   {2147483648, -2147483647},
        "mixedSignNumbers":                 {-1, 0, 1, -2, 2},
    },
    ExpectedList: map[string][]int{
        "empty":                            {},
        "oneElement":                       {0},
        "twoElementsSorted":                {5, 144},
        "twoElementsUnsorted":              {-7, 10},
        "moreThanOneElement":               {1, 2, 3, 4},
        "moreThanOneElementWithRepetition": {1, 2, 4, 4},
        "moreThanOneElement2":              {-5, 0, 1, 7, 7, 10, 99},
        "sameElement":                      {1, 1, 1, 1},
        "negativeNumbers":                  {-10, -5, -3, -2, -1},
        "descendingOrder":                  {1, 2, 3, 4, 5},
        "randomOrder":                      {1, 2, 3, 4, 5, 6, 7, 8, 9},
        "duplicateElements":                {1, 1, 2, 2, 3, 3, 4, 4},
        "largeArray":                       {-1, -10000, -12345, -2032, -23, 0, 0, 0, 0, 10, 10000, 1024, 1024354, 155, 174, 1955, 2, 255, 3, 322, 4741, 96524},
        "singleNegativeElement":            {-7},
        "arrayWithZeroes":                  {-2, 0, 0, 0, 3},
        "ascendingOrder":                   {1, 2, 3, 4, 5},
        "descendingOrderWithDuplicates":    {1, 2, 3, 3, 4, 5, 5},
        "boundaryValues":                   {-2147483647, 2147483648},
        "mixedSignNumbers":                 {-2, -1, 0, 1, 2},
    },
}

func TestBubble(t *testing.T) {

    for testCase, array := range testData.ArrayList {
        t.Run(testCase, func(t *testing.T) {
            actual := Bubble(array)
            assert.ElementsMatch(t, actual, testData.ExpectedList[testCase])
        })

    }
}
Enter fullscreen mode Exit fullscreen mode
  • We basically define two maps/lists: one for the input data and the second for the expected data. We ensure that each case scenario on both sides is referred through the same map key on both sides.
  • Executing the tests is then a matter of a loop in a simple function that iterates over the prepared input/expected lists.
  • Except some one-time boiler-plate of types, modifications to tests should only happen on the data side, mostly no change should alter the logic of the function executing tests, thus achieving the goals we've talked about above: reducing test work down to a matter of raw data preparation.

Bonus: A Github repository implementing the logic presented in this blogpost can be found here https://github.com/MedUnes/dsa-go. So far it contains Github actions running these tests and even showing that super famous green badge ;)

See you in the next [hopefully] informative post!

💖 💪 🙅 🚩
medunes
medunes

Posted on November 10, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related