Concurrency in Go is hard

dillendev

👨‍💻 Dillen Meijboom

Posted on August 29, 2022

Concurrency in Go is hard

I understand that the title might be somewhat confusing as Go is generally known for having good built-in support for concurrency. However, I don’t necessarily think it’s easy to write concurrent software in Go. Let me show you what I mean.

Using global variables

The first example is something we ran into while working on a project. Up until recently, the sarama library (Go library for Apache Kafka) contained the following piece of code (at sarama/version.go):

package sarama

import "runtime/debug"

var v string

func version() string {
    if v == "" {
        bi, ok := debug.ReadBuildInfo()
        if ok {
            v = bi.Main.Version
        } else {
            v = "dev"
        }
    }
    return v
}
Enter fullscreen mode Exit fullscreen mode

At a glance, this looks fine, right? If the version isn’t set globally, it's either based on the build information or assigned to a static value (dev). Otherwise, the version is returned as is. When we run this code, it would appear to work as intended.

However, when the version function is called concurrently, the global variable v could be accessed from multiple goroutines simultaneously, resulting in a potential data race. These issues are hard to track down as they only occur at runtime at precisely the right conditions.

The solution

This issue was fixed in #2171 by using sync.Once which, according to the docs, is “an object that will perform exactly one action.” This means that we can use it to set the version once so that subsequent calls to the version function will return the result. The fix looks like this:

package sarama

import (
    "runtime/debug"
    "sync"
)

var (
    v     string
    vOnce sync.Once
)

func version() string {
    vOnce.Do(func() {
        bi, ok := debug.ReadBuildInfo()
        if ok {
           v = bi.Main.Version
        } else {
           v = "dev"
        }
    })
    return v
}
Enter fullscreen mode Exit fullscreen mode

Although I think in this case, it could also have been fixed without using the sync package by using an init function to set the variable v once. As the variable v won’t change after Go runs the init function, it should be fine.

How to prevent it

You can use the data race detector (available since Go 1.1) during tests or when using go run if you only want to start the application. When it detects a potential data race, it will print a warning. To show how this works, I’ve changed the code a little bit to trigger a data race:

package main

import (
    "fmt"
    "runtime/debug"
)

var v string

func version() string {
    if v == "" {
        bi, ok := debug.ReadBuildInfo()
        if ok {
            v = bi.Main.Version
        } else {
            v = "dev"
        }
    }
    return v
}

func main() {
    go func() {
        version()
    }()
    fmt.Println(version())
}
Enter fullscreen mode Exit fullscreen mode

Now we can run it with the -race flag to enable the race detector:

âžś go run -race .                               
==================
WARNING: DATA RACE
Write at 0x000104a16b90 by main goroutine:
  main.version()
      main.go:14 +0x78
  main.main()
      main.go:27 +0x30

Previous read at 0x000104a16b90 by goroutine 7:
  main.version()
      main.go:11 +0x2c
  main.main.func1()
      main.go:24 +0x24

Goroutine 7 (finished) created at:
  main.main()
      main.go:23 +0x2c
==================
(devel)
Found 1 data race(s)
exit status 66
Enter fullscreen mode Exit fullscreen mode

As you can see, a data race is detected. If we analyze the output, we can see that we simultaneously read and write to the v variable. This is what we call a data race. It’s called a data race because both goroutines are “racing” to access the same data.

A visual representation of the race detector output

A visual representation of the race detector output


Copying structs from the sync package

I did find some real-world examples on GitHub, but none was significant enough to mention here. Instead, I’ll explain this based on a sample I made. So here it goes:

package main

import "sync"

type User struct {
    lock sync.RWMutex
    Name string
}

func doSomething(u User) {
    u.lock.RLock()
    defer u.lock.RUnlock()
    // do something with `u`
}

func main() {
    u := User{Name: "John"}
    doSomething(u)
}
Enter fullscreen mode Exit fullscreen mode

The User struct contains two properties: a read/write lock and a string. When the doSomething function is called, the variable u is copied on the stack (also known as being passed by value), including its fields. This is an issue because as the documentation for the sync package states:

Package sync provides basic synchronization primitives such as mutual exclusion locks. Other than the Once and WaitGroup types, most are intended for use by low-level library routines. Higher-level synchronization is better done via channels and communication.

Values containing the types defined in this package should not be copied.

When the doSomething function is evaluated, running RLock/RUnlock will not affect the original lock in the User struct rendering it useless.

The solution

Use a pointer to the lock instead. The pointer will be copied and points to the same value. The updated version looks like this:

type User struct {
    lock *sync.RWMutex
    Name string
}
Enter fullscreen mode Exit fullscreen mode

 

How to prevent it

Use the copylock analyzer to show a warning when types from the sync package are copied. The easiest way is to run go vet before releasing your code. Running this on the original code results in the following output:

âžś go vet .
# data-synchronization
./main.go:10:20: doSomething passes lock by value: data-synchronization.User contains sync.RWMutex
./main.go:20:14: call of doSomething copies lock value: data-synchronization.User contains sync.RWMutex
Enter fullscreen mode Exit fullscreen mode

 

Using time.After

While searching on GitHub, I found a pull request in the Raft implementation by Hashicorp (a distributed consensus algorithm), which we can use to demonstrate the following problem. Let’s start by showing the code (at api.go):

var timer <-chan time.Time
if timeout > 0 {
    timer = time.After(timeout)
}

// Perform the restore.
restore := &userRestoreFuture{
    meta:   meta,
    reader: reader,
}
restore.init()
select {
case <-timer:
    return ErrEnqueueTimeout
case <-r.shutdownCh:
    return ErrRaftShutdown
case r.userRestoreCh <- restore:
    // If the restore is ingested then wait for it to complete.
    if err := restore.Error(); err != nil {
        return err
    }
}
Enter fullscreen mode Exit fullscreen mode

This piece of code is from the Restore method. The select statement waits for one of the following cases: the timer (used for when a timeout is defined), a shutdown channel, or when the restore operation is finished. It’s pretty straightforward, so what is the issue?

The time.After function looks like this:

func After(d Duration) <-chan Time {
    return NewTimer(d).C
}
Enter fullscreen mode Exit fullscreen mode

So it’s nothing more than a shorthand for time.NewTimer, but it “leaks” the timer (as there is no call to timer.Stop). The documentation says the following about it:

After waits for the duration to elapse and then sends the current time on the returned channel. It is equivalent to NewTimer(d).C. The underlying Timer is not recovered by the garbage collector until the timer fires. If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.

I genuinely don’t understand how a function that intentionally “leaks” the timer (resulting in a potential long-lived allocation, depending on the duration) ends up in the standard library…

The solution

Instead of using time.After, we can create the timer manually. This is what it looks like:

var timerCh <-chan time.Time
if timeout > 0 {
    timer := time.NewTimer(timeout)
    defer timer.Stop()
    timerCh = timer.C
}

// Perform the restore.
restore := &userRestoreFuture{
    meta:   meta,
    reader: reader,
}
restore.init()
select {
case <-timerCh:
    return ErrEnqueueTimeout
case <-r.shutdownCh:
    return ErrRaftShutdown
case r.userRestoreCh <- restore:
    // If the restore is ingested then wait for it to complete.
    if err := restore.Error(); err != nil {
        return err
    }
}
Enter fullscreen mode Exit fullscreen mode

When the function is finished, the timer is cleaned up even if it didn’t fire.

How to prevent it

I wouldn’t use time.After in any codebase. There is no real advantage other than saving one or two lines of code, while it can give quite some issues, mainly when it’s used in the hot paths of your code.

Conclusion

You can quickly write concurrent software with Go’s built-in support for concurrency. However, it leaves it up to the user to ensure data is synchronized correctly and tools from the standard library are used as intended. This, combined with the simplicity of Go, makes it hard to write stable, bug-free, concurrent software.

đź’– đź’Ş đź™… đźš©

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related