Go - Race Condition: Detection and Prevention
Furkan Aksoy
Posted on May 27, 2022
A race condition occurs when two or more things (process, threat, goroutine etc.) access the same location in memory at the same time, and at least one of the accesses is a write.
Let’s analyze a quick and simple example
package main
import (
"fmt"
"sync"
)
const (
stepCount = 100000
routineCount = 2
)
var counter int64
func main() {
var wg sync.WaitGroup
for i := 0; i < routineCount; i++ {
wg.Add(1)
go incr(&wg)
}
wg.Wait() // wait until all goroutines executed
fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}
func incr(wg *sync.WaitGroup) {
for i := 0; i < stepCount; i++ {
counter++
}
wg.Done()
}
incr
function is responsible to increase counter as value of the stepCount. In this example we have 2 goroutines and each one increase the counter 100000 times. Then we expect to see 200000 (routineCount x stepCount) in value of the counter variable.
Let’s look at the output of the code 🙃
Step Count: 100000
LastValue : 192801
Expected : 200000
Ooops, we got 192801
instead of 200000
😮 . But why? What’s wrong?
Critical Section
Before go over the problem, we should know the what is the critical section. Critical section is nothing but segment of code that should not be executed by multiple processes at the same time. Only one process/routines can execute in the critical section, other ones have to wait their turns. If not, the result will be the same as the above one.
Analyze The Problem
Now we know what is critical section. Let’s go over the problem. We have 2 same routines that are increasing the counter value by applying these steps:
1. Read value of the counter
2. Add 1 to counter
3. Store increased value in counter
The critical section consists of these two steps. Routines should not execute these steps at the same time.
Let’s imagine the scenario:
Routine 1: Read value of the counter (counter = 12)
Routine 2: Read value of the counter (counter = 12)
Routine 1: Add 1 to counter (12 + 1 = 13)
Routine 1: Store increased value in counter (counter = 13)
Routine 2: Add 1 to counter (12 + 1 = 13)
Routine 2: Store increased value in counter (counter = 13)
Oops, two routines execute some steps at the same time. Although 2 routines increase the counter by 1 that means we expect the counter will be increased by 2, the counter will only increase by 1. That’s why we got the unexpected answer in the output. Routine 2 should have waited for the critical section of Routine 1 to be executed. (Race Condition 🙋)
How to Detect?
If code written in go, you’re lucky. Golang has internal race detector tool (you don’t have to install explicitly) which is written in C/C++ using ThreadSanitizer runtime library. Tool watches for unsynchronized accesses to shared variables. If it detects some race condition case, prints a warning. Please be careful when using this tool, do not run it in production. It can consume ten times the CPU and memory.
Because of its design, the race detector can detect race conditions only when they are actually triggered by running code, which means it’s important to run race-enabled binaries under realistic workloads. However, race-enabled binaries can use ten times the CPU and memory, so it is impractical to enable the race detector all the time. One way out of this dilemma is to run some tests with the race detector enabled. Load tests and integration tests are good candidates, since they tend to exercise concurrent parts of the code. Another approach using production workloads is to deploy a single race-enabled instance within a pool of running servers.
How to Use Race-Detector Tool?
No need to install anything. It’s fully integrated with the Go tool chain. Just add a -race
flag while compiling/running your application.
$ go test -race mypkg // test the package
$ go run -race mysrc.go // compile and run the program
$ go build -race mycmd // build the command
$ go install -race mypkg // install the package
Let’s run it for our racy code.
$ go run -race main.go
==================
WARNING: DATA RACE
Read at 0x000001279320 by goroutine 8:
main.incr()
main.go:29 +0x47
Previous write at 0x000001279320 by goroutine 7:
main.incr()
main.go:29 +0x64
Goroutine 8 (running) created at:
main.main()
main.go:20 +0xc4
Goroutine 7 (running) created at:
main.main()
main.go:20 +0xc4
==================
Step Count: 100000
LastValue : 192801
Expected : 200000
Found 1 data race(s)
exit status 66
Results shows that an unsynchronized events of the variable counter
from different routines. We’ll go over the solution in next section.
Another points to prevent and detect the race conditions are that
- qualified code reviews
- designing and modeling applications that use as little shared resources as possible
- increasing know-how about such situations
- unit-tests for concurrent things
How to Handle?
Until know, we understood the problem and detect the bug. Let’s fix it!
Using Mutex
Mutex (mutual exclusion) is a lock/unlock mechanism for critical sections. If it’s locked, the critical section is reserved for one goroutine, other ones have to wait until unlocking. In our code, we should lock the code that increases the counter. Other goroutines should not be able to increase the counter if one goroutine is already working on it.
package main
import (
"fmt"
"sync"
)
const (
stepCount = 100000
routineCount = 2
)
var counter int64
func main() {
var wg sync.WaitGroup
var mx sync.Mutex // initialize mutex
for i := 0; i < routineCount; i++ {
wg.Add(1)
go incr(&wg, &mx) // pass mutex to each routine
}
wg.Wait() // wait until all goroutines executed
fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}
func incr(wg *sync.WaitGroup, mx *sync.Mutex) {
for i := 0; i < stepCount; i++ {
mx.Lock() // lock critical section for this routine
counter++ // critical section
mx.Unlock() // unlock critical section then other routines can use it.
}
wg.Done()
}
Using Channels
According to the go document;
Channels are the pipes that connect concurrent goroutines. You can send values into channels from one goroutine and receive those values into another goroutine.
It’s a simple pipe actually. In this scenario we can use buffered channels with capacity 1 to synchronize our goroutines. That means channel accept only one data in it, does not accept the new data until current one read.
Long story short, we should pass the channel to fired routines, and each routine should send a some value to channel to block other routines. When it’s done, routine should clean out the channel to allow other routines. It’s like a lock/unlock mechanism provided by mutex.
package main
import (
"fmt"
"sync"
)
const (
stepCount = 100000
routineCount = 2
)
var counter int64
func main() {
var wg sync.WaitGroup
ch := make(chan struct{}, 1) // define buffered channel
for i := 0; i < routineCount; i++ {
wg.Add(1)
go incr(&wg, ch)
}
wg.Wait() // wait until all goroutines executed
fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}
func incr(wg *sync.WaitGroup, ch chan struct{}) {
ch <- struct{}{} // send empty struct into channel to block other routines.
for i := 0; i < stepCount; i++ {
counter++
}
<- ch // clear out the channel
wg.Done()
}
Using Atomic Package
Atomic function does not need any lock, they are implemented at hardware level. If performance really important for you, atomic package can be used to create lock-free application. But, you or your team should know how atomic functions work in-behind. For example atomic variables should be controlled by only atomic functions. Don’t read or write like a classic variable.
package main
import (
"fmt"
"sync"
"sync/atomic"
)
const (
stepCount = 100000
routineCount = 2
)
var counter int64
func main() {
var wg sync.WaitGroup
for i := 0; i < routineCount; i++ {
wg.Add(1)
go incr(&wg)
}
wg.Wait() // wait until all goroutines executed
fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}
func incr(wg *sync.WaitGroup) {
for i := 0; i < stepCount; i++ {
atomic.AddInt64(&counter, 1) // use atomic function to increase counter
}
wg.Done()
}
Posted on May 27, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.