go

Memory arenas in Go

vearutop

Viacheslav Poturaev

Posted on March 9, 2023

Memory arenas in Go

Recently released go1.20 brought a new experimental package named arena (proposal).

The purpose of this package is to isolate multiple related allocations into a single area of memory, so that they can be freed all at once.

Expected effect is improved allocation performance and reduced garbage collector (GC) pressure.

Let's see it in action. As a toy problem, we can use binary-trees.

A program that idiomatically solves this problem has to put a lot of pressure on a memory management system to allocate many small objects, so it is a great candidate to assess new arenas.

Original source code:

binary-trees-ptr.go
package main

import (
   "flag"
   "fmt"
   "strconv"
   "sync"
)

type Tree struct {
   Left  *Tree
   Right *Tree
}

// Count the nodes in the given complete binary tree.
func (t *Tree) Count() int {
   // Only test the Left node (this binary tree is expected to be complete).
   if t.Left == nil {
      return 1
   }
   return 1 + t.Right.Count() + t.Left.Count()
}

// Create a complete binary tree of `depth` and return it as a pointer.
func NewTree(depth int) *Tree {
   if depth > 0 {
      return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
   } else {
      return &Tree{}
   }
}

func Run(maxDepth int) {

   var wg sync.WaitGroup

   // Set minDepth to 4 and maxDepth to the maximum of maxDepth and minDepth +2.
   const minDepth = 4
   if maxDepth < minDepth+2 {
      maxDepth = minDepth + 2
   }

   // Create an indexed string buffer for outputing the result in order.
   outCurr := 0
   outSize := 3 + (maxDepth-minDepth)/2
   outBuff := make([]string, outSize)

   // Create binary tree of depth maxDepth+1, compute its Count and set the
   // first position of the outputBuffer with its statistics.
   wg.Add(1)
   go func() {
      tree := NewTree(maxDepth + 1)
      msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
         maxDepth+1,
         tree.Count())

      outBuff[0] = msg
      wg.Done()
   }()

   // Create a long-lived binary tree of depth maxDepth. Its statistics will be
   // handled later.
   var longLivedTree *Tree
   wg.Add(1)
   go func() {
      longLivedTree = NewTree(maxDepth)
      wg.Done()
   }()

   // Create a lot of binary trees, of depths ranging from minDepth to maxDepth,
   // compute and tally up all their Count and record the statistics.
   for depth := minDepth; depth <= maxDepth; depth += 2 {
      iterations := 1 << (maxDepth - depth + minDepth)
      outCurr++

      wg.Add(1)
      go func(depth, iterations, index int) {
         acc := 0
         for i := 0; i < iterations; i++ {
            // Create a binary tree of depth and accumulate total counter with its
            // node count.
            a := NewTree(depth)
            acc += a.Count()
         }
         msg := fmt.Sprintf("%d\t trees of depth %d\t check: %d",
            iterations,
            depth,
            acc)

         outBuff[index] = msg
         wg.Done()
      }(depth, iterations, outCurr)
   }

   wg.Wait()

   // Compute the checksum of the long-lived binary tree that we created
   // earlier and store its statistics.
   msg := fmt.Sprintf("long lived tree of depth %d\t check: %d",
      maxDepth,
      longLivedTree.Count())
   outBuff[outSize-1] = msg

   // Print the statistics for all of the various tree depths.
   for _, m := range outBuff {
      fmt.Println(m)
   }
}

func main() {
   n := 0
   flag.Parse()
   if flag.NArg() > 0 {
      n, _ = strconv.Atoi(flag.Arg(0))
   }

   Run(n)
}

If I compile and run this program, I can get a baseline performance metric.

time ./binary-trees-ptr 21
Enter fullscreen mode Exit fullscreen mode
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m7.232s
user    1m9.929s
sys     0m0.959s
Enter fullscreen mode Exit fullscreen mode

The actual program output is not so important, but the execution time is.

This result was obtained with a general purpose Macbook Pro (Intel) and in consecutive runs the numbers were slightly different. However, this post does not aim for scientific accuracy, the goal is to have rough understanding what to expect from arenas. Performance impact may vary depending on the code, so please run proper benchmarks on your actual cases.

Most allocations in the app are happening in

func NewTree(depth int) *Tree {
   if depth > 0 {
      return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
   } else {
      return &Tree{}
   }
}
Enter fullscreen mode Exit fullscreen mode

Enter arena

This new package is available in std lib with "arena" import.
The flow is

  • create arena instance with arena.NewArena(),
  • use it to allocate instances of any type with arena.New[T any](a *Arena) *T,
  • use it to allocate slices with arena.MakeSlice[T any](a *Arena, len, cap int) []T,
  • free arena instance once the data is no longer needed with (*Arena).Free().

Because this package is experimental, you'd need to explicitly enable it with GOEXPERIMENT=arenas env var and/or with equivalent build tag //go:build goexperiment.arenas during build.

Upgrade the app

Let's isolate tree constructor to allocate in an arena.

func NewTree(a *arena.Arena, depth int) *Tree {
    if depth > 0 {
        t := arena.New[Tree](a)
        t.Left = NewTree(a, depth-1)
        t.Right = NewTree(a, depth-1)
        return t
    } else {
        return arena.New[Tree](a)
    }
}
Enter fullscreen mode Exit fullscreen mode

Now when we build a tree, we have control of where to put its data and when to free it.

.....
    // Create binary tree of depth maxDepth+1, compute its Count and set the
    // first position of the outputBuffer with its statistics.
    wg.Add(1)
    go func() {
        a := arena.NewArena()
        tree := NewTree(a, maxDepth+1)
        msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
            maxDepth+1,
            tree.Count())

        outBuff[0] = msg
        a.Free()
        wg.Done()
    }()
.....
Enter fullscreen mode Exit fullscreen mode

Upgraded app:

binary-trees-ptr-arena.go
//go:build goexperiment.arenas

package main

import (
    "arena"
    "flag"
    "fmt"
    "strconv"
    "sync"
)

type Tree struct {
    Left  *Tree
    Right *Tree
}

// Count the nodes in the given complete binary tree.
func (t *Tree) Count() int {
    // Only test the Left node (this binary tree is expected to be complete).
    if t.Left == nil {
        return 1
    }
    return 1 + t.Right.Count() + t.Left.Count()
}

// Create a complete binary tree of `depth` and return it as a pointer.
func NewTree(a *arena.Arena, depth int) *Tree {
    if depth > 0 {
        t := arena.New[Tree](a)
        t.Left = NewTree(a, depth-1)
        t.Right = NewTree(a, depth-1)
        return t
    } else {
        return arena.New[Tree](a)
    }
}

func Run(maxDepth int) {

    var wg sync.WaitGroup

    // Set minDepth to 4 and maxDepth to the maximum of maxDepth and minDepth +2.
    const minDepth = 4
    if maxDepth < minDepth+2 {
        maxDepth = minDepth + 2
    }

    // Create an indexed string buffer for outputing the result in order.
    outCurr := 0
    outSize := 3 + (maxDepth-minDepth)/2
    outBuff := make([]string, outSize)

    // Create binary tree of depth maxDepth+1, compute its Count and set the
    // first position of the outputBuffer with its statistics.
    wg.Add(1)
    go func() {
        a := arena.NewArena()
        tree := NewTree(a, maxDepth+1)
        msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
            maxDepth+1,
            tree.Count())

        outBuff[0] = msg
        a.Free()
        wg.Done()
    }()

    // Create a long-lived binary tree of depth maxDepth. Its statistics will be
    // handled later.
    var longLivedTree *Tree
    la := arena.NewArena()
    wg.Add(1)
    go func() {
        longLivedTree = NewTree(la, maxDepth)
        wg.Done()
    }()

    // Create a lot of binary trees, of depths ranging from minDepth to maxDepth,
    // compute and tally up all their Count and record the statistics.
    for depth := minDepth; depth <= maxDepth; depth += 2 {
        iterations := 1 << (maxDepth - depth + minDepth)
        outCurr++

        wg.Add(1)
        go func(depth, iterations, index int) {
            acc := 0
            for i := 0; i < iterations; i++ {
                // Create a binary tree of depth and accumulate total counter with its
                // node count.
                ar := arena.NewArena()
                a := NewTree(ar, depth)
                acc += a.Count()
                ar.Free()
            }
            msg := fmt.Sprintf("%d\t trees of depth %d\t check: %d",
                iterations,
                depth,
                acc)

            outBuff[index] = msg
            wg.Done()
        }(depth, iterations, outCurr)
    }

    wg.Wait()

    // Compute the checksum of the long-lived binary tree that we created
    // earlier and store its statistics.
    msg := fmt.Sprintf("long lived tree of depth %d\t check: %d",
        maxDepth,
        longLivedTree.Count())
    outBuff[outSize-1] = msg
    la.Free()

    // Print the statistics for all of the various tree depths.
    for _, m := range outBuff {
        fmt.Println(m)
    }
}

func main() {
    n := 0
    flag.Parse()
    if flag.NArg() > 0 {
        n, _ = strconv.Atoi(flag.Arg(0))
    }

    Run(n)
}

Now is there any performance difference?

go build -tags goexperiment.arenas ./binary-trees-ptr-arena.go
time ./binary-trees-ptr-arena 21
Enter fullscreen mode Exit fullscreen mode
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m5.777s
user    0m38.207s
sys     0m4.537s
Enter fullscreen mode Exit fullscreen mode

Time difference is around 20% which is very impressive given such a small change to original program.

Keep in mind that a single arena can not be used concurrently (at least as of current implementation), but you can create multiple arenas for multiple goroutines.

💖 💪 🙅 🚩
vearutop
Viacheslav Poturaev

Posted on March 9, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related