How Go Arrays Work and Get Tricky with For-Range
Phuong Le
Posted on August 20, 2024
This is an excerpt of the post; the full post is available here: How Go Arrays Work and Get Tricky with For-Range.
The classic Golang array and slice are pretty straightforward. Arrays are fixed-size, and slices are dynamic. But I've got to tell you, Go might seem simple on the surface, but it's got a lot going on under the hood.
As always, we'll start with the basics and then dig a bit deeper. Don't worry, arrays get pretty interesting when you look at them from different angles.
We'll cover slices in the next part, I'll drop that here once it's ready.
What is an array?
Arrays in Go are a lot like those in other programming languages. They've got a fixed size and store elements of the same type in contiguous memory locations.
This means Go can access each element quickly since their addresses are calculated based on the starting address of the array and the element's index.
func main() {
arr := [5]byte{0, 1, 2, 3, 4}
println("arr", &arr)
for i := range arr {
println(i, &arr[i])
}
}
// Output:
// arr 0x1400005072b
// 0 0x1400005072b
// 1 0x1400005072c
// 2 0x1400005072d
// 3 0x1400005072e
// 4 0x1400005072f
There are a couple of things to notice here:
- The address of the array
arr
is the same as the address of the first element. - The address of each element is 1 byte apart from each other because our element type is
byte
.
Look at the image carefully.
Our stack is growing downwards from a higher to a lower address, right? This picture shows exactly how an array looks in the stack, from arr[4]
to arr[0]
.
So, does that mean we can access any element of an array by knowing the address of the first element (or the array) and the size of the element? Let's try this with an int
array and unsafe
package:
func main() {
a := [3]int{99, 100, 101}
p := unsafe.Pointer(&a[0])
a1 := unsafe.Pointer(uintptr(p) + 8)
a2 := unsafe.Pointer(uintptr(p) + 16)
fmt.Println(*(*int)(p))
fmt.Println(*(*int)(a1))
fmt.Println(*(*int)(a2))
}
// Output:
// 99
// 100
// 101
Well, we get the pointer to the first element and then calculate the pointers to the next elements by adding multiples of the size of an int, which is 8 bytes on a 64-bit architecture. Then we use these pointers to access and convert them back to the int values.
The example is just a play around with the unsafe
package to access memory directly for educational purposes. Don't do this in production without understanding the consequences.
Now, an array of type T is not a type by itself, but an array with a specific size and type T, is considered a type. Here's what I mean:
func main() {
a := [5]byte{}
b := [4]byte{}
fmt.Printf("%T\n", a) // [5]uint8
fmt.Printf("%T\n", b) // [4]uint8
// cannot use b (variable of type [4]byte) as [5]byte value in assignment
a = b
}
Even though both a
and b
are arrays of bytes, the Go compiler sees them as completely different types, the %T
format makes this point clear.
Here is how the Go compiler sees it internally (src/cmd/compile/internal/types2/array.go):
// An Array represents an array type.
type Array struct {
len int64
elem Type
}
// NewArray returns a new array type for the given element type and length.
// A negative length indicates an unknown length.
func NewArray(elem Type, len int64) *Array { return &Array{len: len, elem: elem} }
The length of the array is "encoded" in the type itself, so the compiler knows the length of the array from its type. Trying to assign an array of one size to another, or compare them, will result in a mismatched type error.
Array literals
There are many ways to initialize an array in Go, and some of them might be rarely used in real projects:
var arr1 [10]int // [0 0 0 0 0 0 0 0 0 0]
// With value, infer-length
arr2 := [...]int{1, 2, 3, 4, 5} // [1 2 3 4 5]
// With index, infer-length
arr3 := [...]int{11: 3} // [0 0 0 0 0 0 0 0 0 0 0 3]
// Combined index and value
arr4 := [5]int{1, 4: 5} // [1 0 0 0 5]
arr5 := [5]int{2: 3, 4, 4: 5} // [0 0 3 4 5]
What we're doing above (except for the first one) is both defining and initializing their values, which is called a "composite literal." This term is also used for slices, maps, and structs.
Now, here's an interesting thing: when we create an array with less than 4 elements, Go generates instructions to put the values into the array one by one.
So when we do arr := [3]int{1, 2, 3, 4}
, what's actually happening is:
arr := [4]int{}
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
This strategy is called local-code initialization. This means that the initialization code is generated and executed within the scope of a specific function, rather than being part of the global or static initialization code.
It'll become clearer when you read another initialization strategy below, where the values aren't placed into the array one by one like that.
"What about arrays with more than 4 elements?"
The compiler creates a static representation of the array in the binary, which is known as 'static initialization' strategy.
This means the values of the array elements are stored in a read-only section of the binary. This static data is created at compile time, so the values are directly embedded into the binary. If you're curious how [5]int{1,2,3,4,5}
looks like in Go assembly:
main..stmp_1 SRODATA static size=40
0x0000 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 ................
0x0010 03 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 ................
0x0020 05 00 00 00 00 00 00 00 ........
It's not easy to see the value of the array, we can still get some key info from this.
Our data is stored in stmp_1
, which is read-only static data with a size of 40 bytes (8 bytes for each element), and the address of this data is hardcoded in the binary.
The compiler generates code to reference this static data. When our application runs, it can directly use this pre-initialized data without needing additional code to set up the array.
const readonly = [5]int{1, 2, 3, 4, 5}
arr := readonly
"What about an array with 5 elements but only 3 of them initialized?"
Good question, this literal [5]int{1,2,3} falls into the first category, where Go puts the value into the array one by one.
While talking about defining and initializing arrays, we should mention that not every array is allocated on the stack. If it's too big, it gets moved to the heap.
But how big is "too big," you might ask.
As of Go 1.23, if the size of the variable, not just array, exceeds a constant value MaxStackVarSize
, which is currently 10 MB, it will be considered too large for stack allocation and will escape to the heap.
func main() {
a := [10 * 1024 * 1024]byte{}
println(&a)
b := [10*1024*1024 + 1]byte{}
println(&b)
}
In this scenario, b
will move to the heap while a
won't.
Array operations
The length of the array is encoded in the type itself. Even though arrays don't have a cap
property, we can still get it:
func main() {
a := [5]int{1, 2, 3}
println(len(a)) // 5
println(cap(a)) // 5
}
The capacity equals the length, no doubt, but the most important thing is that we know this at compile time, right?
So len(a)
doesn't make sense to the compiler because it's not a runtime property, Go compiler knows the value at compile time.
...
This is an excerpt of the post; the full post is available here: How Go Arrays Work and Get Tricky with For-Range.
Posted on August 20, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.