[V8 Deep Dives] Understanding Array Internals
Andrei Pechkurov
Posted on March 16, 2021
In the previous partof this series, we were discussing Map and Set, standard collections introduced in ES6. This time we will focus on JavaScript arrays.
Arrays, which are essentially list-like objects, are one of the core features of the language and every JavaScript developer has a solid experience in working with them. This blog post does not try to give you an understanding of the public API but instead aims to briefly go through various aspects of V8’s internal implementation of JS arrays that seem worthy to me: memory layout, size restrictions, and other interesting implementation details.
To keep things simpler, the remaining part of the blog post assumes that V8 is running on a 64-bit system.
TL;DR fans may want to jump to the last section of the blog post where you may find a summary.
Disclaimer. What’s written below are implementation details specific to V8 8.9 bundled with a recent dev version of Node.js (commit 49342fe to be more precise). As usual, you should not expect any behavior beyond the spec, as implementation details are subject to change in any V8 version.
Once Upon a Time in a REPL
You probably ask yourself: what may be simpler than a JavaScript array? It must be backed by a fixed-size array, i.e. a contiguous chunk of memory. All operations should be straight-forward manipulations with data stored in the underlying array. But as we will see later, the reality is a bit more complicated than that.
To make things more practical, we will observe internal transformations of an array in a Node.js REPL. Fewer words, more code, so let’s run it:
$ node — allow-natives-syntax
Welcome to Node.js v16.0.0-pre.
Type “.help” for more information.
>
We are using the --allow-natives-syntaxflag to be able to use the %DebugPrint() V8 function. This function prints internal debug information for the given object or primitive value.
Now let’s create an empty array and print its debug information:
> const arr = [];
undefined
> %DebugPrint(arr);
DebugPrint: 0x3db6370d4e51: [JSArray]
- map: 0x3de594a433f9 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]
- prototype: 0x3a5538d05849 <JSArray[0]>
- elements: 0x357222481309 <FixedArray[0]> [PACKED_SMI_ELEMENTS]
- length: 0
- properties: 0x357222481309 <FixedArray[0]>
- All own properties (excluding elements): {
0x357222484909: [String] in ReadOnlySpace: #length: 0x0f4cc91c1189 <AccessorInfo> (const accessor descriptor), location: descriptor
}
...
[]
The original output is quite lengthy, so I trimmed it. What we’re interested in is the - elements: ... [PACKED_SMI_ELEMENTS] part of the output. It tells us that our array uses a fixed-size array to store the data (V8 uses the “backing store” term for this), just as we expected. The size of that array is zero.
The debug print also tells us that our JS array has PACKED_SMI_ELEMENTS elements kind. An element kind is a metadata tracked by V8 to optimize array operations. It describes the types of elements stored in the array. If you’re not familiar with the concept, you should read this great blog post from V8 team.
PACKED_SMI_ELEMENTS is the most specific elements kind which means that all items in the array are Smis, small integers from the -2³¹ to 2³¹-1 range. Based on this metadata, V8 can avoid unnecessary checks and value conversions when dealing with the array. Another important aspect for us is the following. When a JS array is modified, its elements kind may transition from a more specific kind to a less specific one, but not the other way around. For instance, if an array’s elements kind changes from PACKED_SMI_ELEMENTS to something else due to insertion, there is no way back to the original (more specific) kind for this particular array instance.
To see how the internal array grows, we’re going to add its first element, a small integer number:
> arr.push(42);
> %DebugPrint(arr);
DebugPrint: 0xe61bd5eb321: [JSArray] in OldSpace
...
- elements: 0x0e61bd5e7501 <FixedArray[17]> [PACKED_SMI_ELEMENTS]
- length: 1
...
- elements: 0x0e61bd5e7501 <FixedArray[17]> {
0: 42
1-16: 0x357222481669 <the_hole>
}
...
[42]
Here we see that the internal array used as the backing store has changed to [PACKED_SMI_ELEMENTS]. The new array has the same elements kind, but a different address, and the internal array size equal to 17. On our 64-bit system, this means that it takes 17 * 8=136 bytes of memory (for the sake of simplicity, we ignore object headers). It also means that the allocated internal array is bigger than what we requested. This allows V8 to achieve constant amortized time for push() and similar operations that grow the array. The following formula is used to determine the new size in situations when the internal array is not enough:
new_capacity = (old_capacity + 50%) + 16
Here, old_capacity stands for the old internal array size plus the number of inserted items, hence in our case it’s equal to 1 and new_capacity is calculated as 1 + 16 = 17.
There is one more interesting detail in the above output. Namely, the 1-16: ... text in the array contents tells us that the unused part of the internal array is filled with “the hole”. The hole is a special value used by V8 to mark unassigned or deleted array items (and not only them). It’s an implementation detail that never “leaks” into JS code. In our example, V8 uses the hole to initialize the unused fraction of the array.
You may wonder if the internal array ever shrinks. It appears that it does shrink on operations that decrease the array length such as pop() or shift(). This happens if more than half the elements (with some padding for small arrays) won’t be used as the result of the operation.
Returning to our REPL session, PACKED_SMI_ELEMENTS kind in our array assumes no holes, but if we change it in a certain way, the kind will transition into a less specific one. Let’s do it:
> arr[2] = 0;
> %DebugPrint(arr);
...
- elements: 0x0e61bd5e7501 <FixedArray[17]> [HOLEY_SMI_ELEMENTS]
- length: 3
...
- elements: 0x0e61bd5e7501 <FixedArray[17]> {
0: 42
1: 0x357222481669 <the_hole>
2: 0
3-16: 0x357222481669 <the_hole>
}
Here we assigned the second item of the array, skipping the first one which contained the hole. As a result, the array’s elements kind transitioned to HOLEY_SMI_ELEMENTS. This kind assumes that the array contains only Smis or holey values. In terms of performance, this elements kind is slightly slower than the packed one as V8 has to perform value checks to skip holes when iterating the array or modifying it.
We’re not going to experiment any further with other element kinds backed by arrays. This is left as an exercise for curious readers. Nevertheless, it makes sense to mention that V8 optimizes for arrays of 64-bit floating-point numbers: PACKED_DOUBLE_ELEMENTS and HOLEY_DOUBLE_ELEMENTS kinds store numbers in the backing array, avoiding on-heap pointers for each number.
What we’re interested in as the next step is knowing whether the backing store used for array items can be different from a fixed-size array. Let’s do one more experiment in our REPL session:
> arr[32 << 20] = 0;
> %DebugPrint(arr);
...
- elements: 0x10f6026db0d9 <NumberDictionary[16]> [DICTIONARY_ELEMENTS]
- length: 33554433
...
- elements: 0x10f6026db0d9 <NumberDictionary[16]> {
- max_number_key: 33554432
2: 0 (data, dict_index: 0, attrs: [WEC])
0: 42 (data, dict_index: 0, attrs: [WEC])
33554432: 0 (data, dict_index: 0, attrs: [WEC])
}
...
What just happened? Our array no longer uses an array-based backing store and instead, it uses a NumberDictionary[16], which is a hash table-based collection specialized for number keys. If you’re interested in additional details, the hash table uses open addressing with quadratic probing.
Elements kind also transitioned to DICTIONARY_ELEMENTS which means “slow” path for JS arrays. With this kind, V8 aims to reduce the memory footprint for sparse arrays with a lot of holes, as the hash table only stores non-hole array elements. On the other hand, hash table operations are slower than an array as we need to pay for the cost of hash code calculation, entry lookup, and rehashing. A bit later we’re going to do some microbenchmarking to understand the cost.
The dictionary kind is used for arrays larger than 32 * 2²⁰ (~33.5M), so that’s why our array transitioned into this kind once we hit the limit. In terms of memory, this means that an array-baked JS array can not grow beyond ~268MB.
As for dictionary-based arrays, the maximum size for them is restricted by the ECMAScript specification and can not exceed the maximum value of a 32-bit unsigned integer (2³² — 1).
Great. Now, when we have a better understanding of how V8 handles JS arrays, let’s do some benchmarking.
Some Silly Benchmarks
Before we go any further I need to warn you that the following microbenchmarks are totally non-scientific, unfair benchmarks, so take them with a grain of salt. Benchmarks were done on my dev machine with i5–8400H CPU, Ubuntu 20.04, and Node.js v15.11.0.
First, let’s try to understand the difference between different element kinds in terms of array iteration. In the first benchmark, we iterate over an array of numbers and simply calculate the total sum of its elements. The results are visualized below.
Here the result for dictionary kind is barely visible as it’s two orders of magnitude smaller than the one for packed kind. As for the holey kind, it’s only 23% slower than the packed one.
Now let’s do some measurements for basic mutation operations, like push() and pop(). In the second benchmark, we push 1K elements into the array, then pop all of them on each iteration. The results are below.
This time the dictionary kind result is not even visible (and, yes, I’m awful at data visualization) since it’s ~200 versus ~238K operations per second for array-based kinds.
Interestingly, if we disable JIT in V8 with the --jitless flag, the result becomes ~200 versus ~16K operations per second. This clearly shows how good V8 JIT is at optimizing loops for array-based kinds.
While the absolute numbers don’t matter, the above results illustrate that your JS application should avoid dealing with dictionary-based arrays, unless you absolutely have to.
It’s time to wrap up and list our today findings.
Summary
- Each JS array is associated with an element kind, metadata tracked by V8 to optimize array operations. These kinds describe types of elements stored in the array.
- Elements of small enough arrays are stored in an internal fixed-size array. V8 allocates some extra space in the internal array to achieve constant amortized time for push() and similar operations that grow the array. When the array length decreases, the internal array may also shrink.
- Once a JS array becomes large (this also includes holey arrays), V8 starts using a hash table to store the array elements. The array is now associated with the “slow” dictionary elements kind.
- For hot loops, the “slow” kind may be multiple orders slower than array-based kinds.
- V8 JIT is good at optimizing loops for array-based kinds.
- In general, when writing code that manipulates large arrays on the hot path, you should let V8 use the most specific elements kind for your arrays.
Thanks for reading this post. Please let me know if you have ideas for the next posts in V8 Deep Dives series. Feedback on inconsistencies or incorrect assumptions is also more than welcome.
Posted on March 16, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.