[V8 Deep Dives] Random Thoughts on Math.random()
Andrei Pechkurov
Posted on April 5, 2021
In previous parts of this series, we were discussing the internals of ES6 collections and arrays in V8. This time we will cover a simpler topic which is Math.random() function.
Every JS developer uses Math.random() once in a while in their applications for various use cases. The general wisdom says that Math.random() is good for anything, but security. That said, this function is not backed by a CSPRNG (cryptographically secure pseudorandom number generator) and shouldn’t be used in security-related tasks, like UUID v4 generation (note: if you dare to use UUIDs for such tasks).
Today we’ll try to understand how exactly V8 implements Math.random() function and then try to match our findings with the general wisdom.
TL;DR fans may want to jump to the last section of the blog post where you may find a summary.
Disclaimer. What’s written below are implementation details specific to V8 9.0 bundled with a recent dev version of Node.js (commit 52f9aaf to be more precise). As usual, you should not expect any behavior beyond the spec, as implementation details are subject to change in any V8 version.
Spec All the Things
Before looking at the code, let’s see what ECMAScript 2020 specification says about Math.random() function:
Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.
Each Math.random function created for distinct realms must produce a distinct sequence of values from successive calls.
Ehmm, that’s not much. It appears that the spec leaves a lot of freedom for the implementers, like JS engines, leaving security-related aspects out of scope.
No luck with the spec and now, with a clean conscience, we can dive into V8 source code.
The Nitty-gritty Details
Our journey starts from the Math.random() code written in Torque language:
We can see that Math.random() (MathRandom here) calls the RefillMathRandom macro defined elsewhere (see extern macro). We’ll see what this macro does a bit later.
Next, we see that the value (random number) is not generated directly, but instead returned from a fixed-size array (array variable). Let’s call this array “entropy pool” (or simply “pool”) to make it recognizable through the rest of the text. The index (newSmiIndex integer) is decremented on each call and periodically, when it becomes zero, the RefillMathRandom macro gets called which intuitively should refill the pool, but we’re not sure about that yet.
The MathRandom macro is defined in the CodeStubAssembler C++ class and does not contain anything spectacular. It simply calls the MathRandom::RefillCache method through an external reference. Hence, the code we expect to refill the entropy pool is written in C++ and looks more or less like the following:
The above code is trimmed and simplified for readability purposes. As we expected, its overall logic is to generate and refill the entropy pool (the cache array). But there are a couple of other interesting details here.
First of all, block #1 from the snippet describes the initialization of the seed to be used in the subsequent number generation. This block runs only once and uses the PRNG available in the current V8 isolate to generate the seed. Then it calculates murmur3 hash codes based on the seed and stores it in the initial state.
The PRNG can be supplied by embedders, like Node.js or Chromium browser. If a PRNG is not supplied by the embedder, V8 falls back to a system-dependent source of randomness, like /dev/urandom
in Linux.
Then, block #2 uses the state struct to generate and fill all kCacheSize values in the pool with a xorshift random number generator. The size of the pool is 64, i.e. after each 64 Math.random() calls the pool needs to be refilled.
Our takeaways here are the following. First, despite the fact that the initial seed used by Math.random() function may be generated with a cryptographically secure PRNG (note: that depends on the embedder and/or OS), the subsequent number generation doesn’t involve this PRNG. Instead, it uses xorshift128+ which is a fast random number generator algorithm, but it’s not cryptographically secure. Thus, we have found proof of the general wisdom and, indeed, V8’s implementation of Math.random() is not supposed to be used for security stuff.
Second, it also means that the generated number sequence will be deterministic in the case of the same initial seed value. Luckily, V8 supports the --random_seed flag to override the initial seed, so let’s see if our thinking is correct.
As expected, we used 42 as the seed value in two separate Node.js REPL sessions, and both times Math.random() produced exactly the same sequence of numbers.
Now, when we have a better understanding of the implementation, let’s try to understand the performance aspect of the entropy pool.
Some Silly Benchmarks
Before we go any further, I need to warn you that the following microbenchmarks are totally non-scientific, unfair benchmarks, so take them with a grain of salt. Benchmarks were done on my dev machine with i5–8400H CPU, Ubuntu 20.04, and Node.js v16.0.0-pre (commit 52f9aaf).
Our microbenchmark is terribly simple this time:
When run, it calls Math.random() in a loop and outputs the resulting throughput.
Armed with the benchmark, we’re going to compare kCacheSize=64 (the default) and kCacheSize=1 (no pool) builds of Node.js. Here is the measured result.
The benchmark shows that removing the pool makes Math.random() 22% slower. The difference is relatively small, yet the pool improves the throughput by removing the overhead of JS-to-C++ switches in each Math.random() call. Interestingly, that uuid npm package and, later, crypto.randomUUID() standard function from Node.js also employ a similar approach with the entropy pool (note: the difference is that they use a CSPRNG and the performance boost is much more significant).
It’s time to wrap up and summarize our findings.
Summary
- As every JS developer knows, it’s a bad idea to use Math.random() for security related tasks. In browsers you can use Web Crypto API and Node.js users should go with the crypto module.
- The initial seed used by Math.random() uses the PRNG supplied by the embedder (say, Node.js or browser) or falls back to an OS-dependent source of randomness, not necessarily a secure one.
- Once the initial seed value is generated, later values are generated deterministically with xorshift128+ algorithm and stored in a pool of 64 items which is refilled when necessary. Determinism here means that in the case of the same initial seed value the generated number sequence returned from Math.random() will be the same.
Thanks for reading this post. Let me know if you have ideas for the next posts in V8 Deep Dives series. Feedback on inconsistencies or incorrect assumptions is also more than welcome.
Posted on April 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.