Why do we use OTP libraries when we can just do Math.random()

One-time passwords (OTPs) are widely used for authentication and verification purposes in various applications and services. A server usually generates them and sends them to the user via SMS, email, or other channels. The user then enters the OTP to confirm their identity or perform an action.

I got a task where we had to implement OTP-based verification in Node JS. Before integrating something like this, I am sure most of us developers/engineers look on the Internet for best practices, tutorials, recent technical trends, and problems other major software systems face during their implementations. So I did it, and the thing that most attracted my attention was libraries like otp-lib, and otp-generator, whose only function was to generate an OTP. The rest of the tasks like sending it over an SMS or email still need to be done by other means. The first question that comes to mind after knowing that such libraries exist is why we have to go to such lengths to use a library to generate OTP when all we have to do is write a one-liner:

const otp = Math.ceil(Math.random() * 10000)

In this blog post, I will explain what I learned during our small research for OTP generators, why using Math.random() to generate OTPs is a bad idea, what are other ways to generate an OTP, and why a library should be used for such a task?

Types of random numbers

There are mainly two types of random numbers:

Pseudo-Random Numbers (PRN)
Cryptographic Random Numbers (CRN).

Pseudo-Random Numbers

Pseudo-random numbers are generated by an algorithm that takes an initial value, called a seed, and produces a sequence of numbers that appear to be random. However, the algorithm is deterministic, meaning that if you know the seed and the algorithm, you can predict the next number in the sequence. Javascript's Math.random() and Python's random.randInt() are an example of a pseudo-random number generator.

Cryptographic Random Numbers

Cryptographic random numbers are generated by a process that is unpredictable and cannot be reproduced or guessed. They are usually based on some physical phenomenon, such as atmospheric noise, thermal noise, or quantum effects.

How Math.random() works?

Different Javascript engines behave a little differently when generating a random number, but it all essentially comes down to a single algorithm XorShift128+.

XorShift is a deterministic algorithm, which uses addition as a faster non-linear transformation solution. Compared to its peers, which use multiplication, this algorithm is faster. It also has less chance of failure than Mersenne Twister (used by Python's random module)

The algorithm takes in two state variables, applies some XOR and shift on them, and returns the sum of the updated state variables which is an Integer. The states are generally seeded using the system clock because that is a good source for a unique number.

An implementation of XOR shift plus in javascript looks like this:

let state0 = 1;
let state1 = 2;
function xorShiftPlus() {
    let s1 = state0; 
    let s0 = state1; 
    state0 = s0;  
    s1 ^= s1 << 23;
    s1 ^= s1 >> 17;
    s1 ^= s0;
    s1 ^= s0 >> 26;
    state1 = s1;
    return state0 + state1;
}

The returned integer is converted to a double using OR operation with a constant. You can find the detailed implementation on chrome source code.

How to predict a random number generated by Math.random()

Predicting the outcome of Math.random() is hard, however, it is not completely impossible. Knowing the algorithm, you can easily regenerate the same random numbers if you know the values of state0 and state1.

Reverse engineering XorShift128+ Using a Z3 theorem prover you can find the value of state0 and state1 by providing 3 consecutive random numbers generated by a server.

The implementation of the Z3 solver can be found here.

Now the question comes how to get those 3 random numbers from a server. That's the hard part, and can be obtained in some of the following cases:

If an API returns a randomly generated number in its response or headers, it can easily be obtained by sending requests at set intervals.
API documentation like OpenAPI/Swagger in modern applications is generated on the server. Sometimes their responses can contain an example value that uses a random number.
With frameworks like NextJS that use server-side rendering while also being capable of handling backend API integrations, there are high chances of getting randomly generated numbers from the content served by them.

Another approach to exploit a random number is using the fact that Math.random() only returns numbers between 0 and 1 with 16 decimal places. This means that there are only 10^16 possible values that Math.random() can return. This is a very small space compared to the space of possible OTPs. if your OTP has 6 digits, there are 10^6 possible values. This visualizer shows that there is a pattern to the numbers generated. Using it, the possibilities can be reduced by 30%. Therefore, if you can guess or brute-force some of the digits of the OTP, you can reduce the space of possible values and increase your chances of finding the correct OTP.

Generating a Cryptographic Random Number in NodeJS

As mentioned previously, cryptographic random numbers are non-deterministic because they depend on the physical factors of a system. Every programming language can access those factors using low-level OS kernel calls.

NodeJS provides its inbuilt crypto module, which we can use to generate randomBytes and then convert them to a number. These random bytes are cryptographic and purely random in nature. The generated number can easily be truncated to the exact number of digits we want in OTP.

import * as crypto from 'crypto';
const num = parseInt(crypto.randomBytes(3).toString('hex'), 16)
// num.toString().slice(0,4)  // truncate to 4 digits

NodeJS 14.10+ provides another function from crypto to generate a random number in a given min-max range.

crypto.randomInt(1001, 9999)

Even after knowing the vulnerability of Math.random() and finding a more secure way to generate a random number cryptographically, we still remain with the same question from the beginning. Why do we have to go to such lengths to use a library to generate OTP when all we have to do is write a one-liner?

Before answering this questions, let's take a look at what is the inconvenience faced while handling and storing an OTP. The problem with using the above method to generate OTPs is that you have to store them in the database in order to verify them later. Storing the OTP in the database is not a good practice for the following reasons:

Storing OTPs in the database creates a lot of garbage data that has to be cleaned up periodically. OTP means a one-time password that can expire after a single use. It can also expire if not used for a specific duration or a new OTP is requested without using the previous one. This mainly adds unnecessary overhead to the database operations for maintaining valid OTPs while also consuming storage space.
Storing OTPs in the database poses a security risk if the database is compromised. An attacker who gains access to the database can read the OTPs and use them to bypass the authentication or verification process. This can lead to account takeover, identity theft, or fraud.
Storing OTPs in the database makes them vulnerable to replay attacks. A replay attack is when an attacker intercepts an incoming valid OTP and uses it again before it expires. This can allow the attacker to perform unauthorised actions or access sensitive information.

What do the OTP libraries do differently?

The OTP libraries use different algorithms and techniques to generate and verify OTPs that behave similarly to a Cryptographic random OTP, while also removing the overhead to store the OTP in a database.

There are mainly two types of OTP implementation techniques.

HOTP

HOTP stands for HMAC-based One-Time Password. It is an algorithm that generates an OTP based on a secret key and a counter. The secret key is a random string that is shared between the server and the user. The counter is an integer that increments every time an OTP is generated or verified.

The algorithm works as follows:

• The server and the user generate the same OTP by applying a cryptographic hash function, such as SHA-1, to the concatenation of the secret key and the counter.
• The server and the user truncate the hash value to obtain a fixed-length OTP, usually 6 or 8 digits.
• The user sends the OTP to the server for verification.
• The server compares the OTP with its own generated OTP and verifies it if they match.
• The server and the user increment their counters by one.

HOTP is mostly used in hardware token-based authentication like Yubikey. Yubikey is basically a programmed hardware key that you can connect physically to your computer or phone. Instead of receiving a code from SMS, or email, you can just press a button on Yubikey to verify and authenticate yourself.

The advantages of HOTP are:

The disadvantages of HOTP are:

• It requires synchronization between the server and the user's counters. If they are out of sync, due to network delays, transmission errors, or device loss, the verification will fail.
• It remains valid as long as a newly generated HOTP is not used, that can be a vulnerability.
• It requires a secure way to distribute and store the secret keys. If the secret keys are leaked or stolen, the OTPs can be compromised.

TOTP

TOTP stands for Time-based One-Time Password. It is an algorithm that generates an OTP based on a secret key, timestamp, and epoch.

The secret key is a random string that is shared between the server and the user. It can be created uniquely for each user by generating SHA1( "secretvalue" + user_id ) .
The timestamp is an integer that represents the current time in seconds
The epoch is the duration for which the algorithm will generate the same result. generally, it is kept between 30 sec - 1 min.

The algorithm works as follows:
• The server decides a secret key for the user and shares it over a medium like Authenticator apps.
• The server can directly generate an OTP and send it to the user by mail or SMS, or it can ask the user to use an Authenticator to generate an OTP using the shared key.
• The user can directly send the OTP received by mail or SMS or can generate it in the authenticator app in case of 2FA in a fixed time window.
• The server compares the OTP with its own generated OTP and verifies it if they are close enough in the epoch time range.

The advantages of TOTP are:

• It does not require storing the OTP in the database, as it can be generated and verified on the fly.
• It does not rely on pseudo-random numbers, as it uses a cryptographic hash function that is unpredictable and irreversible.
• It is resistant to replay attacks, as each OTP is valid only for a short period of time.
• It does not require synchronisation between the server and the user's timestamps. As long as they have reasonably accurate clocks, they can generate and verify OTPs independently.

The disadvantages of TOTP are:

• It requires a secure way to distribute and store the secret keys. If the secret keys are leaked or stolen, the OTPs can be compromised.
• It requires a reliable source of time for both the server and the user. If their clocks are skewed or tampered with, the verification will fail.
• The server has to consider the time drift or delay in processing the requests, so it should maintain a slightly greater epoch than the client.

Conclusion

Through our little research journey on the OTP, we came to know that Math.random() can be predicted, exploited, and replayed. We also got to know that storing OTPs in the database is not a good practice.

TOTP can generate secure and efficient OTPs, and can also verify them. It can generate an OTP offline as well as online, does not require synchronization or storage, and is resistant to replay attacks. Thus it solves most of our concerns related to best practices, security, and reliability.