The Big Reveal of Node.js Performance Optimization! 🚀 Part One: Profiling Node.js

evle

Max

Posted on November 22, 2024

The Big Reveal of Node.js Performance Optimization! 🚀 Part One: Profiling Node.js

Have you ever encountered such a dilemma: You thought that by leveraging Node.js's event-driven and asynchronous I/O, you could smoothly increase the throughput of your service, but the actual test results showed that it could only handle 5 requests per second? When actually applying Node.js to the production environment, do you also have a big question mark in your mind regarding the performance bottlenecks of Node.js? This article will help you deepen your understanding of Node.js performance analysis through a case study and assist you in dealing with the new challenges brought by Node.js while it brings us convenience. This article conducts an in-depth analysis by profiling Node.js applications, uncovers the key bottlenecks that affect performance, and implements improvement measures to achieve a significant increase in throughput.

Appetizer

When developing Node.js applications, the use of libraries is an essential part. Among them, native libraries like fs and http interact with the underlying operating system through C++ binding to implement functions. This implementation ensures the efficient execution of functions such as file operations and network communications, which is an undoubted fact.

However, we need to think deeply about a question: Is it only native libraries that use this efficient implementation method? Obviously not. There are numerous libraries on the current market, and many of them are implemented in C++ to improve performance, such as the encryption library bcrypt, which is a typical CPU-intensive operation.

When installing such libraries, they have a typical characteristic: they all need to be downloaded and compiled when running npm i. For example, when you install canvas, by adding the parameter npm i canvas --verbose, you will see the following information:

npm info run canvas@2.11.2 install node_modules/canvas node-pre-gyp install --fallback-to-build --update-binary
npm info run canvas@2.11.2 install { code: 0, signal: null }
Enter fullscreen mode Exit fullscreen mode

When node-pre-gyp is running, it indicates that this library uses C++ plugins and calls the underlying operating system. It needs to provide packages that support your operating system according to the operating system you are using during installation. node-pre-gyp first compiles this C++ plugin before it can be used in your Node.js application.

To improve the installation speed of libraries, generally, authors will provide several pre-compiled programs for mainstream operating systems. Then node-pre-gyp will directly download and install them. However, when the pre-compiled program is not found, it will trigger the build process and compile during installation, like the following:

   node-pre-gyp info it worked if it ends with ok
    node-pre-gyp info using node-pre-gyp@0.14.0
    node-pre-gyp info using node@14.17.0 | darwin | x64
    node-gyp info find Python using Python version 3.8.2 found at "/usr/local/bin/python"
    node-gyp info spawn /usr/local/bin/python
    node-gyp info spawn args [
      '/Users/user/.nvm/versions/node/v14.17.0/lib/node_modules/npm/node_modules/node-gyp/gyp/gyp_main.py',
      'binding.gyp',
      '-f',
      'make',
      '-I',
      '/Users/user/projects/my-project/node_modules/my-native-module/build/config.gyp.i',
      '-I',
      '/Users/user/.nvm/versions/node/v14.17.0/lib/node_modules/npm/node_modules/node-gyp/addon.gyp.i',
      '-I',
      '/Users/user/Library/Caches/node-gyp/14.17.0/include/node/common.gyp.i',
      '-Dlibrary=shared_library',
    ]
    node-gyp info spawn make
    node-gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build' ]
      CXX(target) Release/obj.target/my-native-module/my-native-module.o
      SOLINK_MODULE(target) Release/binding.node
    node-pre-gyp info ok 
Enter fullscreen mode Exit fullscreen mode

Through this example, we understand a principle: when you need to provide libraries for CPU-intensive operations, using C++ modules is a common practice.

The Service Suddenly Became as Slow as a PPT 😅

Here's what happened. On the monitoring dashboard, I saw that the throughput of a certain service was very low. So I analyzed the historical data of the service and found that it was previously high but had become very low recently. I guessed that it might be due to the implementation of a certain business requirement that led to the decline in program performance. How to troubleshoot it? There were hundreds of commit records across different teams, just like looking for a needle in a haystack.

First, I used ab to conduct a benchmark on the service in the test environment to confirm that it was not a problem with the monitoring data but that there was indeed a problem with the service performance.

ab -n 500 -c 100 https://HOST/endpoint
Enter fullscreen mode Exit fullscreen mode

The results are as follows:

Image description

The Requests per second was only 272. With so much CPU and memory added, it could only handle 272 requests per second? It was confirmed that there was a problem with the service performance. With the intention of saving costs for the company, I began to use technical means to troubleshoot the performance bottlenecks of this service.

Identifying Performance Bottlenecks Through Profiling Node.js

When considering server performance, we are always making a balance. On the one hand, there are the throughput requirements brought by business volume, and on the other hand, how to make better use of hardware resources to reduce hardware costs. For Node.js applications, we are no exception. First, we use the Profiling tool that comes with Node.js to investigate CPU usage.

Profiling is a technique for analyzing program performance. It can help us understand the behavior of the program when it is running and find out where the performance bottlenecks are.
The performance was improved by 50%. It could handle 400 requests per second, and the return speed was also reduced from over 400 ms before to over 200 ms.

Next, we modify the running command of the program and add the --prof option when running:

node --prof app.js
Enter fullscreen mode Exit fullscreen mode

Then we can get a file named isolate-0x7fcef2c4d000-60450-v8.log in the project path as follows:

Image description

By analyzing this file, we can understand various performance-related aspects of the Node.js program when it is running, such as shared library usage, time consumption, memory allocation, and code creation. However, as shown above, currently, we can't understand this file at all. The current format of this file is only "convenient for storage". We need another command to analyze this file, which also comes with Node.js.

node --prof-process isolate-0x7fcef2c4d000-60450-v8.log > processed.txt
Enter fullscreen mode Exit fullscreen mode

By analyzing the generated processed.txt file, we can see information such as the execution time and the number of calls of each function to find out who is dragging down the performance.

The analysis results show:

    [Summary]:
       ticks  total  nonlib   name
       19557   95.7%   99.8%  JavaScript
         89    0.4%    0.5%  C++
         0     0.0%    0.0%  GC
Enter fullscreen mode Exit fullscreen mode

Wow! The JavaScript part occupies 99.8% of the CPU time. Could it be that someone is using Node.js to perform CPU-intensive tasks? Here, ticks is not a specific time. You can understand it as time slices. Generally, the more ticks there are, the more time it occupies.

Continuing to view the file, in the JavaScript part, we found the culprit gaussianBlur. What the heck? Gaussian blur? This function occupies a large amount of CPU calculation.

 [JavaScript]:
   ticks  total  nonlib   name
   12105  61.6%   61.8%  LazyCompile: *gaussianBlur /app/server.js:15:23
...

 [C++]:
   ticks  total  nonlib   name
     35    0.2%          node::Start(int, char**)
...

 [Bottom up (heavy) profile]:
   ticks parent  name
   12105   61.6%  LazyCompile: *gaussianBlur /app/server.js:15:23
    ├─8473   70.0%    Function: processPixel /app/server.js:35:20
    │  └─4892   57.7%      Function: exp native math.js:178:12
    └─2560   21.1%    LazyCompile: *calculateWeight /app/server.js:45:24
Enter fullscreen mode Exit fullscreen mode

By looking up the record of the commit of this code, I quickly located the relevant developer. The developer said that there was a requirement to blur an image and thought of implementing it in the simplest way without adding additional dependencies, thus resulting in this performance problem. It seems simple, but actually it's not.

After communicating with the developer, it was decided to use a third-party library to calculate the blurred image for this implementation solution. Remember the appetizer part earlier? That's right, we use the canvas library to implement this function. Why does the performance of the third-party library is higher? It's not because the calculation logic is better, but because it is a C++ library and is more suitable for CPU-intensive operations.
After the developer modified the solution, a performance test was conducted on the program again:

Image description

It's Time to Unleash a Big Move: C++ Plugins Are Coming! 💪

The previous solution happened to be able to be handled by open-source solutions, but what if there is no ready-made library for the problem encountered? We can implement a C++ extension by ourselves to solve the performance problems caused by calculations.
First, write a calculation function for Gaussian blur.

// gaussian_blur.cpp
#include <node.h>
#include <cmath>

namespace gaussian_blur {

using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Object;
using v8::Number;
using v8::Value;

void GaussianBlur(const FunctionCallbackInfo<Value>& args) {
    Isolate* isolate = args.GetIsolate();

    Local<Object> buffer = args[0].As<Object>();
    int width = args[1].As<Number>()->Value();
    int height = args[2].As<Number>()->Value();
    int radius = args[3].As<Number>()->Value();

    unsigned char* data = (unsigned char*)node::Buffer::Data(buffer);
    //... Implement the code written in JS here with C++...
}

void Initialize(Local<Object> exports) {
    NODE_SET_METHOD(exports, "gaussianBlur", GaussianBlur);
}

NODE_MODULE(gaussian_blur, Initialize)
}
Enter fullscreen mode Exit fullscreen mode

After writing this file, Node.js cannot call it directly. As many of you may know, Node.js is based on V8. However, many students may not pay much attention to what V8 is. I will write a separate article to talk about V8 later, but the conclusion is that V8 does not support running C++ code. So we need to indirectly or build a bridge to let V8 call C++ code. This bridge is the compiler node-gyp. We use this tool written in python to compile the above C++ code.

{
  "targets": [{
    "target_name": "gaussian_blur",
    "sources": [ "gaussian_blur.cpp" ]
  }]
}
Enter fullscreen mode Exit fullscreen mode

Why use node-gyp to compile C++ code instead of a C++ compiler? Because it not only compiles C++ code into binary files for your operating system but also creates a Node.js module. The interfaces in this module will tell V8 how to initialize the module, create objects, set properties, and methods, thus exposing the functions implemented in C++ to JS. For example 🌰: When JS passes a numeric parameter to a C++ function, it will be converted to the C++ numeric type int or double. Similarly, when C++ returns its own data type, it will also be converted into a data type that V8 can understand, thus solving the code differences.

For more details, please refer to the subsequent articles in this series. Just have a concept here for now.
Finally, it's very simple to call the module just compiled by node-gyp in Node.js.

const gaussianBlur = require('./build/Release/gaussian_blur');

app.post('/blur', async (req, res) => {
    const image = req.body.image;
    const result = gaussianBlur.gaussianBlur(image.data, image.width, image.height, 5);
    res.send(result);
});
Enter fullscreen mode Exit fullscreen mode

Preview

This article has introduced identifying bottlenecks at the CPU level through Profiling. In the actual production environment, there are also numerous cases where throughput is reduced due to memory issues. Everyone can leave a message to urge for more updates.

Epilogue

The code examples used in the text are actually quite different from real business scenarios. However, after this improvement, the business effect has been significantly improved 🎉.

  • Before optimization: It took 2000 ms to process a 1920x1080 image.
  • After optimization: It only takes 150 ms to process the same image.
  • Throughput improvement: Originally, only 0.5 images could be processed per second, but now 6 - 7 images can be processed!
  • CPU usage: It has dropped from 100% of a single core to about 30%.

I hope these experiences will be helpful to other developers in the Node.js stack. Here are some takeaways 🌟:

  • Don't blindly believe in JavaScript. It's not omnipotent. Consider using C++ for CPU-intensive tasks.
  • Performance optimization should be based on data rather than speculation.
  • The Node.js ecosystem provides powerful performance analysis tools, and we should make good use of them.
  • Reasonable use of C++ plugins can significantly improve performance.

Finally, a reminder: Performance optimization should be determined according to the actual situation. Sometimes, using **Worker Threads **may be enough, and it's not necessary to use the "heavy weapon" like C++. Choosing the appropriate optimization solution is more important than blindly pursuing extreme performance!

If the article helps you, please give it a like ❤️

💖 💪 🙅 🚩
evle
Max

Posted on November 22, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related