Auto scaling Node.js applications with PM2 and pm2-autoscale module

vexell

Slava

Posted on December 4, 2023

Auto scaling Node.js applications with PM2 and pm2-autoscale module

In this article, i would like to tell you how you can easily automatically scale your Node.js applications to handle increased traffic loads using PM2 and keep your server resources under control.

PM2 Cluster Mode

PM2 is a process manager for Node.js applications that allows users to easily manage their Node.js applications and keep them running smoothly. With cluster mode PM2 allows networked Node.js applications to be scaled across all CPUs available, without any code modifications.

For example, if you have server with 8 CPUs you can run 8 instances of your application and increase the performance and reliability of your application.

To run application in cluster mode you can easily create ecosystem configuration file and start you application.

module.exports = {
    apps: [
        {
            name: 'app',
            script: 'build/app.js',
            instances: '8',
            autorestart: true,
            watch: false,
            max_memory_restart: '512M',
            vizion: false,
            exec_mode: 'cluster',
        },
    ],
};
Enter fullscreen mode Exit fullscreen mode

This configuration allows you run 8 workers on your server. And for most cases it would be enough (especially if you have only one application on your server).

But imagine you want to add one more application to your server. Of course you want split resources between your applications. Now you change configuration file for every app and set 4 workers for every application. It will work until any of your applications can handle increased traffic load. If one of the application uses 100% CPU of every worker you should add more workers. In that case you should connect to your server and use pm2 scale command to scale more workers for loaded app and decrease workers for another app.

pm2 scale app +3 # Add 3 new additional instances for your app
Enter fullscreen mode Exit fullscreen mode

When application load has been decreased, you also should connect to the server and revert your changes back. Not comfortable and easy solution. Plus you have to have solution to monitor you applications to detect when you need scale one of your apps.

Another case, for example, you have server with 48 CPUs and you want to run multiple apps on it. If you run every application with instances=max PM2 will run 48 instances and imagine that every instance uses approximately 100Mb of RAM (~5GB for all instances). So if you have 10 application it means you will use about 50GB of the server memory without any server load. You are using your server resources ineffectively.

Unfortunately PM2 does not have any good and simple solutions how to dynamically increase workers and monitor your applications. And with free version you can see CPU utilization and Memory usage only in terminal.

PM2 Autoscale Module

To solve this issue i wrote plugin pm2-autoscale that helps to optimize application’s performance by automatically adjusting the number of instances running based on the CPU utilization of every application.

To use pm2-autoscale module your should install it with command:

pm2 install pm2-autoscale
Enter fullscreen mode Exit fullscreen mode

Module supports few configuration options

  • scale_cpu_threshold Maximum value of CPU utilization one of application instances when the module will try to increase application instances. (default to 30)
  • release_cpu_threshold Average value of all CPUs utilization of the application when the module will decrease application instances (default to 5)
  • debug Enable debug mode to show logs from the module (default to false)

To modify the module config values you can use the following commands:

pm2 set pm2-autoscale:debug true
pm2 set pm2-autoscale:scale_cpu_threshold 50
Enter fullscreen mode Exit fullscreen mode

Now you can modify your ecosystem configuration to run any application with minimum required instances. For example, if you have 8 CPUs and 2 applications, you can set instances=2 for every application and keep available resources of your server. When module detects that CPU utilisation is higher then scale_cpu_threshold it will start increasing instances to max CPUs-1 only if server has available free memory. When module detects CPU utilization is decreasing it will stop useless instances.

See example how it works. I tested it with Apache Benchmark.

0|pm2-autoscale | App "app" has 1 worker(s). CPU: 0.
0|pm2-autoscale | App "app" has 1 worker(s). CPU: 17.
0|pm2-autoscale | App "app" has 1 worker(s). CPU: 29.
0|pm2-autoscale | INFO: Increase workers
0|pm2-autoscale | App "app" scaled with +1 worker
0|pm2-autoscale | App "app" has 2 worker(s). CPU: 26,34.
0|pm2-autoscale | App "app" has 2 worker(s). CPU: 29,43.
0|pm2-autoscale | App "app" has 2 worker(s). CPU: 28,38.
0|pm2-autoscale | INFO: Increase workers
0|pm2-autoscale | INFO: App "app" is busy
0|pm2-autoscale | App "app" scaled with +1 worker
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 27,35,33.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 25,31,22.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 24,29,23.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 23,25,20.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 15,18,17.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 11,13,13.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 8,9,10.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 6,7,7.
0|pm2-autoscale | App "app" has 3 worker(s). CPU: 5,5,5.
0|pm2-autoscale | INFO: Decrease workers
0|pm2-autoscale | INFO: App "app" is busy
0|pm2-autoscale | App "app" decresed one worker
0|pm2-autoscale | App "app" has 2 worker(s). CPU: 4,4.
0|pm2-autoscale | App "app" has 2 worker(s). CPU: 3,3.
0|pm2-autoscale | App "app" has 2 worker(s). CPU: 2,2.
0|pm2-autoscale | INFO: Decrease workers
0|pm2-autoscale | INFO: App "app" is busy
0|pm2-autoscale | App "app" decresed one worker
Enter fullscreen mode Exit fullscreen mode

Thus, if the CPU usage of the application is high, the module can increase the number of instances running to handle the load. Similarly, if the CPU usage is low, the module can reduce the number of instances running to save resources.

Source code of the module is available on GitHub https://github.com/VeXell/pm2-autoscale and if you want to add/change something — just create a pull request.

Thank you for reading this article. Hope you found this useful.

💖 💪 🙅 🚩
vexell
Slava

Posted on December 4, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related