Node.js fork is not what you think!
Pooya Parsa
Posted on January 24, 2019
Today I realized that in Node.js, neither cluster.fork
or child_process.fork
act like something you expect in a C environment. Actually, it is shortly mentioned in docs:
Unlike the fork(2) POSIX system call,
child_process.fork()
does not clone the current process.The
child_process.fork()
method is a special case ofchild_process.spawn()
used specifically to spawn new Node.js processes. Likechild_process.spawn()
, aChildProcess
object is returned. The returnedChildProcess
will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child.
So what it means?
Taking a simple C code that forks 5 processes:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
printf("Shared Code\n");
for (int i = 0; i < 5; i++)
{
int pid = fork();
if (!pid)
{
return printf("Worker %d\n", i+1);
}
}
return 0;
}
Compiling and running this code gives us a result like this:
Shared
Worker 1
Worker 2
Worker 3
Worker 4
Worker 5
What operating system does under the hoods, is when we call fork(), it copies entire process state to a new one with a new PID. Return value in the worker process is always 0
so we have a way to find-out if rest of the code is running in forked process or master. (Thanks to @littlefox comment🧡)
The important point is that forked process continues from where fork()
was called. Not from the beginning so Shared
is printed once.
Running a similar code in Node.js:
const { fork, isMaster } = require('cluster')
console.log('Shared')
if (isMaster) {
// Fork workers.
for (let i = 0; i < 5; i++) {
fork();
}
} else {
// Worker
console.log(`Worker`);
}
The output is amazingly different:
Shared
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker
Shared
Worker
The point is that each time a worker forked, it started with a fresh V8 instance. This is not a behavior that it's name tells. Fork in Node.js is actually doing exec/spawn which causes shared code running each time.
OK. So let's move console.log('Shared')
into if (isMaster)
:P
Well. Yes. You are right. That's the solution. But just for this example case!
In a real-world application that needs a cluster, we don't immediately fork workers. We may want to set up our web framework, parse CLI args and require a couple of libs and files. All of this steps has to be repeated on each worker that may introduce lots of unnecessary overhead.
Final Solution
Now that we know what exactly cluster.fork
does under the hood, we can split our worker logic into a seperate worker.js
file and change default value of exec
which is process.argv[1]
to worker.js
:) This is possible by calling cluster.setupMaster()
on master process.
Posted on January 24, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.