You Don't Know The REAL Theory Behind Node.JS!
Atena Dadkhah
Posted on April 21, 2023
Introduction
Have you ever wondered how Node.js, one of the most popular JavaScript runtimes, works under the hood? 🤔 While it's easy to get caught up in the excitement of building web applications with Node.js, it's essential to understand the theory behind it.
In this post, we'll take a deep dive into the inner workings of Node.js, exploring what's really happening behind the scenes when you write JavaScript code using Node.js. From its event-driven architecture to its non-blocking I/O operations, we'll demystify the theory behind Node.js and shed light on how it was built.
🚀Get ready to uncover the real theory behind Node.js and gain a deeper understanding of this powerful technology!💡
Architecture Behind The Scenes
So let's represent the architecture here in terms of nodes dependency which are just a couple of libraries that node depends on in order to work properly.
So the node run time has several dependencies and the most important ones are the V8 engine and libuv.
V8 JavaScript Engine | libuv |
---|---|
V8
You might have known that Node is a JavaScript run time based on Google V8 engine. If it wasn't for V8, Node would have absolutely no way of understanding the JavaScript code that we write and therefore V8 is a fundamental part in the Node architecture.
So the V8 engine is what converts JavaScript code into machine code that a computer can actually understand.
But that alone is not enough to create a whole server-side framework like Node and so that is why we also have libuv.
libuv
libuv is an open source library with a strong focus on asynchronous I/O. This layer is what gives Node access to the underlying computer operating system, file system, networking, and more; besides that libuv also implements to extremely important features of Node.js which are the Event Loop and also the Thread Pool.
In simple terms, Event Loop is responsible for handling easy tasks like executing callbacks and network IO, while the Thread Pool is for more heavy works like file access or compression.
Programming Languages
One important thing to note is that libuv is actually completely written in C++, and not in JavaScript! and V8 itself also uses C++ code besides JavaScript. So therefore Node itself is a program written in C++ and JavaScript and not just in JavaScript as you might expect.
Now the beauty of this that Node.js ties all these libraries together no matter if written in C++ or JavaScript and then gives us developers access to their functions in pure JavaScript.
So it really provides us with a very nice layer of a abstraction in order to make our lives a lot easier instead of us like having to mess with C++ code! that would be a terrible experience right? 😂
This architecture allows us to write 100% pure JavaScript code running in Node.js and still access functions like file reading, which behind the scenes are actually implemented in the libuv or other libraries in the C++ language.
And speaking of other libraries, Node does actually not only rely on V8 and the libuv but also on HTTP parser for passing HTTP, c-ares for some DNS request, open SSL for cryptography, and also zlib for compression.
So in the end when we have all these pieces nicely fit together we end up with Node.js ready to be used on the server side or for applications.
Thread Pool
So first off when we use Node on a computer, it means that there is a Node process running on that computer. And the process is just a program in execution and you already learned that Node.js is basically a C++ program which will therefore start a process when it's running. This is important because in Node we actually have access to a processed variable. In that process Node.js runs in a so-called Single Thread.
And a thread is basically just a sequence of instructors. You may imagine a thread as being a box where our code is executed in computers processor.
Now What is important to understand here is the fact that Node runs in just one thread where it makes it easy to block Node applications. It's something really really important to remember because this is one of the unique features that Node brings to the table; so again if you run your Node application it will run in just a single thread no matter if you have 10 users or 10 million users accessing your application at the same time. And you need to be very careful about not blocking that thread!
What happens in a single thread? 🧐
When the program is initialized, all the top level code is executed which means all the code that is not inside any callback function. also all the modules that your app needs are required and all the callbacks are registered. Then after all that the event loop (the heart of node app 🫀) finally starts running.
But some tasks are actually too heavy and expensive to be executed in the event loop because they would then block the single thread and that's where the thread pool comes in, which just like the event loop is provided to Node.js by libuv library. So the Threat pool gives us four additional threads that are completely separate from the main single thread. We can actually configure it up to 128 threads but usually these four are enough. so this threads together form the thread pool and the event loop can automatically offload heavy tasks to the thread pool. And all this happens automatically behind the scenes. It's not us developers who decide what goes to the thread pool and what don't. 🤷
The expensive tasks that do get offloaded are all operations dealing with files. everything related to cryptography like hashing passwords and all compression stuff and also DNS lookups which basically matches web domains to their corresponding real IP addresses. so this is the stuff that would most easily block the main thread and Node takes care of automatically offloading them into the thread pool where they don't block our event loop.
Event Loop
The event loop is where all application code inside callback functions is executed. It's the heart of Node architecture, which uses an event-triggered approach. When an event like an HTTP request or timer expiration occurs, it emits an event that the event loop picks up and calls the associated callback function. The event loop has multiple phases with their own callback queues, including expired timers, I/O polling and execution, setImmediate callbacks, and close callbacks. There are also special queues for nextTick() and other microtasks. After each phase, callbacks in these queues are executed immediately. Node determines whether to continue to the next tick or exit the application based on whether there are pending timers or I/O tasks. Understanding the event loop is crucial for writing performant code in Node.js. For more details, refer to the official Node documentation.
Event-driven Architecture
So most of Node's core modules, like HTTP, File System, and Timers are built around an event-driven architecture, and we can of course also use this architecture to our advantage in our own code. And the concept is actually quite simple. So, in Node, there are certain objects called event emitters that emit named events as soon as something important happens in the app, like a request hitting server, or a timer expiring, or a file finishing to read. These events can then be picked up by event listeners that we developers set up, which will fire off callback functions that are attached to each listener, So again, on one hand, we have event emitters, and on the other hand event listeners that will react to emitted events by calling callback function.
When we want to create a server, we use the Create Server method and save it to a Server Variable. server.on
method is how we actually create a listener, and in this case for the "request" event. So let's say we have our server running, and a new request is made. The server acts as an emitter, and will automatically emit an event called "request" each time that a request hits the server. Then, since we already have a listener set up for this exact event, the callback function that we attached to this listener will automatically be called. And this kind of function will simply send some data back to the client. Now, it works this way because behind the scenes the server is actually an instance of the Node.js EventEmitter class, so it inherits all this event emitting and listening logic from that EventEmitter class.
EventEmitter logic is called the Observer Pattern in Javascript programming in general, and it's quite a popular pattern with many used cases. So the idea is I set there an observer, in this case the event listener, which keeps waiting, keeps observing the subject that will eventually emit the event that the listener is waiting for. And the opposite of this pattern is simply functions calling other functions, which is something that we're more used to actually, right? But the observer pattern has been designed to react rather than to call. And that is because there is a huge benefit of using this architecture, which is the fact that everything is more de-coupled. We don't have, for example, functions from the File System module calling functions from the HTTP module because it would be a huge mess. Instead, these modules are nicely de-coupled and self-contained, each emitting events that other functions, even if they come from other modules can respond to. Also, using an event-driven architecture makes it way more straight forward to react multiple times to the same event. All we have to do is to set up multiple listeners.
Streams
Used to process (read and write) data piece by piece (chunks),
without completing the whole read or write operation, and
therefore without keeping all the data in memory.
1. Readable streams
Readable streams allow reading data piece by piece, making them useful for handling large text files or data coming in through an HTTP request. Readable streams are instances of the EventEmitter class, meaning they can emit and listen to named events. The most important events for readable streams are the data event, which is emitted when there is new data to consume, and the end event, which is emitted when there is no more data to consume. In addition to events, there are important functions like pipe and read that can be used with readable streams.
2. Writable streams
Writable streams are the opposite of readable streams, and examples include HTTP responses that can be sent back to clients. When sending data, it needs to be written to a writable stream. The most important events for writable streams are the drain and finish events, and the most important functions are the write and end functions. Writable streams are commonly used for streaming large files, such as videos, similar to how Netflix or YouTube stream content to users.
3. Duplex streams
They're simply streams that are both readable and writeable9at the same time. These are a bit less common. But anyway, a good example would be a web socket from the net module. And a web socket is basically just a communication channel between client and server that works in both directions and stays open once the connection has been established.
4. Transform streams
Transform streams in Node.js are both readable and writable, and can modify or transform data as it is read or written. An example of a transform stream is the zlib core module used for data compression. The text also mentions that the events and functions discussed are for consuming streams that are already implemented in Node.js, such as HTTP requests and responses. It is possible to implement custom streams and consume them using the same events and functions, but this topic is not covered in detail as it is more important to know how to consume streams rather than implement them for most applications.
And that's it! 🥳
Conclusion
In conclusion, understanding the architecture and inner workings of Node.js is crucial to building efficient and scalable web applications. Node.js relies on several dependencies, including the V8 engine and libuv, which work together to provide JavaScript runtime capabilities on the server side. The V8 engine converts JavaScript code into machine code, while libuv is responsible for handling asynchronous I/O and implementing features such as the event loop and thread pool. Node.js allows developers to write 100% pure JavaScript code while accessing functions implemented in C++ or other languages, providing a layer of abstraction that makes development easier. However, it's important to remember that Node.js runs in a single thread, and heavy tasks that could block the event loop can be offloaded to the thread pool. By understanding the architecture behind Node.js, developers can harness its power to build fast and scalable applications. Happy coding with Node.js! 🚀💻
Posted on April 21, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024