Yarn Berry: The Life Savior That Will Save Us From node_modules
Sol Lee
Posted on March 18, 2024
This is a translated post from the original article: https://toss.tech/article/node-modules-and-yarn-berry
What's Yarn Berry?
Yarn Berry is a package management system for Node.js, created by Mäl Nison, the main developer of Yarn v1. The official version (v2) has been released since January 25, 2020, and is now being adopted by large open source repositories such as Babel. Yarn Berry is managed by source code in the GitHub yarnpkg/berry repository.
Yarn Berry revolutionizes the existing "broken" NPM package management system.
Problems with NPM
NPM is provided by default when installing Node.js and is used widely, but some aspects are inefficient or broken.
Inefficient dependency search
NPM uses a file system to manage dependencies. It is characterized by the use of familiar nodes_modules
folders. When managed in this way, dependencies searching works inefficiently.
For example, let's assume a situation in which the react
package is imported using the statement require()
in the /Users/toss/dev/toss-frontend-libraries folder.
You can use the require.resolve.paths()
function provided by Node.js to check the list of directories that NPM searches for.
> require.resolve.paths('react')
[
'/Users/toss/dev/toss-frontend-libraries/repl/node_modules',
'/Users/toss/dev/toss-frontend-libraries/node_modules',
'/Users/toss/node_modules',
'/Users/node_modules',
'/node_modules',
'/Users/toss/.node_modules',
'/Users/toss/.node_libraries',
'/Users/toss/.nvm/versions/node/v12.16.3/lib/node',
'/Users/toss/.node_modules',
'/Users/toss/.node_libraries',
'/Users/toss/.nvm/versions/node/v12.16.3/lib/node'
]
NPM continues to browse the node_modules
folder in the parent directory to find the package. Therefore, the more you can't find the package right away, the slower I/O calls such as readdir
and stat
are repeated. In some cases, I/O calls fail in the middle.
Until TypeScript 4.0, the package discovery using node_modules
was too inefficient to look for type information inside node_modules
until the package was first imported.
Dependent on the environment
If NPM cannot find a package, it continues to search in its parent directory. Which dependencies can be found depends on the package's parent directory environment.
For example, depending on which node_modules
the parent directory contains, it can or cannot invoke dependencies; there is room for incorrectly invoking other versions of dependencies.
It's a bad sign that behavior changes depending on the environment, because it becomes difficult to reproduce the situation.
Unefficient installations
NPM's node_modules
directory structure takes up a quite large amount of space. Generally, even simple CLI projects require hundreds of megabytes of node_modules
. Not only does it take up a lot of space, but it also requires a lot of I/O workload to create a large node_modules
directory structure.
Because the node_modules
folder is complex, it is difficult to verify valid installation. For example, in a complex dependencies tree where hundreds of packages depend on each other, the node_modules
directory structure gets deep.
This deep tree structure requires a large number of I/O calls to verify that dependencies are well installed. Disk I/O calls are typically much slower than dealing with the data structure in memory. This issue causes Yarn v1 or NPM to only validate the underlying dependency tree, not to verify that each package is correct.
Phantom dependency
NPM and Yarn v1 use hoisting
to avoid redundant node_modules
.
For example, let's say that the dependencies tree looks to the left image.
In the left tree, the [A(1.0)] and [B(1.0)] packages are installed twice, wasting disk space. NPM and Yarn v1 change the shape of the original tree like the one on the right to save disk space.
Now the [B(1.0)] library, which was not originally available in package-1, can now be retrieved.
As you pull it up, you can require()
a library that you don't rely on directly. This is called 'Phantom Dependency'.
When Phantom Dependencies occur, libraries not specified in package.json
become available quietly. Also other dependencies may disappear silently when removed from package.json
. This characteristic makes the dependency management system confusing.
Plug'n'Play (PnP)
Yarn Berry addresses the above mentioned issues using a new Plug'n'Play strategy.
The context
Yarn v1 creates a dependencies tree based on the package.json
file and creates a node_modules
directory structure on the disk. This way, the dependencies tree becomes uncovered.
The dependencies management using the node_modules
file system is fragile. Should all package managers use error-prone Node-embedded dependency management systems? What if package managers don't just create node_modules
directory structures, but also manage dependencies more fundamentally and securely?
Plug'n'Play started with this idea.
Turn on Plug'n'Play
Yarn Berry is available by downloading the latest version of Yarn from NPM and setting the version to Berry.
$ npm install -g yarn
$ cd ../path/to/some-package
$ yarn set version berry
How Plug'n'Play works
When you install dependencies with yarn install in Plug'n'Play install mode, you will see a different scene.
As we can see above, Yarn Berry does not create nodes_modules
. Instead, the .yarn/cache
folder stores information about the dependencies, and the .pnp.cjs
file records where you can find the dependencies. .pnp.cjs
lets you know right away which packages depend on which libraries and where each library is located without disk I/O.
For example, the react
package appears in the .pnp.cjs
file as follows.
["react", [
["npm:17.0.1", {
"packageLocation": "./.yarn/cache/react-npm-17.0.1-98658812fc-a76d86ec97.zip/node_modules/react/",
"packageDependencies": [
["loose-envify", "npm:1.4.0"],
["object-assign", "npm:4.1.1"]
],
}]
]],
You can see that it provides a complete list of locations and dependencies for the react 17.0.1 version of the package, which lets you know when you need information about specific packages and dependencies.
Yarn overwrites the behavior of the require()
statement provided by Node.js to help you find packages efficiently, so you should use the yarn node command instead of the node command when managing dependencies using the PnP API.
$ yarn node
Generally, when you run the Node.js app, you register the execution script with scripts in package.json
, and just run the script with Yarn, as Yarn v1, and it automatically brings dependencies to PnP.
$ yarn dev
ZipFS (Zip Filesystem)
Following is .yarn/cache
folder where zip-bound libraries are stored.
In Yarn PnP system, each dependency is managed as a Zip archive. For example, Recoil 0.1.2 version is managed as a compressed file such as recoil-npm-0.1-9a0edbd2b9-c69105dd7d.zip.
The contents of the Zip archive are then dynamically referenced as specified by the .pnp.cjs
file.
Managing dependencies with Zip archives provides the following benefits.
- Installation completes quickly because you no longer need to create
node_modules
directory structures. - Each package has only one Zip archive per version, so it's no redundant. Considering each Zip archive is compressed, storage capacity can be greatly saved.
- In fact, we were able to drastically reduce the size of dependence at toss team.
- For one service, the
node_modules
directory accounted for approximately 400 MB when NPM was used, but the dependency directory was only 120 MB in size when Yarn PnP was used. - Because there are not many files that make up a dependency, it's quick to detect changes or delete the entire dependency.
- It's easy to find dependencies that you don't have or that you don't need anymore.
- When the contents of a Zip file change, it can be easily detected compared to the checksum.
Results of Plug'n'Play introduction
When searching for dependencies
When searching for dependencies, you no longer have to navigate through the node_modules
directory because you use the data structure provided by the .pnp.cjs
file to locate the dependencies directly. This significantly reduces the time for required()
.
Reproducibility
Because all dependencies of packages are managed using the .pnp.cjs
file, they are no longer affected by external environments. This ensures that the behavior of the require()
or import
statement will be the same across various devices and CI environments.
When installing dependencies
You no longer have to create a deep node_modules
directory for installation. Additionally, packages of the same version can be copied multiple times, much like NPM installs, which can dramatically reduce installation time. In addition, Zero-installation allows you to use most libraries without installation.
This can significantly save time in places where repeated dependencies installation occurs, such as CI. The Toss team reduced the installation from the original CI to 60 seconds by introducing Yarn PnP.
Strict dependency management
Yarn PnP does not lift dependencies up
as in node_modules
. This allows each package to access only the dependencies it describes in package.json
. Code that could have been accidentally operated depending on the environment are managed more strictly. This fundamentally prevents phantom dependencies, which used to easily cause unexpected bugs.
dependency verification
When you used node_modules
to manage dependencies, the dependencies might not be installed correctly and you might have to erase and reinstall the entire dependencies folder. This is because it was difficult to verify the node_modules
folder. More than a minute was wasted rebuilding the node_modules
directory structure when performing a full reinstallation.
Yarn PnP manages packages using Zip files, so it's easy to find missing dependencies or change dependency files. This makes it easy to correct dependencies when they go wrong. This ensures that dependencies are installed correctly close to 100%.
Zero-Install
So far, we have seen some benefits of Yarn Berry's introduction of PnP. We can take it even further. How about how about managing the dependencies with git
?
Yarn PnP has a small memory usage because it manages dependencies as compressed files. Additionally, each dependency is represented by a single Zip file, so the number of files that make up the dependencies is not as large as the NPM. For example, a typical node_modules
is 1.2GB in size and consists of 135,000 files, while Yarn PnP's dependencies consist of 2,000 compressed files with 139MB in size.
Because of this low memory usage and number of files, Yarn Berry allows you to manage dependencies with Git, and you can find even greater advantages in managing this version of dependency.
In Yarn Berry, including dependencies in version management is called Zero-Install.
This practice has some benefits.
You don't have to run
yarn install
just because you've replicated a new repository or changed a branch. Usually, when you changed a branch to a different dependency, you had to install them too. In some cases, incorrect dependency versions were used, causing the service to malfunction for unknown reasons. With Zero-Install, this problem is completely resolved. In addition, it can function as an offline cache when the network is down.You can significantly save time installing dependencies on CI. We typically required 60 to 90 seconds to install dependencies when cache does not exist. With Zero-Install, dependencies become available immediately when replicating storage with Git Clone, eliminating the need to install dependencies. This significantly saved CI time.
By actively introducing Zero-install capabilities into the repository, we were able to significantly reduce build and deployment time.
Other things about Yarn Berry
- Plug-in System: Yarn Berry boasts a plug-in-friendly environment, with core functions also being developed using plug-ins. You can expand Yarn's capabilities as much as you need to, making it easy to use the CLI. In the Toss front-end chapter, Hyunseop Lee created a plug-in to calculate the changed workspace in a matter of days. If Yarn Berry lacks the capabilities, you can easily create plug-ins.
- Workspaces: Yarn Berry offers incomparably more complete workspace capabilities than Yarn v1. You can see Yarn Berry's Git repository representative use case. It's impressive to see how changes to one package's source code are immediately reflected in another, even with TypeScript. The Toss front-end chapter is also actively using workspace capabilities.
-
Basic support for patch commands: In some cases, you may want to modify only a portion of the library that is distributed to the NPM. Yarn Berry provides the
yarn patch
command, making it easy to modify and use portions of the library. These patch files can be easily used to install dependencies using patch protocol.
With Yarn Berry, Toss team was able to handle JavaScript dependencies efficiently and securely. We also reduced CI speed by more than 60 seconds.
Posted on March 18, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024