JavaScript Monorepo Implemented by Lerna with Yarn Workspaces and Git Submodules
liyachun
Posted on November 21, 2020
This is me: ๐ฃ.
And my thoughts while implementing a JavaScript monorepo using lerna and yarn workspaces, as well as git submodules.
Disclaimers
- The term
monorepo
seems to be controversial when it comes to project structuring, some may prefermulti-package
(lerna
itself once wasA tool for managing javascript monorepos
, it's nowA tool for managing JavaScript projects with multiple packages
) . - Not a step by step guide on tools, links to well maintained official docs will be provided.
- To record (not to debate) my own thoughts and details-of-implementation on 'monorepo'. Corrections and guidances are welcome!
Monorepo What and Why
TL; DR
Back to those early days in my web projects as a noob, typically I would create repositories like one named frontend
, another one named server
, separately maintained and git-versioned. In the real world two simple sub-repositories may not cover many of those complicated scenarios. Think about those lovely UI components you would like to pet and spread, and those clever utils/middlewares you want to extract and share.
frontend # a standalone repo
โโโ scripts
โโโ components
โ โโโ some-lovely-ui
โ โโโ ...
โโโ index.html
โโโ ...
server # a standalone repo
โโโ utils
โ โโโ some-mighty-util
โ โโโ ...
โโโ middlewares
โ โโโ some-clever-middleware
โ โโโ ...
โโโ router.js
โโโ app.js
โโโ package.json
โโโ ...
The noob structure
Yes, we must protect our innovative ideas, by creating a few more standalone repositories, which should turn the whole project into a booming repo-society.
webapp # standalone
โโโ node_modules
โโโ package.json
โโโ .gitignore
โโโ .git
โโโ dotenvs
โโโ some-shell-script
โโโ some-lint-config
โโโ some-lang-config
โโโ some-ci-config
โโโ some-bundler-config
โโโ ...
server # standalone as it was
โโโ node_modules
โโโ package.json
โโโ .gitignore
โโโ .git
โโโ dotenvs
โโโ same-old-confs
โโโ ...
whateverapp # say, an electron-app
โโโ same-old-js # a standalone javascript-domain repo, again
โโโ ...
some-lovely-ui # need to be independently bootstraped and managed
โโโ same-old-setup
โโโ ...
some-mighty-util # share almost identical structure
โโโ same-old-structure
โโโ ...
some-clever-middleware # inherit absolute pain
โโโ same-old-pain
โโโ ...
The real world?
With the help of the
link
command provided by yarn (-link) or npm (-link), you can easily try development features across projects and packages on the fly. Say, if you are developingProject A
andPackage B
simultaneously but as separate repos, and want use B as a dependency of A. Performyarn link
underPackage B
andyarn link B
underProject A
, you will find a symlink folder likepath/to/A/node_modules/B
(same result when you perform ayarn add
) which links to your still-active localPackage B
. Details beyond scope, dig for yourself.
So far so good, until then you quickly find yourself annoyed by what everybody tends to get rid of: Repository Bootstrapping, during which, if you care about maintainability and consistency, almost identical configurations have to be set for version control, dependency control, bundling, linting, CI, etc. meanwhile almost identical solutions have to be make to avoid madness, one of the baddest villains for example: The 'node_modules' ๐ณ๏ธ.
The Silver Lining
While dirty jobs must not be avoided, there is still a silver lining hereโdirty jobs done once and for all, at least to get rid of the duplicated painfulness.
The approach is simple. Step zero, since all the repositories we've built are meant to serve the same big blueprint, joining them into one single repository sounds just modern and intuitive.
the [project] root
โโโ apps
โ โโโ webapp
โ โโโ server
โ โโโ some-lovely-ui
โ โโโ some-mighty-util
โ โโโ ...
โโโ ...
The what?
Such approach, looks like a history rewind. As I've not-very-deeply learned, many ancient projects in corporations used to be structured in a monolith
ic way, but gradually suffer from maintenance and collaboration problems. Wait, still?
What is the confusion? What is our goal by putting things together? Our wish:
- Being saved from redundant jobs.
- Promote code consistency
- Version control made easy
- Best practices possible for all sub projects.
MANAGEABILITY, I think.
Manageability Up
The [project] root
โโโ apps
โ โโโ webapp
โ โ โโโ package.json # sub-project manifests and deps
โ โ โโโ lint-conifgs # sub-project-wide lint, can extend or override global confs
โ โ โโโ lang-configs # sub-project-wide, can extend or override global confs
โ โ โโโ bundler-configs # sub-project-wide
โ โ โโโ README.md
โ โ โโโ ...
โ โโโ server
โ โ โโโ package.json # sub-project manifests and deps
โ โ โโโ sub-project-level-confs
โ โ โโโ ...
โ โโโ some-lovely-ui
โ โ โโโ sub-project-level-stuff
โ โ โโโ ...
โ โโโ some-clever-middleware
โ โ โโโ ...
โ โโโ ...
โโโ package.json # global manifests, deps, resolutions, root-only deps (husky for instance)
โโโ .gitignore # git once for all
โโโ .git # git once for all
โโโ dotenvs # dotenvs for all
โโโ shell-scripts # maintainance for all
โโโ lint-configs # lint for all
โโโ lang-configs # helpers for all
โโโ ci-configs # publish made handy
โโโ bundler-configs # bundler for all
โโโ ...
The advanced structure
Here we've introduced several familiar faces into the root of the project directory, they are manifests or config files once only dwelled in each sub-project. This made these configs effect project-wide, allowing a baseline to be set and shared among all sub-projects, aka code consistency. A sub project may still hold its private-scope configs to override or extend the global standardโall thanks to the inheritance-like
feature in most dev toolchainsโif a variation has to be made, in many cases.
The model must be flexible. Take code linting for instance, to save lives, almost every well maintained framework, both frontend and backend included, has their own boilerplates and cli tools to the rescue, which usually have linting covered. Though many of them use eslint and promote the JavaScript Standard Style, there are ts / js variations, there are still tslint advocates, there are complexity between lint plugins, and for sure there can be strict-or-not conventions in your company. Wise choices have to be made on your own responsibility.
Bravo?
Let's now bravely call our project a monorepo
already! By the name we infer (?) that this is basically a project with all its ingredient parts in a single / monophonic repository. Meanwhile the ability of serving a project-wide but extendable development standard is made possible.
Manageability achieved! Now who be the manager?
Sir, We have a problem๏ผ
-
The installing process for a JS project is never satisfying. It creates fat and tricky
node_modules
. Multiple projects in one?๐ญ Not human-life-saving: I have to
cd
and performyarn add
per sub-project folder.๐ Not battery-life-saving: A sub-project's deps are installed under its own directory. To the global scale, heavy loads of duplications are produced and will keep expand.
Cleverer ideas and methods needed for handling sub-project versions, and cross-d relations.
Introducing Lerna
In action, I didn't come across like having headaches resolving package dependencies, went search and found a solution. I've heard of
lerna
in the first place, found it handy while performing semver bumps, and then got to knowmonorepo
goodness.
As described on its website, lerna is a tool for managing JavaScript projects with multiple packages.
A lerna init command creates a new (or updgrade an existing project into a) lerna project, which typically structures like:
root
โโโ lerna.json
โโโ package.json
โโโ node_modules
โโโ packages
โโโ packageA
โย ย โโโ node_modules
โย ย โโโ package.json
โย ย โโโ ...
โโโ packageB
โย ย โโโ node_modules
โย ย โโโ package.json
โย ย โโโ ...
โโโ ...
Looks like pretty much a lerna.json file introduced into our previous mono-structure. The file is the config file for your globally npm-installed or yarn-added lerna command line tool, a project-wide lerna should also be automatically added to root/package.json/devDependencies
.
A minimal effective lerna config be like:
// [project/root]/lerna.json
{
"packages": ["packages/*"],
"version": "independent",
"npmClient": "yarn" // or npm, pnpm?
// ...
}
The packages
entry is a glob list that matches the locations of sub-projects, for instance, "["clients/*", "services/*", "hero"]
should make valid sub-projects (having a valid package.json) directly located under clients
and services
, as well of the exact hero
project which located under the root, recognized as lerna packages.
The version
entry, if given a valid semver string, all packages should always share the same version number. "independent" means packages have different versions in parallel.
Useful Commands
-
lerna bootstrap (once, from any location, project wide):
๐ญ Install dependencies for every single package (sub-project only, root dependencies not included), no per directory by-hand installs.
๐ With a
--hoist
flag, can resolve duplication of common dependencies.โ๏ธ Link cross dependencies, same results (see lerna add and lerna link) as performing
yarn link
s per package lerna clean: Remove installs (purge the
node_modules
folder) from every package (root excepted)-
lerna version and lerna publish as lerna's selling point:
BETTER READ THE DOCS FOR THIS SECTION BY YOURSELF
You must be smart if you use conventional commits in your repo at the same time, it gives you much more advantages.
Use Conventional Commits
A repo who follows the Conventional Commits has its commit messages structured as follows:
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
We have husky the git hooks manager, as well as commitizen the commit util. They create interactive prompts as you
git commit
, simplifying the making of commit messages. Dig it for your self.
Informations provided in a conventional commit message correlate with the Semantic Versioning spec very well. Typically, given that a full semver number can be MAJOR.MINOR.PATCH-PRERELEASE
:
- As a possible value of the type section, a
fix
commit should stand for aPATCH
semver bump,. - A
feat
commit stands for aMINOR
bump. - The
BREAKING CHANGE
optional footer stands for aMAJOR
bump.
This makes it easier to write automated tools on top of.
Meanwhile with lerna, an illustrational workflow on conventional version bump
- Current package versions (independently versioned)
- Make some updates
- A
MAJOR
level performance updates on Package A, withperf(package-a)!: bump electron version
as the commit message. - A
MINOR
level feature updates on Package B, with afeat(package-b): add folder draggability
commit message. - A
PATCH
level fix on Package C, with afix(package-c/error-interception): fix type defs
. - No modifications on Package D.
- A
- Perform
lerna version
with the--conventional-commits
flag, the process and the results- Read current versions from the
package.json
s. - Read from git history (and actual code changes), determine what commit was made in what package.
- Resolve commit messages, generate corresponding version bumps.
- Once get confirmed, will:
- Modify
package.json/version
s. - Create a git commit as well as new version tags (the message format can be configued in
lerna.json
). - Push to remote.
- Modify
- Read current versions from the
- New versions
You should read the docs for prerelease bumps and more capabilities utilizing lerna.
Introducing Yarn Workspaces
Using lerna to handle package installs, though is applicable, is not a very good idea. Especially when you are having root-only dependencies, and when you are using Yarn (the classic version).
This article only focuses on yarn 1.x only. Having introduced the currently-not-very-widely-supported but advanced Plug'n'Play feature, Yarn 2.x has had very significant diverges both conceptually and functionally. It's even incubating its own release-workflow, fyi.
Hoist in Lerna
says this official blog from yarn, which also introduced yarn workspaces and its relationship with Lerna
With the above said, I don't really remember since which version, to solve duplicated installation problem, Lerna does provide a --hoist flag while it bootstrap
s.
root
โโโ package.json # deps: lerna
โโโ node_modules
โ โโโ typescript @4.0.0 # HOISTED because of being a common dep
โ โโโ lodash ^4.17.10 # HOISTED because of being a common dep
โ โโโ lerna # root only
โ โโโ ...
โโโ package A
โ โโโ package.json # deps: typescript @4.0.0, lodash ^4.17.10
โ โโโ node_modules
โ โ โโโ .bin
โ โ โ โโโ tsc # still got a tsc executable in its own scope
โ โ โ โโโ ...
โ โ โโโ ... # typescript and lodash are HOISTED, won't be installed here
โ โโโ ...
โโโ package B
โ โโโ package.json # dpes: typescript @4.0.0, lodash ^4.17.10
โ โโโ node_modules
โ โ โโโ .bin
โ โ โ โโโ tsc # still got a tsc executable in its own scope
โ โ โ โโโ ...
โ โ โโโ ... # typescript and lodash are HOISTED, won't be installed here
โ โโโ ...
โโโ package C
โ โโโ package.json # dpes: lodash ^4.17.20, wattf @1.0.0
โ โโโ node_modules
โ โ โโโ .bin
โ โ โ โโโ wtfdotsh # got an executable from wattf
โ โ โ โโโ ...
โ โ โโโ lodash ^4.17.20 # only package C asks for this version of lodash
โ โ โโโ watf @1.0.0 # package C's private treasure
โ โ โโโ ...
โ โโโ ...
โโโ ...
which means that common dependencies around the repo should get recognized and installed only once into the project/root/node_modules
, while the binary executable of each (if it has one) should still be accessible per package/dir/node_modules/.bin
, as required by package scripts.
However, still, this absolutely very positive feature is only available during lerna bootstrap
, while in most common cases we are installing new packages during development, using a package manager.
Plus, Lerna knows the disadvantages with hoisting, and it doesn't have a way to solve it.
So far with Lerna:
๐ญ Good for managing "macro"-scopic packages.
๐ฌ Bad at resolving microscopic dependencies.
- Easy-to-break package symlinks.
- None-desirable overhead control.
Nohoist in Yarn
Finally we welcome Yarn Workspaces on stage. And she comes with such a duty:
- She has Hoisting as her key feature.
- She knows the caveats of hoisting as well, and provides a
โno-hoist
option (very helpful, PLEASE DO READ THIS).
Its even easier to call her number, by modifying your existing repo/root/package.json
.
[root]/package.json
{
"private": true,
// pretty familliar setup like Lerna
"workspaces": ["workspace-a", "workspace-b", "services/*"]
}
This turn a repo into workspaces
Now, instead of lerna bootstrap
, calling yarn [install/add]
anywhere in the repo and anytime during dev, hoisting will be applied (honestly, more time consuming, but tolerable by all means).
What about nohoisting? Sometimes you don't want some package / workspace having some of there deps installed globally even though they share common versions. It's as simple as adding yet another entry with glob patterns.
[root]/package.json
{
"private": true,
"workspaces": {
// this even more like Lerna
"packages": ["workspace-a", "workspace-b", "services/*"],
// exceptions here, globs
"nohoist": ["**/react-native", "**/react-native/**"]
}
}
DETAILS? AGAIN, PLEASE DO READ THIS FINE BLOG FROM YARN.
Friendship
Its easy to notice similarities in the way Lerna and Yarn manifest a monorepo. In fact the integration of both is encouraged by Yarn and programmatically supported in Lerna.
[root]/lerna.json
{
"npmClient": "yarn",
"useWorkspaces": true
// ...
}
This join hands together
The above useWorkspaces
, once set to true
, we get Lerna to read package / workspace globs from package.json
instead.
Our original goal
- [x] A manageable monorepo
- [x] Package / Workspace versioning made easy
- [x] Low level dependency well controlled
Not an Intruder - Git Submodules
In my actual dev experience, I'd ran into scenarios as follows:
- I have to pick some package out, cuz I want opensource it.
- I am not satisfied with some certain dependency, I'd better fork it and constantly modify and use it in action.
A none-perfect solution
With Git Submodules, we can leverage git as an external dependency management tool as well. In a nutshell, it made possible placing a package inside a big repo, while having its private scope git storage. Details of implementation, please read the above links and this github blog.
For a quick peek, see this sample project structure:
root
โโโ apps
โ โโโ auth-web # a lerna package / yarn workspace
โ โโโ electron-app # a lerna package / yarn workspace
โ โโโ ...
โโโ nest-services # a lerna package / yarn workspace
โโโ submodules
โ โโโ awesome-plugin # MUST NOT be a lerna package / yarn workspace
โ โ โโโ node_modules # deps manually installed
โ โ โโโ package.json # nohoist anything
โ โ โโโ .git # havs its own git history with its own remote origin
โ โโโ some-framework-adapter # MUST NOT be a lerna package / yarn workspace
โ โ โโโ .tsconfig.json # private configs
โ โ โโโ .ci-conf # SHOULD have its own CI config
โ โ โโโ .eslintrc # MAY break code consistency.
โ โ โโโ .git
โ โ โโโ ...
โ โโโ ...
โโโ package.json
โโโ lerna.json
โโโ .gitmodules # the config for submodules
โโโ .git # project git history
โโโ ...
And this config:
# [root]/.gitmodules
[submodule "submodules/awesome-plugin"]
path = submodules/awesome-plugin
url = https://github.com/awesome-plugin
[submodule "submodules/some-framework-adapter"]
path = submodules/some-framework-adapter
url = https://private.gitlab.com/some-framework-adapter
Caveats:
- The implementation is tricky.
- Its recommended that a submodule should not be a Lerna package / workspace, meaning we should regard it as a completely standalone project, perform everything respectively.
- Can possibly break the code consistency.
USE WITH CAUTION.
Conclusion - your own responsibility
As I've been sticking with the Lerna-Yarn-Workspaces scheme for a while, questionmarks constantly emerge. Here are some notes of mine.
- Git commits must be strictly governed, or they could easily end up a mess. For instance, you should always avoid blending changes in various packages into one commit.
- Handle dependencies carefully. I've made mistakes while I was dealing with multiple Nestjs projects. Nest with the help of its CLI tool has its own monorepo mode. I radically tried to merge the Nest monorepo into the Lerna-Yarn-Workspaces one. So I moved all nest-ly common deps (say: express, typescript, prettier plugins) to the project root, make every nest workspace a yarn workspace. This ended up with warnings everywhere, breaking the overall ecosystem. Turns out I had to leave nest inside its own playground and find back inner peace.
I've also investigated the Rushstack a bit, another monorepo implementation from Microsoft. It works best with pnpm
and has many conceptual differences from Lerna. For me the most significant is it doesn't encourage root package.json, and they have their ideas on husky and pre-commit git hooks. Moreover its configs are somehow complicated, should be suitable for LARGE monorepos, in things like even detailed file permissions, I think.
I still use Lerna and Yarn for my own convenience and simplicity. And now the final question: Should I always PUT EVERYTHING IN, company-wide for example, like what some big firms does; Or should I be cool, do it project by project; or even completely avoid this approach๏ผ
The answer? Maintaining monorepos isn't easy, weigh pros and cons on your own responsibility.
References
Monorepos in Git | Atlassian Git Tutorial
Guide to Monorepos for Front-end Code
Misconceptions about Monorepos: Monorepo != Monolith
License Compliance Question ยท Issue #673 ยท microsoft/rushstack
https://www.youtube.com/watch?v=PvabBs_utr8&feature=youtu.be&t=16m24s
[rush] Support Husky for git commit hooks ยท Issue #711 ยท microsoft/rushstack
[rush] Add support for git hooks by nchlswhttkr ยท Pull Request #916 ยท microsoft/rushstack
Posted on November 21, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.