This is me: 🐣.

And my thoughts while implementing a JavaScript monorepo using lerna and yarn workspaces, as well as git submodules.

Disclaimers

The term monorepo seems to be controversial when it comes to project structuring, some may prefer multi-package (lerna itself once was A tool for managing javascript monorepos, it's now A tool for managing JavaScript projects with multiple packages) .
Not a step by step guide on tools, links to well maintained official docs will be provided.
To record (not to debate) my own thoughts and details-of-implementation on 'monorepo'. Corrections and guidances are welcome!

Monorepo What and Why

TL; DR

Back to those early days in my web projects as a noob, typically I would create repositories like one named frontend, another one named server, separately maintained and git-versioned. In the real world two simple sub-repositories may not cover many of those complicated scenarios. Think about those lovely UI components you would like to pet and spread, and those clever utils/middlewares you want to extract and share.

frontend # a standalone repo
├── scripts
├── components
│   ├── some-lovely-ui
│   └── ...
├── index.html
└── ...

server # a standalone repo
├── utils
│   ├── some-mighty-util
│   └── ...
├── middlewares
│   ├── some-clever-middleware
│   └── ...
├── router.js
├── app.js
├── package.json
└── ...

The noob structure

Yes, we must protect our innovative ideas, by creating a few more standalone repositories, which should turn the whole project into a booming repo-society.

webapp # standalone
├── node_modules
├── package.json
├── .gitignore
├── .git
├── dotenvs
├── some-shell-script
├── some-lint-config
├── some-lang-config
├── some-ci-config
├── some-bundler-config
└── ...

server # standalone as it was
├── node_modules
├── package.json
├── .gitignore
├── .git
├── dotenvs
├── same-old-confs
└── ...

whateverapp # say, an electron-app
├── same-old-js # a standalone javascript-domain repo, again
└── ...

some-lovely-ui # need to be independently bootstraped and managed
├── same-old-setup
└── ...

some-mighty-util # share almost identical structure
├── same-old-structure
└── ...

some-clever-middleware # inherit absolute pain
├── same-old-pain
└── ...

The real world?

With the help of the link command provided by yarn (-link) or npm (-link), you can easily try development features across projects and packages on the fly. Say, if you are developing Project A and Package B simultaneously but as separate repos, and want use B as a dependency of A. Perform yarn link under Package B and yarn link B under Project A, you will find a symlink folder like path/to/A/node_modules/B(same result when you perform a yarn add) which links to your still-active local Package B. Details beyond scope, dig for yourself.

So far so good, until then you quickly find yourself annoyed by what everybody tends to get rid of: Repository Bootstrapping, during which, if you care about maintainability and consistency, almost identical configurations have to be set for version control, dependency control, bundling, linting, CI, etc. meanwhile almost identical solutions have to be make to avoid madness, one of the baddest villains for example: The 'node_modules' 🕳️.

The Silver Lining

While dirty jobs must not be avoided, there is still a silver lining here—dirty jobs done once and for all, at least to get rid of the duplicated painfulness.

The approach is simple. Step zero, since all the repositories we've built are meant to serve the same big blueprint, joining them into one single repository sounds just modern and intuitive.

the [project] root
├── apps
│   ├── webapp
│   ├── server
│   ├── some-lovely-ui
│   ├── some-mighty-util
│   └── ...
└── ...

The what?

Such approach, looks like a history rewind. As I've not-very-deeply learned, many ancient projects in corporations used to be structured in a monolithic way, but gradually suffer from maintenance and collaboration problems. Wait, still?

What is the confusion? What is our goal by putting things together? Our wish:

Being saved from redundant jobs.
Promote code consistency
Version control made easy
Best practices possible for all sub projects.

MANAGEABILITY, I think.

Manageability Up

The [project] root
├── apps
│   ├── webapp
│   │   ├── package.json # sub-project manifests and deps
│   │   ├── lint-conifgs # sub-project-wide lint, can extend or override global confs
│   │   ├── lang-configs # sub-project-wide, can extend or override global confs
│   │   ├── bundler-configs # sub-project-wide
│   │   ├── README.md
│   │   └── ...
│   ├── server
│   │   ├── package.json # sub-project manifests and deps
│   │   ├── sub-project-level-confs
│   │   └── ...
│   ├── some-lovely-ui
│   │   ├── sub-project-level-stuff
│   │   └── ...
│   ├── some-clever-middleware
│   │   └── ...
│   └── ...
├── package.json # global manifests, deps, resolutions, root-only deps (husky for instance)
├── .gitignore # git once for all
├── .git # git once for all
├── dotenvs # dotenvs for all
├── shell-scripts # maintainance for all
├── lint-configs # lint for all
├── lang-configs # helpers for all
├── ci-configs # publish made handy
├── bundler-configs # bundler for all
└── ...

The advanced structure

Here we've introduced several familiar faces into the root of the project directory, they are manifests or config files once only dwelled in each sub-project. This made these configs effect project-wide, allowing a baseline to be set and shared among all sub-projects, aka code consistency. A sub project may still hold its private-scope configs to override or extend the global standard—all thanks to the inheritance-like feature in most dev toolchains—if a variation has to be made, in many cases.

The model must be flexible. Take code linting for instance, to save lives, almost every well maintained framework, both frontend and backend included, has their own boilerplates and cli tools to the rescue, which usually have linting covered. Though many of them use eslint and promote the JavaScript Standard Style, there are ts / js variations, there are still tslint advocates, there are complexity between lint plugins, and for sure there can be strict-or-not conventions in your company. Wise choices have to be made on your own responsibility.

Bravo?

Let's now bravely call our project a monorepo already! By the name we infer (?) that this is basically a project with all its ingredient parts in a single / monophonic repository. Meanwhile the ability of serving a project-wide but extendable development standard is made possible.

Manageability achieved! Now who be the manager?

Sir, We have a problem！

The installing process for a JS project is never satisfying. It creates fat and tricky node_modules. Multiple projects in one?

🍭 Not human-life-saving: I have to cd and perform yarn add per sub-project folder.

🔋 Not battery-life-saving: A sub-project's deps are installed under its own directory. To the global scale, heavy loads of duplications are produced and will keep expand.
Cleverer ideas and methods needed for handling sub-project versions, and cross-d relations.

Introducing Lerna

In action, I didn't come across like having headaches resolving package dependencies, went search and found a solution. I've heard of lerna in the first place, found it handy while performing semver bumps, and then got to know monorepo goodness.

As described on its website, lerna is a tool for managing JavaScript projects with multiple packages.

A lerna init command creates a new (or updgrade an existing project into a) lerna project, which typically structures like:

root
├── lerna.json
├── package.json
├── node_modules
└── packages
    ├── packageA
    │   ├── node_modules
    │   ├── package.json
    │   └── ...
    ├── packageB
    │   ├── node_modules
    │   ├── package.json
    │   └── ...
    └── ...

Looks like pretty much a lerna.json file introduced into our previous mono-structure. The file is the config file for your globally npm-installed or yarn-added lerna command line tool, a project-wide lerna should also be automatically added to root/package.json/devDependencies.

A minimal effective lerna config be like:

// [project/root]/lerna.json

{
    "packages": ["packages/*"],
    "version": "independent",
    "npmClient": "yarn" // or npm, pnpm?
    // ...

}

The packages entry is a glob list that matches the locations of sub-projects, for instance, "["clients/*", "services/*", "hero"] should make valid sub-projects (having a valid package.json) directly located under clients and services, as well of the exact hero project which located under the root, recognized as lerna packages.

The version entry, if given a valid semver string, all packages should always share the same version number. "independent" means packages have different versions in parallel.

Useful Commands

lerna bootstrap (once, from any location, project wide):

🍭 Install dependencies for every single package (sub-project only, root dependencies not included), no per directory by-hand installs.

🔋 With a --hoist flag, can resolve duplication of common dependencies.

⚔️ Link cross dependencies, same results (see lerna add and lerna link) as performing yarn links per package
lerna clean: Remove installs (purge the node_modules folder) from every package (root excepted)
lerna version and lerna publish as lerna's selling point:

BETTER READ THE DOCS FOR THIS SECTION BY YOURSELF

You must be smart if you use conventional commits in your repo at the same time, it gives you much more advantages.

Use Conventional Commits

A repo who follows the Conventional Commits has its commit messages structured as follows:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

We have husky the git hooks manager, as well as commitizen the commit util. They create interactive prompts as you git commit, simplifying the making of commit messages. Dig it for your self.

Informations provided in a conventional commit message correlate with the Semantic Versioning spec very well. Typically, given that a full semver number can be MAJOR.MINOR.PATCH-PRERELEASE:

As a possible value of the type section, a fix commit should stand for a PATCH semver bump,.
A feat commit stands for a MINOR bump.
The BREAKING CHANGE optional footer stands for a MAJOR bump.

This makes it easier to write automated tools on top of.

Meanwhile with lerna, an illustrational workflow on conventional version bump

Current package versions (independently versioned)
- Package A@1.0.5
- Package B@1.5.6
- Package C@2.0.5
- Package D@1.0.0
Make some updates
- A MAJOR level performance updates on Package A, with perf(package-a)!: bump electron version as the commit message.
- A MINOR level feature updates on Package B, with a feat(package-b): add folder draggability commit message.
- A PATCH level fix on Package C, with a fix(package-c/error-interception): fix type defs.
- No modifications on Package D.
Perform lerna version with the --conventional-commits flag, the process and the results
1. Read current versions from the package.jsons.
2. Read from git history (and actual code changes), determine what commit was made in what package.
3. Resolve commit messages, generate corresponding version bumps.
4. Once get confirmed, will:
  - Modify package.json/versions.
  - Create a git commit as well as new version tags (the message format can be configued in lerna.json).
  - Push to remote.
New versions
- Package A@2.0.0
- Package B@1.6.6
- Package C@2.0.6
- Package D@1.0.0

You should read the docs for prerelease bumps and more capabilities utilizing lerna.

Introducing Yarn Workspaces

Using lerna to handle package installs, though is applicable, is not a very good idea. Especially when you are having root-only dependencies, and when you are using Yarn (the classic version).

This article only focuses on yarn 1.x only. Having introduced the currently-not-very-widely-supported but advanced Plug'n'Play feature, Yarn 2.x has had very significant diverges both conceptually and functionally. It's even incubating its own release-workflow, fyi.

Hoist in Lerna

says this official blog from yarn, which also introduced yarn workspaces and its relationship with Lerna

With the above said, I don't really remember since which version, to solve duplicated installation problem, Lerna does provide a --hoist flag while it bootstraps.

root
├── package.json # deps: lerna
├── node_modules
│   ├── typescript @4.0.0 # HOISTED because of being a common dep
│   ├── lodash ^4.17.10 # HOISTED because of being a common dep
│   ├── lerna # root only
│   └── ...
├── package A
│   ├── package.json # deps: typescript @4.0.0, lodash ^4.17.10
│   ├── node_modules
│   │   ├── .bin
│   │   │   ├── tsc # still got a tsc executable in its own scope
│   │   │   └── ...
│   │   └── ... # typescript and lodash are HOISTED, won't be installed here
│   └── ...
├── package B
│   ├── package.json # dpes: typescript @4.0.0, lodash ^4.17.10
│   ├── node_modules
│   │   ├── .bin
│   │   │   ├── tsc # still got a tsc executable in its own scope
│   │   │   └── ...
│   │   └── ... # typescript and lodash are HOISTED, won't be installed here
│   └── ...
├── package C
│   ├── package.json # dpes: lodash ^4.17.20, wattf @1.0.0
│   ├── node_modules
│   │   ├── .bin
│   │   │   ├── wtfdotsh # got an executable from wattf
│   │   │   └── ...
│   │   ├── lodash ^4.17.20 # only package C asks for this version of lodash
│   │   ├── watf @1.0.0 # package C's private treasure
│   │   └── ...
│   └── ...
└── ...

which means that common dependencies around the repo should get recognized and installed only once into the project/root/node_modules, while the binary executable of each (if it has one) should still be accessible per package/dir/node_modules/.bin, as required by package scripts.

However, still, this absolutely very positive feature is only available during lerna bootstrap, while in most common cases we are installing new packages during development, using a package manager.

Plus, Lerna knows the disadvantages with hoisting, and it doesn't have a way to solve it.

So far with Lerna:

🔭 Good for managing "macro"-scopic packages.

🔬 Bad at resolving microscopic dependencies.

Easy-to-break package symlinks.
None-desirable overhead control.

Nohoist in Yarn

Finally we welcome Yarn Workspaces on stage. And she comes with such a duty:

She has Hoisting as her key feature.
She knows the caveats of hoisting as well, and provides a —no-hoist option (very helpful, PLEASE DO READ THIS).

Its even easier to call her number, by modifying your existing repo/root/package.json.

[root]/package.json
{
  "private": true,
    // pretty familliar setup like Lerna
  "workspaces": ["workspace-a", "workspace-b", "services/*"]
}

This turn a repo into workspaces

Now, instead of lerna bootstrap, calling yarn [install/add] anywhere in the repo and anytime during dev, hoisting will be applied (honestly, more time consuming, but tolerable by all means).

What about nohoisting? Sometimes you don't want some package / workspace having some of there deps installed globally even though they share common versions. It's as simple as adding yet another entry with glob patterns.

[root]/package.json
{
  "private": true,
  "workspaces": {
        // this even more like Lerna
        "packages": ["workspace-a", "workspace-b", "services/*"],
        // exceptions here, globs
      "nohoist": ["**/react-native", "**/react-native/**"]
    }
}

DETAILS? AGAIN, PLEASE DO READ THIS FINE BLOG FROM YARN.

Friendship

Its easy to notice similarities in the way Lerna and Yarn manifest a monorepo. In fact the integration of both is encouraged by Yarn and programmatically supported in Lerna.

[root]/lerna.json
{
  "npmClient": "yarn",
  "useWorkspaces": true
    // ...
}

This join hands together

The above useWorkspaces, once set to true, we get Lerna to read package / workspace globs from package.json instead.

Our original goal

[x] A manageable monorepo
- [x] Package / Workspace versioning made easy
- [x] Low level dependency well controlled

Not an Intruder - Git Submodules

In my actual dev experience, I'd ran into scenarios as follows:

I have to pick some package out, cuz I want opensource it.
I am not satisfied with some certain dependency, I'd better fork it and constantly modify and use it in action.

A none-perfect solution

With Git Submodules, we can leverage git as an external dependency management tool as well. In a nutshell, it made possible placing a package inside a big repo, while having its private scope git storage. Details of implementation, please read the above links and this github blog.

For a quick peek, see this sample project structure:

root
├── apps
│   ├── auth-web # a lerna package / yarn workspace
│   ├── electron-app # a lerna package / yarn workspace
│   └── ...
├── nest-services # a lerna package / yarn workspace
├── submodules
│   ├── awesome-plugin # MUST NOT be a lerna package / yarn workspace
│   │   ├── node_modules # deps manually installed
│   │   ├── package.json # nohoist anything
│   │   ├── .git # havs its own git history with its own remote origin
│   ├── some-framework-adapter # MUST NOT be a lerna package / yarn workspace
│   │   ├── .tsconfig.json # private configs
│   │   ├── .ci-conf # SHOULD have its own CI config
│   │   ├── .eslintrc # MAY break code consistency.
│   │   ├── .git
│   │   └── ...
│   └── ...
├── package.json
├── lerna.json
├── .gitmodules # the config for submodules
├── .git # project git history
└── ...

And this config:

# [root]/.gitmodules

[submodule "submodules/awesome-plugin"]
    path = submodules/awesome-plugin
    url = https://github.com/awesome-plugin
[submodule "submodules/some-framework-adapter"]
    path = submodules/some-framework-adapter
    url = https://private.gitlab.com/some-framework-adapter

Caveats:

The implementation is tricky.
Its recommended that a submodule should not be a Lerna package / workspace, meaning we should regard it as a completely standalone project, perform everything respectively.
Can possibly break the code consistency.

USE WITH CAUTION.

Conclusion - your own responsibility

As I've been sticking with the Lerna-Yarn-Workspaces scheme for a while, questionmarks constantly emerge. Here are some notes of mine.

Git commits must be strictly governed, or they could easily end up a mess. For instance, you should always avoid blending changes in various packages into one commit.
Handle dependencies carefully. I've made mistakes while I was dealing with multiple Nestjs projects. Nest with the help of its CLI tool has its own monorepo mode. I radically tried to merge the Nest monorepo into the Lerna-Yarn-Workspaces one. So I moved all nest-ly common deps (say: express, typescript, prettier plugins) to the project root, make every nest workspace a yarn workspace. This ended up with warnings everywhere, breaking the overall ecosystem. Turns out I had to leave nest inside its own playground and find back inner peace.

I've also investigated the Rushstack a bit, another monorepo implementation from Microsoft. It works best with pnpm and has many conceptual differences from Lerna. For me the most significant is it doesn't encourage root package.json, and they have their ideas on husky and pre-commit git hooks. Moreover its configs are somehow complicated, should be suitable for LARGE monorepos, in things like even detailed file permissions, I think.

I still use Lerna and Yarn for my own convenience and simplicity. And now the final question: Should I always PUT EVERYTHING IN, company-wide for example, like what some big firms does; Or should I be cool, do it project by project; or even completely avoid this approach？

The answer? Maintaining monorepos isn't easy, weigh pros and cons on your own responsibility.