Up to 20% of your application dependencies may be unmaintained

havocp

Havoc Pennington

Posted on April 10, 2019

Up to 20% of your application dependencies may be unmaintained

We recently added a new feature Tidelift subscribers can use to discover unmaintained dependencies. Repurposing this feature to gather some general-interest numbers, it appears that about 10-20% of commonly-in-use OSS packages aren't actively maintained... meaning not even one commit in the past year, and most issues and PRs filed in the last year are still open.

This number surprised us! We expected a lot of packages to be under-maintained, but we didn't expect so many with no activity at all.

We feel this is a systemic problem. Individual maintainers are not at fault.

We also feel it's a real problem that deserves to be addressed.

Why does maintenance matter, anyway?

For this blog post, we looked first at a sample JavaScript codebase (one of our own repositories that powers Tidelift).

When we deploy an application to production, it'll have code we wrote ourselves and a bunch of code from open source dependencies. In our sample repo, we wrote about 15% of the total lines of code, and open source developers wrote about 85%. My guess is that is a fairly common ratio.

In most organizations, nobody really knows what's in the 85% (especially in the aggregate across many repos), and nobody knows whether it's maintained, or who's maintaining it, or to what standard.

Will anyone handle security vulnerabilities? Is a package at risk of handoff to a malicious actor? Does the project have sound legal practices? Will the project keep up with an evolving ecosystem, or stick you in dependency hell when it can't be upgraded along with everything else? Will the project see any useful new features or track new specifications relevant to its function?

We know it's important to maintain our 15%, but for the 85% we often don't even track what it is, let alone have a plan to take care of it. In part this is because there has been no clear answer for how to take care of it; what would that plan be?

Teams may rationalize the problem to themselves: "well, we don't really know how to keep these dependencies maintained, but everyone else is using them, so it's probably fine." This is absolutely incorrect.

Our new feature to find unmaintained dependencies provides more evidence; we ran it on the dependencies of our own sample repo, and we have a lot to improve.

How we define "unmaintained"

For a first cut, we defined unmaintained very narrowly:

  • The project has no commits in the last year
  • If there were any issues or PRs in the last year, less than half of them were closed

Using this definition, 259 of 1285 (20.15%) dependencies of our sample repository are unmaintained.

Some of these projects are explicitly annotated as unmaintained in their README, but it's more common for them to be "ghosted" (activity ceased at some point but they haven't been marked unmaintained).

Note that we're talking about widely-used packages in a pretty normal app. In fact, "Hello, World" from any of the major single-page application frameworks looks similar to our sample repo, on this metric.

If we looked at the percent of all packages (that is: including unpopular ones), it's a fair guess that the number of unmaintained would be higher than 20%. Not to be alarmist, but 20% may be a best-case scenario—it's the number if you only use currently-popular stuff. Once you start to add packages from outside the "Hello, World" stack, or once time passes and you're on an older generation of packages… it presumably gets worse.

So far, we only flag packages unmaintained when they have a known-to-us GitHub repository (because we're using GitHub's API to collect metrics). If a package isn't on GitHub or we can't determine its repository, we assume it's maintained. This could be another source of conservatism in the numbers, especially for older ecosystems like Maven.

I know what you're going to say!

Whenever I see a thread about unmaintained packages or OSS metrics on the internet, people want to argue with the details… I sympathize, but many of these arguments are rationalizations that don't hold up. Here are three common responses to hearing about unmaintained packages:

  • The apparently-unmaintained packages are actually "done" so don't need maintenance
  • This is only a problem for npm/JavaScript
  • The automated metrics might be missing some key detail or corner case

All three of these are truthy: there's a little bit of reality to them, that is, they move the numbers a little, but they don't change the order-of-magnitude of the situation.

These packages aren't all done

Here's the thing. There's a big difference between "feature frozen" and "unmaintained." Even if a package is feature frozen, there's baseline work to be responsive if security issues are discovered, move to new versions of the ecosystem, or at least document that the project is feature frozen!

Consider some very different scenarios:

  1. The maintainer isn't responsive anymore—they moved on and never committed anything else, while issues are often ignored.
  2. The maintainer documented in the README that the package is unmaintained or deprecated, and suggests switching to something else.
  3. The maintainer documented "no new features will be accepted, but please let me know if there's a critical problem or a need for a new release and I'll take care of it."

Among the packages we're labeling unmaintained, scenario 1 is the most common by far, scenario 2 happens some, and scenario 3 is the least common—in fact it's hard to find examples. Scenario 3 is the one I'd call "done but still maintained."

Sindre Sorhus maintains over 1100 npm packages and supports 100+ of those on Tidelift; he had this to say:

Q: One comment we've heard frequently is that many JavaScript projects don't need a maintainer because they're "done." Do you have any thoughts on that philosophy?

A: That is only really true for small focused packages. Since they have a narrow scope, they can be done feature-wise. However, this argument misses all the other maintenance tasks required for a package; keeping it up to date with deprecated Node.js APIs, new better Node.js APIs, new syntax (which can improve readability and also sometimes be faster), dependency updates (especially important for reducing vulnerabilities in the dependency-tree), bug fixing, documentation improvements, issue triaging, adding TypeScript type definition, etc. In packages with a larger scope, you also need to handle feature requests and refining existing features. The work required for maintenance is often undervalued. I spend a lot of time on these tasks every day.

These packages aren't all trivial

We happen to count how many bytes of .js files each package contains—here are some numbers on that. For our sample repo, the unmaintained packages are smaller than the maintained ones, on average (11K lines of code vs. 40K). This means that though 20% of packages are unmaintained, the 20% represents less than 20% of the total code.

However, a lot of these unmaintained packages are big. I really want to link to specific projects but I decided against it because it's not the fault of these maintainers that the packages are languishing, and it seems harsh and a bit beside the point to put specific projects on the spot.

Without naming names, we are talking about parsers and other substantive libraries. Yes, there are also some short just-a-few-lines packages. That by no means changes the overall picture. To give you a sense of the spread, of 259 unmaintained packages, 91 contain under 1K in .js files, while 75 contain over 4K in .js files.

Keep in mind, for some of the risks of under-maintenance (like the event-stream-style risk or the need to keep up with the ecosystem), neither "done" nor "small size" necessarily matters. What matters is that someone's still around keeping an eye on things.

This isn't unique to the JavaScript ecosystem

Hearing these maintenance stats, some people respond with "lol JavaScript." Unless you've actually researched the dependencies you use (and recently, because maintainers ghost all the time), I'd recommend that you don't get too confident. Non-maintenance and under-maintenance are common across all of open source.

JavaScript does seem to have more fine-grained code reuse and more dispersed maintenance; that has both costs and benefits. Among other things, it makes it easier to analyze lack of maintenance (our approach can't detect unmaintained subdirectories of large projects, only unmaintained entire repositories).

Here are some of the things we've found so far:

  • "Hello World" for React, Angular, and Vue all have the same roughly-20%-unmaintained number we're seeing in our sample Tidelift codebase.
  • In a commonly-used-pypi-dependencies sample repository we created a while ago, it's 13% unmaintained packages.
  • a "Hello, World" Spring app seems to be about 8% unmaintained packages … but we only tag things unmaintained if we have metrics from GitHub, and some Maven packages lack metrics, so this number is low.
  • a "Hello, World" Rails 5.1 app has 8 of 70 unmaintained by the criteria we're using, while 5.2 has 10 of 79, so that's 11-12%.

In general, it appears we can identify more like 10% unmaintained in the non-JS ecosystems, vs. the 20% found in JS apps. A detail is that essentially all of the 10% are larger, substantive packages because these ecosystems don't have the tiny packages that are common on npm.

Also keep in mind that "Hello, World" JS apps are quite batteries-included, while some frameworks have no dependencies at all, by default. Real-world applications will pull in more dependencies (they won't stick to only the initial framework) so they may become more similar to the batteries-included JS apps.

Please don't read too much into the exact numbers above; we aren't pretending this is a scientific comparison. The point is that all ecosystems have maintainers who disappear and none of us should feel this is limited to JavaScript.

The packages are actually abandoned—it isn't a metrics artifact

We kept our definition of "unmaintained" simple and conservative, and still found 20% unmaintained in this sample JavaScript project.

We wondered about corner cases like "well, maybe the commits are on a weird branch," or "maybe none of the issues matter," so we went ahead and looked through these repositories manually. (In the Tidelift subscriber dashboard, there's a handy link to the repo for each unmaintained package we identify that makes it easier to do this.)

Short answer: the automated criteria are basically right. For the most part these repositories haven't been touched in ages. Most have been inactive for a lot longer than a year, and they tend to have numerous ignored issues.

We haven't tried to identify under-maintenance, yet, only full-on abandonment. People have proposed a lot of complex metrics for OSS projects. Since it looks like we have a lot of work cut out for us just to address completely inactive projects, there’s no need to get fancy right away.

20% is a potentially-conservative figure in two ways:

  1. If there's even a single commit in the past year, OR most issues were closed in the past year, we mark the package maintained.
  2. We're looking at the packages in a mainstream "Hello, World," not the entire universe of packages that exist.

For example, I have a seriously-under-maintained project out there with my own name on it. This package doesn't meet our initial criteria for unmaintained because I merge a PR occasionally, but it is 95% unmaintained—I don't have time. Issues and PRs are piling up and plenty of them deserve attention.

It's difficult to accurately flag projects like mine without pulling in more false positives or debatable judgment calls. The bad news from a state-of-open-source standpoint is that a conservative screen still flags a lot of packages and it's usually right to do so.

How can we improve this situation?

We believe the solution includes paying maintainers for value provided. Ghosted and under-maintained packages exist because people lack time and incentive to keep them maintained. It's hard to find a new maintainer for an abandoned package, and it's hard for users to port away from an abandoned package. The right solution is to enable maintainers to stick around, and give maintainers a reason to adopt orphaned packages.

Here's how we're trying to improve the open source maintenance problem at Tidelift:

  • We offer a monthly income to the maintainers of any package used by application development teams who purchase the Tidelift Subscription. The more Tidelift’s subscribers use a particular package, the more the maintainers of that package get paid.
  • Subscribers get a breakdown of packages they use which appear to be unmaintained, as well as a view into follow-on effects of under-maintenance, such as security and licensing issues.
  • For direct dependencies, we help subscribers find alternatives; when there's no alternative, there's also that monthly income we offer if the maintainer reappears—or if an alternative emerges.
  • For transitive dependencies (dependencies-of-dependencies), application development teams often can't do much themselves. So when we show subscribers security, maintenance, and licensing problems pulled in indirectly, we also show the same problems to our network of participating maintainers, and we ask those maintainers to fix problems their package pulls in. We're pushing problems down the stack to get the root cause addressed.
  • In addition to reporting historical problems, the fact that we're paying the maintainers of the stuff you use should cut down on the future "ghost rate" for your packages.

Every subscriber has a certain amount of "special snowflake," especially over time—quirky packages they're using, or older releases, or the like. By tracking their exact dependencies with Tidelift, each subscriber knows where they stand and that they're covered—because with Tidelift they are paying the maintainers to keep them covered.

A real solution to this problem needs to be comprehensive:

  • Tools so everyone can see where the problems are, and locate root causes.

  • Incentives so that we're paying maintainers to stick around and address problems, rather than generating a bunch of new requests for their unpaid labor.

This isn't hypothetical. We're paying maintainers today and helping subscribers improve their dependencies today.

Let us prove it

Curious which unmaintained packages you're using? Pick a sample repository (something you actually maintain and care about, ideally), send us the package manager files, and we'll email you a report with the 10 most critical unmaintained packages you're relying on.

Submit your files here.

💖 💪 🙅 🚩
havocp
Havoc Pennington

Posted on April 10, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related