Poorly managed packages considered harmful

Introduction

Like so many articles entitled "... considered harmful" this one is intended to argue/suggest that undertaking a certain course of action or employing a particular technique, might be detrimental to application development. The title pays homage to the seminal 1968 paper by Edsger W. Dijkstra called "Go To Statement Considered Harmful".

The paper was first published in Communications of the ACM (Association for Computing Machinery), Vol. 11, No. 3, and described the primary feature programming languages of the era used for branching and its shortcomings. Few nowadays would argue trying to finding alternative approaches to GoTo's, such as subroutines, procedures and functions, was a poor steer.

"On the perils of packages"

Free does not necessarily mean without cost. Many packages are provided to the community free of charge, which is a very compelling reason for using them (e.g. React JS). But there are variety of ways adopting 3rd-party code can come with an unexpected price tag.

Why do we use them

A module-based architecture is a proven strategy for the construction of robust systems. An application comprising of many decoupled components is easier to test, maintain and evolve. As a consequence many programming languages, including JavaScript, have syntax to define modules and/or components.

But modules are just the "tip of the proverbial iceberg"; sharing modules requires a little more. In order to publish and integrate 3rd-party modules they often need additional wrapping to enable features such as version control, dependency management and registration/discovery. Packages and modules have become an essential building block in the construction of modern applications; almost irrespective of the programming language or technology stack. It can be considered an extreme form of the DRY (Don't Repeat Yourself) principle.

In Java and Python they are simply known as packages but in .Net they are often called NuGets, in Rust they are "Crates" and in JavaScript "Node/NPM Packages" (as an extension of the language.) But whatever the technology, unless care is taken, using any 3rd-party code can be hazardous, with or without they being any malice or an ulterior motive in mind.

In deed, the idea of writing everything from scratch and not make use of the work and expertise of others is preposterous and in many cases foolhardy; except in very extreme circumstances.

What could possibly go wrong

I am sure other technology stacks have their own 'dirty laundry', none is exempt I am sure, but JS and Node is the domain I know best. I have chosen the cases cited below not because they were particularly bad but because they were well publicised at the time and represent a variety of motivations and consequences. I will not go into the details but have provided links to articles that discuss each case in more detail.

January 2022: Colors.js and faker.js by Marak Squires bleepingcomputer, revenera abrupt tool withdrawal and corruption of own repository.
November 2018: flatmap-stream in Event-stream surreptitious harvesting of authentication information.
March 2016: leftpad by Azer Koçulu sudden withdrawal of a low-level package impacting frameworks.

In all three cases there was a significant but largely recoverable impact on the industry but only in the second case was there a suggestion the author deliberately set out to cause disruption for personal gain. In the other two cases the issue can about through the withdrawal of software developed by the package owners. It can be argued that in the last example (leftpad) the industry response was ultimately to enhance the ECMA Script specification - a positive outcome.

What the examples demonstrate is nothing really comes for free, even packages freely given to the community. Incorporating 3rd-party code will always come with a risk. In the best case the foreign package might introduce a vulnerability to the application architecture/longevity. At worst the foreign package could expose the application to a vector for attack.

What are the problems

Interdependency

A common feature of many package discovery systems is there inter-dependency. Most packages utilise others, that are built on others. "Dependencies all the way down" you could say. The consequence being that incorporating a 3rd-party package is seldom the end of the story. You are also taking on the packages that are not listed as direct dependencies. Indirect dependencies can be hi-jacked, corrupted or, as highlighted above, removed from circulation.

Publication hi-jacking

In the case of the NPM registry (I am sure there are other examples), anyone can publish a package and get it listed in searches. But what is worse is that accounts can be hi-jacked unless the developer protects their account.

Unchecked adoption

Software Engineers are inherently lazy and that is a good thing. We are continually looking for a quicker/cheaper way to deliver features in tight deadlines means we can be too quick to adopt 3rd-party code. This can be hazardous if insufficient research is conducted but that costs time and therefor money.

Growing dependency

Third-party packages are making up an ever increasing proportion of the application eco-system including software modules built into the source code, libraries and frameworks (dependencies) and tools & plugins (dev dependencies). As our reliance on such packages increases so does our risk, if not managed properly.

How can we protect ourselves

Before taking a package into a project first several options should be sought and an assessment be made through asking the following technical and legal/commercial questions.

12 Questions the technical leadership should ask

Does the package offer all the required functionality?
Does the original motivation for the package align with the project requirement?
Does the package meet the project's quality assurance needs (are there unit tests)?
Does the package align/support the project's accessibility/internationalisation needs?
Does the package respect SemVer (Semantic versioning) and is it mature?
Is the project an original source or is it a fork/clone?
What is the type of custodian of the project repo?
- Sole author - beware
- Company backed - beware but less risk
- Community - best option
Should the project take packages (and updates) directly from the public registry NPM or should a private intermediary be employed? Options include verdaccio, nexus), etc.
Is the project documentation well maintained and informative and is the a development roadmap?
Is there a Contributor Code of Conduct and is it appropriate?
Are there outstanding issues and Pull Requests or is the project actively maintained?
Is the project community proactive and supportive?

3 Questions the project leadership should ask

Is the type of licence the project employs clearly defined?
Are the terms and conditions of the licence compatible with the project, the company and the end-user/customer?
What obligations does the licence place on the project?

I am sure there are more questions. If you think of any please let me know via the comments section below.

#1 bleepingcomputer - Dev corrupts NPM libs colors and faker breaking thousands of apps
#2 revenera - The story behind colors and faker JS
#3 NPM JS - Details about the event-stream incident
#4 The Register - NPM leftpad Chaos