Dan
Posted on February 7, 2024
Apple’s newly open-sourced config language Pkl has been making waves on HackerNews, and the responses seem split between “This is a breath of fresh air” and “Why do we need another config language?”.
Kurtosis has forced me to think long and hard about configuration, so in this article I’ll provide a framework for evaluating configuration languages, explain why configuration languages face a difficult tradeoff, and make a prediction on Pkl’s trajectory.
What even is configuration?
This seems like a silly question, since we as an industry generally grok what config is: it’s the thing that controls how a piece of software behaves. It can be expressed in various ways — perhaps as an object in your favorite language, perhaps as flags or environment variables, perhaps as a YAML file — but in all cases config represents the top-level parameters of the piece of software.
However, config is interesting because it sits at the seam between the program being configured and a user (either a human, or another program). Configuration is an API.
The constraints on configuration
Because it sits at the seams between two systems, configuration is subject to the following pressures:
Approachability pressure: configuration must be walk-up approachable in a way that the program being configured need not be. It’s fine to write your entire program in Brainfuck if your configuration is understandable and well-documented YAML. Minimalism, readability, and documentation are important here.
Integration pressure: configuration lives at the seams between two systems, and it’s very common for engineers of the outer system to automate configuring the inner pieces. Configuration that can’t easily be automated is problematic.
Maintainability pressure: the users of the configuration need to evolve it over time, especially if they have an automation system built on top. Configuration reuse (DRY) is an important tactic for preventing bugs and increasing maintainability, so higher-order logic like conditions, loops, and packaging systems become necessary somewhere.
NOTE: configuration is also sometimes subjected to validity pressure, under the idea that config shouldn’t even be usable unless it’s valid. More on this later.
Configuration languages today
Let’s take a brief tour of some popular configuration languages and see how they deal with these pressures:
YAML — the industry standard — is quite approachable, with its comments and whitespace-delineated blocks and insistence on key name duplication. However, it has very limited configuration reuse capability and lacks conditionals or loops, so its maintainability is low. Despite this, its ability to be integrated into a broader system is very high: every general-purpose language has YAML parsers and generators, and templating is relatively easy.
TOML takes a similar “it’s just data” approach to configuration as YAML, and has similar high-approachability, easy-to-integrate, low-maintainability characteristics to YAML (though with the asterisk that it’s not as well-known and therefore has a larger learning curve).
Jsonnet is more maintainable than YAML because it provides code reuse features like variables, references, conditionals, and imports. However, its custom syntax on top of the verbose JSON syntax reduces its approachability, and the lack of Jsonnet generators reduce its ease-of-integration (which might explain its lack of industry dominance).
Dhall plays in the same “JSON with logic” space as Jsonnet, but adds a typing system as well. This means that the language itself enforces configuration validity, which can be helpful in shortening feedback loops. Like Jsonnet though, the extra logic with custom syntax comes at the cost of approachability and ease-of-integration.
Cue also focuses on code reuse features using a custom JSON-like syntax similar to Dhall and Jsonnet, but goes even further by baking data validation into the value system. This leads to very strong validation, but with similar approachability and integration problems as Dhall and Jsonnet. In particular, Cue’s matrix model — while powerful — is hard to grok quickly.
Starlark is Bazel’s configuration language, and has high approachability as a minimal subset of Python. However, users are free to write quite complex logic so its maintainability suffers, and Starlark generation is difficult so its ease-of-integration is low.
Analysis
The projects above are sorted by descending star count on the project repo. Assuming star count is a rough proxy for popularity, this suggests some interesting conclusions:
Approachability and ease-of-integration tend to move in tandem
Approachability and maintainability tend to move opposite each other
The two most-approachable, least-maintainable languages are by far the most popular
In other words, people prefer to have configuration that’s difficult to maintain so long as they can easily understand what it’s doing, and integrate it with a higher-level system to flex the configuration as needed.
This makes sense to me, as configuration’s primary purpose is to hold parameters for the lower-level system. Secondary features like code reuse aren’t always worth their complexity weight, because code reuse can be done in a higher-level configuration automation system.
This is especially true for validation. In theory, it sounds nice to disallow invalid configurations from being instantiated, but in practice doing validation in the configuration language often means the author of the program-to-configure needs to duplicate validation: once in the program to validate the flags and environment variables their program inevitably accepts, and once more in the configuration file schema.
There is, however, an environment where the calculus changes: a large company with a powerful internal platform and a standardized configuration language. In such an environment, a more powerful top-level configuration language is worth the approachability hit to empower engineers to reuse configuration from a large galaxy of existing configuration.
It makes sense to me that Jsonnet, Cue, and Starlark were all born of Google’s configuration needs, and that Pkl might be born of Apple’s.
The Goldilocks problem, and Pkl
We’ve seen that configuration languages face a Goldilocks problem — too little complexity and they struggle to keep code DRY and maintainable, too much complexity and they’re difficult to learn and integrate outside of corporate environments. The balance has to be just right.
So where does Pkl fit in?
Approachability: Pkl uses a completely custom syntax, which is a big barrier to adoption, and the system itself seems decently complex. I’d rate the approachability as low.
Integration: This is a wildcard for me, as it’s not clear to me if I can import third-party schemas, generate bindings in my favorite general-purpose language, and use that to build a configuration automation system on top.
Maintainability: Pkl’s embrace of schemas reflects configuration’s purpose as an API, and much of Pkl’s feature set seems geared towards code reuse and typing. I imagine that Pkl code will be quite maintainable.
Validation: Pkl seems to prioritize validation highly; I’d anticipate Pkl config to be very safe.
This analysis suggests to me that Pkl is trying to fill a spot somewhat similar to Cue: a very safe, highly-reusable language that’s intended to be your top-level configuration language at large scale, but whose approachability and ease-of-integration can be constraining for smaller projects and companies.
I therefore hypothesize that, on the current trajectory, Pkl will probably see heavy adoption at Apple, but will remain niche outside. If the Pkl team is aiming for broader open-source adoption, I’d recommend smoothing the approachability and integration paths with the following tactical suggestions:
The ability for YAML and TOML to be parsed as basic Pkl files (thereby allowing me to write my config in a language I’m familiar with, but get Pkl schema validation)
Smaller example repos with clearer identification of the Pkl-specific parts so I can see the full cost of buying in
The ability to generate Pkl from my favorite general-purpose programming language, so I can build a configuration automation platform on top
Perhaps, features in the toolchain to put backpressure on writing less complex (and therefore more approachable) Pkl files
Afterword
Configuration languages fill an important role but face a difficult game of tradeoffs, and Pkl is no exception. The proliferation of tools attempting to be the top-of-stack configuration language reminds me of the famous XKCD comic about standards, and I’m becoming bearish on any non-YAML language as the open-source standard.
In the next article I’ll detail how we wrestled with these same dilemmas at Kurtosis, and the conclusion we came to. Until then, if you’re a microservice developer who doesn’t want to deal with infra, I encourage you to check us out on Github; you might find Kurtosis useful.
Posted on February 7, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.