Scaling Sidecars to Zero in Kubernetes
Matt Butcher
Posted on June 12, 2024
The sidecar pattern in Kubernetes describes a single pod containing a container in which a main app sits. A helper container (the sidecar) is deployed alongside a main app container within the same pod. This pattern allows each container to focus on a single aspect of the overall functionality, improving the maintainability and scalability of apps deployed in Kubernetes environments. From gathering metrics to connecting to data sources (a la Dapr), sidecars have found a notable place in the cloud-native developer’s toolbox. Sidecars are designed to run alongside your apps continuously and do not scale down to zero. Wouldn't it be great if they did? In this article, we introduce scaling sidecars to zero in Kubernetes.
Zero Cost Sidecars in Kubernetes
WebAssembly (Wasm) and containers will peacefully co-exist and be complementary technologies. While containers offer an efficient way to package entire apps with dependencies, Wasm provides a lightweight, secure, and fast-executing environment that can help scale apps, making serverless Wasm workloads the ideal partner for long-running container apps. In fact, Solomon Hykes (founder of Docker) said this five years ago
Before we explore one such example of Wasm and containers, a word about efficiency.
Maximizing Efficiency With the Sidecar Pattern
A common criticism of the sidecar pattern is its inefficiency. The underlying problem with sidecars is that the sidecar containers must remain operational throughout the lifespan of the main app, leading to potential resource wastage. Consider an app with three sidecars (so four total containers). A typical deployment sets the replica count to 3. So deploying a single app results in 12 long-running containers — three replicas of each of the 4 containers. All 12 of those processes consume CPU and memory all the time.
With SpinKube, there’s a cool way to enjoy all of the benefits of a sidecar without the resource consumption.
Wasm apps written using Spin apps follow a design pattern called serverless functions in which the app is started once when a request comes in. The app handles the request and then shuts down. If four requests come in simultaneously, then four copies of the app are started. When zero requests come in, no copies of the app are running. They are, in this sense, “zero cost”.
Spin apps, based on their Wasm underpinnings, are also lightweight. They cold-start in under a millisecond (as opposed to the dozens of seconds it takes a container to start up). They consume fewer resources at runtime, using around 2-5M of memory and minuscule amounts of CPU. And because they only run while processing a request, most live no more than a few hundred milliseconds.
Spin apps are a great candidate technology for implementing sidecars. And SpinKube makes it possible.
Spin Apps as Sidecars
First, talking about how an app and its sidecars are connected is good. We’ll take a trivial scenario from Dapr. In that ecosystem, a main process uses HTTP or gRPC to communicate with its sidecars. You can almost think about it as the microservice architecture applied to a Kubernetes pod. Say we have an example with one app querying a sidecar service for an HTTP response. In this scenario, the main app periodically needs to perform an HTTP request to the other service. Both are long-running servers, but the main app creates an HTTP client that talks to the sidecar’s HTTP server. That sidecar HTTP server does its internal routing and executes the correct business logic to handle the request.
With a Spin app, there is no reason for the sidecar to need a long-running process. After all, it only needs to handle requests from one other client: the main app. This is a perfect candidate for a Spin app.
When this app is deployed, the main app (in a container) is run in the container runtime, and it executes as a server, always on, always listening.
The Spin app sidecar is deployed in the same pod as the container app, but it is scheduled onto a Spin runtime instead of a container runtime.
The Spin app is deployed, but it is not (properly speaking) running.
When a new request comes into the main app, its HTTP server receives and handles the request. That main app contacts the sidecar at some point over a local HTTP request. When the request to open the network connection happens, the containerd Spin shim (the thing in containerd that handles Spin app invocation) starts a new instance of the Spin app to handle the request object it received from the main app. The new instance of the Spin app then runs to completion, returns a response object, and shuts down.
The important thing to note here is that the Spin app only runs when handling the request. After that, all the resources it uses, including CPU and memory, are freed up again.
Running 4 or 12 of these sidecars per pod can be done efficiently. In fact, it’s preferable to run all of those sidecars in the same Spin instance, meaning they share their resource allocations even more efficiently. In theory, one could run over 1,000 sidecars per main app, but it’s unlikely that there’s a practical use case where this is the best design.
Creating Our Spin App Sidecar
We, begin by using the Spin template - in this case the (Rust HTTP template)[https://github.com/fermyon/spin-rust-sdk] to get us started:
cd $HOME
spin new -t http-rust --accept-defaults spin-app-sidecar
cd spin-app-sidecar
We then add some business logic to the sidecar. In this case, telling the main-container-server app what the current time is:
use spin_sdk::http::{IntoResponse, Request, Response};
use spin_sdk::http_component;
use chrono::Local;
/// A simple Spin HTTP component.
#[http_component]
fn handle_spin_app_sidecar(_req: Request) -> anyhow::Result<impl IntoResponse> {
Ok(Response::builder()
.status(200)
.header("content-type", "text/plain")
.body(Local::now().format("%Y-%m-%d %H:%M:%S").to_string())
.build())
}
As you can see above, we are using the chrono
library to obtain and help with formatting the time. To resolve dependencies, run the following command:
cargo add chrono
Our spin-app-sidecar app is now ready to build and push.
We will now use a GitHub Personal Access Token (via the GH_PAT
and GH_USER
variables in our CLI) to push the app to a registry. First, we generate a GitHub Personal Access Token and set write:packages
in your our GitHub user interface. We then set the GH_PAT
and GH_USER
variables in our CLI and then push the app to a registry:
# Store PAT and GitHub username as environment variables
export GH_PAT=YOUR_TOKEN
export GH_USER=YOUR_GITHUB_USERNAME
# Authenticate spin CLI with GHCR
echo $GH_PAT | spin registry login ghcr.io -u $GH_USER --password-stdin
# Push container server app to the registry
spin registry push --build ghcr.io/$GH_USER/dapr-integration/spin-app-sidecar:1.0.1
We can now use spin kube scaffold
to generate a .yaml
file based on the deployed app:
spin kube scaffold --from ghcr.io/$GH_USER/spin-app-sidecar:1.0.1 \
--out spin-app-sidecar.yaml
The above command will create a spin-app-sidecar.yaml
file with the following contents (note, we have replaced the static username with the $GH_USER
variable here for your convenience):
apiVersion: core.spinoperator.dev/v1alpha1
kind: SpinApp
metadata:
name: spin-app-sidecar
spec:
image: "ghcr.io/$GH_USER/spin-app-sidecar:1.0.1"
executor: containerd-shim-spin
replicas: 2
Deploy the app using the .yaml
file:
kubectl apply -f spin-app-sidecar.yaml
Scheduler Overhead
So far we’ve seen why Spin apps make excellent sidecars. We’ve stayed at a fairly high level. But we should be aware of what happens under the hood. While the apps themselves take no CPU or memory, containerd has to do a little more work, and it does this using a low-level Spin shim.
The Spin shim listens for inbound requests for a Spin app and then starts the relevant serverless function to handle the request. Of course, this requires a small amount of memory deep in the Kubernetes stack, but it is still lighter than the work containerd must do to start a container.
The situation is different in Fermyon Platform for Kubernetes, in which Wasm is not scheduled through containerd, and one process per node handles thousands upon thousands of Wasm apps.
But again, even up to a thousand Spin sidecars can be scheduled using fewer resources than one container-based sidecar.
Running More Apps in Your Cluster
Thanks to SpinKube’s ability to run Spin apps side-by-side with containers, Spin apps can be used in exciting ways. Here, we’ve taken a fresh look at the sidecar pattern that is popular with service meshes and distributed API services like Dapr. And what we’ve seen is that Spin apps make an excellent alternative to older containerized sidecars. Spin apps are faster and more efficient, meaning you can run more apps in your cluster. This translates not only to increased density but also to smaller clusters and saved money.
Local Service Chaining
Spin's local service chaining functionality allows developers to write applications as a network of chained microservices, whereby a "self-request" (an internal component request) can be passed in memory without ever leaving the Spin host process. Although it may limit how deployments can be arranged, local service chaining is highly efficient, depending on the nature of the microservices. It's important to highlight this as a viable strategy for enhancing the integration of helper workloads alongside long-running apps.
Conclusion 🎉
These new approaches to orchestration can minimize CPU and memory usage, ultimately allowing for higher app density per cluster and significant cost savings. In addition, more efficient operations and reduced startup times equate to faster machine-to-machine communication and improved end-user experience.
You can get started building a Spin app over at the QuickStart guide or learn more about the other things you can do with Spin and Kubernetes over at the SpinKube site. Drop a comment if you have suggestions about patterns around sidecars in Kubernetes!
Posted on June 12, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.