Building a Cost-Effective Valheim Server on Azure with Serverless Discord Bot Integration
Rodolfo Albuquerque
Posted on November 23, 2024
In this blog post, I'll walk you through how I built a cost-effective Valheim game server on Azure, complete with a Discord bot that lets players start and stop the server using slash commands. The setup leverages Azure's serverless capabilities and spot instances to minimize costs while providing flexibility and scalability.
Intro
This GitHub repository contains the infrastructure code and Discord bot code for the Valheim game server. The primary goal is to create a server as cost-efficiently as possible by utilizing:
- Azure Virtual Machine Scale Sets (VMSS) with spot instances for compute.
- Azure File Share for persistent game server data.
- Azure Functions for event-driven automation through Discord slash commands.
Azure Functions were chosen for their cost-effectiveness, offering 1 million free executions per month. However, this choice introduces some complexities, which I'll discuss later.
Architecture
Interactions and Reactions
The system's interaction flow starts with Discord slash commands, which are handled by an HTTP-triggered Azure Function (interactions
). Discord requires a response within 3 seconds, so the API's only responsibility is to enqueue the command in an events queue and quickly respond.
Another queue-triggered Azure Function (reactions
) picks up the command, performs the requested task (e.g., starting the server), and reports back to Discord.
Here’s the sequence diagram for the start command:
Game Events
To enhance the experience, the solution monitors Valheim game server logs and reports events such as:
- Server availability for connections
- Player connections
- Player disconnections
This is achieved with a script configured with cloud-init.yml
that runs on the VM. The script listens to the container logs, extracts relevant log lines, and enqueues them in the events queue.
Here's how the flow looks:
Persisting State
To maintain the server state, I chose Azure Table Storage for its simplicity and cost-efficiency. The following attributes are persisted:
-
ip
(server IP address) -
online_players
(number of players currently online) -
status
(e.g., running, stopped)
Given Azure Functions can execute in parallel, I implemented optimistic concurrency control using ETags
. This ensures that if multiple events are processed simultaneously, only the first write succeeds, and retries handle the rest. Retries are built-in with message dequeue counter (5 max) and configured with a visibility timeout to allow time for state reconciliation.
Azure Function OS and Language Choice
One of the biggest challenges was ensuring the bot responded within Discord's 3-second timeout, even during cold starts.
- Initial Setup: I started with a Python bot on a Linux Azure Function. However, cold starts frequently caused timeouts.
- Switch to Go: I migrated to Go, known for its faster performance. Surprisingly, deploying the Go bot on a Windows Function App yielded significantly better cold start times compared to Linux.
To quantify this, I tested the following setups using a Postman monitor:
- Python on Linux
- Go on Linux
- Go on Windows
Here are the results:
Setup | Average (s) | P95 (s) | P99 (s) |
---|---|---|---|
Python | 10.43 | 12.02 | 13.35 |
Go (Linux) | 5.84 | 7.77 | 7.64 |
Go (Windows) | 1.07 | 1.70 | 1.85 |
This trial revealed that Go on Windows offered the best performance for this use case.
Possible Improvements
While the setup works well, there’s room for improvement:
Spot Instance Risks: Spot instances can be preempted, risking progress loss if the game server isn’t stopped gracefully. A solution involves monitoring Azure's scheduled events endpoint. Upon detecting a preemption event, the VM can:
- Stop the Valheim server (
docker stop valheim-server
) to send aSIGTERM
, triggering a world save. - Restart the server in a different zone or instance size.
Additional Features:
- Automating backups of the game world.
- Adding more granular state persistence, such as player-specific data.
Posted on November 23, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 23, 2024