MISP at scale on Kubernetes
Philippe Bürgisser
Posted on November 17, 2022
Introduction
The MISP platform is an OpenSource projet to collect, store, distribute and share Indicators of Compromise (IoC). The project is mostly written in PHP by about +200 contributors and currently has more than 4000 stars on GitHub.
The MISP’s core is composed of:
- Frontend: PHP based application offering both a UI and an API
- Background jobs: Waits for signals to download IOCs from feeds, index IOCs, etc
- Cron jobs: Triggering recurrent tasks such as IOCs downloads and update of internal information
It also relies on external resources like a SQL database, a Redis and an HTTP frontend proxying requests to PHP.
At KPMG-Egyde, we operate our own MISP platform and enrich its content based on our findings which we share with some of our customers. We wanted to provide a scalable, highly available and performant platform; this is why we decided to move the MISP from a traditional virtualized infrastructure to Kubernetes.
As per writing this article, there is no official guide on how to deploy and operate the MISP on Kubernetes. Containers and few Kubernetes manifests are available on GitHub. Despite the great efforts made to deploy the MISP on containers, the solutions isn’t made to scale -yet-.
State of the art
Traditional deployments
The MISP can be deployed on traditional Linux servers running either on-premises or cloud providers. The installation guide provides scripts to deploy the solution. External services such as the database are also part of the installation script.
Cloud images
The project misp-cloud is providing ready to use AWS AMI containing the MISP platform as well as all other external component on the same image. They may provide images for Azure and DigitalOcean in the future.
Containers
MISP-Docker
The official MISP project is providing a containerized version of the MISP where all elements except the SQL database are included in a single container.
Coolacid’s MISP-Docker
The project MISP-Docker from Coolacid is providing a containerized version of the MISP solution. This all-in-one solution includes the frontend, background jobs, cronjobs and an HTTP Server (Nginx) all orchestrated by process manager tool called supervisor. External services such as the database and Redis aren’t part of the container but are necessary. We decided that this project is very a good starting point to scale the MISP on Kubernetes.
Scaling the MISP
v1. Running on an EC2 instance
In our first version of the platform, we deployed the MISP on a traditional hardened EC2 instance where the MySQL database and redis were running within the instance. Running everything in a VM was causing scaling issues and requires the database to be either outside the solution or synced between the instances. Also, running an EC2 means petting it by applying latest patches, etc.
v2. Running on EKS
In order to simplify the operations ,but also for the sake of curiosity, we decided to migrate the traditional EC2 instance based MISP platform to Kubernetes, but more specifically the AWS service EKS. To fix the problems with scaling and synchronizing, we decided to use AWS managed services like RDS (MySQL) and ElastiCache (Redis).
After few months of operating the MISP on Kubernetes, we struggled with scale the deployment due to the design of the container:
- Entrypoint: The startup script copies files, updates rights and permissions at each start up
- PHP Sessions: by default, PHP is configured to store sessions locally causing the sessions to be lost when landing other replicas.
- Background jobs: Each replica are running the background jobs causing the jobs not being found by the UI and appear to be down.
- Cron jobs: Jobs are scheduled at the same time on every replica causing all pods executing the same task at the same time and loading external components.
- Lack of monitoring.
v3. Running on EKS
When it comes to use containers and Kubernetes, it also comes with new concepts such as micro-services and single process container. We realized that the MISP is actually two programs written in a single base code (frontend/ui and background jobs) sharing the same configuration. In our quest to scale, we decided to split the containers into smaller pieces. As mentioned above, this version of our implantation is based on the Coolacid’s amazing work which we forked.
Frontend
In this version, we kept the implementation of Nginx and PHP-FPM from Coolacid’s container and removed the workers and cron jobs from it. Nginx and PHP-FPM are started by Supervisor and the configuration of the application is mounted as static files (ConfigMap, Secret) to the container instead of dynamically set.
Background Jobs aka workers
Up to version 2.4.150
Up to version 2.4.150, the MISP background jobs (workers) were managed by CakeResque library. When starting a worker, the MISP is writing its PID into Redis. Based on that PID, the frontend shows worker’s health by checking if the /proc/{PID} exists. When running multiple replicas, the last started pod writes its PIDs to Redis and may not be the same as the others. Refreshing the frontend on the browser may show an unavailable worker due to inconsistent PID numbers (for example, latest PID of process prio registered is 33 and the container serving the user’s request know process prio by its PID 55).
Our first idea was to create two different Kubernetes deployments: a frontend with no workers that could scale and a single replica frontend which is also running the workers.
Using the Kubernetes' ingress object, we can split the traffic based on the path given in the URL: /servers/serverSettings/workers forwards the traffic to the single-replica of the frontend pod running the workers and all other requests are forwarded to the highly available frontend.
Background jobs are managed by the platform administrators, it’s acceptable that the UI isn’t available for a short period of time whereas the rest of the platform must remain accessible by our end-users and third party applications.
From version 2.4.151
As stated in the MISP documentation, from version 2.4.151, the library CakeResque became deprecated and is replaced by the SimpleBackgroundJobs feature which rely on the Supervisor.
Supervisor is configured to expose an API over HTTP on port 9001. Hence, the workers can be separated from the frontend which can be easily scaled.
Wait, what about the single process container and micro service mentioned earlier ? As per writing this article, the MISP configuration allows to configure a single supervisor endpoint to be polled and not an endpoint per worker.
Yes but … the frontend/ui is still trying to check the health of each process by checking in /proc/{PID} like in previous and shows that the process maybe start but it couldn’t check if it’s alive or not. An issue was created and we’re waiting for the patch to be integrated in a future version.
CronJobs
The MISP relies on few cron jobs used to trigger tasks like downloading new IOCs from external sources or content indexing. When running multiple replicas of the same pod, the same jobs are all scheduled at the same time. For example, the CacheFeed task happens every day at 2:20 am on every replicas degrading the performance of the whole application when running.
Thanks to the Kubernetes CronJob object, those jobs are running only once following the same schedule and independently from the number of replicas. All CronJobs are using the same image but the command parameter is specific for every jobs.
apiVersion: batch/v1
kind: CronJob
metadata:
name: cache-feeds
...
spec:
...
jobTemplate:
...
spec:
...
spec:
containers:
- command:
- sudo
- -u
- www-data
- /var/www/MISP/app/Console/cake
- Server
- cacheFeed
- "2"
- all
...
Other improvements across all components
- Entrypoint: All files operations are done at the dockerfile level speeding up the startup phase instead of the entrypoint script.
- MISP configuration: Configuration of the software is done dynamically done with Cakephp CLI tool and using sed commands based on environment variables. We replaced this process by mounting static configuration files as Kubernetes object ConfigMap
- PHP Sessions: PHP was configured to store sessions into Redis as it’s already used for the core of the MISP
- Helm chart: All the Kubernetes manifests were packaged into a helm chart
- Blue/Green deployment: Shorten the downtime during update of the underlying infrastructure or application
Next steps
We’re already working on a v4 on which we’ll focus on the security aspects as well as matching with the Kubernetes and Docker best practices.
Conclusion
We read quite often from Kubernetes providers that moving monoliths to containers is easy, but they tend to omit that scaling isn’t an easy task. Also, monitoring, backups and all other known challenges on traditional VMs still exist with containers and can be even more complex at scale.
Through the whole project, we learned a lot about how the MISP works by testing numerous parameters but also by reading a lot of code to better understand what and how to scale.
Links
Posted on November 17, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.