Running Cron Jobs in Container Environments
DevGraph
Posted on March 17, 2021
For decades, engineers have been using the system time-based job scheduler cron to manage processes that run on a periodic basis. As applications move to container-based infrastructures, teams are faced with the decision of how to implement scheduled jobs in a Kubernetes environment. Because containers provide a virtualization of the underlying operating system, the use of system services requires additional consideration. Fortunately, there are some very straightforward solutions for this common scenario.
What is Cron?
Cron automates running shell processes using definitions supplied in a crontab, or cron table. The format for this is shown below. You can use a single crontab, or multiple files typically located in the /etc/cron.d directory.
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday;
# │ │ │ │ │ 7 is also Sunday on some systems)
# * * * * * <command to execute>
This file defines the jobs to be run and their schedule. An asterisk denotes every increment of that unit of time, so in the example above, the command specified would run every minute of every hour of every day, etc. Specifying numbers in these slots denotes specific instances of the given time unit when the process should be fun.
We will cover three options in this article to implement cron in a container-based environment.
- Install and configure cron in your Docker image
- Use the Kubernetes CronJob mechanism
- Use a software process implementation such as Cronenberg
Run Cron Using Docker
The first approach uses Docker to set up cron on the container image. When looking at the individual container, this will look and run almost the same as it would on a bare metal server or virtual machine. However, there are a few important differences to consider.
First, let’s examine how to set this up. The Dockerfile is used to put the crontab files in place and run the system service. As an example, consider the crontab file shown below which is located in the root directory of your project.
* * * * * root echo "Hello world" >> /var/log/cron.log 2>&1
Cron requires newline characters at the end of each entry, so leave this here
The following commands are used in the Dockerfile to configure and run the cron service.
RUN apt-get update && apt-get -y install cron
ADD crontab /etc/cron.d/my-cron-file
RUN chmod 0644 /etc/cron.d/my-cron-file
RUN crontab /etc/cron.d/my-cron-file
RUN touch /var/log/cron.log
CMD cron && tail -f /var/log/cron.log
At this point, you can modify the crontab file in your repository and redeploy to make changes. Containers have a more ephemeral life cycle than their virtual machine counterparts, so keep this in mind when using this option. Your cron jobs may be interrupted more often by a container instance starting or stopping. You may also have multiple container instances running if the resource utilization is high enough, so make sure your jobs are prepared to deal with contention and idempotence.
For these reasons and others, this approach may not be the right fit for your architecture. Thus, container orchestration platforms offer mechanisms to accomplish the same goal.
Use the Kubernetes CronJob Mechanism
Kubernetes itself offers a CronJob mechanism. Similar to other Kubernetes features, the CronJob is created in a manifest, an example of which is shown below.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
The format itself is a bit cumbersome, but will look familiar to those dealing with Kubernetes manifest files. Note that you specify the container image on which the scheduled jobs run.
You do not have the same strong guarantee as the native operating system service. The Kubernetes documentation states:
“A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.”
Once again, we see a slight difference and additional considerations in terms of designing and running these scheduled processes in a container environment.
Use a Software Process Solution such as Cronenberg
Cronenberg provides a software implementation of cron that complements your Twelve-Factor Application application. It runs as another process in your container, not as a system service. It is intended to be simple and portable, so it avoids the use of hard-coded locations for the equivalent of crontab files. Instead, it takes an argument with the location of the configuration file.
To run Cronenberg, use the following command in your Dockerfile, or Procfile for common Platforms-as-a-service (PaaS).
cronenberg ./config/cron-jobs.yml
The cron-jobs.yml file format is shown in the example below. It is much simpler than the Kubernetes manifest and a bit closer to the crontab format we are used to dealing with.
This is just a normal job that runs every minute
- name: hello-world
command: echo "Hello World"
when: "* * * * *"
Additional Design Considerations
In each of these scenarios, your scheduled jobs run on container instances specifically deployed for this purpose. This leads to a few decision points.
• Consider whether you should auto-scale these containers, or set their scale specifically to one. If your jobs are not idempotent or you need to avoid contention, set the scale to one. If you don’t have these issues and can leverage the scalability, then by all means use those containers to their potential.
• In a strict Docker implementation, the Dockerfile will determine what user your cron daemon runs as, so be sure to consider permissions and related setup in your Dockerfile. Cron will also fail silently unless you stream the output to a log file, so minimally during development you will want to set up this logging.
• Many scheduled processes take up very minimal amounts of resources, so consider packing related jobs on specific containers so that you can most efficiently use your infrastructure resources.
• Cron jobs run operating system commands that have their own environment and configuration. If you want to leverage application components and services in your existing projects, consider using an asynchronous job framework such as Sidekiq for Ruby-on-Rails apps.
This blog is Originally published at devgraph.com
Posted on March 17, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.