Analyzing Your On-Call Activity with Kubernetes Jobs
Joseph D. Marhee
Posted on December 11, 2018
In Kubernetes, you can create Jobs, or one-off tasks, or make them recurring CronJob resources (like you would on a non-distributed system), to schedule pods to complete a task.
Let’s take the example of the following script:
import requests
import json
import os
import datetime
import sys
API_KEY = os.environ['PAGERDUTY_RO_KEY']
SINCE = os.environ['START_DATE']
UNTIL = os.environ['END_DATE']
STATUSES = []
TIME_ZONE = 'UTC'
LIMIT = 50
RUNDATE = datetime.datetime.today().strftime('%Y%m%d%H%M%S')
def list_incidents(offsetval):
url = 'https://api.pagerduty.com/incidents'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
payload = {
'since': SINCE,
'until': UNTIL,
'statuses[]': STATUSES,
'limit': LIMIT,
'time_zone': TIME_ZONE,
'offset': offsetval
}
r = requests.get(url, headers=headers, params=payload)
return r.text
def write_csv(resp):
incidents = json.loads(resp)['incidents']
incidents_data = open('/mnt/pd-audit/%s-Incidents-%s.csv' % ( RUNDATE, offset), 'w+')
for inc in incidents:
incidents_data.write("%s,%s,\n" % (inc['title'],inc['created_at']))
incidents_data.close()
if __name__ == '__main__':
more_status = True
offset = 0
while more_status == True:
resp = list_incidents(offset)
more = json.loads(resp)['more']
if more == False:
more_status = False
print "No more pages after current run. '/mnt/pd-audit/%s-Incidents-%s.csv'..." % (RUNDATE, offset)
write_csv(resp)
else:
print "Writing '/mnt/pd-audit/%s-Incidents-%s.csv'..." % (RUNDATE, offset)
resp = list_incidents(offset)
write_csv(resp)
offset += 1
which just dumps a CSV of PagerDuty activity between a start and end date to disk, which your team can review later.
So, let’s say you want to run this on Kubernetes as a job you can kick off as-desired, build this script into an image:
FROM python:2
ENV PAGERDUTY_RO_KEY
ENV START_DATE
ENV END_DATE
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY pd-alert-csv.py main.py
CMD ['python','main.py']
and push it to your registry, and then you can reference it in a template like:
apiVersion: batch/v1
kind: Job
metadata:
name: pd-alert-csv
namespace: jobs-space
spec:
template:
spec:
volumes:
- name: pd-data
emptyDir: {}
containers:
- name: pd-alert-csv
image: coolregistry.us/pd-alert-csv:latest
volumeMounts:
- mountPath: "/mnt/pd-audit"
name: pd-data
env:
- name: PAGERDUTY_RO_KEY
valueFrom:
secretKeyRef:
name: pd-api-token
key: PAGERDUTY_RO_KEY
- name: START_DATE
value: "SOME_DATE"
#Ideally, you might program this window calculation into the script if this were recurring
- name: END_DATE
value: "SOME_LATER_DATE"
You might do something like have an EBS-backed volume for the data, or however you’d like it stored (or even write it to S3 directly or something in the script), and then load your PagerDuty API key into the pod as an environment varialble pushed as a Secret in Kubernetes, and then the start and end date for the job, which would be passed into your container from the image we just built.
To make this a recurring job, much like the Deployment resource does with Pods , you’re going to wrap your Job in a CronJob declaration to extend what Kubernetes is supposed to do with the job and when:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: pd-alert-report-cron
spec:
schedule: "0 \*/12 \* \* \*"
jobTemplate:
spec:
template:
spec:
...
and then indented within that spec key would be the job we defined above.
To learn more about what you can do with jobs (work queues, etc.), definitely checkout the Kubernetes documentation:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
and the amazing Kubernetes By Example for another great example:
Posted on December 11, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.