Transforming AWS Liabilities into Assets on a Container Foundation

Periodically running small scripts, serving auxiliary roles, EC2, Lambda, and the likes - I believe many of you have numerous instances of these in your environments. They are, of course, excellent services that support our daily operations. However, having proper maintenance flows and mechanisms for them would be ideal. Unfortunately, in our case, that was lacking, leading to them becoming liabilities.

To mitigate these liabilities, we've been continuously working on improving operability and maintenance by containerizing these services and migrating them onto the execution foundation of our applications.

Do you have these?

Do any of you have these 'legendary' beings in your AWS accounts?

The legendary EC2 triggering SQL queries to RDS via cron for metrics
The legendary Lambda periodically merging OpenSearch

They're the ones handling detailed administrative tasks alongside the main infrastructure supporting services. Nowadays, using Lambda for this type of functionality is more common. The benefit of using Lambda is its serverless nature once you've written the code.

Challenges with EC2 or Lambda for Administrative Tasks

We had numerous EC2s and Lambdas serving such purposes in our company. However, when it came time to update them, it was often a manual deployment despite having the source code. Whether deploying manually or using 'sls' from a local PC, it required an extra step. Especially with Lambda, since runtime updates occurred periodically, manual function updates were sometimes necessary if the code needed modification. As these updates weren't frequent, it became a common cycle of neglect without improvement.

Will Containerization Solve This?

One might suggest, "Let's containerize them!" But containerization alone won't solve the issue. When it comes to aspects like development, modifications, and testing depending on runtimes like Lambda, having a container with good portability does ease the process. Containerization isn’t just crucial; it has its benefits. Running these alongside the main application on the same infrastructure and unifying deployment methods make maintenance easier even in the event of future changes. Hence, containerization isn't just blindly doing so; it presupposes that the main application runs on containers and the foundation is established.

Example: Replacing Lambda Python Script with EKS Cronjob (Shell)

Let me share an actual migration story from our company. We had one such being:

A legendary Lambda periodically merging OpenSearch

Running on Python 3.6, it required an upgrade due to runtime support reaching its end. While upgrading to 3.7 was an option, the CD part wasn't well-structured, making it challenging to verify the functionality. Therefore, we decided to migrate it to EKS, where our main application runs. The process itself was simple, and since it didn't necessarily need to be in Python, we rewrote it in shell and scheduled it as a Kubernetes Cronjob. We promote GitOps for applications using helmfile and Argo CD within our company. By integrating this OpenSearch management script into GitOps, deploying for verification became simple. This not only simplified the verification process but also made it clearer where the scripts were located, where to make changes, and how to deploy them.

The specific structure using helmfile was as follows:

.
├── environments
│   └── values.yaml
├── helmfile.yaml
└── manifests
    ├── merge.sh
    └── manifest.yaml.gotmpl

We store Kubernetes manifests in a 'manifests' directory, utilizing gotmpl format for extensibility, containing both Kubernetes manifests and the actual shell script for execution. Next up is the 'helmfile.yaml'.

environments:
  {{ .Environment.Name }}:
    values:
    - environments/values.yaml

---

releases:
  - name: "merge"
    namespace: "{{ .Namespace | default .Values.merge.namespace }}"
    chart: "./manifests"
    installedTemplate: "{{ .Values.merge.installed }}"
    labels:
      namespace: "{{ .Values.merge.namespace }}"
      default: "false"

Helmfile treats the manifest files under the 'manifests' directory as a Chart. It generates manifests based on the following gotmpl, considering them as Charts, and pods mount the shell script via ConfigMap to execute periodically using CronJobs.

apiVersion: v1
kind: ConfigMap
metadata:
  name: merge
data:
  merge.sh: |
    {{- readFile "merge.sh" | nindent 6 }}
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: merge
spec:
  schedule: "5 01 * * *"
  timeZone: "Asia/Tokyo"
  concurrencyPolicy: "Forbid"
  successfulJobsHistoryLimit: 10
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          restartPolicy: Never
          serviceAccountName: {{ .Values.merge.serviceAccountName }}
          containers:
          - name: merge
            image: chatwork/aws:2.8.10
            env:
              - name: ES_DOMAIN_ENDPOINT
                value: {{ .Values.merge.esDomainEndpoint }}
            command:
              - "/bin/bash"
              - "/tmp/merge.sh"
            volumeMounts:
            - name: merge
              mountPath: /tmp
          volumes:
          - name: merge
            configMap:
              name: merge
              items:
                - key: merge.sh
                  path: merge.sh

Benefits of Organizing the Deployment Flow

As mentioned earlier,

Running these alongside the main application on the same infrastructure and unifying deployment methods make maintenance easier even in the event of future changes.

We reaped the benefits of this. Particularly, the helmfile manifests mechanism proved powerful. For simple scripts, consolidating them here and enabling GitOps made it seem like this could handle everything (albeit a radical thought). Moreover, previously, such administrative code resided in repositories accessible only to the infrastructure and SRE teams, closing off visibility. However, by shifting the deployment flow towards the application side, developers gained visibility, enabling everyone to contribute to maintenance.

Conclusion

While we tailored the setup to operate on Kubernetes due to using EKS, similar configurations could be implemented even if you use ECS. From an EKS perspective, Argo CD's GitOps deployment method is exceptionally robust, and I genuinely believe there's no reason not to leverage this setup. Nonetheless, what matters most is leveraging the advantages of containerization while utilizing the infrastructure and deployment mechanisms you're accustomed to. If this helps shed light on those resources silently operating as liabilities in your AWS accounts, I'd consider it fortunate.

Blog