5 ways for GitLab CI runners to get AWS credentials

edmundkwok

Edmund Kwok

Posted on March 15, 2023

5 ways for GitLab CI runners to get AWS credentials

Table of Contents

Congratulation! You are well on your way to automating pipelines and running them in the cloud. You have settled on using GitLab CI and are now faced with the daunting task of passing AWS credentials to your runners... securely of course.

As with everything in tech, there is more than one way to do it. But which one will you choose?

John Legend pressing the button on his chain in The Voice

Digging deeper, how will you choose and more importantly, why will you choose it?

And that, my friend, is what we will figure out together in this article (just high level for now, we can dig into the hands-on details for each option in follow-up articles). I'll share my approach to making a choice and hopefully it will be helpful to you too.

But first, what do we mean by "AWS credentials"?

You have a pipeline.

A pack of dogs doing stuff one after another, like a pipeline of sorts

Your doggos CI runners are doing something with AWS - maybe through aws-cli or AWS SDK or some other library that uses AWS SDK.

These requests to AWS will need to be signed by a valid AWS credential, more specifically, an AWS access key (authentication). The access key is tied to one or more permissions that allow you to do stuff in AWS (authorization).

For the sake of our discussion here, let's agree that we will be using aws-cli in our pipeline, and the script will be uploading an object to our S3 Bucket, requiring the s3:PutObject permission:

Architecture diagram of how the pipeline will perform the  raw `aws-cli` endraw  command

There are three options to pass the access keys to aws-cli:

  1. AWS credentials file
  2. Environment variables
  3. aws-cli arguments

I will leave it to you to decide what's best for your use case.

Now how do you get an access key? There are a few ways, but the two most common ones are:

  1. Access key created for an IAM user
  2. Access key given after assuming an IAM role (i.e. AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, but we'll mostly reference AssumeRole here)

Pro Tip to tell the difference

1 IAM user access key starts with AKIA, is long-lived, "static"
2 AssumeRole* access key starts with ASIA, is temporary, "dynamic"


Which is better though? Security security security, means I can't recommend 2 enough. Temporary short-lived credentials are preferred as they will expire on their own. If (When?) the credentials are leaked, hopefully they will be expired by then, limiting / eliminating any damage.

Bart Simpson accepting what looks like a ticket, and exclaiming

But, with great features, comes greater effort to get it done properly. So 1 may be an option depending on your use case and security hygiene. Fret not, we will discuss options for both 1 and 2 and you can pick your poison.

If you got this far, we will assume that you have an IAM role with the necessary permissions that your CI runners need, which is ready to be AssumeRole-ed (see what I did there? 😉).

Now bring on the options!

The options here are a mix of difficulty, security tradeoffs, and runners type (shared or self-hosted). For self-hosted runners, there are prerequisites as well - using AWS EC2 instance or AWS EKS.

Ready or not, presenting to you, the first option.

Option 1: GitLab CI Variables

Access key type - IAM user, AssumeRole

Architecture diagram for using GitLab CI Variables

This is the easiest way to get started without any dependency but probably the least secure.

Variables can be added via the GitLab UI, at the project level, group level, and instance level - you literally copy the access key and paste that into GitLab CI Variables UI. The access key will then be available to your pipeline script as environment variables or a file, depending on your use case.

It is also possible to go with an AssumeRole approach by adding a few steps in your script to assume the intended role. The access keys in the GitLab CI Variables should have the necessary permissions to AssumeRole.

⚠️ Important PSA!

Do not put any access key and secret in the .gitlab-ci.yml file in any shape or form. There has been many reported instances where secrets in Git repos are leaked.

Why this

  • Easiest to implement, no additional scripts to manage (if using the IAM user access key approach).
  • Runners can be hosted anywhere, even shared runners - the other approaches require it to be EC2 instances or Kubernetes Pods.

Why not this

  • The keys are in "plain text" and can be read by anyone with relevant access to the repo.
  • If Your / team member's GitLab account is compromised, the access keys will be as well.
  • GitLab itself could be compromised...

Big caveats before you adopt this

  1. Make sure the privilege granted to the access key adheres to the least privilege approach. It may be tempting to just assign the AdministratorAccess managed policy for convenience but think about the major inconvenience if the keys are compromised.
  2. Have a key rotation policy, revoking the old access keys and generating new ones periodically.
  3. Avoid this on production AWS accounts as much as possible. Or if you like living on the edge, and can be certain that the possible leak of production credentials won't be the end of the world for you.
  4. Have an up-to-date "compromised credentials" playbook that can be triggered as soon as a compromise is suspected.

Option 2: EC2 Instance Profile

Access key type - AssumeRole

Architecture diagram for using EC2 Instance Profile

This is the second easiest way to get access keys into your pipeline if you are already using EC2 instances for your self-managed runners.

One caveat, only one IAM role can be assigned to one EC2 instance at one time.

Just add the necessary IAM role as an EC2 instance profile, the AssumeRole wizardry will happen in the background, and the access keys will be automagically consumed by aws-cli and all common AWS SDK client libraries. These are temporary access keys and they are automatically rotated in the background as well.

Why this

  • If you are already using EC2 instances as your runners.
  • You only need one IAM role for your pipeline.

Why not this

  • Your runners are not running on EC2 instance.
  • Only one instance profile can be assigned to an EC2 instance at one time. You could use the associate-iam-instance-profile command to update the role of a running EC2 instance but it may be challenging to manage that in a sustainable manner.

Option 3: EKS IAM Roles for service accounts (aka IRSA)

Access key type - AssumeRole (well technically, AssumeRoleWithWebIdentity)

Architecture diagram for using EKS IAM Roles for service accounts

If you are already using AWS EKS >= 1.12 and the Kubernetes executor for GitLab Runner, this should be your preferred approach!

Using the OpenID Connect (OIDC) flow, EKS acts as an OIDC Identity Provider that is trusted by IAM. The OIDC JSON Web Token (JWT) here is the JWT token of the Pod's ServiceAccount which will be passed to AWS Secure Token Service (STS) with AssumeRoleWithWebIdentity and the temporary access key for the Role is returned. You can configure the IAM role trust policy to only allow specific Kubernetes namespace and Service Account to assume it.

There are a few moving pieces (AWS docs for IRSA), but essentially you need to:

  1. Enable IAM OIDC for your cluster.
  2. Associate a relevant K8S Service Account with an IAM role through annotations.
  3. Ensure the runner Pod is using said Service Account.
  4. The access keys will be automagically consumed by aws-cli and all common AWS SDK client libraries.

(If you are interested in a deep dive into what goes on behind the scenes, this is an excellent read https://mjarosie.github.io/dev/2021/09/15/iam-roles-for-kubernetes-service-accounts-deep-dive.html)


Pro Tip

Remember the constraint with EC2 Instance Profile about a single role per EC2 instance? With this approach, you could create multiple SAs with relevant IAM roles, and pick the required SA per job level, with the KUBERNETES_SERVICE_ACCOUNT_OVERWRITE variable in .gitlab-ci.yml 😎


Technically, this could also be done with non-EKS Kubernetes clusters a self-deployed amazon-eks-pod-identity-webhook but I've not tested it myself to know for sure. Maybe in a follow-up article?

Why this

  • If you already using the AWS EKS AND using the Kubernetes executor for GitLab Runner, this is the easiest, secure and scalable way.
  • If you are using some form of Infrastructure as Code (IaC) tool like Terraform to manage your Kubernetes and IAM resources, it will be a breeze to manage the various moving pieces.

Why not this

  • You are not using the Kubernetes executor for GitLab Runner.
  • You are using a different flavor of Kubernetes - self-managed, AKS, GCP (to be confirmed if amazon-eks-pod-identity-webhook can be deployed successfully outside EKS).

Option 4: IAM OIDC identity provider integration with GitLab

Access key type - AssumeRole (similar to above, technically it's AssumeRoleWithWebIdentity)

Architecture diagram for using IAM OIDC identity provider integration with GitLab

This is an alternative option that is similar to Option 3, but for those who are not using the Kubernetes executor.

It also uses the same OIDC flow, but this time, GitLab is the OIDC Identity Provider that is trusted by IAM. Each GitLab job has an OIDC JWT that is accessible through the CI_JOB_JWT_V2 environment variable. In your script, you pass that to AWS STS with AssumeRoleWithWebIdentity, and the temporary access key for the Role is returned. You can configure the IAM role trust policy to only allow specific GitLab group, project, branch, or tag, to assume it.

Why this

  • If your runners are not running on Kubernetes or EKS.

Why not this

  • You are not using the Kubernetes executor for your runners.
  • You are not comfortable with any GitLab runners registered in the allowed GitLab group, project, branch or tag to assume the IAM role.

Option 5: HashiCorp Vault

Access key type - IAM user (Plot twist, the IAM user from Vault is not static, but temporary! More below.), AssumeRole

Architecture diagram for using HashiCorp Vault

A person opens a vault that looks like those from banks, welcoming you inside


Welcome to The Holy Grail for secrets management.

For the uninitiated, think of HashiCorp Vault as a broker of secrets. You (human, script, machine, etc) authenticate to Vault with a slew of auth methods (depending on what's enabled and what you have been allowed to auth with), and in exchange, receive a lease to a Vault token. That token is tied to a policy that allows you to request one or more configured secrets. When you request a secret, Vault does the heavy lifting of provisioning it in the respective backend and also removing it from the respective backend when the lease to the token expires.

If you already have HashiCorp Vault in your stack, it supports the AWS Secrets Engine out of the box. Vault can generate IAM access keys dynamically for IAM users and IAM roles that you manage in Vault itself (more accurately, the IAM user and IAM roles are created dynamically, and with it comes the access key).

One nice thing about using Vault is that all IAM access keys generated by Vault are time-based, and automatically revoked and removed from AWS once it is expired. This also includes the access token generated for IAM users, where the created IAM user itself is also removed from AWS once the time-to-live is reached (end of plot twist).

In your pipeline script, you'd perform the end to end flow (notice that you won't even interact with AWS IAM / STS directly):

  1. Authenticate to Vault
  2. Logging in with the returned Vault token
  3. Request a new secret for the relevant IAM user or IAM role
  4. Use the generated temporary IAM access key to call AWS

Why this

  • Vault is the most secure option and offers more than one way for your runners to authenticate - TLS certs, Kubernetes Service Account, OIDC and many more.
  • Your runners can run from anywhere as long as they can reach Vault.
  • Once you have Vault in your stack, you can extend the dynamic secrets pattern to other applications, machines, and humans.

Why not this

  • It's harder to setup and integrate Vault in your existing stack if you don't have it already.
  • If you are only using Vault the AWS Secrets Engine for the runners in the foreseeable future, the other options may be better ROI for the time and effort spent to get a production ready Vault cluster up.

Big caveats before you adopt this

  • Make sure you have a secure way for your runners to authenticate with Vault, TLS (mTLS if possible), and strong credentials. Otherwise, any authenticated and authorized Vault tokens can generate valid AWS access tokens.

Which option is the best for me?

Unfortunately, it depends 😅

It depends on your context and your use case. You may start with Option 1 to get started quicker and then migrate to another more secure approach, or even have more than one option, for different workloads.

Hopefully I was able to help shed light on some clarifying questions you can ask to help decide the best option for you.

But the best option sometimes is the one you finally choose - the quicker you pick one, the quicker your pipeline can deliver value to your end users (securely of course).

All the best!

Katniss Everdeen with the respect sign


I hope you enjoyed my first post on DEV 🥹

Either way, I would appreciate it if you will let me know below what you think of this article - if it helps you, anything you agree or disagree with, and what follow-up articles you would like to see. Looking forward to your comments!

💖 💪 🙅 🚩
edmundkwok
Edmund Kwok

Posted on March 15, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related