Edmund Kwok
Posted on March 15, 2023
Table of Contents
- What do we mean by "AWS credentials"
- Option 1: GitLab CI Variables
- Option 2: EC2 Instance Profile
- Option 3: EKS IAM Roles for service accounts (aka IRSA)
- Option 4: IAM OIDC identity provider integration with GitLab
- Option 5: HashiCorp Vault
- Which option is the best for me?
Congratulation! You are well on your way to automating pipelines and running them in the cloud. You have settled on using GitLab CI and are now faced with the daunting task of passing AWS credentials to your runners... securely of course.
As with everything in tech, there is more than one way to do it. But which one will you choose?
Digging deeper, how will you choose and more importantly, why will you choose it?
And that, my friend, is what we will figure out together in this article (just high level for now, we can dig into the hands-on details for each option in follow-up articles). I'll share my approach to making a choice and hopefully it will be helpful to you too.
But first, what do we mean by "AWS credentials"?
You have a pipeline.
Your doggos CI runners are doing something with AWS - maybe through aws-cli
or AWS SDK or some other library that uses AWS SDK.
These requests to AWS will need to be signed by a valid AWS credential, more specifically, an AWS access key (authentication). The access key is tied to one or more permissions that allow you to do stuff in AWS (authorization).
For the sake of our discussion here, let's agree that we will be using aws-cli
in our pipeline, and the script will be uploading an object to our S3 Bucket, requiring the s3:PutObject
permission:
There are three options to pass the access keys to aws-cli
:
I will leave it to you to decide what's best for your use case.
Now how do you get an access key? There are a few ways, but the two most common ones are:
- Access key created for an IAM user
- Access key given after assuming an IAM role (i.e.
AssumeRole
,AssumeRoleWithSAML
,AssumeRoleWithWebIdentity
, but we'll mostly referenceAssumeRole
here)
Pro Tip to tell the difference
1
IAM user access key starts withAKIA
, is long-lived, "static"
2
AssumeRole*
access key starts withASIA
, is temporary, "dynamic"
Which is better though? Security security security, means I can't recommend 2
enough. Temporary short-lived credentials are preferred as they will expire on their own. If (When?) the credentials are leaked, hopefully they will be expired by then, limiting / eliminating any damage.
But, with great features, comes greater effort to get it done properly. So 1
may be an option depending on your use case and security hygiene. Fret not, we will discuss options for both 1
and 2
and you can pick your poison.
If you got this far, we will assume that you have an IAM role with the necessary permissions that your CI runners need, which is ready to be AssumeRole
-ed (see what I did there? 😉).
Now bring on the options!
The options here are a mix of difficulty, security tradeoffs, and runners type (shared or self-hosted). For self-hosted runners, there are prerequisites as well - using AWS EC2 instance or AWS EKS.
Ready or not, presenting to you, the first option.
Option 1: GitLab CI Variables
Access key type - IAM user, AssumeRole
This is the easiest way to get started without any dependency but probably the least secure.
Variables can be added via the GitLab UI, at the project level, group level, and instance level - you literally copy the access key and paste that into GitLab CI Variables UI. The access key will then be available to your pipeline script as environment variables or a file, depending on your use case.
It is also possible to go with an AssumeRole
approach by adding a few steps in your script to assume the intended role. The access keys in the GitLab CI Variables should have the necessary permissions to AssumeRole
.
⚠️ Important PSA!
Do not put any access key and secret in the .gitlab-ci.yml
file in any shape or form. There has been many reported instances where secrets in Git repos are leaked.
Why this
- Easiest to implement, no additional scripts to manage (if using the IAM user access key approach).
- Runners can be hosted anywhere, even shared runners - the other approaches require it to be EC2 instances or Kubernetes Pods.
Why not this
- The keys are in "plain text" and can be read by anyone with relevant access to the repo.
- If Your / team member's GitLab account is compromised, the access keys will be as well.
- GitLab itself could be compromised...
Big caveats before you adopt this
- Make sure the privilege granted to the access key adheres to the least privilege approach. It may be tempting to just assign the
AdministratorAccess
managed policy for convenience but think about the major inconvenience if the keys are compromised. - Have a key rotation policy, revoking the old access keys and generating new ones periodically.
- Avoid this on production AWS accounts as much as possible. Or if you like living on the edge, and can be certain that the possible leak of production credentials won't be the end of the world for you.
- Have an up-to-date "compromised credentials" playbook that can be triggered as soon as a compromise is suspected.
Option 2: EC2 Instance Profile
Access key type - AssumeRole
This is the second easiest way to get access keys into your pipeline if you are already using EC2 instances for your self-managed runners.
One caveat, only one IAM role can be assigned to one EC2 instance at one time.
Just add the necessary IAM role as an EC2 instance profile, the AssumeRole
wizardry will happen in the background, and the access keys will be automagically consumed by aws-cli
and all common AWS SDK client libraries. These are temporary access keys and they are automatically rotated in the background as well.
Why this
- If you are already using EC2 instances as your runners.
- You only need one IAM role for your pipeline.
Why not this
- Your runners are not running on EC2 instance.
- Only one instance profile can be assigned to an EC2 instance at one time. You could use the
associate-iam-instance-profile
command to update the role of a running EC2 instance but it may be challenging to manage that in a sustainable manner.
Option 3: EKS IAM Roles for service accounts (aka IRSA)
Access key type - AssumeRole
(well technically, AssumeRoleWithWebIdentity
)
If you are already using AWS EKS >= 1.12
and the Kubernetes executor for GitLab Runner, this should be your preferred approach!
Using the OpenID Connect (OIDC) flow, EKS acts as an OIDC Identity Provider that is trusted by IAM. The OIDC JSON Web Token (JWT) here is the JWT token of the Pod's ServiceAccount
which will be passed to AWS Secure Token Service (STS) with AssumeRoleWithWebIdentity
and the temporary access key for the Role is returned. You can configure the IAM role trust policy to only allow specific Kubernetes namespace and Service Account to assume it.
There are a few moving pieces (AWS docs for IRSA), but essentially you need to:
- Enable IAM OIDC for your cluster.
- Associate a relevant K8S Service Account with an IAM role through annotations.
- Ensure the runner Pod is using said Service Account.
- The access keys will be automagically consumed by
aws-cli
and all common AWS SDK client libraries.
(If you are interested in a deep dive into what goes on behind the scenes, this is an excellent read https://mjarosie.github.io/dev/2021/09/15/iam-roles-for-kubernetes-service-accounts-deep-dive.html)
Pro Tip
Remember the constraint with EC2 Instance Profile about a single role per EC2 instance? With this approach, you could create multiple SAs with relevant IAM roles, and pick the required SA per job level, with the
KUBERNETES_SERVICE_ACCOUNT_OVERWRITE
variable in.gitlab-ci.yml
😎
Technically, this could also be done with non-EKS Kubernetes clusters a self-deployed amazon-eks-pod-identity-webhook but I've not tested it myself to know for sure. Maybe in a follow-up article?
Why this
- If you already using the AWS EKS AND using the Kubernetes executor for GitLab Runner, this is the easiest, secure and scalable way.
- If you are using some form of Infrastructure as Code (IaC) tool like Terraform to manage your Kubernetes and IAM resources, it will be a breeze to manage the various moving pieces.
Why not this
- You are not using the Kubernetes executor for GitLab Runner.
- You are using a different flavor of Kubernetes - self-managed, AKS, GCP (to be confirmed if amazon-eks-pod-identity-webhook can be deployed successfully outside EKS).
Option 4: IAM OIDC identity provider integration with GitLab
Access key type - AssumeRole
(similar to above, technically it's AssumeRoleWithWebIdentity
)
This is an alternative option that is similar to Option 3, but for those who are not using the Kubernetes executor.
It also uses the same OIDC flow, but this time, GitLab is the OIDC Identity Provider that is trusted by IAM. Each GitLab job has an OIDC JWT that is accessible through the CI_JOB_JWT_V2
environment variable. In your script, you pass that to AWS STS with AssumeRoleWithWebIdentity
, and the temporary access key for the Role is returned. You can configure the IAM role trust policy to only allow specific GitLab group, project, branch, or tag, to assume it.
Why this
- If your runners are not running on Kubernetes or EKS.
Why not this
- You are not using the Kubernetes executor for your runners.
- You are not comfortable with any GitLab runners registered in the allowed GitLab group, project, branch or tag to assume the IAM role.
Option 5: HashiCorp Vault
Access key type - IAM user (Plot twist, the IAM user from Vault is not static, but temporary! More below.), AssumeRole
Welcome to The Holy Grail for secrets management.
For the uninitiated, think of HashiCorp Vault as a broker of secrets. You (human, script, machine, etc) authenticate to Vault with a slew of auth methods (depending on what's enabled and what you have been allowed to auth with), and in exchange, receive a lease to a Vault token. That token is tied to a policy that allows you to request one or more configured secrets. When you request a secret, Vault does the heavy lifting of provisioning it in the respective backend and also removing it from the respective backend when the lease to the token expires.
If you already have HashiCorp Vault in your stack, it supports the AWS Secrets Engine out of the box. Vault can generate IAM access keys dynamically for IAM users and IAM roles that you manage in Vault itself (more accurately, the IAM user and IAM roles are created dynamically, and with it comes the access key).
One nice thing about using Vault is that all IAM access keys generated by Vault are time-based, and automatically revoked and removed from AWS once it is expired. This also includes the access token generated for IAM users, where the created IAM user itself is also removed from AWS once the time-to-live is reached (end of plot twist).
In your pipeline script, you'd perform the end to end flow (notice that you won't even interact with AWS IAM / STS directly):
- Authenticate to Vault
- Logging in with the returned Vault token
- Request a new secret for the relevant IAM user or IAM role
- Use the generated temporary IAM access key to call AWS
Why this
- Vault is the most secure option and offers more than one way for your runners to authenticate - TLS certs, Kubernetes Service Account, OIDC and many more.
- Your runners can run from anywhere as long as they can reach Vault.
- Once you have Vault in your stack, you can extend the dynamic secrets pattern to other applications, machines, and humans.
Why not this
- It's harder to setup and integrate Vault in your existing stack if you don't have it already.
- If you are only using Vault the AWS Secrets Engine for the runners in the foreseeable future, the other options may be better ROI for the time and effort spent to get a production ready Vault cluster up.
Big caveats before you adopt this
- Make sure you have a secure way for your runners to authenticate with Vault, TLS (mTLS if possible), and strong credentials. Otherwise, any authenticated and authorized Vault tokens can generate valid AWS access tokens.
Which option is the best for me?
Unfortunately, it depends 😅
It depends on your context and your use case. You may start with Option 1 to get started quicker and then migrate to another more secure approach, or even have more than one option, for different workloads.
Hopefully I was able to help shed light on some clarifying questions you can ask to help decide the best option for you.
But the best option sometimes is the one you finally choose - the quicker you pick one, the quicker your pipeline can deliver value to your end users (securely of course).
All the best!
I hope you enjoyed my first post on DEV 🥹
Either way, I would appreciate it if you will let me know below what you think of this article - if it helps you, anything you agree or disagree with, and what follow-up articles you would like to see. Looking forward to your comments!
Posted on March 15, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.