Manage Airflow connections with Terraform and AWS SecretsManager
Ivica Kolenkaš
Posted on August 15, 2023
Managing infrastructure as code brings speed, consistency and it makes the software development lifecycle more efficient and predictable. Infrastructure for your ETL/orchestration tool is managed with code - why not manage the secrets that your tool uses with code as well?
This article shows a proof-of-concept implementation of how to manage Airflow secrets through Terraform and keep them committed to a code repository.
Caveats/assumptions:
- Users (developers) on the AWS account don't have permissions to retrieve secret values from SecretsManager
- Terraform is using a remote state with appropriate security measures in place
- IAM role used by Terraform has relevant permissions to manage a wide range of AWS services and resources
TL;DR
Architecture
How it works
Intro to Airflow Connections
Airflow can connect to various systems, such as databases, SFTP servers or S3 buckets. To connect, it needs credentials. Connections are an Airflow concept to store those credentials. They are a great way to configure access to an external system once and use it multiple times.
A Connection can be expressed as a string; for example, a connection to a MySQL database may look like this:
mysql://username:password@hostname:3306/database
Airflow understands this format and can use it to connect to the database for which the connection was configured.
Connections can be configured through environment variables, in an external secrets backend (our use case) and in the internal Airflow database.
A centralized way of managing connections becomes a necessity as soon as your proof-of-concept goes live or you are working in a team.
External secrets backends
Airflow supports multiple external secrets backends, such as AWS SecretsManager, Azure KeyVault and Hashicorp Vault.
Connection details are read from these backends when a connection is used. This keeps the sensitive part of the connection, such as a password, secure and minimizes the attack surface.
AWS SecretsManager backend
Configuring your Airflow deployment to use AWS SecretsManager is well explained on this page.
Creating AWS SecretsManager secrets with Terraform is done in a simple way:
resource "aws_secretsmanager_secret" "secret" {
name = "my-precious"
}
resource "aws_secretsmanager_secret_version" "string" {
secret_id = aws_secretsmanager_secret.secret.id
secret_string = "your secret here"
}
but committing this to a code repository is a cardinal sin!
So how do you manage Airflow Connections in such a way that:
- sensitive part of a connection is hidden
- users can manage connections through code and commit them to a repository
- Airflow can use these connections when running DAGs
Read on!
Encryption (is the) key
Encryption is a way to conceal information by altering it so that it appears to be random data. Source
The AWS Key Management Service (KMS) allows us to create and manage encryption keys. These keys can be used to encrypt the contents of many AWS resources (buckets, disks, clusters...) but they can also be used to encrypt and decrypt user-provided strings.
Your users (developers) need a developer-friendly way of encrypting strings without having access to the KMS key. Developers love APIs. Keep your developers happy and give them an API.
In this case, we have an API built with Powertools for AWS Lambda (Python) and Lambda Function URLs.
A custom Lambda function can be used to encrypt or generate and encrypt random strings. This covers two use cases:
- Administrator of an external system has created credentials for us and we are now using them to create an Airflow connection
- We are creating credentials to a system we manage and will use those credentials to create an Airflow connection
Give a string to this Lambda
POST https://xxxxxx.lambda-url.eu-central-1.on.aws/encrypt
Accept: application/json
{
"encrypt_this": "mysql://username:password@hostname:3306/database"
}
and it returns something like this:
AQICAHjTAGlNShkkcAYzHludk...IhvcNAQcGoIGOMIGLAgEAMIGFBg/AluidQ==
Completely unreadable to you and me and safe to commit to a repository. (If you recognized it, yes, it is base64
encoded. Try decoding it; even less readable!)
Create secrets with Terraform
That unreadable "sausage" from before can be used with Terraform, given that it has the permission to decrypt using the key that encrypted the original string.
data "aws_kms_secrets" "secret" {
secret {
name = "secret"
payload = "AQICAHjTAGlNShkkcAYzHludk...IhvcNAQcGoIGOMIGLAgEAMIGFBg/AluidQ=="
}
}
resource "aws_secretsmanager_secret" "secret" {
name = var.name
}
resource "aws_secretsmanager_secret_version" "string" {
secret_id = aws_secretsmanager_secret.secret.id
secret_string = data.aws_kms_secrets.secret.plaintext["secret"]
}
Code above will happily decrypt the encrypted string using a KMS key from your AWS account and store the decrypted value in SecretsManager.
Warning:
It will store the secret in the Terraform state - take the necessary precautions to secure it. Anyone with access to the KMS key that encrypted the string can decrypt it.
Keep your Terraform state secure and your KMS keys secure-er(?).
Using it in practice
Encrypt a MySQL connection string that Airflow will use:
http -b POST https://xxxxxx.lambda-url.eu-central-1.on.aws/encrypt encrypt_this='mysql://mysql_user:nb_6qaAI8CmkoI-FKxuK@hostname:3306/mysqldb'
{
"encrypted_value": "AQICAHjTAGlNShkkcAYzHl8C2qXs7f...zaxroJDDw==",
"message": "Encrypted a user-provided value."
}
Use the "encrypted_value"
value with a Terraform module to create a secret
module "db_conn" {
source = "./modules/airflow_secret"
name = "airflow/connections/db"
encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7f...zaxroJDDw=="
}
after which you get a nice AWS SecretsManager secret.
Warning:
Anyone with the
secretsmanager:GetSecretValue
permission will be able to read the secret. Keep access to your AWS SecretsManager secrets secured.
Configure Airflow to use the AWS SecretsManager backend
One of the great features of Airflow is the possibility to set (and override) configuration parameters through environment variables. We can leverage this to configure MWAA so that it uses a different secrets backend:
resource "aws_mwaa_environment" "this" {
airflow_configuration_options = {
"secrets.backend" = "airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend",
"secrets.backend_kwargs" = "{\"connections_prefix\": \"airflow/connections\",\"variables_prefix\": \"airflow/variables\"}"
}
... rest of the config...
}
With the secret in SecretsManager and Airflow configured to use SecretsManager as the backend, we can finally use the secret in a default way.
Example DAG shows that fetching the secret works.
With this proof-of-concept solution we were able to achieve the following:
- sensitive part of a connection is hidden
- users can manage connections through code and commit them to a repository
- Airflow can use these connections when running DAGs
One obvious downside of having encrypted strings in Git is that you can't understand what actually changed:
diff --git a/infra/secrets.tf b/infra/secrets.tf
index fe53e5f..7e85d90 100644
--- a/infra/secrets.tf
+++ b/infra/secrets.tf
@@ -9,5 +9,5 @@ module "db_conn" {
source = "./modules/airflow_secret"
name = "airflow/connections/db"
- encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7fs5x9gByXim/PPuwt+TuwH8pYZHik8Cx0AZDM+ECML8AAAAnzCBnAYJ...XwF2a8zaxroJDDw=="
+ encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7fs5x9gByXim/PPuwt+TuwGhmhBNcePnQmhjrTgozm6rAAAAnTCmgYJK...nLU8TVWkLDUsSfDs="
}
This encrypted approach also doesn't work well for connections that have no secrets in them, for example AWS connections that use IAM roles:
aws://?aws_iam_role=my_role_name®ion_name=eu-west-1&aws_account_id=123456789123
If you would like to improve the cost-effectiveness of your MWAA setup, give this article by Vince Beck a read.
Nice things I said about MWAA in my thoughts on MWAA 18 months ago are still valid to this day, and the speed of development on the aws-mwaa-local-runner has increased.
Until next time, keep those ETLs ET-elling!
Posted on August 15, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.