Manage Airflow connections with Terraform and AWS SecretsManager

ivicak

Ivica Kolenkaš

Posted on August 15, 2023

Manage Airflow connections with Terraform and AWS SecretsManager

Managing infrastructure as code brings speed, consistency and it makes the software development lifecycle more efficient and predictable. Infrastructure for your ETL/orchestration tool is managed with code - why not manage the secrets that your tool uses with code as well?

This article shows a proof-of-concept implementation of how to manage Airflow secrets through Terraform and keep them committed to a code repository.

Caveats/assumptions:

  • Users (developers) on the AWS account don't have permissions to retrieve secret values from SecretsManager
  • Terraform is using a remote state with appropriate security measures in place
  • IAM role used by Terraform has relevant permissions to manage a wide range of AWS services and resources

TL;DR

Repository with example code.

Architecture

Solution architecture diagram

How it works

How it works

Intro to Airflow Connections

Airflow can connect to various systems, such as databases, SFTP servers or S3 buckets. To connect, it needs credentials. Connections are an Airflow concept to store those credentials. They are a great way to configure access to an external system once and use it multiple times.

More on Connections.

A Connection can be expressed as a string; for example, a connection to a MySQL database may look like this:



mysql://username:password@hostname:3306/database


Enter fullscreen mode Exit fullscreen mode

Airflow understands this format and can use it to connect to the database for which the connection was configured.

Connections can be configured through environment variables, in an external secrets backend (our use case) and in the internal Airflow database.

A centralized way of managing connections becomes a necessity as soon as your proof-of-concept goes live or you are working in a team.

External secrets backends

Airflow supports multiple external secrets backends, such as AWS SecretsManager, Azure KeyVault and Hashicorp Vault.

Connection details are read from these backends when a connection is used. This keeps the sensitive part of the connection, such as a password, secure and minimizes the attack surface.

AWS SecretsManager backend

Configuring your Airflow deployment to use AWS SecretsManager is well explained on this page.

Creating AWS SecretsManager secrets with Terraform is done in a simple way:



resource "aws_secretsmanager_secret" "secret" {
  name = "my-precious"
}

resource "aws_secretsmanager_secret_version" "string" {
  secret_id     = aws_secretsmanager_secret.secret.id
  secret_string = "your secret here"
}


Enter fullscreen mode Exit fullscreen mode

but committing this to a code repository is a cardinal sin!

So how do you manage Airflow Connections in such a way that:

  • sensitive part of a connection is hidden
  • users can manage connections through code and commit them to a repository
  • Airflow can use these connections when running DAGs

Read on!

Encryption (is the) key

Encryption is a way to conceal information by altering it so that it appears to be random data. Source

The AWS Key Management Service (KMS) allows us to create and manage encryption keys. These keys can be used to encrypt the contents of many AWS resources (buckets, disks, clusters...) but they can also be used to encrypt and decrypt user-provided strings.

User encrypts a string

Your users (developers) need a developer-friendly way of encrypting strings without having access to the KMS key. Developers love APIs. Keep your developers happy and give them an API.

In this case, we have an API built with Powertools for AWS Lambda (Python) and Lambda Function URLs.

A custom Lambda function can be used to encrypt or generate and encrypt random strings. This covers two use cases:

  • Administrator of an external system has created credentials for us and we are now using them to create an Airflow connection
  • We are creating credentials to a system we manage and will use those credentials to create an Airflow connection

Give a string to this Lambda



POST https://xxxxxx.lambda-url.eu-central-1.on.aws/encrypt
Accept: application/json

{
"encrypt_this": "mysql://username:password@hostname:3306/database"
}


Enter fullscreen mode Exit fullscreen mode

and it returns something like this:



AQICAHjTAGlNShkkcAYzHludk...IhvcNAQcGoIGOMIGLAgEAMIGFBg/AluidQ==


Enter fullscreen mode Exit fullscreen mode

Completely unreadable to you and me and safe to commit to a repository. (If you recognized it, yes, it is base64 encoded. Try decoding it; even less readable!)

Create secrets with Terraform

That unreadable "sausage" from before can be used with Terraform, given that it has the permission to decrypt using the key that encrypted the original string.



data "aws_kms_secrets" "secret" {
  secret {
    name    = "secret"
    payload = "AQICAHjTAGlNShkkcAYzHludk...IhvcNAQcGoIGOMIGLAgEAMIGFBg/AluidQ=="
  }
}

resource "aws_secretsmanager_secret" "secret" {
  name = var.name
}

resource "aws_secretsmanager_secret_version" "string" {
  secret_id     = aws_secretsmanager_secret.secret.id
  secret_string = data.aws_kms_secrets.secret.plaintext["secret"]
}


Enter fullscreen mode Exit fullscreen mode

Code above will happily decrypt the encrypted string using a KMS key from your AWS account and store the decrypted value in SecretsManager.

Warning:

It will store the secret in the Terraform state - take the necessary precautions to secure it. Anyone with access to the KMS key that encrypted the string can decrypt it.

Keep your Terraform state secure and your KMS keys secure-er(?).

Using it in practice

Encrypt a MySQL connection string that Airflow will use:



http -b POST https://xxxxxx.lambda-url.eu-central-1.on.aws/encrypt encrypt_this='mysql://mysql_user:nb_6qaAI8CmkoI-FKxuK@hostname:3306/mysqldb'
{
    "encrypted_value": "AQICAHjTAGlNShkkcAYzHl8C2qXs7f...zaxroJDDw==",
    "message": "Encrypted a user-provided value."
}


Enter fullscreen mode Exit fullscreen mode

Use the "encrypted_value" value with a Terraform module to create a secret



module "db_conn" {
  source = "./modules/airflow_secret"

  name             = "airflow/connections/db"
  encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7f...zaxroJDDw=="
}


Enter fullscreen mode Exit fullscreen mode

after which you get a nice AWS SecretsManager secret.

AWS SecretsManager secret

Warning:

Anyone with the secretsmanager:GetSecretValue permission will be able to read the secret. Keep access to your AWS SecretsManager secrets secured.

Configure Airflow to use the AWS SecretsManager backend

One of the great features of Airflow is the possibility to set (and override) configuration parameters through environment variables. We can leverage this to configure MWAA so that it uses a different secrets backend:



resource "aws_mwaa_environment" "this" {
  airflow_configuration_options = {
    "secrets.backend"               = "airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend",
    "secrets.backend_kwargs"        = "{\"connections_prefix\": \"airflow/connections\",\"variables_prefix\": \"airflow/variables\"}"
  }
... rest of the config...
}


Enter fullscreen mode Exit fullscreen mode

With the secret in SecretsManager and Airflow configured to use SecretsManager as the backend, we can finally use the secret in a default way.

Example DAG shows that fetching the secret works.

Airflow logs showing the secret value


With this proof-of-concept solution we were able to achieve the following:

  • sensitive part of a connection is hidden
  • users can manage connections through code and commit them to a repository
  • Airflow can use these connections when running DAGs

One obvious downside of having encrypted strings in Git is that you can't understand what actually changed:



diff --git a/infra/secrets.tf b/infra/secrets.tf
index fe53e5f..7e85d90 100644
--- a/infra/secrets.tf
+++ b/infra/secrets.tf
@@ -9,5 +9,5 @@ module "db_conn" {
   source = "./modules/airflow_secret"

   name             = "airflow/connections/db"
-  encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7fs5x9gByXim/PPuwt+TuwH8pYZHik8Cx0AZDM+ECML8AAAAnzCBnAYJ...XwF2a8zaxroJDDw=="
+  encrypted_string = "AQICAHjTAGlNShkkcAYzHl8C2qXs7fs5x9gByXim/PPuwt+TuwGhmhBNcePnQmhjrTgozm6rAAAAnTCmgYJK...nLU8TVWkLDUsSfDs="
 }


Enter fullscreen mode Exit fullscreen mode

This encrypted approach also doesn't work well for connections that have no secrets in them, for example AWS connections that use IAM roles:



aws://?aws_iam_role=my_role_name&region_name=eu-west-1&aws_account_id=123456789123


Enter fullscreen mode Exit fullscreen mode

If you would like to improve the cost-effectiveness of your MWAA setup, give this article by Vince Beck a read.

Nice things I said about MWAA in my thoughts on MWAA 18 months ago are still valid to this day, and the speed of development on the aws-mwaa-local-runner has increased.

Until next time, keep those ETLs ET-elling!

💖 💪 🙅 🚩
ivicak
Ivica Kolenkaš

Posted on August 15, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related