One staging for each engineer: introducing Layerform

lucasfcosta

Lucas F. da Costa

Posted on August 29, 2023

One staging for each engineer: introducing Layerform

Layerform allows developers to spin up multiple instances of their applications while reusing core parts of their infrastructure.

That way, companies can give each engineer their own staging environment. Consequently, engineers can avoid bugs from getting into production and more easily show their work to others. If this sounds useful, consider giving us a star on GitHub.

Assume you're running multiple pods in a Kubernetes cluster, for example. In that case, you can use Layerform so that each developer has their own pods and namespaces on top of a shared Kubernetes cluster.

Image description

In this post, I'll explain how Layerform allows engineers to do that by manipulating Terraform state.

First, I'll explain how Terraform states work and the problems with how they're currently implemented. Then, I'll explain how you could manipulate states to create resource replicas and reuse infrastructure. Finally, I'll show how Layerform itself uses similar state management techniques to enable development platforms that allow for resources to be reused.

What is a Terraform state?

Every Terraform setup has two parts: .tf files describing the ideal state and .tfstate files describing the current state.

Whenever you call terraform apply, it will compare the desired state described in your .tf files with the current state in your .tfstate files. Then, it will apply the differences.

Image description

Assume you want to create an S3 bucket in your AWS account. For that, you'll declare the desired state in your main.tf file.

# main.tf

provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-unique-bucket-name-layerform"

  tags = {
    Name        = "foo"
  }
}
Enter fullscreen mode Exit fullscreen mode

Then, when running terraform apply, Terraform will invoke the AWS provider, which uses the AWS CLI to talk to AWS and create your bucket.

Image description

After creating your bucket, Terraform will store information about the new bucket in a terraform.tfstate file. That file contains the actual state of your infrastructure.

{
  "version": 4,
  "terraform_version": "1.5.5",
  "serial": 1,
  "lineage": "ca8081999999999999999999999999999999",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "aws_s3_bucket",
      "name": "my_bucket",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "arn": "arn:aws:s3:::my-unique-bucket-name-layerform",
            "bucket": "my-unique-bucket-name-layerform",
            "tags": {
              "Name": "foo"
            },
            "tags_all": {
              "Name": "foo"
            },
            // ...
        }
      ]
    }
  ],
  "check_results": null
}
Enter fullscreen mode Exit fullscreen mode

The next time you change your main.tf file, Terraform will compare the desired state in main.tf with the actual state in terraform.tfstate to determine which changes it must apply.

Image description

Let's say you change your bucket's tags and run terraform apply, for example.

# main.tf

# ...

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-unique-bucket-name-layerform"

  tags = {
    Name        = "Look I changed something"
  }
}
Enter fullscreen mode Exit fullscreen mode

When you do that, Terraform will compare the new desired state with the actual state in terraform.tfstate. Then, it will see that the tags and tags_all properties are different and tell you it plans to update these fields.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_s3_bucket.my_bucket will be updated in-place
  ~ resource "aws_s3_bucket" "my_bucket" {
        id                          = "my-unique-bucket-name-layerform"
      ~ tags                        = {
          ~ "Name" = "foo" -> "Look I changed something"
        }
      ~ tags_all                    = {
          ~ "Name" = "foo" -> "Look I changed something"
        }
        # (10 unchanged attributes hidden)

        # (3 unchanged blocks hidden)
    }
Enter fullscreen mode Exit fullscreen mode

After confirming the apply, the terraform.tfstate file will now contain the updated tags.

{
  "version": 4,
  "terraform_version": "1.5.5",
  // ...
  "resources": [
    {
      "mode": "managed",
      "type": "aws_s3_bucket",
      "name": "my_bucket",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            // ...
            "tags": {
              "Name": "Look I changed something"
            },
            "tags_all": {
              "Name": "Look I changed something"
            },
            // ...
        }
      ]
    }
  ],
  "check_results": null
}
Enter fullscreen mode Exit fullscreen mode

Note that .tfstate files are the only way for Terraform to know which resources exist. If you delete your .tfstate file and run terraform apply, Terraform will try to create your bucket again. If you proceed with the apply, it will throw an error because the bucket already exists.

Error: creating Amazon S3 (Simple Storage) Bucket (my-unique-bucket-name-layerform): bucket already exists

   with aws_s3_bucket.my_bucket,
   on main.tf line 7, in resource "aws_s3_bucket" "my_bucket":
    7: resource "aws_s3_bucket" "my_bucket" {
Enter fullscreen mode Exit fullscreen mode

Manipulating state to create multiple buckets

Let's say a developer called Aaron wants to have his own bucket for testing purposes. For that, Aaron needs to duplicate the aws_s3_bucket resource and run terraform apply again.

# main.tf

# ...

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-unique-bucket-name-layerform"
}

resource "aws_s3_bucket" "aarons_bucket" {
  bucket = "aarons-unique-bucket-layerform"
}
Enter fullscreen mode Exit fullscreen mode

The problem with this approach is that developers must duplicate resources to get their own instances of resources.

This problem compounds as you add resources that depend on one another.

For example, if these S3 buckets must contain an object, Aaron needs to duplicate the aws_s3_bucket and the aws_s3_object within.

# main.tf

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-unique-bucket-name-layerform"
}

resource "aws_s3_object" "my_configs" {
  bucket  = "my-unique-bucket-name-layerform"
  key     = "configs/.keep"
  content = "some configs"
}

resource "aws_s3_bucket" "my_bucket" {
  bucket = "aarons-unique-bucket-layerform"
}

resource "aws_s3_object" "my_configs" {
  bucket  = "aarons-unique-bucket-layerform"
  key     = "configs/.keep"
  content = "some configs"
}
Enter fullscreen mode Exit fullscreen mode

As you can see, duplication quickly becomes unmanageable and error-prone.

Even if you use modules, there's only so much you can do until you're duplicating the module declarations themselves.

You could also try using the count and for_each meta-arguments, but there's only so much you can do until complexity hits you in the head. And let's be honest, if something is called a "meta-argument", it's probably not a good idea.

To avoid duplication, Aaron could tell Terraform to use his own empty .tfstate file when running terraform apply.

$ terraform apply -state="aaron.tfstate"
Enter fullscreen mode Exit fullscreen mode

The problem with just using a new .tfstate file is that it will cause Terraform to throw an error because there's already a bucket whose name is my-unique-bucket-name-layerform.

Image description

To avoid conflicts, Aaron should create a random_id resource and use its random ID in the bucket's name.

# main.tf

# ...

resource "random_string" "prefix" {
  length  = 8
  upper   = false
  special = false
}

resource "aws_s3_bucket" "my_bucket" {
  bucket = "${random_string.prefix.id}-layerform-example"
}
Enter fullscreen mode Exit fullscreen mode

Now, Aaron and his team can create as many buckets as they want by specifying a different .tfstate file every time. That way, everyone on the team can have their own buckets for testing and development purposes.

$ terraform apply -state=aaron.tfstate
$ terraform apply -state=dalton.tfstate
$ terraform apply -state=pete.tfstate
$ terraform apply -state=nicolas.tfstate
Enter fullscreen mode Exit fullscreen mode

This approach will work because each .tfstate file will be empty, causing Terraform to generate a new ID every time. Consequently, each bucket will have a different name, and there won't be conflicts.

Image description

Besides the practical advantages of not having to modify .tf files every time someone needs a new bucket, this approach also completely detaches the concept of a desired state with the concept of an actual state.

In Terraform's design, a declaration of the desired infrastructure is tied to a single actual state. That's a bad abstraction. In the real world, various instances of the desired infrastructure may exist. Consequently, having various actual states should be possible, too — one for each infrastructure instance.

Baking reusability into state files

Buckets are cheap. Consequently, it's okay for each person on Aaron's team to have their buckets.

Now, let's imagine Aaron's system ran on top of a Kubernetes cluster. At the time of this writing, a managed Kubernetes cluster on AWS costs $0.10, which amounts to $72 a month.

If Aaron, Dalton, Pete, and Nicolas all use brand new EKS clusters for their testing and development environments, that will cost their company $288 a month, and all clusters will be subutilized, especially if they're provisioning large EC2 instances.

Image description

To save money, Aaron could share the Kubernetes cluster with his team and allow each colleague to deploy their testing pods to a new namespace. That way, they'd have a single cluster and pay only $72. Additionally, all pods could share the same EC2 instance, making it even cheaper to run these development environments.

Image description

For that, Aaron's first step would be to create an eks.tf file and declare an EKS cluster (Amazon's managed Kubernetes offering).

# eks.tf

module "eks" "shared_cluster" {
  source  = "terraform-aws-modules/eks/aws"
  cluster_name    = "dev-eks-base-layerform"

  # I have intentionally skipped the VPC and node groups
  # configurations to keep these examples short and sweet.
}
Enter fullscreen mode Exit fullscreen mode

When Aaron applies this file using terraform apply -state=base.tfstate, Terraform will generate a base.tfstate file containing the cluster's data.

Image description

After that, Aaron will create a pods.tf file, which declares a namespace and pods within that namespace. To avoid conflicts, Aaron will use random_id to generate the namespace's name.

# pods.tf

resource "random_string" "namespace_suffix" {
  length  = 8
  special = false
}

resource "kubernetes_namespace" "user_ns" {
  metadata {
    name = "user-ns-${namespace_suffix}"
  }
}

resource "kubernetes_pod" "service_a" {
  metadata {
    name      = "service-a"
    namespace = kubernetes_namespace.user_ns.metadata[0].name
  }

  # ...
}

resource "kubernetes_pod" "service_b" {
  metadata {
    name      = "service-b"
    namespace = kubernetes_namespace.user_ns.metadata[0].name
  }

  # ...
}
Enter fullscreen mode Exit fullscreen mode

Then, Aaron can take a snapshot of the old base.tfstate by copying it to base-snapshot.tfstate. Then, he can create his development pods using terraform apply -state=base.tfstate.

That way, Terraform will create his pods on top of the existing Kubernetes cluster without modifying base-snapshot.tfstate.

Image description

Now, let's say Dalton also needs his own development pods but doesn't want to spend another $72 (and an extra 20 minutes) to create a new cluster for himself.

For that, Dalton could ask Aaron for the base-snapshot.tfstate file. Then, he'd run terraform apply -state=base-snapshot.tfstate to create his own set of pods within a new namespace in the same cluster.

Image description

Pete and Nicolas could do the same. All they need to create their development pods in the same cluster is to ask Aaron for the base-snapshot.tfstate and use it when running terraform apply.

Image description

The problem with sharing state snapshots

The problem with sharing state snapshots is that they must be kept up-to-date across all machines. What happens if Nicolas wants to change the cluster's name and write it in French, for example?

In that case, he'd have to apply the change and send the new .tfstate file to Aaron, Dalton, and Pete to download. Otherwise, their terraform apply runs would start failing.

Also, managing these files would become increasingly complicated if they share a kafka instance within the cluster. In that case, everyone in Aaron's team would have to have a base-snapshot.tfstate and a kafka-snapshot.tfstate that they need to merge into a single file before running terraform apply.

Layerform solves that problem by encapsulating your Terraform files into what we call layer definitions.

In Aaron's case, if he were to use Layerform, he'd create one layer definition referencing his eks.tf file and another referencing pods.tf. These would be the eks and the pods layers, respectively.

For the pods layer definition, Aaron would say it depends on an instance of the eks layer.

Image description

Then, when running layerform spawn eks, Layerform would run the eks.tf file and associate the resulting .tfstate to the ID default.

Image description

After that, if Pete runs layerform spawn pods, Layerform will pull the state for the default instance of eks. Then, it will merge the eks.tf and pods.tf files and apply them using that base state.

Image description

Dalton and Nicolas could do the same. When they run layerform spawn pods, Layerform will look for the default state for eks and spin their layer instances on top of it.

Image description

Storing state snapshots

Similarly to Terraform, Layerform can use a cloud back-end to store the states for the different instances and associate them with their IDs.

The difference between Layerform and Terraform is that Layerform can pick and merge the different states based on their IDs.

Image description

Other things you can do with layers

By breaking down your infrastructure into layers, you can have each team care for a particular part of the infrastructure.

For example, the platform team could put together the configurations for a Kubernetes cluster and Kafka instance.

The other teams could then build their layers on top of the platform team's. That way, each team takes care of its own infra and may spin up multiple instances of it without relying on numerous "matching strings" coming from Terraform's data sources.

Image description

Besides encapsulation, layers are also helpful for cost control and chargebacks. By breaking down your infrastructure into layers, Layerform can automatically tag resources in each layer instance so you know which teams' layers spend the most.

Summarising layers

At the end of the day, we can summarise Layerform as an intelligent way to break down desired states and dissociate them from actual states. That way, multiple actual states can exist for a particular desired state definition.

In a way, layers and their instances are like classes and objects. The terraform files for each layer are a "class," and each instance is an "object" with that class' properties.

Questions?

If you've got questions, feel free to use this repository's issue tracker or book a few minutes with me.

If this post has been helpful, please consider starring the repo or sharing it on HackerNews and Reddit.

💖 💪 🙅 🚩
lucasfcosta
Lucas F. da Costa

Posted on August 29, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related