Automate your Terraform using GitOps with Flux
Roberth Strand
Posted on December 29, 2022
GitOps as a workflow is perfect for application delivery, mostly used in Kubernetes environments, but it is also possible to use for infrastructure. In a typical GitOps scenario, you might want to look at solutions like Crossplane as a Kubernetes-native alternative, while most traditional infrastructure are still used with CI/CD pipelines. There are several benefits of creating your deployment platform with Kubernetes as the base, but it also means that more people would have to have that particular skill set. One of the benefits of an Infrastructure-as-Code tool like Terraform is that it is easy to learn, and doesn’t require much specialized knowledge.
When building our platform services, we wanted everyone to be able to contribute. Most, if not all, of our engineers use Terraform on a daily basis, and know how to create Terraform modules that can be used in several scenarios and for several customers. While there are several ways of automating Terraform, we would like to utilize a proper GitOps workflow as much as possible.
How does the Terraform controller work
While searching for alternatives for running Terraform using Kubernetes, I found several controllers and operators, but none that I felt had as much potential as the tf-controller from Weaveworks. We are already using Flux as our GitOps tool, and the tf-controller works by utilizing some of the core functionality from Flux, and has a custom resource for Terraform deployments. The source controller takes care of fetching our modules, the kustomize controllers apply the Terraform resources, and then the controller spin up static pods (called runners) that runs your Terraform commands.
The Terraform resource looks something like this:
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
name: helloworld
namespace: flux-system
spec:
interval: 1m
approvePlan: auto
path: ./terraform/module
sourceRef:
kind: GitRepository
name: helloworld
namespace: flux-system
There are a few things to note on the specs here. The interval in the spec controls how often the controller starts up the runner pods, which then performs terraform plan
on your root module, which is defined by the path parameter.
We also see that this particular resource is set to automatically approve any plans, which means that if there is a difference between the plan and the current state of the target system, a new runner will run to apply the changes automatically. This makes the process as “GitOps” as possible, but you can disable this. If you did, you would have to manually approve plans, either by using the Terraform Controller CLI, or by updating your manifests with a reference to the commit which should be applied. For more details, see the documentation on manual approval.
Like I mentioned earlier, the tf-controller utilizes the source controller from Flux. The sourceRef
attribute is used to define which source resource we want to use, just like a Flux Kustomization resource would.
Advanced deployments
While the example above works, it’s not the type of deployment we would normally do. When not defining a backend storage the state would get stored in the cluster, which is fine for testing and development, but for production we prefer that the state file is stored somewhere outside the cluster. We don’t want this defined in the root module directly, as we want to reuse our root modules in several deployments, so we have to define our backend in our Terraform resource.
Here is an example of how we set up a custom backend configurations. You can find all available backends in the Terraform docs.
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
name: helloworld
namespace: flux-system
spec:
backendConfig:
customConfiguration: |
backend "azurerm" {
resource_group_name = "rg-terraform-mgmt"
storage_account_name = "stgextfstate"
container_name = "tfstate"
key = "helloworld.tfstate"
}
...
For us, storing the state file outside the cluster means that we can redeploy our cluster but have no storage dependency. There is no need for backup, or state migration. As soon as the new cluster is up, it runs the commands against the same state, and we are back in business.
Another advanced move is dependencies between modules. Sometimes we design deployments like a two-stage rocket, where one deployment sets up certain resources that the next one use. In these scenarios, we need to make sure that our Terraform is written in such a fashion so that we output any data needed as inputs for the second module, and ensure that the first module has a successful run first.
These two examples are from code used while demonstrating dependencies, and all code can be found on my GitHub. Some of the resource is omitted for brevity’s sake.
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
name: shared-resources
namespace: flux-system
spec:
...
writeOutputsToSecret:
name: shared-resources-output
...
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
name: workload01
namespace: flux-system
spec:
...
dependsOn:
- name: shared-resources
...
varsFrom:
- kind: Secret
name: shared-resources-output
...
In the deployment that I call shared-resources, we see that I defined a secret where the outputs from the deployment should be stored. In this case, the outputs are the following:
output "subnet_id" {
value = azurerm_virtual_network.base.subnet.*.id[0]
}
output "resource_group_name" {
value = azurerm_resource_group.base.name
}
Then, in the workload01 deployment, we first define our dependency with the dependsOn
attribute, which then makes sure that shared-resources has a successful run before scheduling workload01. The outputs from shared-resources is then used as inputs in workload01, which is the reason why we want it to wait.
Why the controller pattern and not pipelines or Terraform Cloud
The most common approach to automating Terraform is either by using CI/CD pipelines or Terraform Cloud. Using pipelines for Terraform works fine, but usually ends up with us copying pipeline definitions over and over again. There are solutions to that, but by using the tf-controller we have a much more declarative approach to defining what we want our deployments to look like, rather than defining the steps in an imperative fashion.
Terraform Cloud has introduced a lot of features that overlaps with using the GitOps workflow, but using the tf-controller does not exclude you from using Terraform Cloud. You could use Terraform Cloud as the backend for your deployment, only automating the runs through the tf-controller.
The reason for us using this approach is that we already deploy applications using GitOps, and we have much more flexibility as to how we can offer these capabilities as a service. We can control our implementation through APIs, making self-service more accessible to both our operators and end-users. Details around our platform approach is such a big topic, that we will have to return to that in its own blog post.
Resources
- Terraform Controller: GitHub, Documentation
- Example deployments
- YouTube, How to achieve (actual) GitOps with Terraform and Kubernetes - Cloud Native and Kubernetes Oslo Meetup
Posted on December 29, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.