GCP With Terraform: Refactor with Modules

This post is the fourth part of the series about using Terraform to manage GCP resources. In the first part, I did a basic setup of the project: remote state file, state file encryption, a bucket creation. In the second part, I created VPC and subnets and added some basic firewall rules and VMs. In the third part, I added more VPC, subnets, firewall rules, and VMs. In this part, I will refactor the project to make it more manageable. It should be more DRY and easier to maintain while supporting multiple environments.

Current State

You can look at the project's current state here. The existing file structure is like this:

├── README.md
├── Taskfile.yml
└── gcp
    ├── base
    │   ├── .terraform-version
    │   ├── .terraform.lock.hcl
    │   ├── README.md
    │   ├── main.tf
    │   ├── outputs.tf
    │   ├── provider.tf
    │   ├── terraform.tfvars
    │   └── variables.tf
    ├── network
    │   ├── .terraform-version
    │   ├── .terraform.lock.hcl
    │   ├── README.md
    │   ├── main.tf
    │   ├── outputs.tf
    |   ├── peering.tf
    │   ├── provider.tf
    │   ├── terraform.tfvars
    |   ├── vpc_back_office.tf
    │   ├── vpc_datastorages.tf
    |   ├── vpc_services.tf
    │   └── variables.tf
    └── vms
        ├── .terraform-version
        ├── .terraform.lock.hcl
        ├── README.md
        ├── main.tf
        ├── back_office.tf
        ├── imports.tf
        ├── outputs.tf
        ├── provider.tf
        ├── terraform.tfvars
        └── variables.tf

The base directory contains the basic setup of the project. It creates a bucket for the remote state file, enabling the state file's encryption. The network directory contains the VPC, subnets, and firewall rules. The vms directory contains the VMs. The Taskfile.yml is used to run the tasks. The README.md contains the documentation of the project.

There are several problems with this setup:

Modules network and vms handle more than they should. For example, the network module creates VPC, subnets, and firewall rules for all VPCs. It should handle only one VPC. This way, whenever I make a change in one VPC when I run terraform plan/apply in the module network other VPCs will be touched. If I have to work with others, this is the possible bottleneck. The vms module is creating VMs for all VPCs.
There is a lot of code duplication.
This structure does not support multiple environments. I can't have a dev and prod environment with this setup.
Changing one module requires running terraform plan/apply in all modules(not always). For example, when adding a service account to the VMs, I had to run terraform plan/apply in the network and vms modules. It is not a big deal, but it is annoying.

Step one: Split into environments

The first step is to split the project into environments. I will create two environments: dev and prod. I will create two directories: dev and prod on the project level. Then, I will copy whole gcp directory into both dev and prod directories.

.
├── README.md
├── Taskfile.yml
├── dev
│   ├── base
│   ├── network
│   └── vms
└── prod
    ├── base
    ├── network
    └── vms

Why? With this change, I'm reducing the impact of changes. While in an ideal case, these environments should be identical, they are not. To be honest, you do not need the same level of resources in dev and prod.

Step two: Create a module for each VPC

The network module is doing too much. It is creating VPC, subnets, and firewall rules for all VPCs. It will be much better if it makes only one VPC, subnets, and firewall rules for that VPC. But the same thing applies to the vms module. It is creating VMs for all VPCs. It will be much better if it makes VMs for only one VPC.

If I put network stuff and VMs in the same folder, which will then contain network and vms modules, for each VPC, I can create a VPC, subnets, firewall rules, and VMs for that VPC. This way, I'm separating the network stuff and VMs for each VPC. But, at the same time, I'm keeping them together. This way, I'm reducing the impact of changes. So, my new file structure looks like this:

.
├── README.md
├── Taskfile.yml
├── dev
│   ├── base
│   ├── back-office
│   │   ├── network
│   │   └── vms
│   ├── services
│   │   ├── network
│   │   └── vms
│   └── storage
│       ├── network
│       └── vms
└── prod
    ├── base
    ├── back-office
    │   ├── network
    │   └── vms
    ├── services
    │   ├── network
    │   └── vms
    └── storage
        ├── network
        └── vms

In addition to this change, I'll keep code related to the target VPC. This change simplifies the code, increases readability, and reduces the impact of changes. But at the same time, it increases the number of modules, and there is an increase in code duplication. Before dealing with code duplication, I'll ensure the code is working and has the same
structure.

What does that mean? Each module will have the same structure: main.tf, outputs.tf, provider.tf, terraform.tfvars, and variables.tf. The main.tf will contain the code for creating resources. The outputs.tf will contain the outputs of the module. The provider.tf will contain the provider configuration. The terraform.tfvars will contain the variables for the module. The variables.tf will contain the variables for the module.

Refactoring the base module

This module is the most straightforward module to refactor. There is little to change. Except the where the state file is stored. It's done by changing the backend.prefix in:

terraform {
  required_version = ">=1.5.5"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.77.0"
    }
  }

  backend "gcs" {
    bucket = "terraform-states-network-playground-382512"
    prefix = "terraform/state/dev/base"
  }
}

The state file for the base module will be in the terraform/state/dev/base directory. The previous state file was in the terraform/state/base. How do I move the state file? Manually? It can be done. But there is a better way. If I run the terraform plan in the base module, I will get the following output:

$ terraform plan
╷
│ Error: Backend initialization required: please run "terraform init"
│
│ Reason: Backend configuration block has changed
│
│ The "backend" is the interface that Terraform uses to store state,
│ perform operations, etc. If this message is showing up, it means that the
│ Terraform configuration you're using is using a custom configuration for
│ the Terraform backend.
│
│ Changes to backend configurations require reinitialization. This allows
│ Terraform to set up the new configuration, copy existing state, etc. Please run
│ "terraform init" with either the "-reconfigure" or "-migrate-state" flags to
│ use the current configuration.
│
│ If the change reason above is incorrect, please verify your configuration
│ hasn't changed and try again. At this point, no changes to your existing
│ configuration or state have been made.

As the output clearly says, I must run terraform init with either the -reconfigure or -migrate-state flags. I'll use the -migrate-state flag. This flag will migrate the state file to the new location. So, I'll run terraform init -migrate-state in the base module. The output will be:

$ terraform init -migrate-state

Initializing the backend...
Backend configuration changed!

Terraform has detected that the configuration specified for the backend
has changed. Terraform will now check for existing state in the backends.

Acquiring state lock. This may take a few moments...
Acquiring state lock. This may take a few moments...
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "gcs" backend to the
  newly configured "gcs" backend. No existing state was found in the newly
  configured "gcs" backend. Do you want to copy this state to the new "gcs"
  backend? Enter "yes" to copy and "no" to start with an empty state.

  Enter a value: yes


Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/google v4.77.0

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Running terrafom plan should work, giving you no changes as output.

With this, the base is checked.

Refactoring the network module

The network module is a bit more complicated to refactor. Initially, it created VPCs, subnets, and firewall rules for all VPCs. Now, it will create VPC, subnets, and firewall rules
for only one VPC. That means that I have to split changes for each VPC. This is the easy part. A simple copy-paste technique will do the trick. But what about the state file? Let's look at the first VPC: back-office for the dev environment:

...

  backend "gcs" {
    bucket = "terraform-states-network-playground-382512"
    prefix = "terraform/state/dev/back-office/network"
  }
}

The state file for the network module will be in the terraform/state/dev/network/back-office directory. The previous state file was in the terraform/state/network. I can repeat the same technique as I did for the base module. However, I plan to remove all code related to the other VPCs. When I remove code related to the other VPCs, the Terraform
will say that it will delete resources related to other VPCs. Why? It is simple: the state file for the network module contains resources related to other VPCs. So, when I remove code related to other VPCs, Terraform will see that the state file includes resources related to other VPCs and try to delete them. I want something else. I want to keep the resources related to other VPCs. So before I clean up the code, I'll move the state file with the -migrate-state flag. I'll run terraform init -migrate-state in the dev/back-office/network module. The Terraform will migrate the state file. If you look into the bucket, you will see that the ole state file is copied to the terraform/state/dev/network/back-office directory:

$ gsutil ls gs://terraform-states-network-playground-382512/terraform/state/network
gs://terraform-states-network-playground-382512/terraform/state/network/default.tfstate

So, the -migration-state flag does not delete the old state file but copies it to the new location. This makes life easier. After Terraform migrates the state file, I can clean up the code. I'll remove all code related to other VPCs.

The VPC peering is one thing that is related to all VPCs, and I need to refactor all of them before I can deal with it. I'll name a separate module in the dev environment peering. This module will contain the code for creating VPC peering for all VPCs. For now, it should be a copy of the old network module.

To continue with the refactoring back-office VPC, I'll remove all code related to other VPCs. When I run terraform plan this time, I'll get the following output as expected. I see that Terraform will delete resources related to other VPCs. I see the message Plan: 0 to add, 0 to change, 12 to destroy. So, I want to keep the resources related to other VPCs but to delete them from the state file related to the VPC back-office network module. I can do that by running the terraform state rm command. I'll run terraform state rm for each resource related to other VPCs. For example:

$ terraform state rm google_compute_subnetwork.subnet_postgres
Acquiring state lock. This may take a few moments...
Removed google_compute_subnetwork.subnet_postgres
Successfully removed 1 resource instance(s).

So there is a lot of typing. You can use more than one resource when removing resources from the state file. I do this until I get the message that there is nothing to change. All of this I do for the other VPCs in the dev environment.

To refactor the dev/peerings module, I'll do the same thing as I did for the dev/back-office/network module. I'll migrate the state file and remove all code related to VPCs, except peerings. I have to import the VPC information for each VPC from their state files:

data "terraform_remote_state" "back_office" {
  backend = "gcs"

  config = {
    bucket = "terraform-states-network-playground-382512"
    prefix = "terraform/state/dev/back-office/network"
  }
}

data "terraform_remote_state" "services" {
  backend = "gcs"

  config = {
    bucket = "terraform-states-network-playground-382512"
    prefix = "terraform/state/dev/services/network"
  }
}

data "terraform_remote_state" "storage" {
  backend = "gcs"

  config = {
    bucket = "terraform-states-network-playground-382512"
    prefix = "terraform/state/dev/storage/network"
  }
}

Of course, those are exported in the outputs. tf file for each VPC:

output "vpc_services" {
  value = google_compute_network.services.self_link
}

output "vpc_services_subnetwork" {
  value = google_compute_subnetwork.services
}

Refactoring the vms module

The vms module situation resembles the network module. The only difference is that I must import the VPC information from the network modules. So, first, I'll migrate the state file, remove all code related to other VPCs, and then import the VPC information from the network module. I'll do this for each VPC in the dev environment.

See the final code here.

When done, inspect the code. Notice how much there is duplicate code or almost the same code. Ignore resource names and a number of resources. That means that I can reuse the code.

Step three: Reuse the code with modules

There are a lot of code that is duplicated or almost the same. I want to make my code DRY. I can do it either by using Google network module or writing my module. It is wiser to use the already existing Google network module.

Refactoring the dev/back-office/network module to use the Google network module is straightforward:

module "back-office" {
  source  = "terraform-google-modules/network/google"
  version = "~> 7.3"

  project_id   = var.project_id
  network_name = "back-office"
  routing_mode = "REGIONAL"


  subnets = [
    {
      subnet_name   = "back-office"
      subnet_ip     = "10.1.0.0/24"
      subnet_region = var.region
    },
    {
      subnet_name   = "back-office-private"
      subnet_ip     = "10.2.0.0/24"
      subnet_region = var.region
    }
  ]

  ingress_rules = [
    {
      name = "back-office-icmp"
      allow = [
        {
          protocol = "icmp"
        }
      ]
      source_ranges = [
        "0.0.0.0/0"
      ]
    },
    {
      name = "back-office-iap"
      allow = [
        {
          protocol = "tcp"
        }
      ]
      source_ranges = [
        "35.235.240.0/20"
      ]
      target_service_accounts = [google_service_account.back_office_fw_sa.email]
      allow = [
        {
          protocol = "tcp"
        }
      ]

      depends_on = [google_service_account.back_office_fw_sa]
    }
  ]
}

Notice that now I am not creating resources directly. Instead, I am calling the module and passing variables to it. That is the way to reuse code in Terraform.

This code is more readable, and it is easier to maintain. But, as in the previous step, working with the state file is tricky. This time, I'll remove all other resources from the code. Since I have added the module, I need to run terraform init and then terraform plan. The plan will tell me that there are five resources to add and five to destroy. But I do not want to destroy anything. What I need to do is to rename the resources in the state file. I'll do that by running the terraform state mv command. For example:

$ terraform state mv google_compute_network.back_office  module.back-office.module.vpc.google_compute_network.network
Acquiring state lock. This may take a few moments...
Move "google_compute_network.back_office" to "module.back-office.module.vpc.google_compute_network.network"
Successfully moved 1 object(s).

$ terraform state mv google_compute_subnetwork.back_office_private 'module.back-office.module.subnets.google_compute_subnetwork.subnetwork["us-central1/back-office-private"]'
Move "google_compute_subnetwork.back_office_private" to "module.back-office.module.subnets.google_compute_subnetwork.subnetwork[\"us-central1/back-office-private\"]"
Successfully moved 1 object(s).

$ terraform state mv google_compute_subnetwork.back_office 'module.back-office.module.subnets.google_compute_subnetwork.subnetwork["us-central1/back-office"]'
Move "google_compute_subnetwork.back_office" to "module.back-office.module.subnets.google_compute_subnetwork.subnetwork[\"us-central1/back-office\"]"
Successfully moved 1 object(s).
Releasing state lock. This may take a few moments...

$ terraform state mv google_compute_firewall.back_office_icmp 'module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress["back-office-icmp"]'
Move "google_compute_firewall.back_office_icmp" to "module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress[\"back-office-icmp\"]"
Successfully moved 1 object(s).
Releasing state lock. This may take a few moments...

$ terraform state mv google_compute_firewall.back_office_iap 'module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress["back-office-iap"]'
Move "google_compute_firewall.back_office_iap" to "module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress[\"back-office-iap\"]"
Successfully moved 1 object(s).
Releasing state lock. This may take a few moments...

As well, I want to retain the same outputs, so I refactor the outputs.tf file:

output "vpc_back_office" {
  value = module.back-office.network_self_link
}

output "vpc_back_office_id" {
  value = module.back-office.network_id
}

output "vpc_back_office_subnetwork" {
  value = module.back-office.subnets["us-central1/back-office"]
}

output "vpc_back_office_private_subnetwork" {
  value = module.back-office.subnets["us-central1/back-office-private"]
}

output "back_office_fw_sa" {
  value = google_service_account.back_office_fw_sa.email
}

Now, I can run the terraform plan and see no changes. I can do the same for the other VPCs in the dev environment.

If you wonder how to know which resource to rename, you can run terraform plan and see which resources will be destroyed. Those are the resources that you need to rename. Names of new resources are new names of old resources.

Terraform will not allow you to rename resources of different types. So, you can't rename a subnet to a firewall rule. You can rename a subnet to a subnet and a firewall rule to a firewall rule.

Reuse code in the vms module

In the previous step, I used the already provided module. But I can write my module, too. For managing VMs, I'll do it. I'll create the ' vms' module in the <project root>/modules/vms directory. It will have two files, main.tf and variables.tf. The main.tf will contain the code for creating VMs. The variables.tf will have the variables for the module. The main.tf will look like this:

resource "google_compute_instance" "virtual_machine" {
  name         = var.name
  machine_type = var.machine_type
  zone         = var.zone

  scheduling {
    preemptible                 = true
    automatic_restart           = false
    provisioning_model          = "SPOT"
    instance_termination_action = "STOP"
  }

  allow_stopping_for_update = var.allow_stopping_for_update

  dynamic "service_account" {

    for_each = var.sa_email != "" ? [var.sa_email] : []

    content {
      email  = service_account.value
      scopes = ["https://www.googleapis.com/auth/cloud-platform"]
    }
  }

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }
  network_interface {
    network    = var.network
    subnetwork = var.subnetwork
  }
}

In simple terms, look at how the code looks in each VPC for creating VMs. It is the same. You extract common code and put it in the module. All values that differ are replaced with variables. Terraform does not have a classic if-then-else statement, so you must use dynamic block. So, this is my VM
template.

When using it in the dev/services/vms module, it will look like this:

module "services_vm_test" {
  source       = "../../../modules/vms"
  zone         = var.zone
  machine_type = "f1-micro"
  name         = "services-vm-test"
  network      = data.terraform_remote_state.vpc_services.outputs.vpc_services_id
  subnetwork   = data.terraform_remote_state.vpc_services.outputs.vpc_services_subnetwork.id
}

I could even put machine_type as a fixed value in the module. But I want to have the flexibility to change it. So, I'll leave it as a variable. For the dev/back-office/vms module, it will look
like this:

module "vms" {
  for_each = toset(["back-office-vm2", "back-office-private-vm1", "back-office-private-vm2", "back-office-vm1"])

  source       = "../../../modules/vms"
  zone         = var.zone
  machine_type = "f1-micro"
  name         = each.value
  network      = data.terraform_remote_state.back_office.outputs.vpc_back_office_id
  subnetwork   = data.terraform_remote_state.back_office.outputs.vpc_back_office_subnetwork.id

  sa_email                  = "back-office-vm1" == each.value ? data.terraform_remote_state.back_office.outputs.back_office_fw_sa : ""
  allow_stopping_for_update = "back-office-vm1" == each.value ? true : false
}

As the machine name differs for each VM, I'm using the for_each block. The only exception is back-office-vm1. For this VM, I'm using a service account that I created for firewall rules. That's why I'm using the Elvis operator to check if the VM is back-office-vm1. If it is, I'm using a service account; otherwise, I'm using an empty string.

Reuse code in the peering module

First, let's see the code:

locals {
  peerings = {
    "back-office-services-peering" : {
      "network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
      "peer_network" : data.terraform_remote_state.services.outputs.vpc_services
    },
    "services-back-office-peering" : {
      "network" : data.terraform_remote_state.services.outputs.vpc_services,
      "peer_network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
    },
    "back-office-storage-peering" : {
      "network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
      "peer_network" : data.terraform_remote_state.storage.outputs.vpc_storage,
    },
    "storage-back-office-peering" : {
      "network" : data.terraform_remote_state.storage.outputs.vpc_storage,
      "peer_network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
    }
  }
}

resource "google_compute_network_peering" "peerings" {
  for_each = local.peerings

  name         = each.key
  network      = each.value.network
  peer_network = each.value.peer_network
}

First, I define the locals block. It contains a map of maps. Each map contains information about peering. In this case, I can not use variables because when you assign value to a variable in the variable.tf file or terraform.tfvars file, you can not use other variables. So, I have to use the locals block. In the locals block, I define a map for peering. The key is the peering name, and the value is a map with two keys: network and peer_network. This setup allows me to use the for_each block in the google_compute_network_peering resource.

The final result is here.

Conclusion

In this part, I refactored the project to make it more manageable: I split the project into environments, created a module for each VPC, and reused the code with modules. I used already existing modules, and I wrote my module. I used the terraform state mv command to move resources from one state file to another. I used the terraform state rm command to remove resources from the state file.

In the next part, let's try to use Terragrunt to make our code even more DRY and make running CLI commands easier.

Enjoy...

Blog