Robert Nemet
Posted on September 13, 2023
This post is the fourth part of the series about using Terraform to manage GCP resources. In the first part, I did a basic setup of the project: remote state file, state file encryption, a bucket creation. In the second part, I created VPC and subnets and added some basic firewall rules and VMs. In the third part, I added more VPC, subnets, firewall rules, and VMs. In this part, I will refactor the project to make it more manageable. It should be more DRY and easier to maintain while supporting multiple environments.
Current State
You can look at the project's current state here. The existing file structure is like this:
├── README.md
├── Taskfile.yml
└── gcp
├── base
│ ├── .terraform-version
│ ├── .terraform.lock.hcl
│ ├── README.md
│ ├── main.tf
│ ├── outputs.tf
│ ├── provider.tf
│ ├── terraform.tfvars
│ └── variables.tf
├── network
│ ├── .terraform-version
│ ├── .terraform.lock.hcl
│ ├── README.md
│ ├── main.tf
│ ├── outputs.tf
| ├── peering.tf
│ ├── provider.tf
│ ├── terraform.tfvars
| ├── vpc_back_office.tf
│ ├── vpc_datastorages.tf
| ├── vpc_services.tf
│ └── variables.tf
└── vms
├── .terraform-version
├── .terraform.lock.hcl
├── README.md
├── main.tf
├── back_office.tf
├── imports.tf
├── outputs.tf
├── provider.tf
├── terraform.tfvars
└── variables.tf
The base
directory contains the basic setup of the project. It creates a bucket for the remote state file, enabling the state file's encryption. The network
directory contains the VPC, subnets, and firewall rules. The vms
directory contains the VMs. The Taskfile.yml
is used to run the tasks. The README.md
contains the documentation of the project.
There are several problems with this setup:
- Modules
network
andvms
handle more than they should. For example, thenetwork
module creates VPC, subnets, and firewall rules for all VPCs. It should handle only one VPC. This way, whenever I make a change in one VPC when I runterraform plan/apply
in the modulenetwork
other VPCs will be touched. If I have to work with others, this is the possible bottleneck. Thevms
module is creating VMs for all VPCs. - There is a lot of code duplication.
- This structure does not support multiple environments. I can't have a dev and prod environment with this setup.
- Changing one module requires running
terraform plan/apply
in all modules(not always). For example, when adding a service account to the VMs, I had to runterraform plan/apply
in thenetwork
andvms
modules. It is not a big deal, but it is annoying.
Step one: Split into environments
The first step is to split the project into environments. I will create two environments: dev
and prod
. I will create two directories: dev
and prod
on the project level. Then, I will copy whole gcp
directory into both dev
and prod
directories.
.
├── README.md
├── Taskfile.yml
├── dev
│ ├── base
│ ├── network
│ └── vms
└── prod
├── base
├── network
└── vms
Why? With this change, I'm reducing the impact of changes. While in an ideal case, these environments should be identical, they are not. To be honest, you do not need the same level of resources in dev and prod.
Step two: Create a module for each VPC
The network
module is doing too much. It is creating VPC, subnets, and firewall rules for all VPCs. It will be much better if it makes only one VPC, subnets, and firewall rules for that VPC. But the same thing applies to the vms
module. It is creating VMs for all VPCs. It will be much better if it makes VMs for only one VPC.
If I put network stuff and VMs in the same folder, which will then contain network
and vms
modules, for each VPC, I can create a VPC, subnets, firewall rules, and VMs for that VPC. This way, I'm separating the network stuff and VMs for each VPC. But, at the same time, I'm keeping them together. This way, I'm reducing the impact of changes. So, my new file structure looks like this:
.
├── README.md
├── Taskfile.yml
├── dev
│ ├── base
│ ├── back-office
│ │ ├── network
│ │ └── vms
│ ├── services
│ │ ├── network
│ │ └── vms
│ └── storage
│ ├── network
│ └── vms
└── prod
├── base
├── back-office
│ ├── network
│ └── vms
├── services
│ ├── network
│ └── vms
└── storage
├── network
└── vms
In addition to this change, I'll keep code related to the target VPC. This change simplifies the code, increases readability, and reduces the impact of changes. But at the same time, it increases the number of modules, and there is an increase in code duplication. Before dealing with code duplication, I'll ensure the code is working and has the same
structure.
What does that mean? Each module will have the same structure: main.tf
, outputs.tf
, provider.tf
, terraform.tfvars
, and variables.tf
. The main.tf
will contain the code for creating resources. The outputs.tf
will contain the outputs of the module. The provider.tf
will contain the provider configuration. The terraform.tfvars
will contain the variables for the module. The variables.tf
will contain the variables for the module.
Refactoring the base module
This module is the most straightforward module to refactor. There is little to change. Except the where the state file is stored. It's done by changing the backend.prefix
in:
terraform {
required_version = ">=1.5.5"
required_providers {
google = {
source = "hashicorp/google"
version = "4.77.0"
}
}
backend "gcs" {
bucket = "terraform-states-network-playground-382512"
prefix = "terraform/state/dev/base"
}
}
The state file for the base
module will be in the terraform/state/dev/base
directory. The previous state file was in the terraform/state/base
. How do I move the state file? Manually? It can be done. But there is a better way. If I run the terraform plan
in the base
module, I will get the following output:
$ terraform plan
╷
│ Error: Backend initialization required: please run "terraform init"
│
│ Reason: Backend configuration block has changed
│
│ The "backend" is the interface that Terraform uses to store state,
│ perform operations, etc. If this message is showing up, it means that the
│ Terraform configuration you're using is using a custom configuration for
│ the Terraform backend.
│
│ Changes to backend configurations require reinitialization. This allows
│ Terraform to set up the new configuration, copy existing state, etc. Please run
│ "terraform init" with either the "-reconfigure" or "-migrate-state" flags to
│ use the current configuration.
│
│ If the change reason above is incorrect, please verify your configuration
│ hasn't changed and try again. At this point, no changes to your existing
│ configuration or state have been made.
As the output clearly says, I must run terraform init
with either the -reconfigure
or -migrate-state
flags. I'll use the -migrate-state
flag. This flag will migrate the state file to the new location. So, I'll run terraform init -migrate-state
in the base
module. The output will be:
$ terraform init -migrate-state
Initializing the backend...
Backend configuration changed!
Terraform has detected that the configuration specified for the backend
has changed. Terraform will now check for existing state in the backends.
Acquiring state lock. This may take a few moments...
Acquiring state lock. This may take a few moments...
Do you want to copy existing state to the new backend?
Pre-existing state was found while migrating the previous "gcs" backend to the
newly configured "gcs" backend. No existing state was found in the newly
configured "gcs" backend. Do you want to copy this state to the new "gcs"
backend? Enter "yes" to copy and "no" to start with an empty state.
Enter a value: yes
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/google v4.77.0
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Running terrafom plan
should work, giving you no changes as output.
With this, the base is checked.
Refactoring the network module
The network
module is a bit more complicated to refactor. Initially, it created VPCs, subnets, and firewall rules for all VPCs. Now, it will create VPC, subnets, and firewall rules
for only one VPC. That means that I have to split changes for each VPC. This is the easy part. A simple copy-paste technique will do the trick. But what about the state file? Let's look at the first VPC: back-office
for the dev
environment:
...
backend "gcs" {
bucket = "terraform-states-network-playground-382512"
prefix = "terraform/state/dev/back-office/network"
}
}
The state file for the network
module will be in the terraform/state/dev/network/back-office
directory. The previous state file was in the terraform/state/network
. I can repeat the same technique as I did for the base
module. However, I plan to remove all code related to the other VPCs. When I remove code related to the other VPCs, the Terraform
will say that it will delete resources related to other VPCs. Why? It is simple: the state file for the network
module contains resources related to other VPCs. So, when I remove code related to other VPCs, Terraform will see that the state file includes resources related to other VPCs and try to delete them. I want something else. I want to keep the resources related to other VPCs. So before I clean up the code, I'll move the state file with the -migrate-state
flag. I'll run terraform init -migrate-state
in the dev/back-office/network
module. The Terraform will migrate the state file. If you look into the bucket, you will see that the ole state file is copied to the terraform/state/dev/network/back-office
directory:
$ gsutil ls gs://terraform-states-network-playground-382512/terraform/state/network
gs://terraform-states-network-playground-382512/terraform/state/network/default.tfstate
So, the -migration-state
flag does not delete the old state file but copies it to the new location. This makes life easier. After Terraform migrates the state file, I can clean up the code. I'll remove all code related to other VPCs.
The VPC peering is one thing that is related to all VPCs, and I need to refactor all of them before I can deal with it. I'll name a separate module in the dev environment peering
. This module will contain the code for creating VPC peering for all VPCs. For now, it should be a copy of the old network
module.
To continue with the refactoring back-office
VPC, I'll remove all code related to other VPCs. When I run terraform plan
this time, I'll get the following output as expected. I see that Terraform will delete resources related to other VPCs. I see the message Plan: 0 to add, 0 to change, 12 to destroy.
So, I want to keep the resources related to other VPCs but to delete them from the state file related to the VPC back-office
network module. I can do that by running the terraform state rm
command. I'll run terraform state rm
for each resource related to other VPCs. For example:
$ terraform state rm google_compute_subnetwork.subnet_postgres
Acquiring state lock. This may take a few moments...
Removed google_compute_subnetwork.subnet_postgres
Successfully removed 1 resource instance(s).
So there is a lot of typing. You can use more than one resource when removing resources from the state file. I do this until I get the message that there is nothing to change. All of this I do for the other VPCs in the dev environment.
To refactor the dev/peerings
module, I'll do the same thing as I did for the dev/back-office/network
module. I'll migrate the state file and remove all code related to VPCs, except peerings. I have to import the VPC information for each VPC from their state files:
data "terraform_remote_state" "back_office" {
backend = "gcs"
config = {
bucket = "terraform-states-network-playground-382512"
prefix = "terraform/state/dev/back-office/network"
}
}
data "terraform_remote_state" "services" {
backend = "gcs"
config = {
bucket = "terraform-states-network-playground-382512"
prefix = "terraform/state/dev/services/network"
}
}
data "terraform_remote_state" "storage" {
backend = "gcs"
config = {
bucket = "terraform-states-network-playground-382512"
prefix = "terraform/state/dev/storage/network"
}
}
Of course, those are exported in the outputs. tf
file for each VPC:
output "vpc_services" {
value = google_compute_network.services.self_link
}
output "vpc_services_subnetwork" {
value = google_compute_subnetwork.services
}
Refactoring the vms module
The vms
module situation resembles the network
module. The only difference is that I must import the VPC information from the network
modules. So, first, I'll migrate the state file, remove all code related to other VPCs, and then import the VPC information from the network
module. I'll do this for each VPC in the dev environment.
See the final code here.
When done, inspect the code. Notice how much there is duplicate code or almost the same code. Ignore resource names and a number of resources. That means that I can reuse the code.
Step three: Reuse the code with modules
There are a lot of code that is duplicated or almost the same. I want to make my code DRY. I can do it either by using Google network
module or writing my module. It is wiser to use the already existing Google network
module.
Refactoring the dev/back-office/network
module to use the Google network
module is straightforward:
module "back-office" {
source = "terraform-google-modules/network/google"
version = "~> 7.3"
project_id = var.project_id
network_name = "back-office"
routing_mode = "REGIONAL"
subnets = [
{
subnet_name = "back-office"
subnet_ip = "10.1.0.0/24"
subnet_region = var.region
},
{
subnet_name = "back-office-private"
subnet_ip = "10.2.0.0/24"
subnet_region = var.region
}
]
ingress_rules = [
{
name = "back-office-icmp"
allow = [
{
protocol = "icmp"
}
]
source_ranges = [
"0.0.0.0/0"
]
},
{
name = "back-office-iap"
allow = [
{
protocol = "tcp"
}
]
source_ranges = [
"35.235.240.0/20"
]
target_service_accounts = [google_service_account.back_office_fw_sa.email]
allow = [
{
protocol = "tcp"
}
]
depends_on = [google_service_account.back_office_fw_sa]
}
]
}
Notice that now I am not creating resources directly. Instead, I am calling the module and passing variables to it. That is the way to reuse code in Terraform.
This code is more readable, and it is easier to maintain. But, as in the previous step, working with the state file is tricky. This time, I'll remove all other resources from the code. Since I have added the module, I need to run terraform init
and then terraform plan
. The plan will tell me that there are five resources to add and five to destroy. But I do not want to destroy anything. What I need to do is to rename the resources in the state file. I'll do that by running the terraform state mv
command. For example:
$ terraform state mv google_compute_network.back_office module.back-office.module.vpc.google_compute_network.network
Acquiring state lock. This may take a few moments...
Move "google_compute_network.back_office" to "module.back-office.module.vpc.google_compute_network.network"
Successfully moved 1 object(s).
$ terraform state mv google_compute_subnetwork.back_office_private 'module.back-office.module.subnets.google_compute_subnetwork.subnetwork["us-central1/back-office-private"]'
Move "google_compute_subnetwork.back_office_private" to "module.back-office.module.subnets.google_compute_subnetwork.subnetwork[\"us-central1/back-office-private\"]"
Successfully moved 1 object(s).
$ terraform state mv google_compute_subnetwork.back_office 'module.back-office.module.subnets.google_compute_subnetwork.subnetwork["us-central1/back-office"]'
Move "google_compute_subnetwork.back_office" to "module.back-office.module.subnets.google_compute_subnetwork.subnetwork[\"us-central1/back-office\"]"
Successfully moved 1 object(s).
Releasing state lock. This may take a few moments...
$ terraform state mv google_compute_firewall.back_office_icmp 'module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress["back-office-icmp"]'
Move "google_compute_firewall.back_office_icmp" to "module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress[\"back-office-icmp\"]"
Successfully moved 1 object(s).
Releasing state lock. This may take a few moments...
$ terraform state mv google_compute_firewall.back_office_iap 'module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress["back-office-iap"]'
Move "google_compute_firewall.back_office_iap" to "module.back-office.module.firewall_rules.google_compute_firewall.rules_ingress_egress[\"back-office-iap\"]"
Successfully moved 1 object(s).
Releasing state lock. This may take a few moments...
As well, I want to retain the same outputs, so I refactor the outputs.tf
file:
output "vpc_back_office" {
value = module.back-office.network_self_link
}
output "vpc_back_office_id" {
value = module.back-office.network_id
}
output "vpc_back_office_subnetwork" {
value = module.back-office.subnets["us-central1/back-office"]
}
output "vpc_back_office_private_subnetwork" {
value = module.back-office.subnets["us-central1/back-office-private"]
}
output "back_office_fw_sa" {
value = google_service_account.back_office_fw_sa.email
}
Now, I can run the terraform plan
and see no changes. I can do the same for the other VPCs in the dev environment.
If you wonder how to know which resource to rename, you can run terraform plan
and see which resources will be destroyed. Those are the resources that you need to rename. Names of new resources are new names of old resources.
Terraform will not allow you to rename resources of different types. So, you can't rename a subnet to a firewall rule. You can rename a subnet to a subnet and a firewall rule to a firewall rule.
Reuse code in the vms module
In the previous step, I used the already provided module. But I can write my module, too. For managing VMs, I'll do it. I'll create the ' vms' module in the <project root>/modules/vms
directory. It will have two files, main.tf
and variables.tf
. The main.tf
will contain the code for creating VMs. The variables.tf
will have the variables for the module. The main.tf
will look like this:
resource "google_compute_instance" "virtual_machine" {
name = var.name
machine_type = var.machine_type
zone = var.zone
scheduling {
preemptible = true
automatic_restart = false
provisioning_model = "SPOT"
instance_termination_action = "STOP"
}
allow_stopping_for_update = var.allow_stopping_for_update
dynamic "service_account" {
for_each = var.sa_email != "" ? [var.sa_email] : []
content {
email = service_account.value
scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
}
boot_disk {
initialize_params {
image = "debian-cloud/debian-11"
}
}
network_interface {
network = var.network
subnetwork = var.subnetwork
}
}
In simple terms, look at how the code looks in each VPC for creating VMs. It is the same. You extract common code and put it in the module. All values that differ are replaced with variables. Terraform does not have a classic if-then-else
statement, so you must use dynamic
block. So, this is my VM
template.
When using it in the dev/services/vms
module, it will look like this:
module "services_vm_test" {
source = "../../../modules/vms"
zone = var.zone
machine_type = "f1-micro"
name = "services-vm-test"
network = data.terraform_remote_state.vpc_services.outputs.vpc_services_id
subnetwork = data.terraform_remote_state.vpc_services.outputs.vpc_services_subnetwork.id
}
I could even put machine_type
as a fixed value in the module. But I want to have the flexibility to change it. So, I'll leave it as a variable. For the dev/back-office/vms
module, it will look
like this:
module "vms" {
for_each = toset(["back-office-vm2", "back-office-private-vm1", "back-office-private-vm2", "back-office-vm1"])
source = "../../../modules/vms"
zone = var.zone
machine_type = "f1-micro"
name = each.value
network = data.terraform_remote_state.back_office.outputs.vpc_back_office_id
subnetwork = data.terraform_remote_state.back_office.outputs.vpc_back_office_subnetwork.id
sa_email = "back-office-vm1" == each.value ? data.terraform_remote_state.back_office.outputs.back_office_fw_sa : ""
allow_stopping_for_update = "back-office-vm1" == each.value ? true : false
}
As the machine name differs for each VM, I'm using the for_each
block. The only exception is back-office-vm1
. For this VM, I'm using a service account that I created for firewall rules. That's why I'm using the Elvis operator to check if the VM is back-office-vm1
. If it is, I'm using a service account; otherwise, I'm using an empty string.
Reuse code in the peering module
First, let's see the code:
locals {
peerings = {
"back-office-services-peering" : {
"network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
"peer_network" : data.terraform_remote_state.services.outputs.vpc_services
},
"services-back-office-peering" : {
"network" : data.terraform_remote_state.services.outputs.vpc_services,
"peer_network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
},
"back-office-storage-peering" : {
"network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
"peer_network" : data.terraform_remote_state.storage.outputs.vpc_storage,
},
"storage-back-office-peering" : {
"network" : data.terraform_remote_state.storage.outputs.vpc_storage,
"peer_network" : data.terraform_remote_state.back_office.outputs.vpc_back_office,
}
}
}
resource "google_compute_network_peering" "peerings" {
for_each = local.peerings
name = each.key
network = each.value.network
peer_network = each.value.peer_network
}
First, I define the locals
block. It contains a map of maps. Each map contains information about peering. In this case, I can not use variables because when you assign value to a variable in the variable.tf
file or terraform.tfvars
file, you can not use other variables. So, I have to use the locals
block. In the locals
block, I define a map for peering. The key is the peering name, and the value is a map with two keys: network
and peer_network
. This setup allows me to use the for_each
block in the google_compute_network_peering
resource.
The final result is here.
Conclusion
In this part, I refactored the project to make it more manageable: I split the project into environments, created a module for each VPC, and reused the code with modules. I used already existing modules, and I wrote my module. I used the terraform state mv
command to move resources from one state file to another. I used the terraform state rm
command to remove resources from the state file.
In the next part, let's try to use Terragrunt to make our code even more DRY and make running CLI commands easier.
Enjoy...
Posted on September 13, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.