Hello, Terraform (on Azure)

Hi Friends,

One of the exciting new (to me) technologies I've started working with recently is a software called Terraform (by HashiCorp). This is an Infrastructure-as-Code (IaC) platform that greatly simplifies the process of provisioning cloud-based infrastructure in a reliable, predictable, and scalable way. Not only can you run this across different environments but it's code that can be committed to source control allowing you to also version your infrastructure design!

I will be writing a multi-part series working with Terraform, specifically in the context of Azure. This post will just be an introduction to Terraform. In the next post, we'll make it actually do things! But first thing's first, we'll have a fairly lengthy post here covering the basics. I apologize for the wall of text here, but I wanted a single post to cover all of these basics for easy future reference.

What is Terraform?

Okay, that's great. It automates stuff, but what is it really?

Flavors

So there are a few different flavors of Terraform. Terraform Cloud is a fairly new offering (I believe it was just released last September) that offers a few niceties, but everything I'll be looking at in this series will be focused on the free offering.

Free as in beer?

Nope! Free as in speech.

The open source offering that anybody can use is what we'll be using (licensed via the Mozilla Public License 2.0 copyleft license). The paid options have some managed services that allow you to centralize things in a certain way and avoid console windows a bit more, but we'll use Azure Blob storage for our centralization and we'll just use the CLI to run our code. Don't worry, the CLI is easy pretty easy to work with.

But really, what is it?

So if you download and install Terraform, well, you've already failed. Terraform doesn't have an installer. The downloads are just simple binaries that you execute. The Windows .exe (which is what I use) is a 16MB .zip file with a 16MB .exe file inside of it. You put it where you want and manage your PATH however you want so you can call it.

This .exe file executes scripts called HCL (this stands for HashiCorp Configuration Language). HCL is a custom markup language that appears very similar to JSON at first glance, but is a bit different. I believe you can even use JSON but we'll be using HCL in this series.

HCL scripts (I'm not sure if "scripts" is the proper term here but I'm gonna go with it - remember, I'm new to this tech, so I can make mistakes!) contain configuration information about dependencies, connections, and structure. Terraform uses these to make changes to your infrastructure.

The Basics of Terraform

HashiCorp has a pretty good Getting Started guide. Unfortunately, everything there focuses on AWS and not Azure. Feel free to use that guide as well, but I will additionally address some Azure concerns that you would otherwise have to piece together from other sources.

Providers

These are independently developed and maintained packages that provide functionality. You see, Terraform by itself doesn't know how to build infrastructure. It's just the engine to run these providers who know how to actually do stuff (sometimes it's a connector to do cloud things, sometimes it's an RNG).

There is a public repository of Providers that you can use, but there are also private third-party providers out there as well. The AzureRM (Azure Resource Manager) provider will be one of the most important ones we use during this series.

Resources

These are the things you're managing. A VM? A VNET? An IP? A password? A Key Vault? These are just a few examples of resources.

State

Consider a few examples:

If I have HCL that tries to create a VM and I run that twice, it would be nice for it to not create a second VM but instead realize that the VM it's managing already exists and doesn't try again. Idempotency is good for our sanity!
If I have HCL create a VM then I edit that HCL to also put a NIC on that VM, it would be nice for Terraform to realize on the next execution that the VM exists but is missing a NIC that needs to be added.
If I do the above and commit my HCL to git before somebody else (on a different dev workstation) takes that HCL and removes the NIC, it would be nice for Terraform on the next execution to be aware that the HCL previously was managing a NIC that has since been deleted from the HCL, so it knows it needs to also remove the NIC from the VM in Azure.

All of these things require some level of knowledge of the state of the infrastructure, the history of the state, all while coordinating across multiple machines. Terraform's State information allows us to do this. In a best-practice scenario, this state is centrally stored, locked, synced, and versioned. Once you add version control, CI/CD servers, and multiple environments based on your company's topology (Prod/Staging/QA/Test/Dev/etc), state management becomes a critical piece of the puzzle!

Input Variables

If you're a software developer, you can probably guess what these are by the name. These are used throughout the execution of the HCL and only actually affect infrastructure if the HCL chooses to use a variable to do so. They can be used for passing configuration/information around and for abstracting things to make them reusable (just like a function or constructor has in most coding languages).

Data Sources

These are like "unmanaged Resources". They can represent things in your infrastructure but your HCL does not manage these things. Instead, they can query the state of certain infrastructure things and store that state in a Data Source.

For example, if I'm provisioning a VM and putting it on an existing network, my VM and NIC would be examples of resources but I might need to query an existing VNET to get some DNS and subnet information to configure my VM and NIC to use.

Output Variables

If you were able to guess what an input variable is, you can also guess what an output variable is. You put information into these and it gets passed out for other things to use. Sometimes they're very important for functional purposes (i.e. an RNG needs to output the generated number) and other times they're just informational and not terribly important most of the time (i.e. something might output the duration it took to finish provisioning some infrastructure component).

Other stuff

We'll ignore most of the other things for now. We'll introduce the more advanced topics (such as workspaces, modules, and more) once we've mastered these basics.

In the next edition of this series, we'll learn the following things:

How to authenticate and connect to Azure.
How to build up a Key Vault and SQL database, using a random SA password stored securely in the Key Vault.
How to tear down that server, database, and vault.

Sources: