Setting Up a Production-Ready Kubernetes Cluster with RKE2 in vSphere Using Terraform

aleskerov

alessskeno

Posted on November 29, 2024

Setting Up a Production-Ready Kubernetes Cluster with RKE2 in vSphere Using Terraform

Setting up a robust Kubernetes cluster in a production environment is no small feat. In this article, I’ll walk you through my journey of deploying an RKE2 (Rancher Kubernetes Engine 2) cluster in a vSphere environment using a custom Terraform module. This guide will include configurations, explanations, and a peek into the setup process with screenshots.

Why RKE2 on vSphere?

RKE2 provides a lightweight yet powerful Kubernetes distribution ideal for secure production workloads. When combined with vSphere’s virtualization and Terraform’s Infrastructure-as-Code capabilities, you can achieve a flexible, scalable, and automated deployment process.

Tools and Technologies

Terraform: Automates the provisioning of infrastructure.
RKE2: Kubernetes distribution optimized for production environments.
vSphere: Virtualization platform for deploying VMs.
Ansible: Used for post-deployment configuration.
Prerequisites:
Python 3.x
Ansible, Ansible-Core
sshpass and whois (for password management)

Terraform Module Overview

I built a reusable Terraform module to standardize and automate the Kubernetes cluster provisioning. Here's an overview of the main.tf configuration file that calls the module:

Core Module Features

Multi-AZ Clusters: Enables highly available clusters with master and worker nodes spread across multiple availability zones.
Customizable Resources: Easily configure CPU, memory, and storage for master, worker, and storage nodes.
Built-in RKE2 Installation: Installs RKE2 with a choice of CNI plugins (canal, flannel, etc.).
Networking Configuration: Define Kubernetes service and cluster CIDRs.
Storage Options: Support for local storage and optional NFS integration.
Secure Communication: TLS certificates for domain and API access.


main.tf Breakdown

Here are the main aspects of the main.tf file:

Module Invocation

module "rke2_prod_cluster" {
  source       = "./modules/rke2-provisioner"
  env          = "prod" # Environment name
  domain       = var.domain # Domain name
  multi_az     = true # If you want to create multi-az cluster
  install_rke2 = true # Install RKE2
  lh_storage   = true # Local storage for worker nodes
  hashed_pass  = var.hashed_pass # Hashed password for user creation
  cluster_cidr = var.cluster_cidr # Kubernetes cluster CIDR
  service_cidr = var.service_cidr # Kubernetes service CIDR
  nfs_enabled  = false # Change to true if you want to enable nfs server
  update_apt   = false # Update apt packages by changing to true
  rke2_token   = var.rke2_token
  rke2_version = "v1.30.5+rke2r1"
  rke2_cni     = "canal" # Alternatives: flannel, calico, cilium
  kubevip_range_global = join("-", [cidrhost(var.vm_cidr_az1, 50)], [cidrhost(var.vm_cidr_az1, 60)]) # Global IP range for LoadBalancer IPs
  kubevip_alb_cidr          = "${cidrhost(var.vm_cidr_az1, 20)}/32" # IP for Nginx Ingress Controller Service
  rke2_api_endpoint = cidrhost(var.vm_cidr_az1, 10) # API Server IP

  ansible_password  = var.ansible_password # Ansible user password
  domain_crt        = var.domain_crt # Domain certificate
  domain_key        = var.domain_key # Domain key
  domain_root_crt   = var.domain_root_crt # Root certificate
  master_node_count = var.master_node_count_prod
  worker_node_count = var.worker_node_count_prod
  storage_node_count = var.storage_node_count_prod

  # Resources
  worker_node_cpus      = 8
  worker_node_memory    = 8192
  worker_node_disk_size = 100

  master_node_cpus      = 8
  master_node_memory    = 8192
  master_node_disk_size = 50

  storage_node_disk_size = 100

  nfs_node_disk_size = 50

  # AZ1
  master_ip_range_az1       = [for i in range(61, 69) : cidrhost(local.vm_cidr_az1, i)] # Master node IP range
  worker_ip_range_az1       = [for i in range(71, 79) : cidrhost(local.vm_cidr_az1, i)] # Worker node IP range
  vsphere_datacenter_az1    = var.vsphere_datacenter_az1 # vSphere datacenter name
  vsphere_host_az1          = var.vsphere_host_az1 # vSphere host name
  vsphere_resource_pool_az1 = var.vsphere_resource_pool_az1 # vSphere resource pool name
  vsphere_datastore_az1     = var.vsphere_datastore_az1 # vSphere datastore name
  vsphere_network_name_az1  = var.vsphere_network_name_az1 # vSphere network name
  vm_gw_ip_az1              = local.vm_gw_ip_az1 # Gateway IP
  nfs_ip_az1 = cidrhost(local.vm_cidr_az1, 70) # NFS server IP

  # AZ3
  master_ip_range_az3       = [for i in range(81, 89) : cidrhost(local.vm_cidr_az3, i)]
  worker_ip_range_az3       = [for i in range(91, 99) : cidrhost(local.vm_cidr_az3, i)]
  vsphere_datacenter_az3    = var.vsphere_datacenter_az3
  vsphere_host_az3          = var.vsphere_host_az3
  vsphere_resource_pool_az3 = var.vsphere_resource_pool_az3
  vsphere_datastore_az3     = var.vsphere_datastore_az3
  vsphere_network_name_az3  = var.vsphere_network_name_az3
  vm_gw_ip_az3              = local.vm_gw_ip_az3
}
Enter fullscreen mode Exit fullscreen mode

Deployment Walkthrough

Step 1: Initialize Terraform

Run the following commands to initialize Terraform and apply the configuration:

terraform init
terraform plan
terraform apply
Enter fullscreen mode Exit fullscreen mode

Step 2: Verify Resources in vSphere

Confirm the VMs are provisioned in vSphere.
Ensure the network configurations (IP, gateway) match the Terraform parameters.

Step 3: Validate Kubernetes Cluster

After deployment:
SSH into one of the master nodes.
Run kubectl get nodes to ensure all nodes are registered and ready.


Screenshots of the Process

Terraform Apply Output

Image description

vSphere Dashboard

Image description

Image description

Kubernetes Terminal

Image description


Lessons Learned

Key Challenges

Configuring multi-AZ setups required precise IP allocation and resource planning.
Ensuring compatibility between Terraform, vSphere, and RKE2 versions.

Tips for Success

Automate Certificate Management: Pre-generate and verify certificates for secure communication.
Test Locally: Run initial setups in a test environment to validate module behavior.
Optimize Resource Allocation: Tailor resource parameters to your workload needs.


Conclusion

Using Terraform and vSphere to deploy an RKE2 Kubernetes cluster offers a highly customizable and scalable solution for production environments. By modularizing the Terraform configuration, this setup can be reused and extended for other environments with minimal changes.
If you've followed along or have feedback, share your experience in the comments below. Checkout the code repository from my GitHub profile. Let's discuss Kubernetes automation at scale!

💖 💪 🙅 🚩
aleskerov
alessskeno

Posted on November 29, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related