Setting Up a Production-Ready Kubernetes Cluster with RKE2 in vSphere Using Terraform
alessskeno
Posted on November 29, 2024
Setting up a robust Kubernetes cluster in a production environment is no small feat. In this article, I’ll walk you through my journey of deploying an RKE2 (Rancher Kubernetes Engine 2) cluster in a vSphere environment using a custom Terraform module. This guide will include configurations, explanations, and a peek into the setup process with screenshots.
Why RKE2 on vSphere?
RKE2 provides a lightweight yet powerful Kubernetes distribution ideal for secure production workloads. When combined with vSphere’s virtualization and Terraform’s Infrastructure-as-Code capabilities, you can achieve a flexible, scalable, and automated deployment process.
Tools and Technologies
Terraform: Automates the provisioning of infrastructure.
RKE2: Kubernetes distribution optimized for production environments.
vSphere: Virtualization platform for deploying VMs.
Ansible: Used for post-deployment configuration.
Prerequisites:
Python 3.x
Ansible, Ansible-Core
sshpass and whois (for password management)
Terraform Module Overview
I built a reusable Terraform module to standardize and automate the Kubernetes cluster provisioning. Here's an overview of the main.tf configuration file that calls the module:
Core Module Features
Multi-AZ Clusters: Enables highly available clusters with master and worker nodes spread across multiple availability zones.
Customizable Resources: Easily configure CPU, memory, and storage for master, worker, and storage nodes.
Built-in RKE2 Installation: Installs RKE2 with a choice of CNI plugins (canal, flannel, etc.).
Networking Configuration: Define Kubernetes service and cluster CIDRs.
Storage Options: Support for local storage and optional NFS integration.
Secure Communication: TLS certificates for domain and API access.
main.tf Breakdown
Here are the main aspects of the main.tf file:
Module Invocation
module "rke2_prod_cluster" {
source = "./modules/rke2-provisioner"
env = "prod" # Environment name
domain = var.domain # Domain name
multi_az = true # If you want to create multi-az cluster
install_rke2 = true # Install RKE2
lh_storage = true # Local storage for worker nodes
hashed_pass = var.hashed_pass # Hashed password for user creation
cluster_cidr = var.cluster_cidr # Kubernetes cluster CIDR
service_cidr = var.service_cidr # Kubernetes service CIDR
nfs_enabled = false # Change to true if you want to enable nfs server
update_apt = false # Update apt packages by changing to true
rke2_token = var.rke2_token
rke2_version = "v1.30.5+rke2r1"
rke2_cni = "canal" # Alternatives: flannel, calico, cilium
kubevip_range_global = join("-", [cidrhost(var.vm_cidr_az1, 50)], [cidrhost(var.vm_cidr_az1, 60)]) # Global IP range for LoadBalancer IPs
kubevip_alb_cidr = "${cidrhost(var.vm_cidr_az1, 20)}/32" # IP for Nginx Ingress Controller Service
rke2_api_endpoint = cidrhost(var.vm_cidr_az1, 10) # API Server IP
ansible_password = var.ansible_password # Ansible user password
domain_crt = var.domain_crt # Domain certificate
domain_key = var.domain_key # Domain key
domain_root_crt = var.domain_root_crt # Root certificate
master_node_count = var.master_node_count_prod
worker_node_count = var.worker_node_count_prod
storage_node_count = var.storage_node_count_prod
# Resources
worker_node_cpus = 8
worker_node_memory = 8192
worker_node_disk_size = 100
master_node_cpus = 8
master_node_memory = 8192
master_node_disk_size = 50
storage_node_disk_size = 100
nfs_node_disk_size = 50
# AZ1
master_ip_range_az1 = [for i in range(61, 69) : cidrhost(local.vm_cidr_az1, i)] # Master node IP range
worker_ip_range_az1 = [for i in range(71, 79) : cidrhost(local.vm_cidr_az1, i)] # Worker node IP range
vsphere_datacenter_az1 = var.vsphere_datacenter_az1 # vSphere datacenter name
vsphere_host_az1 = var.vsphere_host_az1 # vSphere host name
vsphere_resource_pool_az1 = var.vsphere_resource_pool_az1 # vSphere resource pool name
vsphere_datastore_az1 = var.vsphere_datastore_az1 # vSphere datastore name
vsphere_network_name_az1 = var.vsphere_network_name_az1 # vSphere network name
vm_gw_ip_az1 = local.vm_gw_ip_az1 # Gateway IP
nfs_ip_az1 = cidrhost(local.vm_cidr_az1, 70) # NFS server IP
# AZ3
master_ip_range_az3 = [for i in range(81, 89) : cidrhost(local.vm_cidr_az3, i)]
worker_ip_range_az3 = [for i in range(91, 99) : cidrhost(local.vm_cidr_az3, i)]
vsphere_datacenter_az3 = var.vsphere_datacenter_az3
vsphere_host_az3 = var.vsphere_host_az3
vsphere_resource_pool_az3 = var.vsphere_resource_pool_az3
vsphere_datastore_az3 = var.vsphere_datastore_az3
vsphere_network_name_az3 = var.vsphere_network_name_az3
vm_gw_ip_az3 = local.vm_gw_ip_az3
}
Deployment Walkthrough
Step 1: Initialize Terraform
Run the following commands to initialize Terraform and apply the configuration:
terraform init
terraform plan
terraform apply
Step 2: Verify Resources in vSphere
Confirm the VMs are provisioned in vSphere.
Ensure the network configurations (IP, gateway) match the Terraform parameters.
Step 3: Validate Kubernetes Cluster
After deployment:
SSH into one of the master nodes.
Run kubectl get nodes to ensure all nodes are registered and ready.
Screenshots of the Process
Terraform Apply Output
vSphere Dashboard
Kubernetes Terminal
Lessons Learned
Key Challenges
Configuring multi-AZ setups required precise IP allocation and resource planning.
Ensuring compatibility between Terraform, vSphere, and RKE2 versions.
Tips for Success
Automate Certificate Management: Pre-generate and verify certificates for secure communication.
Test Locally: Run initial setups in a test environment to validate module behavior.
Optimize Resource Allocation: Tailor resource parameters to your workload needs.
Conclusion
Using Terraform and vSphere to deploy an RKE2 Kubernetes cluster offers a highly customizable and scalable solution for production environments. By modularizing the Terraform configuration, this setup can be reused and extended for other environments with minimal changes.
If you've followed along or have feedback, share your experience in the comments below. Checkout the code repository from my GitHub profile. Let's discuss Kubernetes automation at scale!
Posted on November 29, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024