End-to-End Deployment and Monitoring of EKS and Flask Apps with Terraform, GitHub Actions, Helm, Prometheus and Grafana
Oloruntobi Olurombi
Posted on September 14, 2024
In the realm of cloud-native applications, automation plays a vital role in simplifying deployments and optimising infrastructure management. In this article, we will delve into how to automate the provisioning of an Amazon EKS cluster and deploy a Flask application using Terraform and GitHub Actions. Along the way, we'll cover essential security best practices to safeguard your environment, explore monitoring techniques to ensure system health, and discuss building a resilient CI/CD pipeline for continuous integration and deployment. This comprehensive approach will help you achieve efficient, secure, and scalable cloud-native application management.
Table of Contents
Overview
Prerequisites
Infrastructure Automation: EKS Cluster Setup
Application Deployment: Flask App on EKS
Monitoring with Prometheus and Grafana
Security Best Practices
Conclusion
Overview
The goal of this project is to automate the deployment of a containerised Flask application on an EKS (Elastic Kubernetes Service) cluster. Using Terraform to provision AWS resources and GitHub Actions to automate the CI/CD pipeline, this setup allows for seamless infrastructure management and application deployment.
Why Terraform?
Terraform enables you to write declarative code for infrastructure. Instead of manually creating resources like VPCs, subnets, or an EKS cluster, we automate everything via Infrastructure as Code (IaC).
Why GitHub Actions?
GitHub Actions provides a powerful way to integrate CI/CD, testing, static analysis, and security checks into the code deployment process.
Prerequisites
Before diving into the automation, here are the prerequisites you’ll need to get started:
AWS Account: Create an AWS account if you don’t have one.
IAM Access Keys: Set up access keys with permissions for managing EKS, EC2, and S3.
S3 Bucket: Create an S3 bucket to store your Terraform state files securely.
AWS CLI: Install and configure the AWS CLI.
Terraform: Make sure Terraform is installed on your local machine or use GitHub Actions for automation.
GitHub Secrets: Add AWS credentials (access keys, secret keys) and other sensitive data as GitHub secrets to avoid hardcoding them.
Synk: Create a Synk account and get your Token.
SonarCloud: Create a SonarCloud account and get your Token, Organisation key and Project key.
Infrastructure Automation: EKS Cluster Setup
Automating infrastructure deployment is key to maintaining scalable, consistent, and reliable environments. In this project, Terraform is utilised to automate the provisioning of an EKS cluster, its node groups, and the supporting AWS infrastructure. This includes VPC creation, IAM roles, S3 bucket setup, and cloud resources like CloudWatch and CloudTrail for logging and monitoring.
Terraform Setup
Let’s start by provisioning the necessary infrastructure. Below is the detailed explanation of the key resources defined in the Terraform files.
EKS Cluster and Node Group (main.tf):
This provision an EKS cluster and node group with IAM roles attached.
The cluster supports encryption using a KMS key, and the worker nodes are set up to scale between a minimum of 2 nodes. Outputs include the cluster name and endpoint for easy reference.
touch main.tf
terraform {
backend "s3" {
bucket = "regtech-iac"
key = "terraform.tfstate"
region = "us-east-1"
encrypt = true
}
}
# Provides an EKS Cluster
resource "aws_eks_cluster" "eks_cluster" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster_role.arn
version = "1.28"
vpc_config {
subnet_ids = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id, aws_subnet.public_subnet_3.id]
}
encryption_config {
provider {
key_arn = aws_kms_key.eks_encryption_key.arn
}
resources = ["secrets"]
}
# Ensure that IAM Role permissions are created before and deleted after EKS Cluster handling.
# Otherwise, EKS will not be able to properly delete EKS managed EC2 infrastructure such as Security Groups.
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy_attachment,
aws_iam_role_policy_attachment.eks_service_policy_attachment,
]
}
# Provides an EKS Node Group
resource "aws_eks_node_group" "eks_node_group" {
cluster_name = aws_eks_cluster.eks_cluster.name
node_group_name = var.node_group_name
node_role_arn = aws_iam_role.eks_node_group_role.arn
subnet_ids = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id, aws_subnet.public_subnet_3.id]
scaling_config {
desired_size = 2
max_size = 2
min_size = 2
}
update_config {
max_unavailable = 1
}
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy_attachment,
aws_iam_role_policy_attachment.eks_cni_policy_attachment,
aws_iam_role_policy_attachment.ec2_container_registry_readonly,
]
}
# Extra resources
resource "aws_ebs_volume" "volume_regtech"{
availability_zone = var.az_a
size = 40
encrypted = true
type = "gp2"
kms_key_id = aws_kms_key.ebs_encryption_key.arn
}
resource "aws_s3_bucket" "regtech_iac" {
bucket = var.bucket_name
}
resource "aws_s3_bucket_server_side_encryption_configuration" "regtech_iac_encrypt_config" {
bucket = aws_s3_bucket.regtech_iac.bucket
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.s3_encryption_key.arn
sse_algorithm = "aws:kms"
}
}
}
# OutPut Resources
output "endpoint" {
value = aws_eks_cluster.eks_cluster.endpoint
}
output "eks_cluster_name" {
value = aws_eks_cluster.eks_cluster.name
}
Networking (vpc.tf):
Defines a VPC, public subnets for the EKS cluster, and private subnets for other resources, ensuring flexibility in network architecture.
vpc.tf
# Provides a VPC resource
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr_block
instance_tenancy = "default"
tags = {
Name = var.tags_vpc
}
}
# Provides an VPC Public subnet resource
resource "aws_subnet" "public_subnet_1" {
vpc_id = aws_vpc.main.id
cidr_block = var.p_s_1_cidr_block
availability_zone = var.az_a
map_public_ip_on_launch = true
tags = {
Name = var.tags_public_subnet_1
}
}
resource "aws_subnet" "public_subnet_2" {
vpc_id = aws_vpc.main.id
cidr_block = var.p_s_2_cidr_block
availability_zone = var.az_b
map_public_ip_on_launch = true
tags = {
Name = var.tags_public_subnet_2
}
}
resource "aws_subnet" "public_subnet_3" {
vpc_id = aws_vpc.main.id
cidr_block = var.p_s_3_cidr_block
availability_zone = var.az_c
map_public_ip_on_launch = true
tags = {
Name = var.tags_public_subnet_3
}
}
# Provides an VPC Private subnet resource
resource "aws_subnet" "private_subnet_1" {
vpc_id = aws_vpc.main.id
cidr_block = var.private_s_1_cidr_block
availability_zone = var.az_private_a
map_public_ip_on_launch = false
tags = {
Name = var.tags_private_subnet_1
}
}
resource "aws_subnet" "private_subnet_2" {
vpc_id = aws_vpc.main.id
cidr_block = var.private_s_2_cidr_block
availability_zone = var.az_private_b
map_public_ip_on_launch = false
tags = {
Name = var.tags_private_subnet_2
}
}
resource "aws_subnet" "private_subnet_3" {
vpc_id = aws_vpc.main.id
cidr_block = var.private_s_3_cidr_block
availability_zone = var.az_private_c
map_public_ip_on_launch = false
tags = {
Name = var.tags_private_subnet_3
}
}
IAM Roles (iam.tf):
IAM roles and policies for the EKS cluster, node groups, and autoscaler. Includes roles for security services like CloudWatch and CloudTrail, ensuring robust monitoring.
iam.tf
# Declare the aws_caller_identity data source
data "aws_caller_identity" "current" {}
# IAM Role for EKS Cluster Plane
resource "aws_iam_role" "eks_cluster_role" {
name = var.eks_cluster_role_name
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy_attachment" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster_role.name
}
resource "aws_iam_role_policy_attachment" "eks_service_policy_attachment" {
role = aws_iam_role.eks_cluster_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
}
# IAM Role for Worker node
resource "aws_iam_role" "eks_node_group_role" {
name = var.eks_node_group_role_name
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
})
}
resource "aws_iam_role_policy_attachment" "eks_worker_node_policy_attachment" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.eks_node_group_role.name
}
resource "aws_iam_role_policy_attachment" "eks_cni_policy_attachment" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.eks_node_group_role.name
}
resource "aws_iam_role_policy_attachment" "ec2_container_registry_readonly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks_node_group_role.name
}
resource "aws_iam_instance_profile" "eks_node_instance_profile" {
name = var.eks_node_group_profile
role = aws_iam_role.eks_node_group_role.name
}
# Policy For volume creation and attachment
resource "aws_iam_role_policy" "eks_node_group_volume_policy" {
name = var.eks_node_group_volume_policy_name
role = aws_iam_role.eks_node_group_role.name
policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVolumeStatus",
"ec2:CreateVolume",
"ec2:AttachVolume"
],
"Resource": "arn:aws:ec2:${var.region}:${data.aws_caller_identity.current.account_id}:volume/*"
}
]
})
}
# IAM Role for CloudWatch
resource "aws_iam_role" "cloudwatch_role" {
name = "cloudwatch_role_log"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "cloudwatch.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
})
}
resource "aws_iam_role_policy_attachment" "cloudwatch_policy_attachment" {
role = aws_iam_role.cloudwatch_role.name
policy_arn = "arn:aws:iam::aws:policy/CloudWatchLogsFullAccess"
}
# IAM Role for CloudTrail
resource "aws_iam_role" "cloudtrail_role" {
name = "cloudtrail_role_log"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "cloudtrail.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
})
}
resource "aws_iam_role_policy_attachment" "cloudtrail_policy_attachment" {
role = aws_iam_role.cloudtrail_role.name
policy_arn = "arn:aws:iam::aws:policy/AWSCloudTrail_FullAccess"
}
# KMS Key Policy for Encryption
resource "aws_kms_key" "ebs_encryption_key" {
description = "KMS key for EBS volume encryption"
}
resource "aws_kms_key" "s3_encryption_key" {
description = "KMS key for S3 bucket encryption"
}
resource "aws_kms_key" "eks_encryption_key" {
description = "KMS key for EKS secret encryption"
}
resource "aws_s3_bucket_policy" "regtech_iac_policy" {
bucket = aws_s3_bucket.regtech_iac.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "cloudtrail.amazonaws.com"
}
Action = "s3:GetBucketAcl"
Resource = "arn:aws:s3:::${aws_s3_bucket.regtech_iac.bucket}"
},
{
Effect = "Allow"
Principal = {
Service = "cloudtrail.amazonaws.com"
}
Action = "s3:PutObject"
Resource = "arn:aws:s3:::${aws_s3_bucket.regtech_iac.bucket}/AWSLogs/${data.aws_caller_identity.current.account_id}/*"
Condition = {
StringEquals = {
"s3:x-amz-acl" = "bucket-owner-full-control"
}
}
}
]
})
}
CloudWatch and Monitoring (cloudwatch.tf):
This provisions CloudWatch log groups, an SNS topic for alerts, and a CloudWatch alarm to monitor CPU utilisation. CloudTrail logs are configured to monitor S3 and management events.
touch cloudwatch.tf
resource "aws_cloudwatch_log_group" "eks_log_group" {
name = "/aws/eks/cluster-logs-regtech"
retention_in_days = 30
}
resource "aws_cloudtrail" "security_trail" {
name = "security-trail-log"
s3_bucket_name = aws_s3_bucket.regtech_iac.bucket
include_global_service_events = true
is_multi_region_trail = true
enable_log_file_validation = true
event_selector {
read_write_type = "All"
include_management_events = true
data_resource {
type = "AWS::S3::Object"
values = ["arn:aws:s3:::${aws_s3_bucket.regtech_iac.bucket}/"]
}
}
}
resource "aws_sns_topic" "alarm_topic" {
name = "high-cpu-alarm-topic"
}
resource "aws_sns_topic_subscription" "alarm_subscription" {
topic_arn = aws_sns_topic.alarm_topic.arn
protocol = "email"
endpoint = "oloruntobiolurombi@gmail.com"
}
resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
alarm_name = "high_cpu_usage"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "70"
alarm_actions = [
aws_sns_topic.alarm_topic.arn
]
}
AutoScaler IAM (iam-autoscaler.tf):
This will provision roles and policies for enabling the EKS Cluster Autoscaler are included, which will help in adjusting the number of worker nodes based on resource demands.
data "aws_iam_policy_document" "eks_cluster_autoscaler_assume_role_policy" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
effect = "Allow"
condition {
test = "StringEquals"
variable = "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub"
values = ["system:serviceaccount:kube-system:cluster-autoscaler"]
}
principals {
identifiers = [aws_iam_openid_connect_provider.eks.arn]
type = "Federated"
}
}
}
resource "aws_iam_role" "eks_cluster_autoscaler" {
assume_role_policy = data.aws_iam_policy_document.eks_cluster_autoscaler_assume_role_policy.json
name = "eks-cluster-autoscaler"
}
resource "aws_iam_policy" "eks_cluster_autoscaler" {
name = "eks-cluster-autoscaler"
policy = jsonencode({
Statement = [{
Action = [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
]
Effect = "Allow"
Resource = "*"
}]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_autoscaler_attach" {
role = aws_iam_role.eks_cluster_autoscaler.name
policy_arn = aws_iam_policy.eks_cluster_autoscaler.arn
}
output "eks_cluster_autoscaler_arn" {
value = aws_iam_role.eks_cluster_autoscaler.arn
}
Routing (security_groups.tf
)
This defines the security groups required for your infrastructure. Security groups act as virtual firewalls that control the inbound and outbound traffic to your resources.
touch security_groups.tf
# Provides a security group
resource "aws_security_group" "main_sg" {
name = "main_sg"
description = var.main_sg_description
vpc_id = aws_vpc.main.id
ingress {
description = "ssh access"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "Kubernetes API access"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = -1
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = var.tags_main_sg_eks
}
}
Variables (variables.tf
)
This file holds the variable definitions that are utilized throughout your Terraform configurations. These variables serve as configurable parameters, offering default values that can be modified or overridden based on specific requirements. By centralizing variable management, this approach ensures flexibility and reusability, allowing you to tailor infrastructure settings without altering the core configuration files. This is particularly useful when deploying infrastructure across different environments or making adjustments to scale resources.
touch variables.tf
variable "region" {
type = string
default = "us-east-1"
}
variable "bucket_name" {
type = string
default = "regtech-logs"
}
variable "aws_access_key_id" {
type = string
default = ""
}
variable "aws_secret_access_key" {
type = string
default = ""
}
variable "tags_vpc" {
type = string
default = "main-vpc-eks"
}
variable "tags_public_rt" {
type = string
default = "public-route-table"
}
variable "tags_igw" {
type = string
default = "internet-gateway"
}
variable "tags_public_subnet_1" {
type = string
default = "public-subnet-1"
}
variable "tags_public_subnet_2" {
type = string
default = "public-subnet-2"
}
variable "tags_public_subnet_3" {
type = string
default = "public-subnet-3"
}
variable "tags_private_subnet_1" {
type = string
default = "private-subnet-1"
}
variable "tags_private_subnet_2" {
type = string
default = "private-subnet-2"
}
variable "tags_private_subnet_3" {
type = string
default = "private-subnet-3"
}
variable "tags_main_sg_eks" {
type = string
default = "main-sg-eks"
}
variable "instance_type" {
type = string
default = "t2.micro"
}
variable "cluster_name" {
type = string
default = "EKSCluster"
}
variable "node_group_name" {
type = string
default = "SlaveNode"
}
variable "vpc_cidr_block" {
type = string
default = "10.0.0.0/16"
}
variable "p_s_1_cidr_block" {
type = string
default = "10.0.1.0/24"
}
variable "az_a" {
type = string
default = "us-east-1a"
}
variable "p_s_2_cidr_block" {
type = string
default = "10.0.2.0/24"
}
variable "az_b" {
type = string
default = "us-east-1b"
}
variable "p_s_3_cidr_block" {
type = string
default = "10.0.3.0/24"
}
variable "az_c" {
type = string
default = "us-east-1c"
}
variable "private_s_1_cidr_block" {
type = string
default = "10.0.4.0/24"
}
variable "az_private_a" {
type = string
default = "us-east-1c"
}
variable "private_s_2_cidr_block" {
type = string
default = "10.0.5.0/24"
}
variable "az_private_b" {
type = string
default = "us-east-1c"
}
variable "private_s_3_cidr_block" {
type = string
default = "10.0.6.0/24"
}
variable "az_private_c" {
type = string
default = "us-east-1c"
}
variable "main_sg_description" {
type = string
default = "Allow TLS inbound traffic and all outbound traffic"
}
variable "eks_node_group_profile" {
type = string
default = "eks-node-group-instance-profile_log"
}
variable "eks_cluster_role_name" {
type = string
default = "eksclusterrole_log"
}
variable "eks_node_group_role_name" {
type = string
default = "eks-node-group-role_log"
}
variable "eks_node_group_volume_policy_name" {
type = string
default = "eks-node-group-volume-policy"
}
variable "eks_describe_cluster_policy_name" {
type = string
default = "eks-describe-cluster-policy_log"
}
variable "tags_nat" {
type = string
default = "nat-gateway_eip"
}
variable "tags_k8s-nat" {
type = string
default = "k8s-nat"
}
Provider (provider.tf
)
This step is essential in any Terraform project, as it defines the provider configuration, specifying which cloud platform or service you'll be interacting with. In this case, the provider is AWS, and this configuration establishes the connection between Terraform and the AWS environment. It ensures that all infrastructure resources are provisioned and managed within the specified cloud platform. Properly setting up the provider is foundational to the entire Terraform workflow, enabling seamless communication with AWS services such as EC2, S3, and EKS. Without it, Terraform wouldn't know where to deploy or manage the infrastructure.
touch
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Configure the AWS Provider
provider "aws" {
region = var.region
access_key = var.aws_access_key_id
secret_key = var.aws_secret_access_key
}
IAM OpenID (oidc.tf
)
This component sets up an IAM OpenID Connect (OIDC) provider, which plays a critical role in enabling secure authentication and identity management for your AWS resources. By establishing this OIDC provider, your Kubernetes clusters can seamlessly integrate with AWS IAM, allowing you to manage permissions and roles for applications running within the cluster. This is particularly important for securely granting temporary, limited access to AWS services like S3 or DynamoDB, without the need for hardcoding credentials. The OIDC provider facilitates trust between AWS IAM and external identity providers, enabling scalable, secure access control across your infrastructure.
touch oidc.tf
data "tls_certificate" "eks" {
url = aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer
}
GitHub Actions Workflow setup: eks-setup.yaml
The eks-setup.yaml
file is designed to automate the process of deploying an EKS cluster on AWS. This workflow streamlines the entire infrastructure setup, eliminating manual intervention and ensuring consistency across deployments.
Purpose:
This workflow automates the provisioning of AWS infrastructure, focusing on setting up an Amazon EKS cluster. By leveraging Terraform within GitHub Actions, it ensures that your EKS cluster is deployed efficiently and consistently, aligning with infrastructure-as-code best practices.
Steps:
AWS Login: Configures the AWS credentials required for secure authentication, ensuring Terraform has the proper access to interact with AWS services.
Terraform Initialisation: Initialises Terraform by downloading and configuring the necessary provider plugins (such as AWS), setting up the working environment to handle the infrastructure resources.
Terraform Plan: Generates a detailed execution plan, outlining the changes Terraform will make to the infrastructure, without actually applying those changes yet. This step helps verify the proposed updates.
Terraform Apply: Executes the Terraform configuration, applying the planned changes and provisioning the EKS cluster along with any related resources. This fully automates the creation of the Kubernetes control plane, networking, and worker nodes in AWS.
This workflow is essential for ensuring a repeatable, scalable deployment process while maintaining the flexibility to adjust infrastructure configurations based on changing requirements.
name: Set up EKS with Terraform
on: push
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
EKS_CLUSTER_NAME: ${{ secrets.EKS_CLUSTER_NAME }}
jobs:
LogInToAWS:
runs-on: ubuntu-latest
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
TerraformInit:
runs-on: ubuntu-latest
needs: LogInToAWS
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Initialize Terraform
run: terraform init
TerraformPlan:
runs-on: ubuntu-latest
needs: TerraformInit
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Terraform Plan
run: terraform plan
TerraformApply:
runs-on: ubuntu-latest
needs: TerraformPlan
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Apply Terraform configuration
run: terraform apply -auto-approve
Essentially, the workflow carries out several key tasks to automate the deployment process:
First, it checks out the latest version of your code from the repository, ensuring that the most up-to-date changes are available.
Next, it configures the necessary AWS credentials to securely authenticate with your cloud environment, allowing the workflow to interact with AWS services.
Finally, it initialises Terraform, setting up the working environment, and applies the EKS cluster configuration, provisioning the infrastructure based on the defined Terraform scripts. This ensures the EKS cluster is deployed correctly, ready to manage Kubernetes resources efficiently.
Lastly, we need to configure our control machine, which has kubectl installed, to communicate with the EKS cluster. This involves updating the machine with the appropriate cluster name and AWS region. By doing so, kubectl can interact with the cluster, allowing us to manage resources such as pods, services, and deployments. Without this step, the control machine won’t be able to send commands to the cluster, which is crucial for managing and monitoring our Kubernetes environment effectively.
aws eks update-kubeconfig --region region-code --name my-cluster
With our infrastructure now fully provisioned and operational, the next step is to set up our Flask application and deploy it onto the newly created environment. This involves configuring the necessary application dependencies, setting up environment variables, and ensuring the app is containerised for deployment. Once everything is configured, we can seamlessly deploy the Flask app to our infrastructure, leveraging the scalability and reliability of the EKS cluster to manage the application. This deployment marks the transition from infrastructure setup to delivering a functional, production-ready application.
Flask App and Docker Setup:
This guide will walk you through setting up a basic Flask application, configuring the necessary files, and preparing it for testing. We’ll also cover how to set up a Python virtual environment to manage dependencies and keep your project isolated.
Step 1: Set Up the Flask Application
- Create a new directory for your application
Start by creating a dedicated directory for your Flask project. This keeps everything organised:
mkdir regtech-docker-app
cd regtech-docker-app
- (Optional) Set Up a Python Virtual Environment
It’s highly recommended to use a Python virtual environment to isolate your project dependencies. This ensures that packages installed for this project won’t affect other Python projects:
python3 -m venv venv
source venv/bin/activate
Once the virtual environment is active, you’ll see (venv)
before your terminal prompt.
- Install Flask
Next, you need to install Flask. Before doing so, create your main application file:
touch app.py
Then install Flask using pip
:
pip install Flask
- Create the Flask Application
Now, let's populate the app.py
file with a basic Flask app. This app includes CSRF protection and loads configurations from a separate config.py
file:
from flask import Flask
from flask_wtf.csrf import CSRFProtect
from config import Config
app = Flask(__name__)
app.config.from_object(Config)
csrf = CSRFProtect(app)
@app.route('/')
def hello():
return "Hello, Welcome to Zip Reg Tech!"
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
This sets up a simple route (/
) that returns a greeting message when accessed.
- Create a
requirements.txt
File
To ensure all dependencies are properly documented, generate a requirements.txt file. This file will list all installed packages, including Flask:
pip freeze > requirements.txt
Now add the following dependencies:
blinker==1.8.2
click==8.1.7
Flask==3.0.3
itsdangerous==2.2.0
Jinja2==3.1.4
MarkupSafe==2.1.5
Werkzeug==3.0.4
pytest
requests
Flask-WTF
This step ensures that anyone who clones your repository can easily install all the required dependencies using pip install -r requirements.txt
.
- Create the Configuration File
You’ll need a configuration file to manage environment-specific settings like the app’s secret key. Let’s create a config.py file:
touch config.py
Populate it with the following code:
# config.py
import os
import secrets
class Config:
SECRET_KEY = os.getenv('SECRET_KEY', secrets.token_urlsafe(32))
This file uses the environment variable SECRET_KEY if available; otherwise, it generates a secure token on the fly. You can set the environment variable in your Docker container or deployment environment.
- Write Unit Tests for the Flask App
To ensure the application works as expected, let’s add some unit tests. Create a test_app.py
file:
touch test_app.py
Inside the file, add the following code to test the root endpoint (/
):
from app import app
def test_hello():
with app.test_client() as client:
response = client.get('/')
assert response.data == b"Hello, Welcome to Zip Reg Tech!"
assert response.status_code == 200
This test simulates an HTTP request to the Flask app and verifies that the correct response is returned.
- Create Integration Tests
To simulate real-world conditions, we’ll write an integration test that starts the Flask app, makes a request to it, and verifies the response. Create a test_integration.py
file:
touch test_integration.py
Now add the following code for integration testing:
import requests
from app import app
import multiprocessing
import time
# Run Flask app in a separate process for integration testing
def run_app():
app.run(host="0.0.0.0", port=5000)
def test_integration():
# Start the Flask app in a background process
p = multiprocessing.Process(target=run_app)
p.start()
# Give the app a moment to start up
time.sleep(2)
# Make an HTTP request to the running Flask app
response = requests.get('http://localhost:5000/')
# Check that the response is as expected
assert response.status_code == 200
assert response.text == "Hello, Welcome to Zip Reg Tech!"
# Terminate the Flask app process
p.terminate()
This integration test launches the Flask app in a separate process and makes an HTTP request to the /
route. It then verifies the status code and response body before terminating the app.
By following these steps, you have a fully functional Flask app with unit and integration tests. The next step will be setting up Docker for containerisation, ensuring your app is ready for deployment in any environment.
Step 2: Set Up Docker and Push the Image to Amazon Elastic Container Registry (ECR)
In this section, we'll create a Docker image for our Flask app, configure the necessary Docker files, and then push the image to Amazon Elastic Container Registry (ECR) for deployment.
- Create the Dockerfile
A Dockerfile
is a script that defines how your Docker image is built. In the same directory as your Flask application, create a new file named Dockerfile
and add the following content:
# Use an official Python runtime as a base image
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Make port 5000 available to the world outside this container
EXPOSE 5000
# Define environment variable to prevent Python from buffering stdout/stderr
ENV PYTHONUNBUFFERED=1
# Run the application
CMD ["python", "app.py"]
Base Image: We are using
python:3.9-slim
as our base image. This is a lightweight version of Python that reduces the size of the final Docker image.Working Directory: The
WORKDIR /app
command sets the working directory for subsequent instructions.Copying Files: The
COPY . /app
command copies the current directory’s contents (including your Flask app and config files) into the container.Install Dependencies: The
RUN pip install --no-cache-dir -r requirements.txt
installs all the necessary dependencies as specified in therequirements.txt
file.Expose Port: The
EXPOSE 5000
command allows the Flask app to communicate over port 5000 from inside the container.Set Environment Variable:
ENV PYTHONUNBUFFERED=1
ensures that Flask logs are outputted in real time rather than being buffered.Run the App: The
CMD
directive specifies thatapp.py
will be executed when the container starts.
- Create the
.dockerignore
File
To keep your Docker image clean and avoid copying unnecessary files into the container, create a .dockerignore
file in your project’s root directory. This file works similarly to .gitignore
, telling Docker which files to exclude from the build context.
Create the .dockerignore
file with the following content:
venv/
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
venv/
: Prevents your local Python virtual environment from being included in the Docker image.
__pycache__/
: Excludes Python cache files generated during development.
File extensions
: Ignoring Python compiled files like.pyc
,.pyo
, and.pyd
.
.Python
: Excludes the Python build files.
- Build and Test the Docker Image Locally
To make sure everything works, you can build and run your Docker image locally. In your project directory, run the following commands:
docker build -t regtech-docker-app .
docker run -p 5100:5000 regtech-docker-app
docker build
: This command creates the Docker image, tagging it asregtech-docker-app
.
docker run
: Runs the Docker container on port 5100, forwarding traffic from your machine to the container.
At this point, your Flask app should be running at http://localhost:5100.
- Create an Elastic Container Registry (ECR) Repository
Log in to the AWS Console:
Open the AWS Management Console and search for ECR (Elastic Container Registry).
Create a New Repository:
On the ECR landing page, click the Create repository button.
Give your repository a name (e.g., regtech-docker-app), then click Create.
Confirm Repository Creation: Once the repository is created, you’ll be redirected to a confirmation screen showing the repository details.
You’ll see the repository listed, confirming it was successfully created:
Setup Kubernetes Deployment and Service:
In this section, we will configure a Kubernetes Deployment and a NodePort Service for our Dockerized Flask application. This setup will ensure that your Flask app runs smoothly and is accessible from outside the Kubernetes cluster.
- Create a Kubernetes Deployment and Service
A Deployment will handle managing the Flask application, making sure the desired number of Pods are always running. The NodePort Service will expose the Flask application on a specific port that can be accessed externally from outside the cluster. It maps a port on the Kubernetes nodes to the port where the Flask app is running inside the Pods.
Create a file named deploy.yml
with the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: regtech-app-deployment
labels:
app: regtech-app
spec:
replicas: 1
selector:
matchLabels:
app: regtech-app
template:
metadata:
labels:
app: regtech-app
spec:
containers:
- name: regtech-app
image: REPOSITORY_TAG
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: regtech-app-service
spec:
type: NodePort
selector:
app: regtech-app
ports:
- protocol: TCP
port: 5000
targetPort: 5000
nodePort: 30030
Explanation of the Configuration
- Deployment:
Deploys a single replica of the Flask application.
Each Pod runs a container from your Docker image specified in REPOSITORY_TAG.
The container exposes port 5000, which is the port the Flask application listens on.
- Service:
Creates a NodePort Service that makes the Flask application accessible externally.
Maps port 5000 in the cluster to port 5000 on the Flask application container.
Exposes the application on NodePort 30030 (or a random port in the range 30000-32767 if not specified).
Deploying the Flask Application
The deployment of the Flask application is managed through a CI/CD pipeline, which automates the process of building the Docker image and deploying it to the Kubernetes cluster.
Required GitHub Secrets
You'll need to configure the following secrets in your GitHub repository:
AWS_ACCESS_KEY_ID: Your AWS access key.
AWS_SECRET_ACCESS_KEY: Your AWS secret access key.
AWS_REGION: The AWS Region where your cluster resides.
EKS_CLUSTER_NAME: Your EKS Cluster name.
NEW_GITHUB_TOKEN: Your github access token
ORGANIZATION_KEY: Your Sonarcloud organisation key
PROJECT_KEY: Your Sonarcloud project key
SONAR_TOKEN: Your SonarQube token.
SONAR_HOST_URL: The URL of your SonarQube server (for SonarCloud, this can be https://sonarcloud.io).
SNYK_TOKEN: The API token from Snyk.
Adding Secrets to GitHub:
Navigate to your GitHub repository.
Go to Settings > Secrets > Actions.
Add the required secrets (SONAR_TOKEN, SONAR_HOST_URL, and SNYK_TOKEN).
CI/CD Pipeline Process (regtech-app.yaml)
Here’s a step-by-step breakdown of the CI/CD workflow:
- Linting and Static Analysis (SonarCloud):
- Ensures code quality and identifies potential issues.
- Unit and Integration Tests:
- Validates the functionality of your code before deployment.
- Security Scan (Snyk):
- Detects vulnerabilities in your code dependencies.
- Build Docker Image:
- Packages the Flask application into a Docker image.
- Push to Amazon ECR:
- Publishes the Docker image to Amazon Elastic Container Registry (ECR).
- Deploy to EKS:
- Deploys the Docker image to your EKS cluster using kubectl.
- Rollback:
- Automatically rolls back the deployment if it fails.
Here’s the workflow configuration for regtech-app.yaml
:
name: Deploy Flask App to EKS
on:
push:
branches:
- main
pull_request:
branches:
- main
env:
AWS_REGION: ${{ secrets.AWS_REGION }}
EKS_CLUSTER_NAME: ${{ secrets.EKS_CLUSTER_NAME }}
jobs:
# Step 1: Source Code Testing (Linting, Static Analysis, Unit Tests, Snyk Scan)
Lint-and-Static-Analysis:
name: Linting and Static Analysis (SonarQube)
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: SonarCloud Scan
uses: sonarsource/sonarcloud-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.NEW_GITHUB_TOKEN }}
#ORGANIZATION_KEY: ${{ secrets.ORGANIZATION_KEY }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
with:
args: >
-Dsonar.organization=${{ secrets.ORGANIZATION_KEY }}
-Dsonar.projectKey=${{ secrets.PROJECT_KEY }}
-Dsonar.exclusions=venv/**
-Dsonar.c.file.suffixes=-
-Dsonar.cpp.file.suffixes=-
-Dsonar.objc.file.suffixes=-
- name: Check SonarCloud Quality Gate
run: |
curl -u ${{ secrets.SONAR_TOKEN }} "https://sonarcloud.io/api/qualitygates/project_status?projectKey=${{ secrets.PROJECT_KEY }}" | grep '"status":"OK"' || exit 1
UnitAndIntegrationTests:
name: Unit and Integration Tests on Source Code
runs-on: ubuntu-latest
needs: Lint-and-Static-Analysis
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3'
- name: Check Python version
run: python --version
- name: Verify venv creation
run: ls -la venv/bin/
- name: Clean up and recreate virtual environment
run: |
rm -rf venv
python3 -m venv venv
- name: Create virtual environment
run: |
python3 -m venv venv
- name: Check Python executable path
run: |
which python3
- name: List directory contents
run: |
cd /home/runner/work/regtech_accessment_cicd
ls -la
- name: Install dependencies
run: |
cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
source venv/bin/activate
ls -la venv venv
pip3 install -r requirements.txt
- name: Run Unit Tests
run: |
cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
source venv/bin/activate
pytest test_app.py
- name: Run Integration tests
run: |
cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
source venv/bin/activate
pytest test_integration.py
SNYK-SCAN:
name: Dependency Scanning (Snyk)
runs-on: ubuntu-latest
needs: UnitAndIntegrationTests
steps:
- name: Checkout repository
uses: actions/checkout@master
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3'
- name: Check Python version
run: python --version
- name: Clean up and recreate virtual environment
run: |
rm -rf venv
python3 -m venv venv
- name: Create virtual environment
run: |
python3 -m venv venv
- name: Check Python executable path
run: |
which python3
- name: List directory contents
run: |
cd /home/runner/work/regtech_accessment_cicd
ls -la
- name: Install dependencies
run: |
cd /home/runner/work/regtech_accessment_cicd/regtech_accessment_cicd
source venv/bin/activate
ls -la venv venv
pip3 install -r requirements.txt
- name: Set up Snyk
uses: snyk/actions/python-3.10@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
# Step 2: Build Docker Image
BuildImage-and-Publish-To-ECR:
name: Build and Push Docker Image
runs-on: ubuntu-latest
needs: SNYK-SCAN
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Login to ECR
uses: docker/login-action@v3
with:
registry: 611512058022.dkr.ecr.us-east-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_ID }}
password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
region: ${{ secrets.AWS_REGION }}
- name: Build Image
run: |
docker build -t regtech-app .
docker tag regtech-app:latest 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}
docker push 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}
# Step 3: Docker Image Testing (Integration Tests Inside Container)
Integration-Tests:
name: Integration Tests on Docker Image
runs-on: ubuntu-latest
needs: BuildImage-and-Publish-To-ECR
steps:
- name: Login to ECR
uses: docker/login-action@v3
with:
registry: 611512058022.dkr.ecr.us-east-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_ID }}
password: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
region: ${{ secrets.AWS_REGION }}
- name: Pull Docker Image from ECR
run: |
docker pull 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}
- name: Run Integration Tests inside Docker Container
run: |
docker run --rm -v $(pwd):/results 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER} pytest --junitxml=/results/integration-test-results.xml
- name: List Files in Current Directory
run: |
ls -l
- name: Upload Integration Test Results
uses: actions/upload-artifact@v3
with:
name: integration-test-results
path: integration-test-results.xml
if-no-files-found: warn
# Step 4: Install Kubectl
Install-kubectl:
name: Install Kubectl on The Github Actions Runner
runs-on: ubuntu-latest
needs: Integration-Tests
steps:
- name: Checkout
run: |
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/OS_DISTRIBUTION/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
# Step 5: Deploy To EKS
Deploy-To-Cluster:
runs-on: ubuntu-latest
needs: Install-kubectl
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Download KubeConfig File
env:
KUBECONFIG: ${{ runner.temp }}/kubeconfig
run: |
aws eks update-kubeconfig --region ${{ secrets.AWS_REGION }} --name ${{ secrets.EKS_CLUSTER_NAME }} --kubeconfig $KUBECONFIG
echo "KUBECONFIG=$KUBECONFIG" >> $GITHUB_ENV
echo $KUBECONFIG
- name: Deploy to EKS
run: |
sed -i "s|image: REPOSITORY_TAG|image: 611512058022.dkr.ecr.us-east-1.amazonaws.com/regtech-app:${GITHUB_RUN_NUMBER}|g" ./deploy.yml
kubectl apply -f ./deploy.yml
- name: Check Deployment Status
id: check-status
run: |
kubectl rollout status deployment.apps/regtech-app-deployment || exit
- name: Rollback Deployment
if: failure()
run: |
echo "Deployment failed. Rolling back..."
kubectl rollout undo deployment.apps/regtech-app-deployment
To determine which node your deployment is running on, follow these steps:
- First, get the details of your running pods, including the node they are scheduled on, by running the following command:
kubectl get po -o wide
This command will provide detailed information about the pods, including their IP addresses, node assignments, and the container status.
- Next, identify the public IP address of the node where your pod is running. To do this, list all the nodes in your cluster:
kubectl get nodes -o wide
You'll see a list of the nodes in your Kubernetes cluster, along with their corresponding external IPs.
- Once you've identified the public IP of the node, copy it. You'll combine this public IP with the NodePort assigned to your Flask application to access it externally.
- Finally, construct the URL by combining the public IP of the node with the NodePort (e.g., 30030 from the example). Enter this into your browser like so:
http://<PUBLIC_IP>:<NODE_PORT>
Hit enter, and if everything is set up correctly, your Flask app will be live and accessible from the browser.
Monitoring with Prometheus and Grafana
After deploying your application, it's crucial to set up monitoring to track its performance, health, and any potential issues in real-time. In this guide, we’ll walk through setting up Prometheus for monitoring, leveraging Grafana for visualization later.
Prometheus
Prometheus is a powerful monitoring and alerting toolkit that collects and stores metrics from both your application and the Kubernetes cluster itself.
We will use Helm, the Kubernetes package manager, to deploy Prometheus. Follow the steps below:
- Create a namespace for Prometheus:
kubectl create namespace prometheus
- Add the Prometheus Helm chart repository:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
- Deploy Prometheus using Helm:
helm upgrade -i prometheus prometheus-community/prometheus --namespace prometheus --set alertmanager.persistence.storageClass="gp2" --set server.persistentVolume.storageClass="gp2"
This command installs or upgrades the Prometheus instance in the prometheus
namespace, setting the storage class for both Alertmanager and Prometheus Server's persistent volumes to gp2
(Amazon EBS General Purpose volumes).
- Verify the deployment:
kubectl get pods -n prometheus
At this point, some of your pods might be in a Pending state due to missing Amazon Elastic Block Store (EBS) volumes. This is because the Amazon EBS CSI (Container Storage Interface) driver is required to manage EBS volumes for Kubernetes. Let’s set up the driver.
Amazon EBS CSI Driver Setup
- Create an IAM OIDC identity provider for your cluster:
eksctl utils associate-iam-oidc-provider --cluster $cluster_name --approve
Ensure that eksctl
is installed on your control machine for this step.
- Create an IAM role for the Amazon EBS CSI plugin:
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster my-cluster \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--role-only \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve
- Add the Amazon EBS CSI driver as an EKS add-on:
- To check the required platform version:
aws eks describe-addon-versions --addon-name aws-ebs-csi-driver
- To install the EBS CSI driver add-on using
eksctl
:
eksctl create addon --name aws-ebs-csi-driver --cluster my-cluster --service-account-role-arn arn:aws:iam::111122223333:role/AmazonEKS_EBS_CSI_DriverRole --force
Replace my-cluster with your actual cluster name and with your AWS account number.
- Verify the installation of the EBS CSI driver:
eksctl get addon --name aws-ebs-csi-driver --cluster my-cluster
- Update the EBS CSI driver if needed.
eksctl update addon --name aws-ebs-csi-driver --version v1.11.4-eksbuild.1 --cluster my-cluster \
--service-account-role-arn arn:aws:iam::111122223333:role/AmazonEKS_EBS_CSI_DriverRole --force
- Re-check the pod status:
kubectl get pods -n prometheus
Your Prometheus pods should now be in the Running state.
- Label the Prometheus server pod:
To connect the Prometheus server pod with a service:
kubectl label pod <pod-name> app=prometheus
- Expose Prometheus via NodePort
Prometheus has a built-in web interface for accessing metrics. To access Prometheus externally, we will expose it via a NodePort service.
- Create a Prometheus service YAML file:
touch prometheus-service.yml
- Define the NodePort service configuration:
apiVersion: v1
kind: Service
metadata:
name: prometheus-nodeport
namespace: prometheus
spec:
selector:
app: prometheus
ports:
- name: web
port: 9090
targetPort: 9090
protocol: TCP
nodePort: 30000 # You can choose any available port on your nodes
type: NodePort
- Apply the service configuration:
kubectl apply -f prometheus-service.yml
- Access Prometheus:
Now you can access the Prometheus web UI by navigating to:
http://<NODE_PUBLIC_IP>:30000
Replace with the public IP address of any node in your cluster.
By following these steps, we now have a fully functioning Prometheus setup monitoring both our Kubernetes cluster and application metrics. Next, we will integrate Grafana to create rich, visual dashboards for real-time analysis of the collected metrics.
Grafana: Visualizing and Monitoring with Dashboard
A visualisation tool that integrates with Prometheus to create dashboards. It also allows you to set up alerts for events like high CPU usage, memory spikes, or pod failures in your EKS cluster.
Here’s how to deploy Grafana on your EKS cluster using Helm:
- Add the Grafana Helm repository:
First, you'll need to add the Grafana Helm chart repository to Helm:
helm repo add grafana https://grafana.github.io/helm-charts
- Create a Grafana namespace:
Set up a dedicated namespace for Grafana:
kubectl create namespace grafana
- Create a Grafana YAML configuration file:
Grafana requires a configuration file to connect it to Prometheus. Create a file called grafana.yml
with the following content:
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.prometheus.svc.cluster.local
access: proxy
isDefault: true
Ensure the Prometheus URL is correctly specified to point to the Prometheus service in your cluster.
- Deploy Grafana using Helm:
Use Helm to deploy Grafana into your Kubernetes cluster. You can set up persistent storage for Grafana dashboards, configure the admin password, and provide the path to your grafana.yml
file:
helm install grafana grafana/grafana \
--namespace grafana \
--set persistence.storageClassName="gp2" \
--set persistence.enabled=true \
--set adminPassword='EKS!sAWSome' \
--values /home/ec2-user/grafana.yaml \
--set service.type=NodePort
Replace the adminPassword with a strong password of your choice, and ensure the path to your grafana.yml
file is correct.
- Verify the deployment
Check if the Grafana pods are running successfully in the Grafana namespace:
kubectl get pods -n grafana
- Access Grafana:
Grafana will be exposed using a NodePort. You can access the Grafana UI using the public IP of any node in your cluster and the specified NodePort. For example:
nodeport:30281
Replace with the actual IP address of one of your nodes. The default NodePort is set to 30281, but it can vary based on your configuration.
- Login to Grafana:
Once you’ve accessed the Grafana web UI, log in using the credentials:
Username: admin
Password: The admin password you set during the Helm installation.
- Create a new dashboard:
After logging in, you can create your first dashboard. To make things easier, you can import a pre-built dashboard tailored for Kubernetes monitoring.
- Click "Create" → "Import" on the Grafana console.
- Import a pre-built Kubernetes dashboard:
On the "Find and Import Dashboards for Common Applications" section, input the dashboard ID 17119 and click "Load".
- Configure the data source:
Select Prometheus as the data source for the dashboard.
Click "Import" to load the dashboard.
- Configure the data source:
Select Prometheus as the data source for the dashboard.
Click "Import" to load the dashboard.
- View your dashboard:
After importing the dashboard, you can now visualize the performance of your Kubernetes cluster. The dashboard will display real-time metrics such as CPU and memory usage, pod status, and more.
By following these steps, we have a Grafana instance running on your EKS cluster, integrated with Prometheus to collect and visualise metrics.
Conclusion
By integrating Terraform and GitHub Actions, we fully automate the setup and management of our AWS infrastructure and Kubernetes-based application deployments. This setup ensures:
Scalability: You can easily scale your infrastructure to meet demand.
Efficiency: Automating deployments speeds up the process, reduces errors, and makes your development workflow smoother.
Security: Following security best practices protects your application and data.
☕️ If this article helped you avoid a tech meltdown or gave you a lightbulb moment, feel free to buy me a coffee! It keeps my code clean, my deployments smooth, and my spirit caffeinated. Help fuel the magic here!.
Happy deploying!
Posted on September 14, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 14, 2024