Getting Started with AWS and Terraform: Setting Up CloudWatch Alarms for CPU Utilization on AWS EC2 Instances with Terraform

chinmay13

Chinmay Tonape

Posted on January 26, 2024

Getting Started with AWS and Terraform: Setting Up CloudWatch Alarms for CPU Utilization on AWS EC2 Instances with Terraform

In a previous post, we walked through the process of hosting a Windows IIS web server on an AWS EC2 Windows instance using Terraform.
In this post, we will delve into the realm of AWS CloudWatch Alarms to monitor CPU utilization and set up email notifications. To achieve this, we will create Terraform modules to organize and manage separate components of our infrastructure.

Architecture Overview

Before diving into the setup, let's briefly revisit the architecture we'll be working with:

CloudWatch alarm with SNS Email Subscription

Step 1: Create VPC, Network Components, and EC2 Instances

Refer to our previous posts for detailed instructions on setting up a VPC and network components along with two Linux EC2 instances in separate Availability Zones (AZs). Ensure to enable monitoring on EC2 instances for one-minute interval monitoring. Note that continuous monitoring may lead to increased charges.

Step 2: Set Up SNS Topic for Email Notifications

Create a Simple Notification Service (SNS) topic and add a subscriber with a disposable email ID (numerous free online disposable email services are available).



####################################################
# Create an SNS topic with a email subscription
####################################################
resource "aws_sns_topic" "topic" {
  name = "WebServer-CPU_Utilization_alert"
}

resource "aws_sns_topic_subscription" "topic_email_subscription" {
  count     = length(var.email_address)
  topic_arn = aws_sns_topic.topic.arn
  protocol  = "email"
  endpoint  = var.email_address[count.index]
}


Enter fullscreen mode Exit fullscreen mode

Step 3: Create CloudWatch Metric Alarms

Create CloudWatch metric alarms for CPU utilization, individually for each EC2 instance. Set the alarm action as the Amazon Resource Name (ARN) of the SNS Topic. Key parameters such as metric name, threshold, statistic, and comparison are crucial for effective monitoring.



####################################################
# Create a cloudwatch alarm for EC2 instances and alarm_actions to SNS topic
####################################################
resource "aws_cloudwatch_metric_alarm" "ec2_cpu" {
  comparison_operator       = "GreaterThanOrEqualToThreshold"
  evaluation_periods        = "2"
  metric_name               = "CPUUtilization"
  namespace                 = "AWS/EC2"
  period                    = "60" #seconds
  statistic                 = "Average"
  threshold                 = "80"
  alarm_description         = "This metric monitors ec2 cpu utilization"
  treat_missing_data        = "notBreaching"
  insufficient_data_actions = []
  alarm_actions             = [aws_sns_topic.topic.arn]

  count      = length(module.web.instance_ids)
  alarm_name = "cpu-utilization-${element(module.web.instance_ids, count.index)}"
  dimensions = {
    InstanceId = element(module.web.instance_ids, count.index)
  }
}



Enter fullscreen mode Exit fullscreen mode

Running Terraform

Execute the following commands to automate the infrastructure setup:



terraform init
terraform plan -var-file=aws.tfvars
terraform apply -var-file=aws.tfvars -auto-approve


Enter fullscreen mode Exit fullscreen mode

Once the terrform apply completed successfully it will show following:



Apply complete! Resources: 14 added, 0 changed, 0 destroyed.


Enter fullscreen mode Exit fullscreen mode

Testing CloudWatch Alarms with SNS topic

Check that SNS topic and its subscription is confirmed:
AWS SNS Topic Creation

Confirm SNS Topic Subscription

SNS Topic Subscription Confimed

Once the Terraform apply process completes successfully, wait for some time to observe both alarms in an "OK" state, indicating that CPU utilization is within the threshold.

Running EC2 instances

Cloudwatch alarms with status OK

Metric Graph

Metric sometimes may go to "INSUFFICIENT_DATA" state because of multiple reasons. Refer to https://repost.aws/knowledge-center/cloudwatch-alarm-insufficient-data-state

To test the alarms, stress one EC2 instance by installing and starting the stress utility.



amazon-linux-extras install epel -y
sudo yum install stress -y
stress -c 1 --backoff 300000000 -t 30m


Enter fullscreen mode Exit fullscreen mode

After the monitoring interval, the CloudWatch alarm will trigger, and you will receive an email with detailed information.

Metric Graph shows high CPU utilization

Changed alarm state

SNS Email notification for threshold breach

I stressed 2nd EC2 instance also:

Stressing 2nd EC2 instance

Changed alarm state for 2nd EC2 instance

SNS Email notification for threshold breach

Both EC2 instances in alarm state:

EC2 instances in alarm

Cloudwatch alarm states

Cleanup

To prevent unnecessary costs, remember to stop AWS components by executing the following command:



terraform destroy -auto-approve


Enter fullscreen mode Exit fullscreen mode

Congratulations! We successfully deployed CloudWatch alarms and an SNS topic with email subscriptions to receive alarm details.

In our next module, we will explore application load balancers to enhance resilience and availability. Happy Coding!

Resources

GitHub Link: https://github.com/chinmayto/terraform-aws-linux-webserver-cloudwatch-sns
CloudWatch Documentation:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html
SNS Documentation:
https://docs.aws.amazon.com/sns/latest/dg/welcome.html

đź’– đź’Ş đź™… đźš©
chinmay13
Chinmay Tonape

Posted on January 26, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related