Ana Carmiña Mendoza
Posted on April 23, 2023
How amazing would be to provision an application that whenever it is no longer in use, it will destroy itself automatically?! 🤯
Si prefieres leer este artículo en español, ¡haz click aquí!
In this post, I will go through some use cases, the architecture diagram and every service needed for a self-destructing infrastructure in AWS. The fun part is that you can add any additional resource you need to the architecture!
Motivation
Sometimes, developers would provision a lot of services to test their applications and then forget to delete them, incurring extra expenses.
In an internal initiative along with a great colleague, I was asked by my managers to find a creative solution this problem. So my peer focused on configuring the application automatically on an EC2 Instance, while I was working on building an infrastructure that could delete itself.
In which scenarios would I need this?
Lets say you want to create a development environment to test new code. With this architecture you can create and delete environments on demand, reducing costs and even increasing efficiency.
Another example would be to use it on event-based workloads, such as conferences or workshops. To host these events you might need servers and storage, but for a very short time. By using a self-destructive architecture you could easily delete everything after the event, reducing complexity.
What about an application for disaster recovery? In the event of an outage you could transfer your traffic to this infrastructure. Once the original is restored and you route the traffic back, this environment would be automatically deleted.
How does it work?
This architecture is provisioned with AWS CloudFormation, a tool to provision workloads using infrastructure as code. The great thing about this, is that by having all resources and dependencies defined on a template and deployed in a stack, it becomes really easy to delete all of them, as a single unit.
Architecture diagram
Lets focus on the main components of the architecture: EC2 Instance, the CloudWatch alarm, EventBridge Rule, the Lambda Function and the respective IAM Role, IAM Policy and Lambda Permission. There are other trivial resources needed like a security group, instance role and instance profile. These are not going to be covered here.
The CloudFormation stack will create the following architecture:
Here is the architecture in motion:
Now, lets dive deep into each one of the elements of the architecture! 🤓
WebServer Instance
First and foremost, we need an application. This will be provisioned on an EC2 Instance. You can configure this as you want. In my case, I configure the application using the UserData section.
This is the definition of the resource:
"WebServerInstance": {
"Type" : "AWS::EC2::Instance",
"Properties": {
"ImageId" : "ami-0ab4d1e9cf9a1215a",
"InstanceType" : "t3.small",
"KeyName" : "YOUR_KEY_PAIR",
"IamInstanceProfile" : "YOUR_INSTANCE_PROFILE",
"BlockDeviceMappings" : [
{
"DeviceName" : "/dev/xvda",
"Ebs" : {
"VolumeType" : "gp2",
"VolumeSize" : "25",
"Encrypted" : "true",
"KmsKeyId" : "YOUR_KMS_KEY",
"DeleteOnTermination" : "true"
}
}],
"NetworkInterfaces" : [{
"AssociatePublicIpAddress" : "true",
"DeleteOnTermination" : "true",
"SubnetId" : "YOUR_SUBNET_ID",
"GroupSet" : ["YOUR_SECURITY_GROUP"],
"DeviceIndex" : 0
}],
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash\n",
"SOME_CONFIGURATION_FOR_YOUR_APP"
]]}}
}
}
Inactivity Alarm
Based on the CPU utilization metric, we can know if the application is still being used.
The alarm is programmed for this: once the maximum CPU utilization of the instance is below 12% for 1 hour, the alarm will stop it.
✏️Note: The threshold value for the CPU Utilization must be define according to your application. In my case, the application I deployed on the EC2 instance was a Splunk dashboard, so setting that threshold was my best option.
Here’s the definition of the Alarm:
"MyEC2Alarm": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmDescription": "Alarm to stop Instance",
"AlarmName": "Inactivity Alarm",
"AlarmActions":
[ "arn:aws:automate:us-east-1:ec2:stop" ],
"MetricName": "CPUUtilization",
"Namespace": "AWS/EC2",
"Statistic": "Maximum",
"Period": "1800",
"Threshold": "3",
"ComparisonOperator": "LessThanOrEqualToThreshold",
"EvaluationPeriods": "2",
"Dimensions": [
{
"Name": "InstanceId",
"Value": { "Ref" : "WebServerInstance" }
}
]
}
}
Event Rule
The EventBridge Rule will be waiting for your application to stop, so it can then perform an action. The action will be to trigger a Lambda Function that contains the code to delete the CloudFormation Stack.
"EventRule": {
"DependsOn": ["ADLambda", "WebServerInstance"],
"Type": "AWS::Events::Rule",
"Properties": {
"Description": "EventRule for EC2 Stopping",
"EventPattern": {
"source": [
"aws.ec2"
],
"detail-type": [
"EC2 Instance State-change Notification"
],
"detail": {
"state": [
"stopped"
],
"instance-id": [{
"Ref": "WebServerInstance"
}]
}
},
"State": "ENABLED",
"Targets": [{
"Arn": {"Fn::GetAtt": ["ADLambda", "Arn"] },
"Id": "ADLambda"
}]
}
}
Lambda Function
Once the function is triggered by the event rule, it will run a python script to delete the CloudFormation stack that created everything… I mean, how awesome is that?! 🤯
Here’s the definition of the Lambda:
"ADLambda": {
"Type": "AWS::Lambda::Function",
"Properties": {
"Handler": "index.handler",
"Role": {
"Fn::GetAtt": [
"LambdaExecutionRole",
"Arn"
]
},
"Code": {
"ZipFile": "import boto3 \nimport os \nimport json \nstack_name = os.environ['stackName'] \n\ndef delete_cfn(stack_name):\n try:\n cfn = boto3.resource('cloudformation')\n stack = cfn.Stack(stack_name)\n stack.delete()\n return \"SUCCESS\"\n except:\n return \"ERROR\" \ndef handler(event, context):\n print(\"Received event:\")\n print(json.dumps(event))\n return delete_cfn(stack_name)"
},
"Environment": {
"Variables": {
"stackName": {
"Ref" : "AWS::StackName"
}
}
},
"Runtime": "python3.9"
}
}
Python Code
The one that’s on the “ZipFile” line on the previous section.
import boto3
import os
import json
stack_name = os.environ['stackName']
def delete_cfn(stack_name):
try:
cfn = boto3.resource('cloudformation')
stack = cfn.Stack(stack_name)
stack.delete()
return "SUCCESS"
except:
return "ERROR"
def handler(event, context):
print("Received event:")
print(json.dumps(event))
return delete_cfn(stack_name)
For the Lambda Function to be able to work, we need a role, policy and permission resource. The IAM Role and Policy will allow the function to delete the stack. On the other hand, the Lambda Permission will grant the EventBridge Rule to invoke the function.
Lambda Execution Role
The one that will perform and allow the Lambda Function to delete all resources from the stack, just as the policy states.
"LambdaExecutionRole": {
"Type": "AWS::IAM::Role",
"DeletionPolicy": "Retain",
"Properties": {
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": ["lambda.amazonaws.com"]
},
"Action": ["sts:AssumeRole"]
}
]
},
"Path": "/"
}
}
Lambda Execution Policy
Here’s the policy with the permissions to delete every resource that the stack provisioned.
✏️Note: If you deploy any other resources within the stack, don’t forget to add the permissions to the policy.
"LambdaExecutionPolicy": {
"Type": "AWS::IAM::Policy",
"DeletionPolicy": "Retain",
"Properties": {
"PolicyName": "autodestruction-policy",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["logs:"],
"Resource": "arn:aws:logs:::"
},
{
"Effect": "Allow",
"Action": [ "cloudformation:DeleteStack" ],
"Resource": {
"Ref": "AWS::StackId"
}},
{
"Effect": "Allow",
"Action": [ "lambda:DeleteFunction" ],
"Resource": ""
},
{
"Effect": "Allow",
"Action": [ "events:RemoveTargets" ],
"Resource": ""
},
{
"Effect": "Allow",
"Action": [ "events:DeleteRule" ],
"Resource": ""
},
{
"Effect": "Allow",
"Action": [ "lambda:RemovePermission" ],
"Resource": ""
},
{
"Effect": "Allow",
"Action": ["iam:DeleteRolePolicy","iam:DeleteRole"],
"Resource": ""
},
{
"Effect": "Allow",
"Action": [ "ec2:TerminateInstances" ],
"Resource": [{ "Fn::Join": ["", [
"arn:aws:ec2:",{"Ref": "AWS::Region"},":",
{"Ref": "AWS::AccountId"}, ":instance/",
{"Ref": "WebServerInstance"}]]}]
},
{
"Effect": "Allow",
"Action": [ "iam:DeleteRolePolicy" ],
"Resource": ""
},
{
"Effect": "Allow",
"Action": [ "cloudwatch:DeleteAlarms" ],
"Resource": [{"Fn::GetAtt" : ["MyEC2Alarm","Arn"]}]
}
]
},
"Roles": [{
"Ref" : "LambdaExecutionRole"
}]
}
}
Lambda Permission
The resource that will allow EventBridge rule to invoke the function. ⚡
"PermissionForADLambda": {
"Type": "AWS::Lambda::Permission",
"Properties": {
"FunctionName": {
"Ref": "ADLambda"
},
"Action": "lambda:InvokeFunction",
"Principal": "events.amazonaws.com",
"SourceArn": {
"Fn::GetAtt": [
"EventRule",
"Arn"
]
}
}
}
Now here comes the interesting part…
The Lambda Function cannot delete the whole stack because it will be deleting the Lambda Policy (contains the permissions of what can be deleted) and Lambda Role (who is going to perform those policies). If we delete them both, then how could we even finish the task? It cannot delete itself and then continue doing a task that was told.
Even if we set up some dependencies to alter the order of deletion, it still gets to a point where it should delete those resources before the complete stack. That is why these two special resources will be left out of the destruction with a “Retain” Deletion Policy.
I know what you are thinking… “Ana, this is no longer a self-destructive architecture 🤔”. Well this was the closest I could get! And the good thing about this, is that roles and policies do not incur costs. So you’re still saving!
💡Fun fact: I was stuck for a while trying to figure this out, until I went to the AWS Summit Mexico City and I explained this architecture to an AWS Architect (on the Ask the Expert lounge). He was actually the one that enlightened me with the retention solution!
This is how you should see your CloudFormation page once everything is created:
Now, the only thing left to do is to stop using the application and wait… ⏰
So go ahead and provision your stack, open your application and then stop using it. The CPU utilization will drop and eventually start deleting itself. Believe me, it is a great feeling to see how it automatically gets deleted. 🥲
Conclusion
Building an automatic self-destructive architecture its a solution that you can implement in your temporary projects to save costs, increase application efficiency, reduce complexity and even recover from disaster or outage.
I invite you to test it out, break it and come with new solutions around it. I would love to hear any feedback or improvements you might find! 🔍
👩💻Lets keep building!
Posted on April 23, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.