KEDA in Amazon EKS Part 2: Scale Based On AWS SQS Queue
Carlo Columna
Posted on July 1, 2023
Kia ora everyone!
Welcome to the second part of a series of articles discussing the use of KEDA (Kubernetes Event-driven Autoscaling) for the autoscaling of Amazon EKS workloads. If you haven't, check out the first part below to learn why and how to install KEDA in Amazon EKS.
KEDA in Amazon EKS Part 1: Why and How to Install KEDA
This part two of this blog series focuses on scaling based on AWS SQS Queue.
The structure of this blog would be as follows:
- Prerequisites
- Create a demo application
- Scale based on two identity owner models: Here we'll go deeper in explaining the two models which are not extensively covered in the KEDA documentation.
- Common errors
Prerequisites
Before we can proceed, let's lay out the prerequisites:
- AWS account
- Access to an Amazon EKS cluster
- IAM roles for service accounts (IRSA) setup for the EKS cluster. See here for more details.
- KEDA installed in the EKS cluster, see part 1
- Terminal with kubectl
Create a demo application
We'll use a demo application and set it up across different accounts to best illustrate the steps we'll go through. We will be using Terraform and Helm to deploy it.
Our demo application has the following requirements:
- It is a simple NGINX application deployed in an EKS cluster in Account A in Oregon.
- It has an IAM Role created in Account C in Canada Central. KEDA is deployed and running on the same EKS cluster using an IAM Role created in Account B in N. Virginia.
- It scales based on a queue in Account C in Canada Central.
To better understand the scenario, the following is the architecture diagram for KEDA and our demo application.
KEDA and Demo Application Architecture
Before we can install our demo application, we'll start by creating an IAM Role for it using Terraform, like so,
resource "aws_iam_role" "demo-app" {
provider = aws.account_c
name = "demo-app"
assume_role_policy = module.demo-app-trust-policy.json
}
For now, we are creating a role with no permissions. Later on, we'll update the role by granting it permission to access AWS resources.
We also set up the role's trust policy so our demo application can assume the role.
resource "aws_iam_role" "demo-app" { ... }
data "aws_iam_policy_document" "demo-app-trust-policy" {
statement {
actions = [
"sts:AssumeRoleWithWebIdentity"
]
principals {
type = "Federated"
identifiers = ["arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"]
}
condition {
test = "StringEquals"
variable = "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub"
values = ["system:serviceaccount:demo-ns:demo-app"]
}
}
}
Run terraform apply
.
Now, let's create the Helm manifest of our basic deployment using the NGINX image.
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: demo-app
name: demo-app
namespace: demo-ns
spec:
replicas: 1
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
serviceAccountName: demo-app-sa
containers:
- image: nginx
name: nginx
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: demo-app-sa
namespace: demo-ns
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/demo-app # demo-app IAM role ARN created from the previous step
This creates a deployment with a ServiceAccount. If you noticed, we have added the IAM role we created above as an annotation so our application can access AWS resources. Let's save this file as demo-app.yaml
.
To deploy, ideally, you package this manifest into a Helm chart or use kustomize. But to keep things short and simple in this blog, let's simply run a kubectl apply -f demo-app.yaml
.
Scaling with KEDA
We've come to the more exciting part of this blog, setting up and testing the scaling using AWS SQS Queue. Before we can start, there is one important security consideration we have to make.
There are two ways that KEDA can get permission to access the SQS Queue and scale a workload. This is determined by setting the identityOwner
to either pod
which is the default, or operator
.
-
pod: This sets the
keda-operator
to temporarily assume the role of the demo application. This means that the demo application role should have the necessary permissions to access the SQS Queue. We'll call this the Pod Identity Owner model. -
operator : Grant the
keda-operator
direct access to the AWS SQS queue. We'll call this the Operator Identity Owner model.
Note that in this blog, when talking about permissions and IAM, the KEDAs role is the same as the keda-operator
s IAM role.
What model should you use?
This depends on what works best in your setup. Let's summarise the pros and cons of each model.
Pod Identity Owner model
- Good, because we don't have to know beforehand what privileges we need to give KEDA as long as our application has already access to the event source.
- Good, in cases where KEDA cannot be given direct access to the event source
- Bad, because KEDA gets all the permission of all workload that uses this permission model. This probably includes permissions that are not required for scaling purposes (such as write privilege) and thus breaking the principle of least privilege.
- Bad, because it requires a bit more complicated setup
Operator Identity Owner model
- Good, because KEDA only gets the permission it needs to trigger scaling so it follows the principle of least privilege
- Good, because it's a simpler setup
- Bad, because it requires more initial understanding what are the privileges that we need to grant KEDA to access an events source
- Bad, in cases where the event source cannot be updated or set to be accessed by KEDA (e.g. if access policy cannot be set to the event source)
From a security and simplicity perspective, the Operator Identity Owner model is a better option. This is more so if you have an application that has access to sensitive information that you don't want keda-operator
to get access to. However, there might be cases where you are restricted to using the Pod Identity Owner model. In these cases, you just have to be aware that the keda-operator
gets all the permissions of the application role.
In this blog, we'll show how to scale each identity owner model starting with the default identityOwner: pod
.
Pod Identity Owner Model
For this model to work it's important to understand the requirements:
- We need to add an AssumeRole policy to the demo application roles trust policy to allow
keda-operator
to assume it - The demo application role has the permission to access the queue
- We need to add an AssumeRole policy to the
keda-operator
role so it knows what role to assume
Going back to our demo applications architecture and incorporating the use of the Pod Identity Owner model, the architecture will look like below.
KEDA and Demo Application: Pod Identity Owner Model
In this model, the demo application through its IAM role has access to the AWS SQS Queue. Meanwhile, KEDA can temporarily assume the demo applications role to get access to the same AWS SQS Queue to trigger scaling.
Here is the summary of the steps we'll go through.
- Create AWS SQS Queue
- Update the IAM Role of the demo application
- Update the IAM Role of KEDA
- Create scaler
- Test scaling
Steps
1. Create AWS SQS Queue
Using Terraform we can simply create a queue like so,
resource "aws_sqs_queue" "aws_sqs" {
provider = aws.account_c
name = "demo-app-sqs"
}
output "demo_app_sqs_url" {
value = aws_sqs_queue.aws_sqs.url
}
Run terraform apply
.
2. Update the IAM Role of the demo application
Now, let's update the role of our demo application by doing two things:
- Grant our demo application permission to access the queue
- Add a policy to allow KEDA to assume it
For the first one, let's add permission to access the queue created in the previous step like so,
resource "aws_iam_role" "demo-app" { ... }
data "aws_iam_policy_document" "demo-app-trust-policy" { ... }
data "aws_iam_policy_document" "sqs-policy-document" {
statement {
sid = "SQS"
effect = "Allow"
actions = [
"sqs:GetQueueAttributes", #Add IAM action to the demo-app role
]
resources = [
aws_sqs_queue.aws_sqs.arn,
]
}
}
resource "aws_iam_role_policy" "sqs-policy" {
depends_on = [
aws_sqs_queue.aws_sqs
]
provider = aws.account_c
name = "demo-app-sqs-policy"
role = aws_iam_role.demo-app.id
policy = data.aws_iam_policy_document.sqs-policy-document.json
}
For the second one, let's add an sts:AssumeRole policy to our demo applications trust policy passing in the keda-operator
s role like so,
resource "aws_iam_role" "demo-app" { ... }
data "aws_iam_policy_document" "demo-app-trust-policy" {
statement {
sid = "AssumeRoleKedaOperator"
effect = "Allow"
actions = [
"sts:AssumeRole",
]
principals {
type = "AWS"
identifiers = var.keda_operator_role # "arn:aws:iam::111122223333:role/keda-operator"
}
}
statement {
actions = [
"sts:AssumeRoleWithWebIdentity"
]
principals {
type = "Federated"
identifiers = ["arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"]
}
condition {
test = "StringEquals"
variable = "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub"
values = ["system:serviceaccount:demo-ns:demo-app"]
}
}
}
Run terraform apply
.
3. Update the IAM Role of KEDA
Let's update the role of keda-operator
so it knows what role to assume by adding an sts:AssumeRole policy passing in our demo application's role.
resource "aws_iam_role" "keda-operator" { ... }
data "aws_iam_policy_document" "keda-operator-trust-policy" { ... }
data "aws_iam_policy_document" "assume-role-policy-document" {
statement {
sid = "sts"
effect = "Allow"
actions = [
"sts:AssumeRole",
]
resources = var.application_roles_list # ["arn:aws:iam::111122223333:role/demo-app"]
}
}
resource "aws_iam_role_policy" "assume-role-policy" {
provider = aws.account_b
count = length(var.application_roles_list) == 0 ? 0 : 1 # This is needed otherwise Terraform will fail to create this resource when the role list is empty
name = "keda-operator-assume-role-policy"
role = aws_iam_role.keda-operator.id
policy = data.aws_iam_policy_document.assume-role-policy-document.json
}
Run terraform apply
.
4. Create scaler
Okay now, let's shift from AWS stuff to KEDA stuff. Let's start by creating the scaler.
KEDA scalers can both detect if a deployment should be activated or deactivated, and feed custom metrics for a specific event source.
To create a scaler, we have to define a ScaledObject custom resource.
The ScaledObject Custom Resource definition is used to define how KEDA should scale your application and what the triggers are.
The ScaledObject resource specifications are defined here. As part of the specification, we also have to define a trigger, which in our case is an AWS SQS Queue. The specification for AWS SQS Queue can be found here.
Following these specifications, our scaler configuration would be,
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: demo-app-scaledobject
namespace: demo-ns
spec:
minReplicaCount: 0 # scale down to 0
maxReplicaCount: 10
pollingInterval: 30
scaleTargetRef:
name: demo-app # name of our demo-app deployment
triggers:
- type: aws-sqs-queue
authenticationRef:
name: demo-app-trigger-auth-aws-credentials # trigger auth object name
metadata:
queueURL: https://sqs.ca-central-1.amazonaws.com/111122223333/demo-app-sqs
queueLength: "2" # 2 messages per pod
awsRegion: ca-central-1
identityOwner: pod # defaults to pod if unset
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: demo-app-trigger-auth-aws-credentials
namespace: demo-ns
spec:
podIdentity:
provider: aws-eks # or aws-kiam when using kiam
Add this configuration to our manifest file, demo-app.yaml
. We then deploy these resources on the same namespace as our demo application simply by running kubectl apply -f demo-app.yaml
.
Describing our ScaledObject by running kubectl describe ScaledObject demo-app-scaledobject -n demo-ns
shows it's active and happy.
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Message: No fallbacks are active on this scaled object
Reason: NoFallbackFound
Status: False
Type: Fallback
External Metric Names:
s0-aws-sqs-demo-app
Health:
s0-aws-sqs-demo-app:
Number Of Failures: 0
Status: Happy
Hpa Name: keda-hpa-demo-app
Last Active Time: 2023-02-16T20:18:48Z
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events: <none>
Let's check the logs of our KEDA deployments to see what they say,
keda-operator
log:
INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"demo-app-scaledobject","namespace":"demo-ns"}, "namespace": "demo-ns", "name": "demo-app", "reconcileID": "e40b3625-6794-4358-b285-a9a57194edee"}
keda-operator-metrics-apiserver
log:
1 trace.go:205] Trace[87882572]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/demo-ns/s0-aws-sqs-demo-app,user-agent:kube-controller-manager/v1.22.16 (linux/amd64) kubernetes/52e500d/system:serviceaccount:kube-system:horizontal-pod-autoscaler,audit-id:1570ee53-ba50-4de8-81c7-727482dca392,client:172.16.113.69,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (20-Feb-2023 05:04:01.958) (total time: 986ms):
Both KEDA deployments acknowledge and reconciles with the new ScaledObject we created.
The keda-operator
has also created an HPA for our demo application.
$ kubectl get hpa -n demo-ns
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-demo-app Deployment/demo-app <unknown>/1 (avg) 1 10 0 5s
From now on, this HPA will be managed keda-operator
to set the number of replicas based on the metrics provided by the keda-operator-metrics-apiserver
.
5. Test scaling
To trigger scaling, let's send some messages to our queue using AWS Console. Let's send 4 messages to target 2 replicas.
Going back to the cluster, let's check the events from our ScaledObject and HPA.
keda-hpa-demo-app
events:
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-aws-sqs-demo-app(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: demo-app,},MatchExpressions:[]LabelSelectorRequirement{},})
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 10s (x2 over 4d) horizontal-pod-autoscaler New size: 2; reason: external metric s0-aws-sqs-demo-app(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: demo-app,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
demo-app-scaledobject
events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScaleTargetActivated 10s (x3 over 4d) keda-operator Scaled apps/v1.Deployment demo-ns/demo-app from 0 to 2
Putting a watch on the demo-ns
shows the new pods that get created as a result of the scaling,
$ kubectl get pods -n demo-ns -w
NAME READY STATUS RESTARTS AGE
demo-app-674d8c455c-n7svp 0/1 Pending 0 0s
demo-app-674d8c455c-n7svp 0/1 Pending 0 0s
demo-app-674d8c455c-n7svp 0/1 ContainerCreating 0 0s
demo-app-674d8c455c-n7svp 1/1 Running 0 3s
demo-app-674d8c455c-64jvw 0/1 Pending 0 0s
demo-app-674d8c455c-64jvw 0/1 Pending 0 0s
demo-app-674d8c455c-64jvw 0/1 ContainerCreating 0 0s
demo-app-674d8c455c-64jvw 1/1 Running 0 2s
Let's trigger descaling by deleting the messages in the queue. Poll for messages, select all and delete them.
Getting the events from the demo-ns
shows:
$ kubectl get events -n demo-ns
LAST SEEN TYPE REASON OBJECT MESSAGE
4m2s Normal Killing pod/demo-app-674d8c455c-64jvw Stopping container demo-app
4m2s Normal Killing pod/demo-app-674d8c455c-n7svp Stopping container demo-app
4m2s Normal SuccessfulDelete replicaset/demo-app-674d8c455c Deleted pod: demo-app-674d8c455c-n7svp
4m2s Normal SuccessfulDelete replicaset/demo-app-674d8c455c Deleted pod: demo-app-674d8c455c-64jvw
4m3s Normal ScalingReplicaSet deployment/demo-app Scaled down replica set demo-app-674d8c455c to 0
4m3s Normal KEDAScaleTargetDeactivated scaledobject/demo-app Deactivated apps/v1.Deployment demo-ns/demo-app from 2 to 0
Getting the pods shows they are now in terminating state,
$ kubectl get pods -n demo-ns -w
NAME READY STATUS RESTARTS AGE
demo-app-674d8c455c-n7svp 1/1 Terminating 0 90m
demo-app-674d8c455c-64jvw 1/1 Terminating 0 90m
demo-app-674d8c455c-n7svp 0/1 Terminating 0 90m
demo-app-674d8c455c-n7svp 0/1 Terminating 0 90m
demo-app-674d8c455c-n7svp 0/1 Terminating 0 90m
demo-app-674d8c455c-64jvw 0/1 Terminating 0 90m
demo-app-674d8c455c-64jvw 0/1 Terminating 0 90m
demo-app-674d8c455c-64jvw 0/1 Terminating 0 90m
Awesome! Great work! You made it this far.
That's how easy to scale based on a queue using KEDA. Next, we'll look at using the operator identity model which I personally recommend.
Operator Identity Owner Model
When
identityOwner
set tooperator
- the only requirement is that the KEDA operator has the correct IAM permissions on the SQS queue. Additional Authentication Parameters are not required.
Going back to our demo applications architecture and incorporating the use of the Operator Identity Owner model, the architecture will look like below.
KEDA and Demo Application: Operator Identity Owner Model
Let's create a new set of resources to differentiate from the resources we created in the previous section. Here is the summary of the steps we'll go through.
- Create AWS SQS Queue with access policy
- Grant KEDA permissions to the AWS SQS Queue
- Create second demo application
- Create scaler
- Test scaling
Steps
1. Create AWS SQS Queue with access policy
The first step we have to do is create a queue. Using Terraform we can create it like so,
resource "aws_sqs_queue" "aws_sqs_operator" {
provider = aws.account_c
name = "demo-app-sqs-operator"
}
An access policy is needed to explicitly allow KEDA to access certain SQS API actions. This is needed when doing cross-account access. To create the access policy we need the IAM permission to read a queue, in this case, it's SQS:GetQueueAttributes, it's the only permission we need. We also need to pass the KEDA IAM role created in Part 1: Why and How to Install KEDA.
data "aws_iam_policy_document" "sqs_access_policy_data_operator" {
statement {
actions = ["SQS:GetQueueAttributes"]
resources = [ aws_sqs_queue.aws_sqs_operator.arn ]
effect = "Allow"
principals {
type = "AWS"
identifiers = var.keda_operator_role # "arn:aws:iam::111122223333:role/keda-operator"
}
}
}
resource "aws_sqs_queue_policy" "sqs_access_policy_operator" {
provider = aws.account_c
depends_on = [
aws_sqs_queue.aws_sqs_operator
]
queue_url = aws_sqs_queue.aws_sqs_operator.url
policy = data.aws_iam_policy_document.sqs_access_policy_data_operator.json
}
Run terraform apply
.
2. Grant KEDA access to the AWS SQS Queue
For this step, we'll have to update our KEDA installation. We need to grant KEDA or in particular the keda-operator
direct permission to the queue we created. We do this by adding the same policy we added on the queue access policy to the keda-operator
role.
resource "aws_iam_role" "keda-operator" { ... }
data "aws_iam_policy_document" "keda-operator-trust-policy" { ... }
output "keda_operator_role_arn" { ... }
resource "aws_iam_role_policy" "sqs-policy" {
name = "sqs-queue-policy"
role = aws_iam_role.keda-operator.id
policy = data.aws_iam_policy_document.sqs-policy-document.json
}
data "aws_iam_policy_document" "sqs-policy-document" {
statement {
sid = "SQS"
effect = "Allow"
actions = [
"sqs:GetQueueAttributes",
]
resources = [ "arn:aws:sqs:region-code:111122223333:demo-app-sqs-operator" ] # queue arn
}
}
Run terraform apply
.
3. Create second demo application
Let's deploy a second demo application so we can keep the first one and we can test both as much as we like. We would have to also create an IAM role for it. Scroll to the top to see the Terraform on how to create one.
Our second demo application will now look as follows:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: demo-app-operator
name: demo-app-operator
namespace: demo-ns
spec:
replicas: 1
selector:
matchLabels:
app: demo-app-operator
template:
metadata:
labels:
app: demo-app-operator
spec:
serviceAccountName: demo-app-operator-sa
containers:
- image: nginx
name: nginx
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: demo-app-operator-sa
namespace: demo-ns
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/demo-app-operator
Let's save this file as demo-app-operator.yaml
. Let's deploy our second demo application to the same namespace using kubectl
.
4. Create scaler
Let's now create our scaler specifications. This time it will be simpler because we don't need the TriggerAuthentication resource anymore.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: demo-app-scaledobject-operator
namespace: demo-ns
spec:
minReplicaCount: 0 # scale down to 0
maxReplicaCount: 10
pollingInterval: 30
scaleTargetRef:
name: demo-app-operator # name of our demo-app deployment
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.ca-central-1.amazonaws.com/111122223333/demo-app-sqs-operator
queueLength: "2" # 2 messages per pod
awsRegion: ca-central-1
identityOwner: operator # defaults to pod if unset
Add this configuration to our manifest file, demo-app-operator.yaml
. Let's deploy our ScaledObject and check if it's happy.
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Message: No fallbacks are active on this scaled object
Reason: NoFallbackFound
Status: False
Type: Fallback
External Metric Names:
s0-aws-sqs-demo-app-operator
Health:
s0-aws-sqs-demo-app-operator:
Number Of Failures: 0
Status: Happy
Hpa Name: keda-hpa-demo-app-operator
Last Active Time: 2023-02-16T21:44:24Z
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events: <none>
Check also the logs of both KEDA deployments if they have reconciled with the new ScaledObject we created.
The keda-operator
has also created an HPA for our second demo application.
$ kubectl get hpa -n demo-ns
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-demo-app-operator Deployment/demo-app-operator <unknown>/1 (avg) 1 10 0 4d16h
keda-hpa-demo-app Deployment/demo-app <unknown>/2 (avg) 1 10 0 4d16h
Awesome! It's time to trigger scaling!
5. Test scaling
Let's send messages to our queue.
In the cluster, let's get the latest events from our demo-ns
.
$ kubectl get events -n demo-ns
LAST SEEN TYPE REASON OBJECT MESSAGE
5m31s Normal SuccessfulRescale horizontalpodautoscaler/keda-hpa-demo-app-operator New size: 3; reason: external metric s0-aws-sqs-demo-app-operator(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: demo-app-operator,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
5m31s Normal Scheduled pod/demo-app-operator-7cff88c99b-5ddjv Successfully assigned demo-ns/demo-app-operator-7cff88c99b-5ddjv to ip-11-111-111-111.us-west-2.compute.internal
5m29s Normal Pulled pod/demo-app-operator-7cff88c99b-5ddjv Container image "nginx:latest" already present on machine
5m29s Normal Created pod/demo-app-operator-7cff88c99b-5ddjv Created container demo-app-operator
5m29s Normal Started pod/demo-app-operator-7cff88c99b-5ddjv Started container demo-app-operator
5m36s Normal Scheduled pod/demo-app-operator-7cff88c99b-gwz46 Successfully assigned demo-ns/demo-app-operator-7cff88c99b-gwz46 to ip-11-111-111-111.us-west-2.compute.internal
5m34s Normal Pulled pod/demo-app-operator-7cff88c99b-gwz46 Container image "nginx:latest" already present on machine
5m34s Normal Created pod/demo-app-operator-7cff88c99b-gwz46 Created container demo-app-operator
5m34s Normal Started pod/demo-app-operator-7cff88c99b-gwz46 Started container demo-app-operator
5m31s Normal Scheduled pod/demo-app-operator-7cff88c99b-xn7c8 Successfully assigned demo-ns/demo-app-operator-7cff88c99b-xn7c8 to ip-11-111-111-111.us-west-2.compute.internal
5m29s Normal Pulled pod/demo-app-operator-7cff88c99b-xn7c8 Container image "nginx:latest" already present on machine
5m29s Normal Created pod/demo-app-operator-7cff88c99b-xn7c8 Created container demo-app-operator
5m29s Normal Started pod/demo-app-operator-7cff88c99b-xn7c8 Started container demo-app-operator
5m36s Normal SuccessfulCreate replicaset/demo-app-operator-7cff88c99b Created pod: demo-app-operator-7cff88c99b-gwz46
5m31s Normal SuccessfulCreate replicaset/demo-app-operator-7cff88c99b Created pod: demo-app-operator-7cff88c99b-xn7c8
5m31s Normal SuccessfulCreate replicaset/demo-app-operator-7cff88c99b Created pod: demo-app-operator-7cff88c99b-5ddjv
5m36s Normal ScalingReplicaSet deployment/demo-app-operator Scaled up replica set demo-app-operator-7cff88c99b to 1
5m36s Normal KEDAScaleTargetActivated scaledobject/demo-app-operator Scaled apps/v1.Deployment demo-ns/demo-app-operator from 0 to 1
5m31s Normal ScalingReplicaSet deployment/demo-app-operator Scaled up replica set demo-app-operator-7cff88c99b to 3
Putting a watch on the demo-ns shows the new pods that get created as a result of the scaling,
$ kubectl get pods -n demo-ns -w
NAME READY STATUS RESTARTS AGE
demo-app-operator-7cff88c99b-gwz46 0/1 Pending 0 0s
demo-app-operator-7cff88c99b-gwz46 0/1 Pending 0 0s
demo-app-operator-7cff88c99b-gwz46 0/1 ContainerCreating 0 0s
demo-app-operator-7cff88c99b-gwz46 1/1 Running 0 3s
demo-app-operator-7cff88c99b-xn7c8 0/1 Pending 0 0s
demo-app-operator-7cff88c99b-xn7c8 0/1 Pending 0 0s
demo-app-operator-7cff88c99b-5ddjv 0/1 Pending 0 0s
demo-app-operator-7cff88c99b-5ddjv 0/1 Pending 0 0s
demo-app-operator-7cff88c99b-xn7c8 0/1 ContainerCreating 0 0s
demo-app-operator-7cff88c99b-5ddjv 0/1 ContainerCreating 0 1s
demo-app-operator-7cff88c99b-5ddjv 1/1 Running 0 2s
demo-app-operator-7cff88c99b-xn7c8 1/1 Running 0 3s
Let's trigger descaling by deleting the messages in the queue.
Getting the events from the demo-ns shows:
$ kubectl get events -n demo-ns
LAST SEEN TYPE REASON OBJECT MESSAGE
4m2s Normal Killing pod/demo-app-674d8c455c-64jvw Stopping container demo-app
4m2s Normal Killing pod/demo-app-674d8c455c-n7svp Stopping container demo-app
4m2s Normal SuccessfulDelete replicaset/demo-app-674d8c455c Deleted pod: demo-app-674d8c455c-n7svp
4m2s Normal SuccessfulDelete replicaset/demo-app-674d8c455c Deleted pod: demo-app-674d8c455c-64jvw
4m3s Normal ScalingReplicaSet deployment/demo-app Scaled down replica set demo-app-674d8c455c to 0
4m3s Normal KEDAScaleTargetDeactivated scaledobject/demo-app Deactivated apps/v1.Deployment demo-ns/demo-app from 2 to 0
Getting the pods shows,
$ kubectl get pods -n demo-ns -w
NAME READY STATUS RESTARTS AGE
demo-app-operator-7cff88c99b-gwz46 1/1 Terminating 0 14m
demo-app-operator-7cff88c99b-xn7c8 1/1 Terminating 0 13m
demo-app-operator-7cff88c99b-5ddjv 1/1 Terminating 0 13m
demo-app-operator-7cff88c99b-5ddjv 0/1 Terminating 0 13m
demo-app-operator-7cff88c99b-xn7c8 0/1 Terminating 0 13m
demo-app-operator-7cff88c99b-xn7c8 0/1 Terminating 0 13m
demo-app-operator-7cff88c99b-xn7c8 0/1 Terminating 0 13m
demo-app-operator-7cff88c99b-5ddjv 0/1 Terminating 0 13m
demo-app-operator-7cff88c99b-5ddjv 0/1 Terminating 0 13m
demo-app-operator-7cff88c99b-gwz46 0/1 Terminating 0 14m
demo-app-operator-7cff88c99b-gwz46 0/1 Terminating 0 14m
demo-app-operator-7cff88c99b-gwz46 0/1 Terminating 0 14m
Yaaay! Awesome work in following through with this guide. We have scaled our demo application based on a queue using two different identity models.
If you have run through some errors, check out below for a list of errors that I encountered and how to fix them.
Watch out for the next part of KEDA in Amazon EKS mini-series. On the next one, we'll use KEDA to scale a workload based on http traffic! This ability is a game-changer for our scaling capabilities.
Common Errors
Here are some possible errors that you may encounter following this blog and how to solve them. These errors can be seen by describing the ScaledObject in your namespace.
WebIdentityErr
Warning KEDAScalerFailed 18s keda-operator WebIdentityErr: failed to retrieve credentials caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity status code: 403, request id: 24f2181f-8a1b-4a4d-8de3-f85d501787a7
This is often caused by an issue with the role's trust policy. Double check the values.
TriggerAuthentication
2023-02-16T03:46:46Z ERROR scalehandler Error getting triggerAuth {"type": "ScaledObject", "namespace": "demo-ns", "name": "demo-app", "triggerAuthRef.Name": "demo-app-trigger-auth-aws-credentials", "error": "TriggerAuthentication.keda.sh \"demo-app-trigger-auth-aws-credentials\" not found"}
Check that the TriggerAuthentication object gets created.
NonExistentQueue: wsdl
Warning KEDAScalerFailed 78s keda-operator AWS.SimpleQueueService.NonExistentQueue: The specified queue does not exist for this wsdl version status code: 400, request id: b01e1553-fa5d-5bde-8190-493cc508300f
You might have set the incorrect region on the values.yaml
file. Double-check that you are setting the region where the queue is.
AccessDenied
Warning KEDAScalerFailed 3m40s keda-operator AccessDenied: User: arn:aws:sts::620528608035:assumed-role/keda-operator/1677015712377894955 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::046921848075:role/demo-app status code: 403, request id: 6001b079-4cc9-4309-b364-5958316ce52b
This can be caused by a missing sts:AssumeRole policy on the keda-operator
role or on your workload's role.
Warning KEDAScalerFailed 0s keda-operator AccessDenied: Access to the resource https://sqs.ca-central-1.amazonaws.com/ is denied. status code: 403, request id: c9b51391-2132-551e-b998-4ffe48b8de74
This can be caused by missing permissions to access the queue. When using the Operator Identity Model, check that:
- access policy has been added to the queue
- permissions have been added to the KEDA role
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScalersStarted 109s keda-operator Started scalers watch
Warning KEDAScalerFailed 106s keda-operator AccessDenied: Access to the resource https://sqs.ca-central-1.amazonaws.com/ is denied.
status code: 403, request id: dabe543b-d61f-57a4-9c6b-089fdde8daa9
Warning KEDAScalerFailed 92s keda-operator AccessDenied: Access to the resource https://sqs.ca-central-1.amazonaws.com/ is denied.
status code: 403, request id: 84ae713a-5e5b-5c73-98d9-2d9e525f953d
Warning KEDAScalerFailed 77s keda-operator AccessDenied: Access to the resource https://sqs.ca-central-1.amazonaws.com/ is denied.
status code: 403, request id: 3955aad9-0221-5a6f-a7c4-f67d0e32145f
Normal ScaledObjectReady 69s (x2 over 109s) keda-operator ScaledObject is ready for scaling
Warning KEDAScalerFailed 62s keda-operator AccessDenied: Access to the resource https://sqs.ca-central-1.amazonaws.com/ is denied.
status code: 403, request id: eddee0ca-6e76-520b-b4ac-bc6e62543864
Warning KEDAScalerFailed 47s keda-operator AccessDenied: Access to the resource https://sqs.ca-central-1.amazonaws.com/ is denied.
status code: 403, request id: 7f1a95e9-3e9c-550b-82d0-c94d32adf964
Normal KEDAScaleTargetDeactivated 33s keda-operator Deactivated apps/v1.Deployment demo-ns/demo-app-operator from 1 to 0
From the example above, sometimes it may also take a minute before KEDA can access the queue.
Reach out for a yarn
If you have some questions, feedback or just want to reach out for a good ol' yarn, please connect and flick me a message at https://www.linkedin.com/in/carlo-columna/.
References:
Posted on July 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.