Combining IAM Roles for Service Accounts with Pod level Security Groups for a defense-in-depth strategy
Chabane R.
Posted on March 20, 2021
In the previous part we created our RDS instance. In this part, we'll put them all together and deploy the metabase to Kubernetes. Our objective is to:
- Enable IAM roles for Service Account.
- Create an IAM role to connect to the RDS instance. It will be added to the metabase service account.
- Enable Pod Security Group by adding the managed policy AmazonEKSVPCResourceController on Amazon EKS cluster.
- Create a security group that allows inbound traffic to RDS. It will be assigned to the metabase service account.
- Upgrade the VPC CNI to the latest version. Version +1.7.7 is required to enable Pod Security Group in the EKS Cluster.
- Enabling POD ENI in the aws-node daemonset.
- Deploy and test our Kubernetes manifests.
Enabling IAM roles for Service Account
To assign an IAM role to a pod, we need:
- To create an IAM OIDC provider for the cluster. The cluster has an OpenID Connect issuer URL associated with it.
- To create the IAM role and attach an IAM policy to it with the
rds-db:connect
permission that the service account needs:
Complete infra/plan/eks-cluster.tf
with:
data "tls_certificate" "cert" {
url = aws_eks_cluster.eks.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "openid" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.cert.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.eks.identity[0].oidc[0].issuer
}
data "aws_iam_policy_document" "web_identity_assume_role_policy" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
effect = "Allow"
condition {
test = "StringEquals"
variable = "${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub"
values = ["system:serviceaccount:metabase:metabase"]
}
condition {
test = "StringEquals"
variable = "${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:aud"
values = ["sts.amazonaws.com"]
}
principals {
identifiers = [aws_iam_openid_connect_provider.openid.arn]
type = "Federated"
}
}
}
resource "aws_iam_role" "web_identity_role" {
assume_role_policy = data.aws_iam_policy_document.web_identity_assume_role_policy.json
name = "web-identity-role-${var.env}"
}
By combining the OpenID Connect (OIDC)
identity provider and Kubernetes service account annotations, we will be able use IAM roles at the pod level.
Inside EKS, there is an admission controller that will inject AWS session credentials into pods respectively of the roles based on the annotation on the Service Account used by the pod. The credentials will get exposed by AWS_ROLE_ARN
& AWS_WEB_IDENTITY_TOKEN_FILE
environment variables. [3]
For a detailed explanation of this capability, see the [introducing fine-grained IAM roles for service accounts][aws-7]
Now we can create the IAM role to allow access to RDS instance from Kubernetes pods:
Complete infra/plan/eks-cluster.tf
with:
resource "aws_iam_role_policy" "rds_access_from_k8s_pods" {
name = "rds-access-from-k8s-pods-${var.env}"
role = aws_iam_role.web_identity_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"rds-db:connect",
]
Effect = "Allow"
Resource = "arn:aws:rds-db:${var.region}:${data.aws_caller_identity.current.account_id}:dbuser:${aws_db_instance.postgresql.resource_id}/metabase"
}
]
})
}
Pod Security Group
To enable Pod security group, we need to add the managed policy AmazonEKSVPCResourceController
. It allows the role to manage network interfaces, their private IP addresses, and their attachment and detachment to and from instances.
Complete infra/plan/eks-cluster.tf
with:
resource "aws_iam_role_policy_attachment" "eks-AmazonEKSVPCResourceController" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
role = aws_iam_role.eks.name
}
Now let's create our pod security group
Complete infra/plan/eks-node-group.tf
with:
resource "aws_security_group" "rds_access" {
name = "rds-access-from-pod-${var.env}"
description = "Allow RDS Access from Kubernetes Pods"
vpc_id = aws_vpc.main.id
ingress {
from_port = 3000
to_port = 3000
protocol = "tcp"
self = true
}
ingress {
from_port = 53
to_port = 53
protocol = "tcp"
security_groups = [aws_eks_cluster.eks.vpc_config[0].cluster_security_group_id]
}
ingress {
from_port = 53
to_port = 53
protocol = "udp"
security_groups = [aws_eks_cluster.eks.vpc_config[0].cluster_security_group_id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "rds-access-from-pod-${var.env}"
Environment = var.env
}
}
To allow the pod to access the Amazon RDS instance, we need to allow the pod security group as the source of inbound / outbound traffic on the RDS port.
Update the VPC security group aws_security_group.sg
in infra/plan/rds.tf
with the following ingress / egress rules:
ingress {
from_port = var.rds_port
to_port = var.rds_port
protocol = "tcp"
security_groups = [aws_security_group.rds_access.id]
}
egress {
from_port = 1025
to_port = 65535
protocol = "tcp"
security_groups = [aws_security_group.rds_access.id]
}
Add the following outputs:
output "sg-eks-cluster" {
value = aws_eks_cluster.eks.vpc_config[0].cluster_security_group_id
}
output "sg-rds-access" {
value = aws_security_group.rds_access.id
}
Let's deploy our modifications
cd infra/envs/dev
terraform apply ../../plan/
Kubernetes configuration
Let's connect to EKS cluster
aws eks --region $REGION update-kubeconfig --name $EKS_CLUSTER_NAME
Now we need to enable pods to receive their own network interfaces. Before doing that, use the following command to print your cluster's CNI version:
kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2
The Amazon EKS cluster must be running Kubernetes version 1.17 and Amazon EKS platform version eks.3 or later.
Upgrade your CNI version [1]
curl -o aws-k8s-cni.yaml https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.7.9/config/v1.7/aws-k8s-cni.yaml
sed -i "s/us-west-2/$REGION/g" aws-k8s-cni.yaml
kubectl apply -f aws-k8s-cni.yaml
Enable the CNI plugin to manage network interfaces for pods by setting the ENABLE_POD_ENI
variable to true in the aws-node DaemonSet. Once this setting is set to true, for each node in the cluster the plugin adds a label with the value vpc.amazonaws.com/has-trunk-attached=true
. The VPC resource controller creates and attaches one special network interface called a trunk network interface with the description aws-k8s-trunk-eni
[2].
kubectl set env daemonset -n kube-system aws-node ENABLE_POD_ENI=true
You can see which of your nodes have aws-k8s-trunk-eni
set to true with the following command.
$ kubectl get nodes -o wide -l vpc.amazonaws.com/has-trunk-attached=true
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-3-109.eu-west-1.compute.internal Ready <none> 56m v1.18.9-eks-d1db3c 10.0.3.109 <none> Amazon Linux 2 4.14.219-164.354.amzn2.x86_64 docker://19.3.13
ip-10-0-7-157.eu-west-1.compute.internal Ready <none> 56m v1.18.9-eks-d1db3c 10.0.7.157 34.253.89.183 Amazon Linux 2 4.14.219-164.354.amzn2.x86_64 docker://19.3.13
Testing metabase connection to the RDS Instance
We deploy our k8s manifests using Kustomize. Add the following manifests in the folder config/base
config/base/service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: metabase
name: metabase
config/base/security-group-policy.yaml
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
name: metabase
spec:
serviceAccountSelector:
matchLabels:
app: metabase
config/base/database-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: metabase
type: Opaque
data:
password: metabase
config/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metabase
labels:
app: metabase
spec:
selector:
matchLabels:
app: metabase
replicas: 1
template:
metadata:
labels:
app: metabase
spec:
containers:
- name: metabase
image: metabase/metabase
imagePullPolicy: IfNotPresent
resources:
requests:
memory: "1Gi"
cpu: "512m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 100
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 60
periodSeconds: 10
config/base/service.yaml
apiVersion: v1
kind: Service
metadata:
name: metabase
labels:
app: metabase
spec:
type: LoadBalancer
ports:
- port: 8000
targetPort: 3000
protocol: TCP
selector:
app: metabase
And finally our config/base/kustomization.yaml
file
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: metabase
resources:
- security-group-policy.yaml
- service-account.yaml
- deployment.yaml
- service.yaml
- database-secret.yaml
Now we have our kustomize base
, we can patch the manifests with the values provided as terraform outputs.
Create config/envs/$ENV/service-account.patch.yaml
. We annotate the service account with the IAM role created before for RDS access.
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: <RDS_ACCESS_ROLE_ARN>
labels:
app: metabase
name: metabase
Create config/envs/$ENV/security-group-policy.patch.yaml
.
The SecurityGroupPolicy
CRD specifies which security groups to assign to pods. Within a namespace, we can select pods based on pod labels, or based on labels of the service account associated with a pod. We define the security group IDs to be applied.
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
name: metabase
spec:
serviceAccountSelector:
matchLabels:
app: metabase
securityGroups:
groupIds:
- <POD_SECURITY_GROUP_ID>
- <EKS_CLUSTER_SECURITY_GROUP_ID>
Create config/envs/$ENV/database-secret.patch.yaml
apiVersion: v1
kind: Secret
metadata:
name: metabase
type: Opaque
data:
password: <MB_DB_PASS>
Create config/envs/$ENV/deployment.patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metabase
labels:
app: metabase
spec:
selector:
matchLabels:
app: metabase
replicas: 1
template:
metadata:
labels:
app: metabase
spec:
serviceAccountName: metabase
containers:
- name: metabase
image: metabase/metabase
imagePullPolicy: IfNotPresent
env:
- name: MB_DB_TYPE
value: postgres
- name: MB_DB_HOST
value: <MB_DB_HOST>
- name: MB_DB_PORT
value: "5432"
- name: MB_DB_DBNAME
value: metabase
- name: MB_DB_USER
value: metabase
- name: MB_DB_PASS
valueFrom:
secretKeyRef:
name: metabase
key: password
nodeSelector:
type: private
And the config/envs/$ENV/kustomization.yaml
file
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: metabase
resources:
- ../../base
patchesStrategicMerge:
- security-group-policy.patch.yaml
- service-account.patch.yaml
- database-secret.patch.yaml
- deployment.patch.yaml
Let's replace the by real values:
cd config/envs/dev
# Generate DB auth token
METABASE_PWD=$(aws rds generate-db-auth-token --hostname $(terraform output private-rds-endpoint) --port 5432 --username metabase --region $REGION)
METABASE_PWD=$(echo -n $METABASE_PWD | base64 -w 0 )
sed -i "s/<MB_DB_PASS>/$METABASE_PWD/g" database-secret.patch.yaml
sed -i "s/<POD_SECURITY_GROUP_ID>/$(terraform output sg-rds-access)/g; s/<EKS_CLUSTER_SECURITY_GROUP_ID>/$(terraform output sg-eks-cluster)/g" security-group-policy.patch.yaml
sed -i "s,<RDS_ACCESS_ROLE_ARN>,$(terraform output rds-access-role-arn),g" service-account.patch.yaml
sed -i "s/<MB_DB_HOST>/$(terraform output private-rds-endpoint)/g" deployment.patch.yaml
Run the manifests
kubectl create namespace metabase
kubectl config set-context --current --namespace=metabase
kustomize build . | kubectl apply -f -
Let's see if it worked
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
metabase-6d47d7b94b-796sx 1/1 Running 2 98s
$ kubectl describe pods metabase-6d47d7b94b-796sx
Name: metabase-6d47d7b94b-796sx
Namespace: metabase
Priority: 0
Node: ip-10-0-3-109.eu-west-1.compute.internal/10.0.3.109
[..]
Labels: app=metabase
pod-template-hash=6d47d7b94b
Annotations: kubernetes.io/psp: eks.privileged
vpc.amazonaws.com/pod-eni:
[{"eniId":"eni-054df22ad2b1b89c3","ifAddress":"02:3b:a8:a7:9c:f5","privateIp":"10.0.3.128","vlanId":1,"subnetCidr":"10.0.2.0/23"}]
Status: Running
IP: 10.0.3.128
IPs:
IP: 10.0.3.128
[..]
Node-Selectors: type=private
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32s default-scheduler Successfully assigned metabase/metabase-6d47d7b94b-796sx to ip-10-0-3-109.eu-west-1.compute.internal
Normal SecurityGroupRequested 32s vpc-resource-controller Pod will get the following Security Groups [sg-0c0195a69b1b8bdc3 sg-0d4b509bad15ec963]
Normal ResourceAllocated 31s vpc-resource-controller Allocated [{"eniId":"eni-054df22ad2b1b89c3","ifAddress":"02:3b:a8:a7:9c:f5","privateIp":"10.0.3.128","vlanId":1,"subnetCidr":"10.0.2.0/23"}] to the pod
Normal Pulled 31s kubelet Container image "metabase/metabase" already present on machine
Normal Created 31s kubelet Created container metabase
Normal Started 31s kubelet Started container metabase
As we can see the security groups have been attached to the pods.
$ kubectl logs metabase-6d47d7b94b-796sx
[..]
2021-03-20 13:22:35,660 INFO metabase.core :: Setting up and migrating Metabase DB. Please sit tight, this may take a minute...
2021-03-20 13:22:35,663 INFO db.setup :: Verifying postgres Database Connection ...
2021-03-20 13:22:40,245 INFO db.setup :: Successfully verified PostgreSQL 12.5 application database connection. ✅
2021-03-20 13:22:40,246 INFO db.setup :: Running Database Migrations...
2021-03-20 13:22:40,387 INFO db.setup :: Setting up Liquibase...
2021-03-20 13:22:40,502 INFO db.setup :: Liquibase is ready.
2021-03-20 13:22:40,503 INFO db.liquibase :: Checking if Database has unrun migrations...
2021-03-20 13:22:42,900 INFO db.liquibase :: Database has unrun migrations. Waiting for migration lock to be cleared...
2021-03-20 13:22:42,980 INFO db.liquibase :: Migration lock is cleared. Running migrations...
2021-03-20 13:22:48,068 INFO db.setup :: Database Migrations Current ... ✅
[..]
2021-03-20 13:23:13,054 INFO metabase.core :: Metabase Initialization COMPLETE
If the deployment is created before the
SecurityGroupPolicy
you will get aconnect timed out
. Delete and recreate the deployment.
Now, let's delete the security groups policy and recreate the deployment to check if the connection fails.
$ kubectl delete -f security-group-policy.patch.yaml
$ kubectl delete -f deployment.patch.yaml
$ kubectl apply -f deployment.patch.yaml
$ kubectl logs metabase-6d47d7b94b-wbn4r
2021-03-20 13:31:32,993 INFO db.setup :: Verifying postgres Database Connection ...
2021-03-20 13:31:43,052 ERROR metabase.core :: Metabase Initialization FAILED
clojure.lang.ExceptionInfo: Unable to connect to Metabase postgres DB.
[..]
Caused by: java.net.SocketTimeoutException: connect timed out
[..]
2021-03-20 13:31:43,072 INFO metabase.core :: Metabase Shutting Down ...
2021-03-20 13:31:43,077 INFO metabase.server :: Shutting Down Embedded Jetty Webserver
2021-03-20 13:31:43,088 INFO metabase.core :: Metabase Shutdown COMPLETE
As you can see, metabase is no longer authorized to access the RDS instance.
Last check, let's add Security Group Policy again and remove the annotation from the service account that attaches the IAM role to the pod.
$ kubectl annotate sa metabase eks.amazonaws.com/role-arn-
$ kubectl apply -f security-group-policy.patch.yaml
$ kubectl delete -f deployment.patch.yaml
$ kubectl apply -f deployment.patch.yaml
2021-03-20 13:43:42,329 INFO db.setup :: Verifying postgres Database Connection ...
2021-03-20 13:43:42,710 ERROR metabase.core :: Metabase Initialization FAILED
clojure.lang.ExceptionInfo: Unable to connect to Metabase postgres DB.
[..]
Caused by: org.postgresql.util.PSQLException: FATAL: PAM authentication failed for user "metabase"
[..]
As you can see, metabase is no longer authenticated and then authorized to access the user "metabase".
Conclusion
In this long workshop, we created:
- An isolated network to host our Amazon RDS
- Configured an Amazon EKS cluster with fine-grained access control to Amazon RDS
- We tested the connectivity between a Kubernetes container and an RDS instance database.
That's it!
Clean
kustomize build . | kubectl delete -f -
cd ../../../infra/envs/$ENV
terraform destroy ../../plan/
Final Words
The source code is available on Gitlab.
If you have any questions or feedback, please feel free to leave a comment.
Otherwise, I hope I have helped you answer some of the hard questions about connecting Amazon EKS to Amazon RDS and providing a pod level defense in depth security strategy at both the networking and authentication layers.
By the way, do not hesitate to share with peers 😊
Thanks for reading!
Documentation
[1] https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html
[3] https://eksctl.io/usage/iamserviceaccounts/#how-it-works
Posted on March 20, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.