Setup Multi-Cluster ServiceMesh with Istio on EKS
Alfred Valderrama
Posted on June 22, 2022
1. Provisioning
2. Istio Setup with Helm chart
3. Cross network gateway validation
Hey! In this post, We will be exploring a technology called ServiceMesh powered by Istio.
Short Intro on Non-Istio users
Istio is an open source service mesh that layers transparently onto existing distributed applications.
Istio's powerful features provide a uniform and more efficient way to secure, connect, and monitor services. Its powerful control plane brings vital features, including:
Secure service-to-service communication in a cluster with TLS encryption, strong identity-based authentication and authorization.
Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic
Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection
A pluggable policy layer and configuration API supporting access controls, rate limits and quotas
Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress
Most of you are already familiar with Istio. Since, Kubernetes federation is currently not yet available and the latest version is on a Beta version and you want to distribute your traffic across different clusters with production grade deployment.
So, Let's quickly go through the Step-by-Step procedure to implement Multi Cluster Deployment with Istio.
This tutorial is highly based on AWS and Terraform and also Helm Charts.
Provisioning
Required resources
- EKS Clusters
- Security Group / Rule
- S3 Bucket for terraform states
- IAM Roles
- IAM Permissions
- AWS ALB (Application LoadBalancer)
IAM Role
Create your IAM Role for your EKS Cluster.
iam.tf
resource "aws_iam_role" "eks_iam_role" {
name = "AmazonAwsEksRole"
assume_role_policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Principal" : {
"Service" : "eks.amazonaws.com"
},
"Action" : "sts:AssumeRole"
}
]
})
tags = local.default_tags
}
resource "aws_iam_policy_attachment" "eksClusterPolicyAttachmentDefault" {
name = "eksClusterPolicyAttachmentDefault"
roles = [aws_iam_role.eks_iam_role.name]
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}
resource "aws_iam_role" "eks_iam_node_role" {
name = "AmazonAwsEksNodeRole"
assume_role_policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Principal" : {
"Service" : "ec2.amazonaws.com"
},
"Action" : "sts:AssumeRole"
}
]
})
tags = local.default_tags
depends_on = [
aws_iam_role.eks_iam_role,
aws_iam_policy_attachment.eksClusterPolicyAttachmentDefault
]
}
resource "aws_iam_policy_attachment" "AmazonEKSWorkerNodePolicyAttachment" {
name = "AmazonEKSWorkerNodePolicyAttachment"
roles = [aws_iam_role.eks_iam_node_role.name]
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}
resource "aws_iam_policy_attachment" "AmazonEC2ContainerRegistryReadOnlyAttachment" {
name = "AmazonEC2ContainerRegistryReadOnlyAttachment"
roles = [aws_iam_role.eks_iam_node_role.name]
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}
resource "aws_iam_policy_attachment" "AmazonEKSCNIPolicyAttachment" {
name = "AmazonEKSCNIPolicyAttachment"
roles = [aws_iam_role.eks_iam_node_role.name]
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}
Security Group
AWS Security Group is required in able for your cluster to communicate.
securitygroup.tf
resource "aws_security_group" "cluster_sg" {
name = "cluster-security-group"
description = "Communication with Worker Nodes"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
ingress {
from_port = 0
to_port = 0
protocol = "-1"
}
tags = local.default_tags
}
resource "aws_security_group" "cp_sg" {
name = "cp-sg"
description = "CP and Nodegroup communication"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "Allow all"
cidr_blocks = ["0.0.0.0/0"]
from_port = 0
to_port = 0
protocol = "-1"
}
tags = local.default_tags
}
resource "aws_security_group" "wrkr_node" {
name = "worker-sg"
description = "Worker Node SG"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "Allow All"
cidr_blocks = ["0.0.0.0/0"]
from_port = 0
to_port = 0
protocol = "-1"
}
ingress {
description = "Self Communication"
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
tags = local.default_tags
}
EKS Clusters
In this section, you will provision the 2 EKS Cluster.
eks.tf
locals {
default_tags = {
Provisioner = "Terraform"
Environment = "Testing"
}
}
resource "aws_eks_cluster" "istio_service_mesh_primary_1" {
name = "istio-service-mesh-primary-1"
role_arn = aws_iam_role.eks_iam_role.arn
vpc_config {
subnet_ids = var.subnet_ids
public_access_cidrs = ["0.0.0.0/0"]
security_group_ids = [
aws_security_group.cluster_sg.id,
aws_security_group.cp_sg.id
]
}
version = "1.21"
timeouts {
create = "15m"
}
depends_on = [
aws_iam_role.eks_iam_role,
aws_iam_role.eks_iam_node_role,
aws_security_group.cluster_sg,
aws_security_group.cp_sg,
aws_security_group.wrkr_node
]
}
resource "aws_eks_cluster" "istio_service_mesh_primary_2" {
name = "istio-service-mesh-primary-2"
role_arn = aws_iam_role.eks_iam_role.arn
vpc_config {
subnet_ids = var.subnet_ids
public_access_cidrs = ["0.0.0.0/0"]
security_group_ids = [
aws_security_group.cluster_sg.id,
aws_security_group.cp_sg.id
]
}
version = "1.21"
timeouts {
create = "15m"
}
tags = local.default_tags
depends_on = [
aws_iam_role.eks_iam_role,
aws_iam_role.eks_iam_node_role,
aws_iam_policy_attachment.eksClusterPolicyAttachmentDefault,
aws_security_group.cluster_sg,
aws_security_group.cp_sg,
aws_security_group.wrkr_node
]
}
resource "aws_eks_addon" "eks_addon_vpc-cni" {
cluster_name = aws_eks_cluster.istio_service_mesh_primary_1.name
addon_name = "vpc-cni"
depends_on = [
aws_iam_role.eks_iam_role,
aws_iam_role.eks_iam_node_role,
aws_security_group.cluster_sg,
aws_security_group.cp_sg,
aws_security_group.wrkr_node,
aws_eks_cluster.istio_service_mesh_primary_1
]
}
resource "aws_eks_addon" "eks_addon_vpc-cni_2" {
cluster_name = aws_eks_cluster.istio_service_mesh_primary_2.name
addon_name = "vpc-cni"
depends_on = [
aws_iam_role.eks_iam_role,
aws_iam_role.eks_iam_node_role,
aws_security_group.cluster_sg,
aws_security_group.cp_sg,
aws_security_group.wrkr_node,
aws_eks_cluster.istio_service_mesh_primary_2
]
}
resource "aws_eks_node_group" "istio_service_mesh_primary_worker_group_1" {
cluster_name = aws_eks_cluster.istio_service_mesh_primary_1.name
node_group_name = "istio-service-mesh-primary-worker-group-1"
node_role_arn = aws_iam_role.eks_iam_node_role.arn
subnet_ids = var.subnet_ids
remote_access {
ec2_ssh_key = var.ssh_key
source_security_group_ids = [aws_security_group.wrkr_node.id]
}
scaling_config {
desired_size = 2
max_size = 3
min_size = 2
}
instance_types = ["t3.medium"]
update_config {
max_unavailable = 1
}
depends_on = [
aws_iam_role.eks_iam_role,
aws_iam_role.eks_iam_node_role,
aws_security_group.cluster_sg,
aws_security_group.cp_sg,
aws_security_group.wrkr_node,
aws_eks_cluster.istio_service_mesh_primary_1,
aws_eks_addon.eks_addon_vpc-cni
]
timeouts {
create = "15m"
}
tags = local.default_tags
}
resource "aws_eks_node_group" "istio_service_mesh_primary_worker_group_2" {
cluster_name = aws_eks_cluster.istio_service_mesh_primary_2.name
node_group_name = "istio-service-mesh-primary-worker-group-2"
node_role_arn = aws_iam_role.eks_iam_node_role.arn
subnet_ids = var.subnet_ids
remote_access {
ec2_ssh_key = var.ssh_key
source_security_group_ids = [aws_security_group.wrkr_node.id]
}
scaling_config {
desired_size = 2
max_size = 3
min_size = 2
}
instance_types = ["t3.medium"]
update_config {
max_unavailable = 1
}
depends_on = [
aws_eks_cluster.istio_service_mesh_primary_2
]
timeouts {
create = "15m"
}
tags = local.default_tags
}
After creating necessary tf configuration. It's now time to apply it.
First, Create a tf workspace.
terraform workspace new istio-service-mesh
Next, Verify if your tf configuration is smooth.
terraform init
terraform workspace select istio-service-mesh
terraform fmt
terraform validate
terraform plan -out='plan.out'
Then, Apply it.
terraform apply 'plan.out'
It is now provisioning:
After 20 minutes or more, Your cluster's is ready!
2. Istio Setup with HelmΒ chart
It's now time to install Istio on both clusters.
Required Charts
- Istio Base helm chart
- Istiod helm chart
- Istio ingress gateway helm chart
First, Add the helm istio repository via:
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
Cluster1
Installing Istio Base helm chart via:
helm upgrade --install istio-base istio/base \
--create-namespace -n istio-system \
--version 1.13.2 --wait
Now, Istio based is now installed. Next one is the Istio Control Plane.
Note: You must specify the meshID, clusterName and network to uniquely identify your clusters when installing Istio control Plane.
helm upgrade --install istiod istio/istiod -n istio-system --create-namespace \
--wait --version 1.13.2 \
--set global.meshID="cluster1" \
--set global.multiCluster.clusterName="cluster1" \
--set global.network="cluster1"
Now, It's time to expose the cluster with ingress or what so called edge router by installing istio ingressgateway. In my case, I prepare to use ALB instead of prepared loadbalancer by Istio π.
kubectl create namespace istio-ingress
kubectl label namespace istio-ingress istio-injection=enabled
helm upgrade --install istio-ingressgateway istio/gateway \
-n istio-ingress --create-namespace \
--version 1.13.2 --set service.type="NodePort"
Finally, create an ingress resource then associate the ingress to istio-ingressgateway NodePort service.
ingress.yaml
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: istio-alb-ingress
namespace: istio-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/healthcheck-path: /healthz/ready
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/certificate-arn: "<your-certificate-arn>"
alb.ingress.kubernetes.io/listen-ports: '[{ "HTTP": 80 }, { "HTTPS": 443 }]'
alb.ingress.kubernetes.io/security-groups: <your-security-group-id>
alb.ingress.kubernetes.io/scheme: internet-facing
#alb.ingress.kubernetes.io/target-type: instance
alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
alb.ingress.kubernetes.io/tags: Environment=Test,Provisioner=Kubernetes
labels:
app: "Istio"
ingress: "Istio"
spec:
rules:
- http:
paths:
- path: /*
backend:
serviceName: ssl-redirect
servicePort: use-annotation
- path: /healthz/ready
backend:
serviceName: istio-ingressgateway
servicePort: 15021
- path: /*
backend:
serviceName: istio-ingressgateway
servicePort: 443
Cluster2
Same steps applied to cluster2. But you must change the meshID, clusterName and network values on Istio Control plane chart.
Installing Istio base chart via:
helm upgrade --install istio-base istio/base \
--create-namespace -n istio-system \
--version 1.13.2 --wait
Installing Istio Control Plane:
helm upgrade --install istiod istio/istiod \
-n istio-system --create-namespace \
--wait --version 1.13.2 \
--set global.meshID="cluster2" \
--set global.multiCluster.clusterName="cluster2" \
--set global.network="cluster2"
On cluster2, We don't have to setup additional edge ingressgateway. Since, the connection will be started from cluster1. But, How can we distribute the traffic from cluster1 to cluster2Β ?
Answer: By exposing cluster services π‘
On cluster1, Create additional Loadbalancer by installing additional istio-ingressgateway.
helm upgrade --install istio-crossnetworkgateway istio/gateway \
-n istio-system --create-namespace --version 1.13.2
For cluster2:
helm upgrade --install istio-crossnetworkgateway istio/gateway \
-n istio-system --create-namespace --version 1.13.2
Exposing services for both cluster.
istio-exposeservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cross-network-gateway
namespace: istio-system
spec:
selector:
app: istio-crossnetworkgateway
servers:
- port:
number: 15443
name: tls
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
hosts:
- "*.local"
Now, Services are now exposed. But how does istio identify or discover resources from the other cluster?
We need to enable Endpoint Discovery.
On cluster1, I assume that your kubeconfig file is pointed to cluster1 context. This way, We can create istio secret file that can give access on both clusters in able for them to discover resources.
Create an istio secret for cluster2. This command should be done on cluster1 context:
istioctl x create-remote-secret --name=cluster1 > cluster2-secret.yaml
On cluster2, Create an istio secret for cluster1. This command should be done on cluster2 context:
istioctl x create-remote-secret --name=cluster2 > cluster1-secret.yaml
If you view the file, It's just a kubeconfig context from both cluster contexts enabling API Access.
Next, We should apply the secret to both clusters.
Cluster1:
apply -f cluster1-secret.yaml
Cluster2:
apply -f cluster2-secret.yaml
Last, but not least. Verify if your clusters has already a trust configuration.
diff \
<(export KUBECONFIG=$(pwd)/kubeconfig_cluster1.yaml && kubectl -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}') \
<(export KUBECONFIG=$(pwd)/kubeconfig_cluster2.yaml && kubectl -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}')
If there's no certificate found on both clusters. You can generate a self-signed root CA certificate.
Kindly, visit for more info: Generating self-signed root CA certificates
Generate Certificates
Istio provides basic security by default in able for the services not being accidentally exposed publicly. Istio will automatically drop client connection if the TLS handshake doesn't meet the requirements.
Because Istio verifies service-to-service communication by using Trust Configurations.
Creating a root-ca certificate.
cd istio-tool
mkdir -p certs
pushd certs
make -f ../Makefile.selfsigned.mk root-ca
Generate a cluster1 certificate.
make -f ../Makefile.selfsigned.mk cluster1-cacerts
Generate a cluster2 certificate.
make -f ../Makefile.selfsigned.mk cluster2-cacerts
Now, apply both the certificates on both cluster.
For Cluster1:
kubectl create secret generic cacerts -n istio-system \
--from-file=cluster1/ca-cert.pem \
--from-file=cluster1/ca-key.pem \
--from-file=cluster1/root-cert.pem \
--from-file=cluster1/cert-chain.pem
For Cluster2:
kubectl create secret generic cacerts -n istio-system \
--from-file=cluster2/ca-cert.pem \
--from-file=cluster2/ca-key.pem \
--from-file=cluster2/root-cert.pem \
--from-file=cluster2/cert-chain.pem
After applying all the necessary steps. The cluster1 and cluster2 should now be able to distribute traffic on both clusters.
3. Cross network gateway validation
After applying the necessary steps. Of course, you need to verify if it's actually working. I've created a basic application called MetaPod that allows you to extract the pod information or metadata through the web. So you can determine if your traffic is actually being forwarded to the 2nd cluster.
MetaPod sample deployment manifest.
For Cluster1, Try to deploy the test deployment.
Note: You must change the hosts values to make it work on your end.
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: metapod-gateway
spec:
selector:
istio: ingressgateway # use istio default controller
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "metapod.example.com"
tls:
mode: PASSTHROUGH
httpsRedirect: true # sends 301 redirect for http requests
- port:
number: 443
name: http-443
protocol: HTTP # http only since tls certificate is came from upstream (LoadBalancer) Level
hosts:
- "metapod.example.com"
tls:
mode: PASSTHROUGH
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: metapod
spec:
hosts:
- "metapod.example.com"
- "metapod.default.svc.cluster.local"
gateways:
- metapod-gateway
http:
- route:
- destination:
host: metapod.default.svc.cluster.local
port:
number: 80
retries:
attempts: 5
perTryTimeout: 5s
---
apiVersion: v1
kind: Service
metadata:
name: metapod
labels:
app: metapod
service: metapod
spec:
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: metapod
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metapod
spec:
replicas: 2
selector:
matchLabels:
app: metapod
version: v1
template:
metadata:
labels:
app: metapod
version: v1
spec:
containers:
- image: docker.io/redopsbay/metapod:latest
imagePullPolicy: IfNotPresent
name: metapod
ports:
- containerPort: 8080
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: POD_CPU_REQUEST
valueFrom:
resourceFieldRef:
containerName: metapod
resource: requests.cpu
- name: POD_CPU_LIMIT
valueFrom:
resourceFieldRef:
containerName: metapod
resource: limits.cpu
- name: POD_MEM_REQUEST
valueFrom:
resourceFieldRef:
containerName: metapod
resource: requests.memory
- name: POD_MEM_LIMIT
valueFrom:
resourceFieldRef:
containerName: metapod
resource: limits.memory
- name: CLUSTER_NAME
value: "Cluster1"
- name: GIN_MODE
value: "release"
For Cluster2:
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: metapod
spec:
hosts:
- "metapod.example.com"
- "metapod.default.svc.cluster.local"
gateways:
- metapod-gateway
http:
- route:
- destination:
host: metapod.default.svc.cluster.local
port:
number: 80
---
apiVersion: v1
kind: Service
metadata:
name: metapod
labels:
app: metapod
service: metapod
spec:
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: metapod
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metapod
spec:
replicas: 2
selector:
matchLabels:
app: metapod
version: v1
template:
metadata:
labels:
app: metapod
version: v1
spec:
containers:
- image: docker.io/redopsbay/metapod:latest
imagePullPolicy: IfNotPresent
name: metapod
ports:
- containerPort: 8080
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: POD_CPU_REQUEST
valueFrom:
resourceFieldRef:
containerName: metapod
resource: requests.cpu
- name: POD_CPU_LIMIT
valueFrom:
resourceFieldRef:
containerName: metapod
resource: limits.cpu
- name: POD_MEM_REQUEST
valueFrom:
resourceFieldRef:
containerName: metapod
resource: requests.memory
- name: POD_MEM_LIMIT
valueFrom:
resourceFieldRef:
containerName: metapod
resource: limits.memory
- name: CLUSTER_NAME
value: "Cluster2"
- name: GIN_MODE
value: "release"
After a few seconds, Try to visit the registered gateway on your end.
In my case, https://metapod.example.com and it should look like this:
As you can see under the CLUSTER NAME. Your traffic is forwarded to Cluster1. If you constantly refresh your browser page. You'll notice that your traffic is being forwarded also to Cluster2. See below:
Alright! That's it. You may encounter a lot of problems during your journey. But it's worth to try.
You can message me directly here or on my twitter account https://twitter.com/redopsbay if you need help.
I will try my best to help you out to fix it. π
π
π
Hope you like it. Cheers!!!π»π» Thanks!!!
Quick References
Posted on June 22, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.