Setup Multi-Cluster ServiceMesh with Istio on EKS

1. Provisioning
2. Istio Setup with Helm chart
3. Cross network gateway validation

Hey! In this post, We will be exploring a technology called ServiceMesh powered by Istio.

Short Intro on Non-Istio users

Istio is an open source service mesh that layers transparently onto existing distributed applications.

Istio's powerful features provide a uniform and more efficient way to secure, connect, and monitor services. Its powerful control plane brings vital features, including:

Secure service-to-service communication in a cluster with TLS encryption, strong identity-based authentication and authorization.

Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic

Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection

A pluggable policy layer and configuration API supporting access controls, rate limits and quotas

Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress

Most of you are already familiar with Istio. Since, Kubernetes federation is currently not yet available and the latest version is on a Beta version and you want to distribute your traffic across different clusters with production grade deployment.

So, Let's quickly go through the Step-by-Step procedure to implement Multi Cluster Deployment with Istio.

This tutorial is highly based on AWS and Terraform and also Helm Charts.

Provisioning

Required resources

EKS Clusters

Security Group / Rule

S3 Bucket for terraform states

IAM Roles

IAM Permissions

AWS ALB (Application LoadBalancer)

IAM Role

Create your IAM Role for your EKS Cluster.

iam.tf



resource "aws_iam_role" "eks_iam_role" {
  name = "AmazonAwsEksRole"
  assume_role_policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : {
          "Service" : "eks.amazonaws.com"
        },
        "Action" : "sts:AssumeRole"
      }
    ]
  })
  tags = local.default_tags
}

resource "aws_iam_policy_attachment" "eksClusterPolicyAttachmentDefault" {
  name       = "eksClusterPolicyAttachmentDefault"
  roles      = [aws_iam_role.eks_iam_role.name]
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}

resource "aws_iam_role" "eks_iam_node_role" {
  name = "AmazonAwsEksNodeRole"
  assume_role_policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : {
          "Service" : "ec2.amazonaws.com"
        },
        "Action" : "sts:AssumeRole"
      }
    ]
  })

  tags = local.default_tags
  depends_on = [
    aws_iam_role.eks_iam_role,
    aws_iam_policy_attachment.eksClusterPolicyAttachmentDefault
  ]
}
resource "aws_iam_policy_attachment" "AmazonEKSWorkerNodePolicyAttachment" {
  name       = "AmazonEKSWorkerNodePolicyAttachment"
  roles      = [aws_iam_role.eks_iam_node_role.name]
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}

resource "aws_iam_policy_attachment" "AmazonEC2ContainerRegistryReadOnlyAttachment" {
  name       = "AmazonEC2ContainerRegistryReadOnlyAttachment"
  roles      = [aws_iam_role.eks_iam_node_role.name]
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}

resource "aws_iam_policy_attachment" "AmazonEKSCNIPolicyAttachment" {
  name       = "AmazonEKSCNIPolicyAttachment"
  roles      = [aws_iam_role.eks_iam_node_role.name]
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}

Security Group

AWS Security Group is required in able for your cluster to communicate.

securitygroup.tf



resource "aws_security_group" "cluster_sg" {
  name        = "cluster-security-group"
  description = "Communication with Worker Nodes"
  vpc_id      = var.vpc_id
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port = 0
    to_port   = 0
    protocol  = "-1"
    self      = true
  }

  ingress {
    from_port = 0
    to_port   = 0
    protocol  = "-1"
  }

  tags = local.default_tags
}


resource "aws_security_group" "cp_sg" {
  name        = "cp-sg"
  description = "CP and Nodegroup communication"
  vpc_id      = var.vpc_id
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "Allow all"
    cidr_blocks = ["0.0.0.0/0"]
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
  }

  tags = local.default_tags
}

resource "aws_security_group" "wrkr_node" {
  name        = "worker-sg"
  description = "Worker Node SG"
  vpc_id      = var.vpc_id
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "Allow All"
    cidr_blocks = ["0.0.0.0/0"]
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
  }

  ingress {
    description = "Self Communication"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    self        = true
  }

  tags = local.default_tags
}

EKS Clusters

In this section, you will provision the 2 EKS Cluster.

eks.tf



locals {
  default_tags = {
    Provisioner = "Terraform"
    Environment = "Testing"
  }
}

resource "aws_eks_cluster" "istio_service_mesh_primary_1" {
  name     = "istio-service-mesh-primary-1"
  role_arn = aws_iam_role.eks_iam_role.arn

  vpc_config {
    subnet_ids          = var.subnet_ids
    public_access_cidrs = ["0.0.0.0/0"]
    security_group_ids = [
      aws_security_group.cluster_sg.id,
      aws_security_group.cp_sg.id
    ]
  }
  version = "1.21"

  timeouts {
    create = "15m"
  }
  depends_on = [
    aws_iam_role.eks_iam_role,
    aws_iam_role.eks_iam_node_role,
    aws_security_group.cluster_sg,
    aws_security_group.cp_sg,
    aws_security_group.wrkr_node
  ]
}

resource "aws_eks_cluster" "istio_service_mesh_primary_2" {
  name     = "istio-service-mesh-primary-2"
  role_arn = aws_iam_role.eks_iam_role.arn

  vpc_config {
    subnet_ids          = var.subnet_ids
    public_access_cidrs = ["0.0.0.0/0"]
    security_group_ids = [
      aws_security_group.cluster_sg.id,
      aws_security_group.cp_sg.id
    ]
  }
  version = "1.21"

  timeouts {
    create = "15m"
  }
  tags = local.default_tags
  depends_on = [
    aws_iam_role.eks_iam_role,
    aws_iam_role.eks_iam_node_role,
    aws_iam_policy_attachment.eksClusterPolicyAttachmentDefault,
    aws_security_group.cluster_sg,
    aws_security_group.cp_sg,
    aws_security_group.wrkr_node
  ]
}


resource "aws_eks_addon" "eks_addon_vpc-cni" {
  cluster_name = aws_eks_cluster.istio_service_mesh_primary_1.name
  addon_name   = "vpc-cni"
  depends_on = [
    aws_iam_role.eks_iam_role,
    aws_iam_role.eks_iam_node_role,
    aws_security_group.cluster_sg,
    aws_security_group.cp_sg,
    aws_security_group.wrkr_node,
    aws_eks_cluster.istio_service_mesh_primary_1
  ]
}

resource "aws_eks_addon" "eks_addon_vpc-cni_2" {
  cluster_name = aws_eks_cluster.istio_service_mesh_primary_2.name
  addon_name   = "vpc-cni"
  depends_on = [
    aws_iam_role.eks_iam_role,
    aws_iam_role.eks_iam_node_role,
    aws_security_group.cluster_sg,
    aws_security_group.cp_sg,
    aws_security_group.wrkr_node,
    aws_eks_cluster.istio_service_mesh_primary_2
  ]
}

resource "aws_eks_node_group" "istio_service_mesh_primary_worker_group_1" {
  cluster_name    = aws_eks_cluster.istio_service_mesh_primary_1.name
  node_group_name = "istio-service-mesh-primary-worker-group-1"
  node_role_arn   = aws_iam_role.eks_iam_node_role.arn
  subnet_ids      = var.subnet_ids

  remote_access {
    ec2_ssh_key               = var.ssh_key
    source_security_group_ids = [aws_security_group.wrkr_node.id]
  }

  scaling_config {
    desired_size = 2
    max_size     = 3
    min_size     = 2
  }
  instance_types = ["t3.medium"]

  update_config {
    max_unavailable = 1
  }
  depends_on = [
    aws_iam_role.eks_iam_role,
    aws_iam_role.eks_iam_node_role,
    aws_security_group.cluster_sg,
    aws_security_group.cp_sg,
    aws_security_group.wrkr_node,
    aws_eks_cluster.istio_service_mesh_primary_1,
    aws_eks_addon.eks_addon_vpc-cni
  ]

  timeouts {
    create = "15m"
  }
  tags = local.default_tags

}


resource "aws_eks_node_group" "istio_service_mesh_primary_worker_group_2" {
  cluster_name    = aws_eks_cluster.istio_service_mesh_primary_2.name
  node_group_name = "istio-service-mesh-primary-worker-group-2"
  node_role_arn   = aws_iam_role.eks_iam_node_role.arn
  subnet_ids      = var.subnet_ids
  remote_access {
    ec2_ssh_key               = var.ssh_key
    source_security_group_ids = [aws_security_group.wrkr_node.id]
  }

  scaling_config {
    desired_size = 2
    max_size     = 3
    min_size     = 2
  }
  instance_types = ["t3.medium"]
  update_config {
    max_unavailable = 1
  }
  depends_on = [
    aws_eks_cluster.istio_service_mesh_primary_2
  ]

  timeouts {
    create = "15m"
  }
  tags = local.default_tags
}

After creating necessary tf configuration. It's now time to apply it.

First, Create a tf workspace.



terraform workspace new istio-service-mesh

Next, Verify if your tf configuration is smooth.



terraform init
terraform workspace select istio-service-mesh
terraform fmt
terraform validate
terraform plan -out='plan.out'

Then, Apply it.



terraform apply 'plan.out'

It is now provisioning:

After 20 minutes or more, Your cluster's is ready!

2. Istio Setup with Helm chart

It's now time to install Istio on both clusters.

Required Charts

Istio Base helm chart
Istiod helm chart
Istio ingress gateway helm chart

First, Add the helm istio repository via:



helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update

Cluster1

Installing Istio Base helm chart via:



helm upgrade --install istio-base istio/base \
    --create-namespace -n istio-system \
    --version 1.13.2 --wait

Now, Istio based is now installed. Next one is the Istio Control Plane.
Note: You must specify the meshID, clusterName and network to uniquely identify your clusters when installing Istio control Plane.



helm upgrade --install istiod istio/istiod -n istio-system --create-namespace \
    --wait --version 1.13.2 \
    --set global.meshID="cluster1" \
    --set global.multiCluster.clusterName="cluster1" \
    --set global.network="cluster1"

Now, It's time to expose the cluster with ingress or what so called edge router by installing istio ingressgateway. In my case, I prepare to use ALB instead of prepared loadbalancer by Istio 😆.



kubectl create namespace istio-ingress
kubectl label namespace istio-ingress istio-injection=enabled

helm upgrade --install istio-ingressgateway istio/gateway \
     -n istio-ingress --create-namespace \
     --version 1.13.2 --set service.type="NodePort"

Finally, create an ingress resource then associate the ingress to istio-ingressgateway NodePort service.

ingress.yaml



apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: istio-alb-ingress
  namespace: istio-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/healthcheck-path: /healthz/ready
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/certificate-arn: "<your-certificate-arn>"
    alb.ingress.kubernetes.io/listen-ports: '[{ "HTTP": 80 }, { "HTTPS": 443 }]'
    alb.ingress.kubernetes.io/security-groups: <your-security-group-id>
    alb.ingress.kubernetes.io/scheme: internet-facing
    #alb.ingress.kubernetes.io/target-type: instance
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    alb.ingress.kubernetes.io/tags: Environment=Test,Provisioner=Kubernetes
  labels:
    app:  "Istio"
    ingress: "Istio"
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: ssl-redirect
              servicePort: use-annotation
          - path: /healthz/ready
            backend:
              serviceName: istio-ingressgateway
              servicePort: 15021
          - path: /*
            backend:
              serviceName: istio-ingressgateway
              servicePort: 443

Cluster2

Same steps applied to cluster2. But you must change the meshID, clusterName and network values on Istio Control plane chart.

Installing Istio base chart via:



helm upgrade --install istio-base istio/base \
    --create-namespace -n istio-system \
    --version 1.13.2 --wait

Installing Istio Control Plane:



helm upgrade --install istiod istio/istiod \
    -n istio-system --create-namespace \
    --wait --version 1.13.2 \
    --set global.meshID="cluster2" \
    --set global.multiCluster.clusterName="cluster2" \
    --set global.network="cluster2"

On cluster2, We don't have to setup additional edge ingressgateway. Since, the connection will be started from cluster1. But, How can we distribute the traffic from cluster1 to cluster2 ?

Answer: By exposing cluster services 💡

On cluster1, Create additional Loadbalancer by installing additional istio-ingressgateway.



helm upgrade --install istio-crossnetworkgateway istio/gateway \
     -n istio-system --create-namespace --version 1.13.2

For cluster2:



helm upgrade --install istio-crossnetworkgateway istio/gateway \
     -n istio-system --create-namespace --version 1.13.2

Exposing services for both cluster.
istio-exposeservice.yaml



apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cross-network-gateway
  namespace: istio-system
spec:
  selector:
    app: istio-crossnetworkgateway
  servers:
    - port:
        number: 15443
        name: tls
        protocol: TLS
      tls:
        mode: AUTO_PASSTHROUGH
      hosts:
        - "*.local"

Now, Services are now exposed. But how does istio identify or discover resources from the other cluster?

We need to enable Endpoint Discovery.

On cluster1, I assume that your kubeconfig file is pointed to cluster1 context. This way, We can create istio secret file that can give access on both clusters in able for them to discover resources.

Create an istio secret for cluster2. This command should be done on cluster1 context:



istioctl x create-remote-secret --name=cluster1 > cluster2-secret.yaml

On cluster2, Create an istio secret for cluster1. This command should be done on cluster2 context:



istioctl x create-remote-secret --name=cluster2 > cluster1-secret.yaml

If you view the file, It's just a kubeconfig context from both cluster contexts enabling API Access.

Next, We should apply the secret to both clusters.
Cluster1:


 apply -f cluster1-secret.yaml

Cluster2:


 apply -f cluster2-secret.yaml

Last, but not least. Verify if your clusters has already a trust configuration.



diff \
   <(export KUBECONFIG=$(pwd)/kubeconfig_cluster1.yaml && kubectl -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}') \
   <(export KUBECONFIG=$(pwd)/kubeconfig_cluster2.yaml && kubectl -n istio-system get secret cacerts -ojsonpath='{.data.root-cert\.pem}')

If there's no certificate found on both clusters. You can generate a self-signed root CA certificate.

Kindly, visit for more info: Generating self-signed root CA certificates

Generate Certificates

Istio provides basic security by default in able for the services not being accidentally exposed publicly. Istio will automatically drop client connection if the TLS handshake doesn't meet the requirements.

Because Istio verifies service-to-service communication by using Trust Configurations.

Creating a root-ca certificate.



cd istio-tool
mkdir -p certs
pushd certs
make -f ../Makefile.selfsigned.mk root-ca

Generate a cluster1 certificate.



make -f ../Makefile.selfsigned.mk cluster1-cacerts

Generate a cluster2 certificate.



make -f ../Makefile.selfsigned.mk cluster2-cacerts

Now, apply both the certificates on both cluster.
For Cluster1:



kubectl create secret generic cacerts -n istio-system \
      --from-file=cluster1/ca-cert.pem \
      --from-file=cluster1/ca-key.pem \
      --from-file=cluster1/root-cert.pem \
      --from-file=cluster1/cert-chain.pem

For Cluster2:



kubectl create secret generic cacerts -n istio-system \
      --from-file=cluster2/ca-cert.pem \
      --from-file=cluster2/ca-key.pem \
      --from-file=cluster2/root-cert.pem \
      --from-file=cluster2/cert-chain.pem

After applying all the necessary steps. The cluster1 and cluster2 should now be able to distribute traffic on both clusters.

3. Cross network gateway validation

After applying the necessary steps. Of course, you need to verify if it's actually working. I've created a basic application called MetaPod that allows you to extract the pod information or metadata through the web. So you can determine if your traffic is actually being forwarded to the 2nd cluster.

MetaPod sample deployment manifest.

For Cluster1, Try to deploy the test deployment.
Note: You must change the hosts values to make it work on your end.



---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: metapod-gateway
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "metapod.example.com"
    tls:
      mode: PASSTHROUGH
      httpsRedirect: true # sends 301 redirect for http requests
  - port:
      number: 443
      name: http-443
      protocol: HTTP  # http only since tls certificate is came from upstream (LoadBalancer) Level
    hosts:
    - "metapod.example.com"
    tls:
      mode: PASSTHROUGH
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: metapod
spec:
  hosts:
  - "metapod.example.com"
  - "metapod.default.svc.cluster.local"
  gateways:
  - metapod-gateway
  http:
  - route:
    - destination:
        host: metapod.default.svc.cluster.local
        port:
          number: 80
    retries:
      attempts: 5
      perTryTimeout: 5s

---
apiVersion: v1
kind: Service
metadata:
  name: metapod
  labels:
    app: metapod
    service: metapod
spec:
  ports:
  - name: http
    port: 80
    targetPort: 8080
  selector:
    app: metapod
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metapod
spec:
  replicas: 2
  selector:
    matchLabels:
      app: metapod
      version: v1
  template:
    metadata:
      labels:
        app: metapod
        version: v1
    spec:
      containers:
      - image: docker.io/redopsbay/metapod:latest
        imagePullPolicy: IfNotPresent
        name: metapod
        ports:
        - containerPort: 8080
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: POD_SERVICE_ACCOUNT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: POD_CPU_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: requests.cpu
        - name: POD_CPU_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: limits.cpu
        - name: POD_MEM_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: requests.memory
        - name: POD_MEM_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: limits.memory
        - name: CLUSTER_NAME
          value: "Cluster1"
        - name: GIN_MODE
          value: "release"

For Cluster2:



---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: metapod
spec:
  hosts:
  - "metapod.example.com"
  - "metapod.default.svc.cluster.local"
  gateways:
  - metapod-gateway
  http:
  - route:
    - destination:
        host: metapod.default.svc.cluster.local
        port:
          number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: metapod
  labels:
    app: metapod
    service: metapod
spec:
  ports:
  - name: http
    port: 80
    targetPort: 8080
  selector:
    app: metapod
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metapod
spec:
  replicas: 2
  selector:
    matchLabels:
      app: metapod
      version: v1
  template:
    metadata:
      labels:
        app: metapod
        version: v1
    spec:
      containers:
      - image: docker.io/redopsbay/metapod:latest
        imagePullPolicy: IfNotPresent
        name: metapod
        ports:
        - containerPort: 8080
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: POD_SERVICE_ACCOUNT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: POD_CPU_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: requests.cpu
        - name: POD_CPU_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: limits.cpu
        - name: POD_MEM_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: requests.memory
        - name: POD_MEM_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: metapod
              resource: limits.memory
        - name: CLUSTER_NAME
          value: "Cluster2"
        - name: GIN_MODE
          value: "release"

After a few seconds, Try to visit the registered gateway on your end.

In my case, https://metapod.example.com and it should look like this:

As you can see under the CLUSTER NAME. Your traffic is forwarded to Cluster1. If you constantly refresh your browser page. You'll notice that your traffic is being forwarded also to Cluster2. See below:

Alright! That's it. You may encounter a lot of problems during your journey. But it's worth to try.

You can message me directly here or on my twitter account https://twitter.com/redopsbay if you need help.

I will try my best to help you out to fix it. 😅😅😅
Hope you like it. Cheers!!!🍻🍻 Thanks!!!

Blog