Build own Kubernetes - ClusterIP Service networking

jonatan5524

Jonatan Ezron

Posted on October 20, 2022

Build own Kubernetes - ClusterIP Service networking

In the previous articles, we automated the node creation with Docker. Everything looked good, we saw we had a running node with a REST API agent and inside it a running pod, but what about the connection to the pod? the communication to the pod? if you try to send a request to the running pod in your environment it will not work, and even in the Node itself, only inside the running containerd to localhost, it will work.
So we need to solve this problem.
This part was very difficult for me, I needed to get deep into Linux namespaces and networking to solve it, this amazing blogs helped me a lot ontainer-networking-is-simple, some of the commands and drawing even was taken from there, I will explain the basics as we go along, but I recommend on reading this article before reading this, this article flow will be somewhat the same as iximiuz and msazure blogs.


All of the commands we will execute will be inside the Node container, so start by creating a node and entering into his shell:

sudo ./bin/main node create
2022/10/14 16:00:16 node created: node-f8c7d6b8-ef24-4406-84e2-284e1f520bdf
2022/10/14 16:00:16 starting node
2022/10/14 16:00:16 node assign port: 49171
❯ sudo docker ps
CONTAINER ID   IMAGE           COMMAND     CREATED          STATUS          PORTS                      NAMES
73c64ab768c2   own-kube-node   "./agent"   28 seconds ago   Up 27 seconds   0.0.0.0:49171->10250/tcp   node-f8c7d6b8-ef24-4406-84e2-284e1f520bdf
❯ sudo docker exec -it 73c /bin/bash
root@73c64ab768c2:/agent#
Enter fullscreen mode Exit fullscreen mode

Now we are going to use the ip, iptables and ping commands which are not installed in the container (later we will add them to the Dockerfile):

root@73c64ab768c2:/agent# apt-get install -y iproute2 iptables iputils-ping
Enter fullscreen mode Exit fullscreen mode

Let's start and create 2 containers with ctr, we will use the whats-my-ip image so later it will be easy to test the connection.

root@73c64ab768c2:/agent# ctr image pull docker.io/cloudnativelabs/whats-my-ip:latest
root@73c64ab768c2:/agent# ctr run -d docker.io/cloudnativelabs/whats-my-ip:latest pod1
root@73c64ab768c2:/agent# ctr run -d docker.io/cloudnativelabs/whats-my-ip:latest pod2
root@73c64ab768c2:/agent# ctr c ls
CONTAINER    IMAGE                                           RUNTIME
pod1         docker.io/cloudnativelabs/whats-my-ip:latest    io.containerd.runc.v2
pod2         docker.io/cloudnativelabs/whats-my-ip:latest    io.containerd.runc.v2
root@73c64ab768c2:/agent# ctr t ls
TASK    PID     STATUS
pod2    974     RUNNING
pod1    920     RUNNING
Enter fullscreen mode Exit fullscreen mode

We list the containers and tasks, as you can see there is no IP address assigned to the containers so we cant address them.
When a container is created for each instance a Linux namespace is created that isolates them from the host environment, our current state is looking like this:

Image description

The node has the eth0 interface for communication outside and the 2 containers have nothing.
So we need to create a virtual Ethernet device (veth), this is a virtual device that Linux provides us so we will able to communicate between namespace, these veth devices come in pairs:

root@73c64ab768c2:/agent# ip link add veth0 type veth peer name ceth0
Enter fullscreen mode Exit fullscreen mode

Now we need one veth in our namespace and another in the container namespace, the container namespaces are located in /proc/PID/ns/net, so now we move ceth0 to the pod network namespace, activate the devices, and assign an IP address to ceth0:

# move ceth0 to pod1 network namespace
root@73c64ab768c2:/agent# ip link set ceth0 netns /proc/920/ns/net
# activate ceth0 in the test network namespace
root@73c64ab768c2:/agent# nsenter --net=/proc/920/ns/net ip link set ceth0 up
# assign IP address to ceth0
root@73c64ab768c2:/agent# nsenter --net=/proc/920/ns/net ip addr add 10.0.1.2/16 dev ceth0
# activate veth0 
root@73c64ab768c2:/agent# ip link set veth0 up
Enter fullscreen mode Exit fullscreen mode

Now we will do the same for pod2 network and assign a different IP address:

root@73c64ab768c2:/agent# ip link add veth1 type veth peer name ceth1
root@73c64ab768c2:/agent# ip link set veth1 up
root@73c64ab768c2:/agent# ip link set ceth1 netns /proc/974/ns/net
root@73c64ab768c2:/agent# nsenter --net=/proc/974/ns/net ip link set ceth1 up
root@73c64ab768c2:/agent# nsenter --net=/proc/974/ns/net ip addr add 10.0.1.3/16 dev ceth1
Enter fullscreen mode Exit fullscreen mode

Now our system looks like that:

Image description

Now we need to add a virtual bridge to separate the two networks we have created:

# creating a new bridge
root@73c64ab768c2:/agent# ip link add br0 type bridge
# activating the device
root@73c64ab768c2:/agent# ip link set br0 up
# connecting the two veth to the bridge
root@73c64ab768c2:/agent# ip link set veth0 master br0
root@73c64ab768c2:/agent# ip link set veth1 master br0
# assign an address to the bridge
root@73c64ab768c2:/agent# ip addr add 10.0.1.1/16 dev br0
Enter fullscreen mode Exit fullscreen mode

our system will look like this:

Image description

We want also to make the bridge for each of the container's network its default gateway, we do this using the following commands:

root@73c64ab768c2:/agent# nsenter --net=/proc/920/ns/net ip route add default via 10.0.1.1
root@73c64ab768c2:/agent# nsenter --net=/proc/974/ns/net ip route add default via 10.0.1.1
Enter fullscreen mode Exit fullscreen mode

Great we have two pods connected using a bridge and they both have two IP addresses.


Now we will focus on ClusterIP service, this service is for communication and load balancer inside the cluster between other pods across nodes.

Firstly we need to create a new node and two pods inside it and create the veth and bridge as we followed here, the IP is as followed:
pod3 10.0.2.2/16
pod4 10.0.2.3/16
bridge 10.0.2.1/16

# installing packages
apt-get install iproute2 iptables -y

# setup pod3 - 537 pid
ip link add veth0 type veth peer name ceth0
ip link set veth0 up
ip link set ceth0 netns /proc/537/ns/net
nsenter --net=/proc/537/ns/net ip link set ceth0 up
nsenter --net=/proc/537/ns/net ip addr add 10.0.2.2/16 dev ceth0

# setup pod2 - 595 pid
ip link add veth1 type veth peer name ceth1
ip link set veth1 up
ip link set ceth1 netns /proc/595/ns/net
nsenter --net=/proc/595/ns/net ip link set ceth1 up
nsenter --net=/proc/595/ns/net ip addr add 10.0.2.3/16 dev ceth1

# setup bridge
ip link add br0 type bridge
ip link set br0 up
ip addr add 10.0.2.1/16 dev br0
ip link set veth0 master br0
ip link set veth1 master br0

# setup default route for both pods
nsenter --net=/proc/537/ns/net ip route add default via 10.0.2.1
nsenter --net=/proc/595/ns/net ip route add default via 10.0.2.1
Enter fullscreen mode Exit fullscreen mode

Our system is as followed:

Image description

Now we want to allow communication for the pods between nodes, we are doing this by using vxlan, which is an extended vlan method for virtualizing a LAN connection between different hosts.
we create that with the following command in the two nodes:

# create vxlan
ip link add vxlan10 type vxlan id 10 group 239.1.1.1 dstport 0 dev eth0
# connect vxlan to the bridge
ip link set vxlan10 master br0
# activate the vxlan
ip link set vxlan10 up
Enter fullscreen mode Exit fullscreen mode

Our system is as followed:

Image description

Now for creating a ClusterIP service, we need to add iptables rules.
We will use the nat table and DNAT chain to address every request to the cluster IP (virtual selected IP) to a pod the command will look like this:

iptables -t nat -A PREROUTING -d $CLUSTER_IP -p tcp -m tcp --dport $CLUSTER_IP_PORT -j DNAT --to-destination $POD_IP4:$POD_PORT4 
Enter fullscreen mode Exit fullscreen mode

Now we want to include also load balancing using the round-robin method, iptables provide a solution for this too, using statistics, to forward each request by every option.
The full command looks like this:

iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -m statistic --mode nth --every 4 --packet 0  -j DNAT --to-destination 10.0.1.2:8080
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -m statistic --mode nth --every 3 --packet 0  -j DNAT --to-destination 10.0.1.3:8080
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -m statistic --mode nth --every 2 --packet 0  -j DNAT --to-destination 10.0.2.2:8080
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -j DNAT --to-destination 10.0.2.3:8080
Enter fullscreen mode Exit fullscreen mode

In the above commands, we establish four rules for every request to a virtual cluster IP load balance to our four pods using the round-robin method.

For testing the ClusterIP service we create a new pod in one of the nodes containing simple Linux like alpine, connect it to the network and bridge, and make a request, for each request you can see there is a different IP response!
In alpine curl is not installed by default and we don't have a communication to the outside world so we need to use wget:

wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html           100% |********************************|    34  0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.1.2
rm index.html
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html           100% |********************************|    34  0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.1.3
rm index.html
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html           100% |********************************|    34  0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.2.2
rm index.html
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html           100% |********************************|    34  0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.2.3
Enter fullscreen mode Exit fullscreen mode

We covered here a lot of Linux networking, I recommend if you wonder how I got to these commands and structure read the article I mentioned above. reference

All the networking we did above is done for every pod creation and deletion, for this, Kubernetes created some plugin for the containerd ctr called cni, I have decided to work on my own to learn better the underline of things.

We node have communication between pods but not communication to the outside world, in the next article, we will make a NodePort service!

💖 💪 🙅 🚩
jonatan5524
Jonatan Ezron

Posted on October 20, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related