Build own Kubernetes - ClusterIP Service networking
Jonatan Ezron
Posted on October 20, 2022
In the previous articles, we automated the node creation with Docker. Everything looked good, we saw we had a running node with a REST API agent and inside it a running pod, but what about the connection to the pod? the communication to the pod? if you try to send a request to the running pod in your environment it will not work, and even in the Node itself, only inside the running containerd to localhost, it will work.
So we need to solve this problem.
This part was very difficult for me, I needed to get deep into Linux namespaces and networking to solve it, this amazing blogs helped me a lot ontainer-networking-is-simple, some of the commands and drawing even was taken from there, I will explain the basics as we go along, but I recommend on reading this article before reading this, this article flow will be somewhat the same as iximiuz and msazure blogs.
All of the commands we will execute will be inside the Node container, so start by creating a node and entering into his shell:
❯ sudo ./bin/main node create
2022/10/14 16:00:16 node created: node-f8c7d6b8-ef24-4406-84e2-284e1f520bdf
2022/10/14 16:00:16 starting node
2022/10/14 16:00:16 node assign port: 49171
❯ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
73c64ab768c2 own-kube-node "./agent" 28 seconds ago Up 27 seconds 0.0.0.0:49171->10250/tcp node-f8c7d6b8-ef24-4406-84e2-284e1f520bdf
❯ sudo docker exec -it 73c /bin/bash
root@73c64ab768c2:/agent#
Now we are going to use the ip
, iptables
and ping
commands which are not installed in the container (later we will add them to the Dockerfile):
root@73c64ab768c2:/agent# apt-get install -y iproute2 iptables iputils-ping
Let's start and create 2 containers with ctr, we will use the whats-my-ip image so later it will be easy to test the connection.
root@73c64ab768c2:/agent# ctr image pull docker.io/cloudnativelabs/whats-my-ip:latest
root@73c64ab768c2:/agent# ctr run -d docker.io/cloudnativelabs/whats-my-ip:latest pod1
root@73c64ab768c2:/agent# ctr run -d docker.io/cloudnativelabs/whats-my-ip:latest pod2
root@73c64ab768c2:/agent# ctr c ls
CONTAINER IMAGE RUNTIME
pod1 docker.io/cloudnativelabs/whats-my-ip:latest io.containerd.runc.v2
pod2 docker.io/cloudnativelabs/whats-my-ip:latest io.containerd.runc.v2
root@73c64ab768c2:/agent# ctr t ls
TASK PID STATUS
pod2 974 RUNNING
pod1 920 RUNNING
We list the containers and tasks, as you can see there is no IP address assigned to the containers so we cant address them.
When a container is created for each instance a Linux namespace is created that isolates them from the host environment, our current state is looking like this:
The node has the eth0
interface for communication outside and the 2 containers have nothing.
So we need to create a virtual Ethernet device (veth
), this is a virtual device that Linux provides us so we will able to communicate between namespace, these veth
devices come in pairs:
root@73c64ab768c2:/agent# ip link add veth0 type veth peer name ceth0
Now we need one veth
in our namespace and another in the container namespace, the container namespaces are located in /proc/PID/ns/net
, so now we move ceth0
to the pod network namespace, activate the devices, and assign an IP address to ceth0
:
# move ceth0 to pod1 network namespace
root@73c64ab768c2:/agent# ip link set ceth0 netns /proc/920/ns/net
# activate ceth0 in the test network namespace
root@73c64ab768c2:/agent# nsenter --net=/proc/920/ns/net ip link set ceth0 up
# assign IP address to ceth0
root@73c64ab768c2:/agent# nsenter --net=/proc/920/ns/net ip addr add 10.0.1.2/16 dev ceth0
# activate veth0
root@73c64ab768c2:/agent# ip link set veth0 up
Now we will do the same for pod2 network and assign a different IP address:
root@73c64ab768c2:/agent# ip link add veth1 type veth peer name ceth1
root@73c64ab768c2:/agent# ip link set veth1 up
root@73c64ab768c2:/agent# ip link set ceth1 netns /proc/974/ns/net
root@73c64ab768c2:/agent# nsenter --net=/proc/974/ns/net ip link set ceth1 up
root@73c64ab768c2:/agent# nsenter --net=/proc/974/ns/net ip addr add 10.0.1.3/16 dev ceth1
Now our system looks like that:
Now we need to add a virtual bridge to separate the two networks we have created:
# creating a new bridge
root@73c64ab768c2:/agent# ip link add br0 type bridge
# activating the device
root@73c64ab768c2:/agent# ip link set br0 up
# connecting the two veth to the bridge
root@73c64ab768c2:/agent# ip link set veth0 master br0
root@73c64ab768c2:/agent# ip link set veth1 master br0
# assign an address to the bridge
root@73c64ab768c2:/agent# ip addr add 10.0.1.1/16 dev br0
our system will look like this:
We want also to make the bridge for each of the container's network its default gateway, we do this using the following commands:
root@73c64ab768c2:/agent# nsenter --net=/proc/920/ns/net ip route add default via 10.0.1.1
root@73c64ab768c2:/agent# nsenter --net=/proc/974/ns/net ip route add default via 10.0.1.1
Great we have two pods connected using a bridge and they both have two IP addresses.
Now we will focus on ClusterIP service, this service is for communication and load balancer inside the cluster between other pods across nodes.
Firstly we need to create a new node and two pods inside it and create the veth
and bridge
as we followed here, the IP is as followed:
pod3 10.0.2.2/16
pod4 10.0.2.3/16
bridge 10.0.2.1/16
# installing packages
apt-get install iproute2 iptables -y
# setup pod3 - 537 pid
ip link add veth0 type veth peer name ceth0
ip link set veth0 up
ip link set ceth0 netns /proc/537/ns/net
nsenter --net=/proc/537/ns/net ip link set ceth0 up
nsenter --net=/proc/537/ns/net ip addr add 10.0.2.2/16 dev ceth0
# setup pod2 - 595 pid
ip link add veth1 type veth peer name ceth1
ip link set veth1 up
ip link set ceth1 netns /proc/595/ns/net
nsenter --net=/proc/595/ns/net ip link set ceth1 up
nsenter --net=/proc/595/ns/net ip addr add 10.0.2.3/16 dev ceth1
# setup bridge
ip link add br0 type bridge
ip link set br0 up
ip addr add 10.0.2.1/16 dev br0
ip link set veth0 master br0
ip link set veth1 master br0
# setup default route for both pods
nsenter --net=/proc/537/ns/net ip route add default via 10.0.2.1
nsenter --net=/proc/595/ns/net ip route add default via 10.0.2.1
Our system is as followed:
Now we want to allow communication for the pods between nodes, we are doing this by using vxlan
, which is an extended vlan method for virtualizing a LAN connection between different hosts.
we create that with the following command in the two nodes:
# create vxlan
ip link add vxlan10 type vxlan id 10 group 239.1.1.1 dstport 0 dev eth0
# connect vxlan to the bridge
ip link set vxlan10 master br0
# activate the vxlan
ip link set vxlan10 up
Our system is as followed:
Now for creating a ClusterIP service, we need to add iptables rules.
We will use the nat
table and DNAT
chain to address every request to the cluster IP (virtual selected IP) to a pod the command will look like this:
iptables -t nat -A PREROUTING -d $CLUSTER_IP -p tcp -m tcp --dport $CLUSTER_IP_PORT -j DNAT --to-destination $POD_IP4:$POD_PORT4
Now we want to include also load balancing using the round-robin method, iptables
provide a solution for this too, using statistics, to forward each request by every option.
The full command looks like this:
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -m statistic --mode nth --every 4 --packet 0 -j DNAT --to-destination 10.0.1.2:8080
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -m statistic --mode nth --every 3 --packet 0 -j DNAT --to-destination 10.0.1.3:8080
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -m statistic --mode nth --every 2 --packet 0 -j DNAT --to-destination 10.0.2.2:8080
iptables -t nat -A PREROUTING -d 172.17.10.10 -p tcp -m tcp --dport 3001 -j DNAT --to-destination 10.0.2.3:8080
In the above commands, we establish four rules for every request to a virtual cluster IP load balance to our four pods using the round-robin method.
For testing the ClusterIP service we create a new pod in one of the nodes containing simple Linux like alpine, connect it to the network and bridge, and make a request, for each request you can see there is a different IP response!
In alpine curl
is not installed by default and we don't have a communication to the outside world so we need to use wget
:
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html 100% |********************************| 34 0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.1.2
rm index.html
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html 100% |********************************| 34 0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.1.3
rm index.html
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html 100% |********************************| 34 0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.2.2
rm index.html
wget http://172.17.10.10:3001
Connecting to 172.17.10.10:3001 (172.17.10.10:3001)
saving to 'index.html'
index.html 100% |********************************| 34 0:00:00 ETA
'index.html' saved
cat index.html
HOSTNAME:91cda5c7dade IP:10.0.2.3
We covered here a lot of Linux networking, I recommend if you wonder how I got to these commands and structure read the article I mentioned above. reference
All the networking we did above is done for every pod creation and deletion, for this, Kubernetes created some plugin for the containerd ctr
called cni
, I have decided to work on my own to learn better the underline of things.
We node have communication between pods but not communication to the outside world, in the next article, we will make a NodePort service!
Posted on October 20, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.