Introduction to Netlink with Go

sn3d

Zdenko Vrabel

Posted on April 28, 2024

Introduction to Netlink with Go

A few years back, I got really interested in understanding how Docker and containers work. One area that grabbed my attention was containers and networking. Docker handles networking pretty simply. It mainly relies on virtual interfaces, bridges, NAT routing, and other features provided by the Linux operating system. But I won't be focusing on Docker and networking in this article.

As I delved deeper, I became curious about how Docker do all this networking. If you're like me—a developer who spends almost all their time in user space—you're probably familiar with interacting with the kernel through syscalls or the filesystem (like /proc). At first, I had this naive idea that OS had some kind of magic syscall or something in /proc that Docker was tapping into. But nope, I was way off. In Docker's code, I discovered another method: Netlink.

So, what is Netlink? It's another way of exchanging information between user space and the kernel. But this time, the socket-like way. Think of Netlink as a direct socket connection into the kernel, allowing you to send and receive messages. This approach is quite interesting, because I can do asynchronous communication with the kernel or simply listening for messages from the kernel in user space.

netlink

With Netlink, I can communicate with various kernel subsystems. For example, I can receive events from SELinux, updates about routing or network links, and even modify routing tables and IP addresses.

If you're comfortable with basic socket programming like me, handling Netlink could be easy to understand. All you need to do is open a socket to the kernel, address the subsystem, and send or receive binary messages.

Bring interface UP

Let's get practical here. Let's start with something super basic—like a Netlink hello world. One of the simplest examples I could think of is enabling an network interface. On my system, I've got this veth0 interface sitting there, in the DOWN state:

$ ip link
...
78: veth0@veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether de:8f:4e:7e:9c:cd brd ff:ff:ff:ff:ff:ff
Enter fullscreen mode Exit fullscreen mode

I'd like to write my own simple Go program to execute ip link set veth1 up.

One note before I start: Since I want to use syscalls and related data structures, I'll need to install the Go golang.org/x/sys package.

Let's begin by establishing the socket. The Netlink socket is created using unix.Socket():

// open the Netlink socket 
sock, err := unix.Socket(
    unix.AF_NETLINK,
    unix.SOCK_RAW,
    unix.NETLINK_ROUTE,
)

if err != nil {
    fmt.Printf("Error creating socket: %s\n", err)
    return
}

defer unix.Close(sock)
Enter fullscreen mode Exit fullscreen mode

Instead of the AF_INET domain, which we typically use for TCP/IP communication, I'm using AF_NETLINK. What's also interesting here is the last parameter. This value determines which subsystem I want to communicate with. It could be NETLINK_NETFILTER, NETLINK_SELINUX, and so on. I'm opting for NETLINK_ROUTE, which is dedicated to interfaces, links, IP addresses, and such.

However, just having the socket isn't enough. In TCP/IP communication, we associate a socket with a specific network address using bind(). Similarly, in Netlink, I'll set what group and port ID (PID) I want to use. For simplicity, I'll use values of 0.

// bind the socket to group and PID
err = unix.Bind(sock, &unix.SockaddrNetlink{
    Family: unix.AF_NETLINK,
    Groups: 0,
    Pid: 0,
})

if err != nil {
    fmt.Printf("Error in binding socket: %s\n", err)
    return
}
Enter fullscreen mode Exit fullscreen mode

Create and send message

The easy part is done. At this point, I've got everything ready for sending and receiving messages. Now comes the tricky part—building and parsing Netlink messages. Like in many binary protocols, the Netlink message consists of a header and payload.

Message

Let's start from the end—with the payload. The payload is all about what I want to do or get from Netlink. In my case, it's about enabling a network interface. Therefore, I'll use IfInfomsg, where I'll set the Change field to IFF_UP. The good news is that the structure is available in the golang.org/x/sys package, so I don't need to write it from scratch.

payload := unix.IfInfomsg{
    Family: unix.AF_UNSPEC,
    Change: unix.IFF_UP,
    Flags:  unix.IFF_UP,
    Index:  int32(ethIndex), // index of network interface I would like to enable (in my case it's 79 - veth1)
}
Enter fullscreen mode Exit fullscreen mode

Then, I need to build a header. The header carries information like the type of the payload or the total length of the whole message. The structure for the header is NlMsghdr. The type I need to set is RTM_NEWLINK, which is related to the IfInfomsg payload.

// total length of message is size of header + size of payload
length := unix.SizeofNlMsghdr + unix.SizeofIfInfomsg

header := unix.NlMsghdr{
    Len:   uint32(length),
    Type:  uint16(unix.RTM_NEWLINK),
    Flags: uint16(unix.NLM_F_REQUEST) | uint16(unix.NLM_F_ACK),
    Seq:   1,
}
Enter fullscreen mode Exit fullscreen mode

Alright, the message should be almost ready. I just need to put the header and payload into one message structure. I'll create an anonymous structure and fill it with the payload and header:

msg := struct {
   header unix.NlMsghdr
   payload unix.IfInfomsg
}{
   header: header,
   payload: payload,
}
Enter fullscreen mode Exit fullscreen mode

The message is ready, and I could write the message data into the socket. But before I call Sendto(), I need to convert the message structure to an array (or slice) of bytes:

// first I need convert the `msg` to slice of bytes
var asByteSlice []byte = (*(*[unix.SizeofNlMsghdr + unix.SizeofIfInfomsg]byte)(unsafe.Pointer(&msg)))[:]

// write the data to the socket
err = unix.Sendto(sock, asByteSlice, 0, &unix.SockaddrNetlink{Family: unix.AF_NETLINK})
if err != nil {
    fmt.Printf("Could not write message to socket:%s\n", err)
}
Enter fullscreen mode Exit fullscreen mode

Receiving message

At this point, if I compile and run my simple program with root privileges, the code will bring the veth1 interface up. Mission accomplished. Right? But what about receiving messages?

Receiving messages might be complicated. The messages might be large, or the information might be broken into multiple pieces. There are various factors to consider. But I'll stick with my simple scenario. I just want to know if my up operation failed or if it was successful.

To receive the response, I'll use unix.Recvfrom(), which will read all remaining data from the socket into the buf:

var buf [1024]byte
n, _, err := unix.Recvfrom(sock, buf[:], 0)

if err != nil {
    fmt.Printf("Could not read data from socket: %s\n", err)
    return
}
Enter fullscreen mode Exit fullscreen mode

The next step is parsing the received raw data. Here, I'll use ParseNetlinkMessage() to do just that.

// parse data to messages
msgs, err := syscall.ParseNetlinkMessage(buf[:n])

if err != nil {
    fmt.Printf("Could not parse the response: %s\n", err)
    return
}
Enter fullscreen mode Exit fullscreen mode

The function will return parsed data as an array of []NetlinkMessage. The NetlinkMessage is a simple structure with Header and Data. The Header is NlMsghdr, and Data is an array of bytes. Based on the type in the Header, I can cast the Data to the proper type. In my case, the first response message will be NLMSG_ERROR, so I'll cast Data to NlMsgerr.

// the first received message must be `NLMSG_ERROR`
if msgs[0].Header.Type != unix.NLMSG_ERROR {
    fmt.Printf("The first received message is not NLMSG_ERROR\n")
    return
}

// cast the data to NlMsgerr payload
errPayload := (*unix.NlMsgerr)(unsafe.Pointer(&resp[0].Data[0]))
if errPayload.Error != 0 {
    fmt.Printf("Error returned by Netlink\n")
}

fmt.Printf("Interface is UP\n")
Enter fullscreen mode Exit fullscreen mode

The full code is available on github.com/sn3d/netlink-example

Let's try...

It's time to play with my program. As I mentioned above, I have veth1 present in my system which is DOWN.

$ ip link
...
78: veth0@veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether de:8f:4e:7e:9c:cd brd ff:ff:ff:ff:ff:ff

Enter fullscreen mode Exit fullscreen mode

If you look closer, you might see the veth0 have index 78. I need this index pass to my program. Now when I run my program with this index, I should get information Interface is UP:

$ sudo go run main.go 78
Interface is UP
Enter fullscreen mode Exit fullscreen mode

This is not just a message from my program. If I will check the veth0, I will notice the interface is UP.

$ ip link
...
78: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether de:8f:4e:7e:9c:cd brd ff:ff:ff:ff:ff:ff
Enter fullscreen mode Exit fullscreen mode

How to debug messages

As you noticed, creating a Netlink connection, reading from, and writing to the socket is the easy part. Maybe reading bigger chunks of data or data that's broken into smaller chunks is more complicated, but it's something we're familiar with from socket programming.

The tricky part for me was creating a proper Netlink message. But there's a pretty useful way to debug and observe Netlink messages - using strace. Modern strace has a great feature - it can parse and understand Netlink messages.

If you try to execute ip link set veth0 up with strace, you might see sendmsg with a parsed Netlink message:

$ sudo strace -Tfe trace=sendmsg ip link set veth0 up
endmsg(4, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=52, nlmsg_type=RTM_GETLINK, nlmsg_flags=NLM_F_REQUEST, nlmsg_seq=1714051690, nlmsg_pid=0}, {ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=0, ifi_flags=0, ifi_change=0}, [[{nla_len=8, nla_type=IFLA_EXT_MASK}, RTEXT_FILTER_VF|RTEXT_FILTER_SKIP_STATS], [{nla_len=10, nla_type=IFLA_IFNAME}, "veth0"]]], iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 52 <0.000094>
sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=32, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=1714051690, nlmsg_pid=0}, {ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=if_nametoindex("veth0"), ifi_flags=IFF_UP, ifi_change=0x1}], iov_len=32}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32 <0.000064>
Enter fullscreen mode Exit fullscreen mode

It's a bit messy output, but you could notice 2 sendmsg calls. One is for RTM_GETLINK, and the second is for RTM_NEWLINK. In this output, you might see the header and payload. For instance, the header of the second RTM_NEWLINK is:

{nlmsg_len=32, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=1714051690, nlmsg_pid=0}
Enter fullscreen mode Exit fullscreen mode

And the payload is:

{ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=if_nametoindex("veth0"), ifi_flags=IFF_UP, ifi_change=0x1}
Enter fullscreen mode Exit fullscreen mode

With strace, we could study requests like creating a bridge, etc., and reproduce the messages from our code.

Netlink library

Working with sockets, building, and parsing our own messages require quite a lot of work. Thanks to Vish Abrams, we can use in our project the package github.com/vishvananda/netlink, which provides a lot of Netlink functionalities without having to build our own message structures from scratch. It's well-maintained and used by many projects like Docker, Cilium, Flannel, Istio, etc.

Thanks to this library, adding a new bridge to the system is a matter of a few lines:

la := netlink.NewLinkAttrs()
la.Name = "docker0"
dockerBridge := &netlink.Bridge{LinkAttrs: la}
err := netlink.LinkAdd(dockerBridge)
Enter fullscreen mode Exit fullscreen mode

A few words in conclusion...

One important aspect I didn't mention earlier is that the byte order in messages depends on the host's CPU architecture. This means we don't need to worry about converting between little and big-endian for integers. Additionally, it's crucial that our messages follow a four-byte padding rule. For example, if a message is 33 bytes long, we'll need to send 36 bytes.

I wrote about Netlink almost three years ago, but it was in Slovak. I decided to write this English version because, even after three years, I still find Netlink interesting and worth studying and trying.

💖 💪 🙅 🚩
sn3d
Zdenko Vrabel

Posted on April 28, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Introduction to Netlink with Go
linux Introduction to Netlink with Go

April 28, 2024