The Home Server Journey - 4: Enter The Matrix

Hello

Time to forget about simple demo containers and struggle [not so much] with the configuration details of fully-featured applications. Today we not only put Kubernetes to production-level usage, but find out about some strengths and weaknesses of distributed computing

Notes on storage

Code nowadays, more than ever, works consuming and producing huge amounts of data, that usually has (or "have", since "data" is plural for "datum"?) to be kept for later. It's no surprise then that the capacity to store information is one of the most valuable commodities in IT, always at risk of being depleted due to poor optimization

In the containerized world, where application instances can be created and recreated dynamically, extra steps have to be taken for reusing disk space and avoiding duplication whenever possible. That's why K8s adopts a layered approach: deployments don't create Persistent Volumes, roughly equivalent to root folders, on their own, but set their storage requirements via Persistent Volume Claims

Long story short, administrators either create the necessary volumes manually or let the claims be managed by a Storage Class, to provide volumes on demand. Like ingresses have multiple controller implementations, depending on the backend, storage classes allow for different providers, offering a variety of local and remote (including cloud) data resources:

(Screenshot from Nana's tutorial)

Although local is an easy to use option, it quickly betrays the purpose of container orchestration: tying storage to a specific machine's disk path prevents multiple pod replicas from being distributed among multiple nodes for load balancing. Besides, if we wish to avoid cloud providers (like Google or AWS) to have full control of our data, we're left with [local] network storage solutions like NFS

If you remember the first article, my setup consists of one USB hard disk (meant for persistent volumes) connected to each node board. That's not ideal for NFS, as decoupling storage from the K8s cluster, using a dedicated computer, would prevent node crashes compromising data availability to the rest. Still, I can at least make the devices share disks with each other

In order to expose external drive's partition to the network, begin by ensuring that they're mounted during initialization by editing /etc/fstab:

# Finding the partition's UUID (assuming a EXT4-formatted disk)
[ancapepe@ChoppaServer-1 ~]$ lsblk -f
NAME        FSTYPE FSVER LABEL      UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                     
└─sda1      ext4   1.0              69c7df98-0349-4c47-84ce-514a1699ccf1  869.2G     0% 
mmcblk0                                                                                 
├─mmcblk0p1 vfat   FAT16 BOOT_MNJRO 42D4-5413                             394.8M    14% /boot
└─mmcblk0p2 ext4   1.0   ROOT_MNJRO 83982167-1add-4b6a-ad63-3479493a4510   44.4G    18% /var/lib/kubelet/pods/ae500dcf-b765-4f37-be8c-ee20fbb3ea1b/volume-subpaths/welcome-volume/welcome/0
                                                                                        /
zram0                                                                                   [SWAP]
# Creating the mount point for the external partition
[ancapepe@ChoppaServer-1 ~]$ sudo mkdir /storage
# Adding line with mount configuration to fstab 
[ancapepe@ChoppaServer-1 ~]$ echo 'UUID=69c7df98-0349-4c47-84ce-514a1699ccf1 /storage ext4 defaults,noatime        0       2' | sudo tee -a /etc/fstab
# Check the contents
[ancapepe@ChoppaServer-1 ~]$ cat /etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.

# <file system> <dir> <type> <options> <dump> <pass>
PARTUUID=42045b6f-01  /boot   vfat    defaults,noexec,nodev,showexec     0   0
PARTUUID=42045b6f-02   /   ext4     defaults    0   1
UUID=69c7df98-0349-4c47-84ce-514a1699ccf1 /storage ext4 defaults,noatime        0       2

Reboot and check if you can access the contents of the /storage folder (or whatever path you've chosen). Proceed by setting up the NFS server:

# Install required packages (showing the ones for Manjaro Linux)
[ancapepe@ChoppaServer-1 ~]$ sudo pacman -S nfs-utils rpcbind
# ...
# Installation process ...
# ...
# Add disk sharing configurations (Adapt to you local network address range)
[ancapepe@ChoppaServer-1 ~]$ echo '/storage    192.168.3.0/24(rw,sync,no_wdelay,no_root_squash,insecure)' | sudo tee -a /etc/exports
# Check the contents
[ancapepe@ChoppaServer-1 ~]$ cat /etc/exports
# /etc/exports - exports(5) - directories exported to NFS clients
#
# Example for NFSv3:
#  /srv/home        hostname1(rw,sync) hostname2(ro,sync)
# Example for NFSv4:
#  /srv/nfs4        hostname1(rw,sync,fsid=0)
#  /srv/nfs4/home   hostname1(rw,sync,nohide)
# Using Kerberos and integrity checking:
#  /srv/nfs4        *(rw,sync,sec=krb5i,fsid=0)
#  /srv/nfs4/home   *(rw,sync,sec=krb5i,nohide)
#
# Use `exportfs -arv` to reload.
/storage    192.168.3.0/24(rw,sync,no_wdelay,no_root_squash,insecure)
# Update exports based on configuration
[ancapepe@ChoppaServer-1 ~]$ sudo exportfs -arv
# Enable and start related initialization services
[ancapepe@ChoppaServer-1 ~]$ systemctl enable --now rpcbind nfs-server 
# Check if shared folders are visible
[ancapepe@ChoppaServer-1 ~]$ showmount localhost -e
Export list for localhost:
/storage 192.168.3.0/24

(Remember that your mileage may vary)

Phew, and that's just the first part of it! Not only you have to do the same process for every machine whose storage you wish to share, but afterwards it's necessary to install a NFS provisioner to the K8s cluster. Here I have used nfs-subdir-external-provisioner, not as-it-is, but modifying 2 manifest files to my liking:

Modified deploy/rbac.yaml (Access permissions):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs-storage
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: nfs-storage
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs-storage
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs-storage
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: nfs-storage
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

(Here just set the namespace to your liking)

Modified deploy/deployment.yaml (Storage classes):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  labels:
    app: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs-storage
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        # 1TB disk on Odroid N2+
        - name: nfs-client-provisioner-big
          image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
          volumeMounts:
            - name: nfs-root-big
              mountPath: /persistentvolumes   # CAN'T CHANGE THIS PATH
          env:     # Environment variables
            - name: PROVISIONER_NAME
              value: nfs-provisioner-big
            - name: NFS_SERVER                # Hostnames don't resolve in setting env vars
              value: 192.168.3.10             # Looking for a way to make it more flexible
            - name: NFS_PATH
              value: /storage
        # 500GB disk on Raspberry Pi 4B
        - name: nfs-client-provisioner-small
          image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
          volumeMounts:
            - name: nfs-root-small
              mountPath: /persistentvolumes   # CAN'T CHANGE THIS PATH
          env:     # Environment variables
            - name: PROVISIONER_NAME
              value: nfs-provisioner-small
            - name: NFS_SERVER
              value: 192.168.3.11
            - name: NFS_PATH
              value: /storage
      volumes:
        - name: nfs-root-big
          nfs:
            server: 192.168.3.10
            path: /storage
        - name: nfs-root-small
          nfs:
            server: 192.168.3.11
            path: /storage
---
# 1TB disk on Odroid N2+
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-big
  namespace: choppa
provisioner: nfs-provisioner-big
parameters:
  archiveOnDelete: "false"
reclaimPolicy: Retain
---
# 500GB disk on Raspberry Pi 4B
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-small
  namespace: choppa
provisioner: nfs-provisioner-small
parameters:
  archiveOnDelete: "false"
reclaimPolicy: Retain

(Make sure that all namespace settings match the RBAC rules. Also notice that, the way the provisioner works, I had to create on storage class for each shared disk)

If everything goes according to plan, applying a test like the one below will make volumes appear to meet the claims and the SUCCESS file to be written on each one:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim-big
spec:
  storageClassName: nfs-big
  accessModes:
    - ReadWriteMany          # Concurrent access
  resources:
    requests:
      storage: 1Mi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim-small
spec:
  storageClassName: nfs-small
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
---
kind: Pod
apiVersion: v1
metadata:
  name: test-pod-big
spec:
  containers:
  - name: test-pod-big
    image: busybox:stable
    command:               # Replace the default command for the image by this
      - "/bin/sh"
    args:                  # Custom command arguments
      - "-c"
      - "touch /mnt/SUCCESS && exit 0 || exit 1"
    volumeMounts:
      - name: test-volume
        mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
    - name: test-volume
      persistentVolumeClaim:
        claimName: test-claim-big
---
kind: Pod
apiVersion: v1
metadata:
  name: test-pod-small
spec:
  containers:
  - name: test-pod-small
    image: busybox:stable
    command:
      - "/bin/sh"
    args:
      - "-c"
      - "touch /mnt/SUCCESS && exit 0 || exit 1"
    volumeMounts:
      - name: test-volume
        mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
    - name: test-volume
      persistentVolumeClaim:
        claimName: test-claim-small

(Thanks to Albert Weng for the very useful guide)

Your Very Own Messaging Service

Now that we have real storage at our disposal, some real apps would be nice. One of the first things that come to mind when I think about data I'd like to protect is the history of my private conversations, where most spontaneous interactions happen and most ideas get forgotten

If you're a old school person wishing to host your own messaging, I bet you'd go for IRC or XMPP. But I'm more of a late millennial, so it's easier to get drawn to fancy stuff like Matrix, a protocol for federated communication

"And what's federation?", you may ask. Well, to this day, the best video about it I've seen is the one from Simply Explained. Even though it's focused on Mastodon and the Fediverse (We'll get there eventually), the same concepts apply:

As any open protocol, the Matrix specification has a number of different implementations, most famously Synapse (Python) and Dendrite (Go). I've been, however, particularly endeared to Conduit, a lightweight Matrix server written in Rust. Here I'll show the deployment configuration as recommended by its maintainer, Timo Kösters:

kind: PersistentVolumeClaim       # Storage requirements component
apiVersion: v1
metadata:
  name: conduit-pv-claim          
  namespace: choppa
  labels:
    app: conduit
spec:
  storageClassName: nfs-big       # The used storage class (1TB drive)
  accessModes:
    - ReadWriteOnce               # For synchronous access, as we don't want data corruption
  resources:
    requests:
      storage: 50Gi               # Asking for a ~50 Gigabytes volume
---
apiVersion: v1
kind: ConfigMap                   # Read-only data component
metadata:
  name: conduit-config
  namespace: choppa
  labels:
    app: conduit
data:
  CONDUIT_CONFIG: ''              # Leave empty to inform that we're not using a configuration file
  CONDUIT_SERVER_NAME: 'choppa.xyz'                   # The server name that will appear after registered usernames 
  CONDUIT_DATABASE_PATH: '/var/lib/matrix-conduit/'   # Database directory path. Uses the permanent volume
  CONDUIT_DATABASE_BACKEND: 'rocksdb'                 # Recommended for performance, sqlite alternative
  CONDUIT_ADDRESS: '0.0.0.0'                          # Listen on all IP addresses/Network interfaces
  CONDUIT_PORT: '6167'                                # Listening port for the process inside the container
  CONDUIT_MAX_CONCURRENT_REQUESTS: '200'              # Network usage limiting options
  CONDUIT_MAX_REQUEST_SIZE: '20000000'                # (in bytes, ~20 MB)
  CONDUIT_ALLOW_REGISTRATION: 'true'                  # Allow other people to register in the server (requires token)
  CONDUIT_ALLOW_FEDERATION: 'true'                    # Allow communication with other instances
  CONDUIT_ALLOW_CHECK_FOR_UPDATES: 'true'   
  CONDUIT_TRUSTED_SERVERS: '["matrix.org"]'
  CONDUIT_ALLOW_WELL_KNOWN: 'true'                      # Generate "/.well-known/matrix/{client,server}" endpoints
  CONDUIT_WELL_KNOWN_CLIENT: 'https://talk.choppa.xyz'  # return value for "/.well-known/matrix/client"
  CONDUIT_WELL_KNOWN_SERVER: 'talk.choppa.xyz:443'      # return value for "/.well-known/matrix/server"
---
apiVersion: v1
kind: Secret                      # Read-only encrypted data
metadata:
  name: conduit-secret
  namespace: choppa
data:
  CONDUIT_REGISTRATION_TOKEN: bXlwYXNzd29yZG9yYWNjZXNzdG9rZW4=    # Registration token/password in base64 format
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: conduit-deploy
  namespace: choppa
spec:
  replicas: 1                     # Keep it at 1 as the database has an access lock
  strategy:
    type: Recreate                # Wait for replica to be terminated completely befor creating a new one
  selector:
    matchLabels:
      app: conduit
  template:
    metadata:
      labels:
        app: conduit
    spec:
      containers:
        - name: conduit
          image: ancapepe/conduit:next # fork of matrixconduit/matrix-conduit:next
          imagePullPolicy: "IfNotPresent"
          ports:
            - containerPort: 6167                 # Has to match the configuration above
              name: conduit-port
          envFrom:                                # Take all data from a config file 
            - configMapRef:                       # As environment variables
                name: conduit-config
          env:                                    # Set environment variables one by one
            - name: CONDUIT_REGISTRATION_TOKEN
              valueFrom:
                secretKeyRef:                     # Take data from a secret
                  name: conduit-secret            # Secret source
                  key: CONDUIT_REGISTRATION_TOKEN # Data entry key
          volumeMounts:
            - mountPath: /var/lib/matrix-conduit/ # Mount the referenced volume into this path (database)
              name: rocksdb
      volumes:
        - name: rocksdb                           # Label the volume for this deployment
          persistentVolumeClaim:
            claimName: conduit-pv-claim           # Reference volumen create by the claim
---
apiVersion: v1
kind: Service
metadata:
  name: conduit-service
  namespace: choppa
spec:
  selector:
    app: conduit
  ports:
    - protocol: TCP
      port: 8448                                  # Using the default port for federation (not required)
      targetPort: conduit-port

One thing you should know is that the K8s architecture is optimized for stateless applications, which don't store changing information within themselves and whose output depend solely on input from the user or auxiliary processes. Conduit, on the contrary, is tightly coupled with its high-performance database, RocksDB, and has stateful behavior. That's why we need to take extra care by not replicating our process in order to prevent data races when accessing storage

There's actually a way to run multiple replicas of a stateful pod, that requires different component types to create one volume copy for each container and keep all them in sync. But let's keep it "simple" for now. More on that in a future chapter

We introduce here a new component type, Secret, intended for sensible information such as passwords and access tokens. It requires that all data is set in a Base64-encoded format for obfuscation, which could be achieved with:

$ echo -n mypasswordoraccesstoken | base64                                                                                                                                                            
bXlwYXNzd29yZG9yYWNjZXNzdG9rZW4=

You might also have noticed that I jumped the gun a bit and acquired a proper domain for myself (so that my Matrix server looks nice, of course). I've found one available at a promotional price on Dynadot, which now seems even more like a good decision, considering how easy is to bend Cloudflare. Conveniently, the registrar also provides a DNS service, so... 2 for the price of one:

The CNAME record is basically an alias: the subdomain talk.choppa.xyz is simply redirected to choppa.xyz and just works as a hint for our reverse proxy from the previous chapter:

apiVersion: networking.k8s.io/v1                    
kind: Ingress                                     # Component type
metadata:
  name: proxy                                     # Component name
  namespace: choppa                               # You may add the default namespace for components as a paramenter
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
    kubernetes.io/ingress.class: traefik
status:
  loadBalancer: {}
spec:
  ingressClassName: traefik
  tls:
  - hosts:
      - choppa.xyz
      - talk.choppa.xyz
    secretName: certificate      
  rules:                                          # Routing rules
  - host: choppa.xyz                              # Expected domain name of request, including subdomain
    http:                                         # For HTTP or HTTPS requests
      paths:                                      # Behavior for different base paths
        - path: /                                 # For all request paths
          pathType: Prefix
          backend:
            service:
              name: welcome-service               # Redirect to this service
              port:
                number: 8080                      # Redirect to this internal service port
        - path: /.well-known/matrix/              # Paths for Conduit delegation
          pathType: ImplementationSpecific
          backend:
            service:
              name: conduit-service
              port:
                number: 8448
  - host: talk.choppa.xyz               # Conduit server subdomain
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: conduit-service
              port:
                number: 8448

If your CONDUIT_SERVER_NAME differs from the [sub]domain where your server is hosted, other Matrix federated instances will look for the actual address in Well-known URIs. Normally you'd have to serve them manually, but Conduit has a convenient automatic delegation feature that helps your set it up with the CONDUIT_WELL_KNOWN_* env variables

Currently, at version 0.8, automatically delegation via env vars is not working properly. Timo has helped me debug it (Thanks!) at Conduit's Matrix room and I have submitted a patch. Let's hope it gets fixed for 0.9. Meanwhile, you may use the custom image I'm referencing in the configuration example

Update: Aaaaaand... it's merged!

With ALL that said, let's apply the deployment and hope for the best:

(Success !?)

With your instance running, you can now use Matrix clients [that support registration tokens] like Element or Nheko to register your user, login and have fun!

See you soon

Blog

The Home Server Journey - 4: Enter The Matrix

Beppe

Notes on storage

Your Very Own Messaging Service

Join Our Newsletter. No Spam, Only the good stuff.

Related