Building an Exocortex: Mediawiki on a microk8s Kubernetes Cluster

duplys

Paul Duplys

Posted on July 29, 2020

Building an Exocortex: Mediawiki on a microk8s Kubernetes Cluster

Boost Your Memory

In a book I read some time ago, I stumbled upon the idea of artificially augmenting one's memory by systematically storing everything one learns in a single place. The book used the term exocortex to refer to this concept of extending the natural brain's memory capabilities by external technical means. Since then, I have used such a mechanism to keep everything I consider important (mostly technical) stuff in one place.

In this post, I'll show you how you can use a MediaWiki instance on your private Raspberry Pi Kubernetes cluster to implement an exocortex. MediaWiki is an open source wiki software originally authored by Magnus Manske and improved by Lee Danial Crocker. The most prominent website running MediaWiki is Wikipedia. It is written in the PHP programming language and stores the contents into a database.

For ease of administration, we'll use Kubernetes to deploy the Mediawiki application. Kubernetes will ensure that things work reliably. For example, if the applications stops working, Kubernetes will immediately spawn a new instance.

A word on spelling: for easier reading -- and in alignment with conventions used in the official Kubernetes documentation -- I will capitalize the names of the Kubernetes objects. Hence, the words "Deployment" and "Pod" refer to specific Kubernetes objects (more details on that later).

The Ingredients

Here's a list of ingredients we'll need:

Starting Point

If you look up the official MediaWiki Docker image on DockerHub, you will also find there a docker-compose file recommended to deploy the dockerized MediaWiki. We will use this file as a starting point for writing our Kubernetes manifests so we can deploy MediaWiki using microk8s.kubectl apply on the Raspberry Pi Kubernetes cluster. This is what the docker-compose file looks like:

# MediaWiki with MariaDB
#
# Access via "http://localhost:8080"
#   (or "http://$(docker-machine ip):8080" if using docker-machine)
version: '3'
services:
  mediawiki:
    image: mediawiki
    restart: always
    ports:
      - 8080:80
    links:
      - database
    volumes:
      - /var/www/html/images
      # After initial setup, download LocalSettings.php to the same directory as
      # this yaml and uncomment the following line and use compose to restart
      # the mediawiki service
      # - ./LocalSettings.php:/var/www/html/LocalSettings.php
  database:
    image: mariadb
    restart: always
    environment:
      # @see https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/DefaultSettings.php
      MYSQL_DATABASE: my_wiki
      MYSQL_USER: wikiuser
      MYSQL_PASSWORD: example
      MYSQL_RANDOM_ROOT_PASSWORD: 'yes'

Essentially, it consists of two services: the mediawiki application itself and the database service for storing the data. The mediawiki service is exposed to port 8080 and requires two volumes for regular operation. The database service run a MariaDB container and needs several environment variables to be set so that the mediawiki application can access the database. In the remainder of this post, we'll recreated this setup using Kubernetes mechanisms.

The MediaWiki Application

Let's start by creating a microk8s Kubernetes deployment for the MediaWiki container.

The Minimal MediaWiki Deployment

A Kubernetes Deployment is a Kubernetes object — a persistent entity in the Kubernetes system defining the desired state — that basically specifies a ReplicaSet and the replicated Pods to be run. The specification of a Kubernetes object, referred to as a manifest, must contain the following required fields:

  • apiVersion specifying which version of the Kubernetes API is used to create the object
  • kind specifying the kind of the object to be created
  • metadata for uniquely identifying the object (at least a name string and a UID)
  • spec specifying the object's desired state

Therefore, as a minimum, the MediaWiki Deployment manifest would look something like this (feel free to choose your own UIDs and label values):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mediawiki-app
spec:
  selector:
    matchLabels:
      app: mediawiki-app
  replicas: 2
  template:
    metadata:
      labels:
        app: mediawiki-app
    spec:
      containers:
        - name: mediawiki-container
          image: mediawiki

Note that the label in spec.selector.matchLabels must match the label in spec.template.metadata.labels. This is because the .spec.selector field defines how the Deployment finds which Pods to manage. Thus, you need to select a label that is defined in the Pod template (app: mediawiki-app).

We can create the Deployment by issuing microk8s.kubectl apply -f mediawiki.yml on the command line of the Raspberry Pi (assuming you saved the Deployment manifest in the file mediawiki.yml). It takes some time when we run it for the first time because the images must be downloaded first. After a while — it takes approximately 5-10 minutes on my RPi — you should see something like this:

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl get all
NAME                                 READY   STATUS    RESTARTS   AGE
pod/mediawiki-app-656f6f8d64-7f4xz   1/1     Running   0          11m
pod/mediawiki-app-656f6f8d64-9x924   1/1     Running   0          11m

NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/kubernetes    ClusterIP   10.152.183.1    <none>        443/TCP        75d

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mediawiki-app   2/2     2            2           11m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/mediawiki-app-656f6f8d64   2         2         2       11m
ubuntu@ubuntu:~/mediawiki$

The Deployment is running, the ReplicaSet was created and the two Pods are running just like we specified in the above manifest.

Adding Volumes

In the docker-compose file, our starting point from above, the mediawiki service uses a Docker volume for the /var/www/html/images directory within the mediawiki container. So let's add a volume to our Kubernetes Deployment. The official documentation for Kubernetes volumes can be found here. It turns out, we need to add a volumeMounts field to spec.containers and pass it a list (that's why the next line starts with a -) containing the mountPath and the name of the volume. We also need to add the a volume field to spec and list the volume we want to add. The Deployment manifest now looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mediawiki-app
spec:
  selector:
    matchLabels:
      app: mediawiki-app
  replicas: 2
  template:
    metadata:
      labels:
        app: mediawiki-app
    spec:
      containers:
        - name: mediawiki-container
          image: mediawiki
          volumeMounts:
            - mountPath: /var/www/html/images
              name: mediawiki-volume
      volumes:
        - name: mediawiki-volume

To update our MediaWiki Deployment, we issue again the same command microk8s.kubectl apply -f mediawiki.yml. If we now take a closer look at one of the Pods, we should see a volume attached to it (the actual output of microk8s.kubectl describe pod is very detailed, I have trimmed it here for the ease of reading):

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl describe pod mediawiki-app-656f6f8d64-7f4xz
Name:         mediawiki-app-656f6f8d64-7f4xz

# -- snip --

Containers:
  mediawiki-container:
    Container ID:   containerd://8525869985915306078e1c2b836682ddf145510a4fe04b7155618a749ed87e2d
    Image:          mediawiki

    # -- snip --

    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-krfdn (ro)
      /var/www/html/images from mediawiki-volume (rw)

# -- snip --

Volumes:
  mediawiki-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>

# -- snip --

ubuntu@ubuntu:~/mediawiki$ 

Note that microk8s reports that this volume is just a temporary directory. This is because this type of volume is not persistent in the sense that it is deleted if the pod ceases to exist.

Restart

Our docker-compose file defines the restart policy always for the mediawiki service. As described in the official Kubernetes documentation for the Pod lifecycle, a Pod specification has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always. So we're done here!

Links

In docker-compose the link directive creates a link to containers in another service. From the official Docker Compose documentation:

Containers for the linked service are reachable at a hostname identical to the alias, or the service name if no alias was specified.

Links are not required to enable services to communicate - by default, any service can reach any other service at that service's name. (See also, the Links topic in Networking in Compose.)

Links also express dependency between services in the same way as depends_on, so they determine the order of service startup.

I'm not sure whether I need this in Kubernetes because the official documentation says:

Kubernetes gives every pod its own cluster-private IP address, so you do not need to explicitly create links between pods or map container ports to host ports. This means that containers within a Pod can all reach each other's ports on localhost, and all pods in a cluster can see each other without NAT.

Ports

The mediawiki service in the docker-compose file exposes port 80 of the mediawiki container and maps it to the port 8080 on the host. To replicate this in our Kubernetes manifest, we first add the spec.containers.ports field and pass it a list of the container ports to be exposed (- containerPort: 80 in this case). The manifest now looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mediawiki-app
spec:
  selector:
    matchLabels:
      app: mediawiki-app
  replicas: 2
  template:
    metadata:
      labels:
        app: mediawiki-app
    spec:
      containers:
        - name: mediawiki-container
          image: mediawiki
          volumeMounts:
            - mountPath: /var/www/html/images
              name: mediawiki-volume
          ports:
            - containerPort: 80
      volumes:
        - name: mediawiki-volume

We again update our Deployment using the microk8s.kubectl apply -f mediawiki.yml command. We can verify that ports specification took effect:

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl describe deployment mediawiki-app
Name:                   mediawiki-app

# -- snip --

Pod Template:
  Labels:  app=mediawiki-app
  Containers:
   mediawiki-container:
    Image:        mediawiki
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>

# -- snip --

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl describe pod mediawiki-app-5494668f87-5xldp
Name:         mediawiki-app-5494668f87-5xldp

# -- snip --

Containers:
  mediawiki-container:
    Container ID:   containerd://6ba69454dc74ebc359466edc233197e2cf11f3ae8365d4fcf5ea60b0b62ec3b7
    Image:          mediawiki
    Image ID:       docker.io/library/mediawiki@sha256:e3be6a44c1d82e454657a013c5df29f7a9a0bfc112325d6d9df1ab1e21087b51
    Port:           80/TCP
    Host Port:      0/TCP

# -- snip --

ubuntu@ubuntu:~/mediawiki$ 

As you can see, port 80 of the MediaWiki container is now exposed. What this means is that your Pod is now accessible through port 80 from any Kubernetes Node within your cluster.

Adding a MediaWiki Service

But what happens if a Node dies? The Pods running on that Node die with it. The Deployment will create new Pods, but they will have different IP addresses. And that's exactly the problem solved by a Kubernetes Service.

A Kubernetes Service defines a logical set of Pods that provide the same functionality and run somewhere in the Kubernetes cluster. Upon creation, each Service is assigned a unique IP address, the so-called clusterIP. This address will not change while the Service is alive. Communication to the Service will be automatically load-balanced to some Pod that is a member of that Service.

Thus, we need to create a Service to expose the mediawiki pods to the outside world (outside of the Kubernetes cluster, that is). As with our MediaWiki Deployment, we store the Service manifest in a new file mediawiki-service.yml:

apiVersion: v1
kind: Service
metadata:
  name: mediawiki-srv
  labels:
    app: mediawiki-app
spec:
  ports:
    - port: 8883
      protocol: TCP
  selector:
    app: mediawiki-app

Note that the Service label must match our selector label in the Deployment spec (in our case the label app: mediawiki-app). By the way: for some odd reason, microk8s.kubectl returns an error saying Service is an unknown type if I use apiVersion: apps/v1 as in the Deployment (using just v1 instead works).

To start the Service, we issue the microk8s.kubectl apply -f mediawiki-service.yml command. Let's now inspect out Kubernetes cluster:

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl get all
NAME                                 READY   STATUS    RESTARTS   AGE
pod/mediawiki-app-5494668f87-5xldp   1/1     Running   0          19m
pod/mediawiki-app-5494668f87-p7p4s   1/1     Running   0          19m

NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/kubernetes      ClusterIP   10.152.183.1    <none>        443/TCP        75d
service/mediawiki-srv   ClusterIP   10.152.183.17   <none>        8883/TCP       3m51s

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mediawiki-app   2/2     2            2           49m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/mediawiki-app-5494668f87   2         2         2       19m
replicaset.apps/mediawiki-app-656f6f8d64   0         0         0       49m
ubuntu@ubuntu:~/mediawiki$ 

From the above output, you can see that the Service was created. However, the Service has the type ClusterIP meaning that it cannot be reached from outside of the Kubernetes cluster. To fix this, we need to change the type of the mediawiki-srv service into NodePort. In addition, the port should be 80 (as this is the port exposed by the mediawiki Pod):

apiVersion: v1
kind: Service
metadata:
  name: mediawiki-srv
  labels:
    app: mediawiki-app
spec:
  type: NodePort
  ports:
    - port: 80
      protocol: TCP
  selector:
    app: mediawiki-app

We microk8s.kubectl apply the Service manifest again and check the result:

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl apply -f mediawiki-service.yml 
service/mediawiki-srv configured
ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl get all
NAME                                 READY   STATUS    RESTARTS   AGE
pod/mediawiki-app-5494668f87-5xldp   1/1     Running   0          30m
pod/mediawiki-app-5494668f87-p7p4s   1/1     Running   0          30m

NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/kubernetes      ClusterIP   10.152.183.1    <none>        443/TCP        76d
service/mediawiki-srv   NodePort    10.152.183.17   <none>        80:32681/TCP   14m

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mediawiki-app   2/2     2            2           59m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/mediawiki-app-5494668f87   2         2         2       30m
replicaset.apps/mediawiki-app-656f6f8d64   0         0         0       59m
ubuntu@ubuntu:~/mediawiki$ 

Verifying the Intermediate Result

We can now check our MediaWiki Service and Deployment using the curl command line tool directly on the Raspberry Pi:

ubuntu@ubuntu:~/mediawiki$ curl http://127.0.0.1:32681
<!DOCTYPE html>
<html lang="en" dir="ltr">
    <head>
        <meta charset="UTF-8" />
        <title>MediaWiki 1.34.2</title>
        <style media="screen">
            body {
                color: #000;
                background-color: #fff;
                font-family: sans-serif;
                text-align: center;
            }

            h1 {
                font-size: 150%;
            }
        </style>
    </head>
    <body>
        <img src="/resources/assets/mediawiki.png" alt="The MediaWiki logo" />

        <h1>MediaWiki 1.34.2</h1>
        <div class="error">


            <p>LocalSettings.php not found.</p>


                <p>Please <a href="/mw-config/index.php">set up the wiki</a> first.</p>


        </div>
    </body>
</html>
ubuntu@ubuntu:~/mediawiki$ 

Now that it works locally, we can try to access the mediawiki service from our host system, i.e., outside the Raspberry Pi. If you open your web browser and type http://<ip address of you RPi>:32681, you should see this:

Alt Text

The Database Application

Next, we need to create a Deployment and Service for the database (MariaDB in our case). In the docker-compose.yml file, the MariaDB Service looks like this:

version: '3'
services:

  # -- snip --

  database:
    image: mariadb
    restart: always
    environment:
      # @see https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/DefaultSettings.php
      MYSQL_DATABASE: my_wiki
      MYSQL_USER: wikiuser
      MYSQL_PASSWORD: example
      MYSQL_RANDOM_ROOT_PASSWORD: 'yes'

The MariaDB Deployment

We already saw that we don't need to do anything explicitly for the always restart policy since it's the default in Kubernetes. However, we have to use a different Docker image for MariaDB because that Docker container will run on our Raspberry Pi.

Why do we need a different Docker image? Well, the Raspberry Pi CPU is based on the ARM architecture which is a different from that of your standard PC or laptop (which nowadays mostly happens to be x86_64). A Docker image is essentially a collection of layers (packaged in *.tar files) containing, among other things, binaries needed to run your application, e.g., command utilities like cp or ls, compilers like gcc, scripting languages like python or ruby, and your application itself, say, nginx (if you're curious, you can just run docker image save <image-name> -o <some-name>.tar and look into that .tar archive to see the layers and the binaries in each layer).

If the Docker image was built on an x86_64 architecture, the resulting binaries cannot be executed on ARM architecture. Thus, many of the official Docker images from DockerHub — an the official MariaDb image happens to be one of those — will not run on a Raspberry Pi. To deal with this, you can either build your own MariaDB image for RPi or use e.g., the rpi-mariadb image, which is a port of the official mariadb image for Raspberry Pi.

A word of caution: For security reasons, you should always be very careful when using unofficial Docker images from DockerHub! Since Docker itself runs with root permissions on your system, every Docker container can essentially also access your system with root permissions (unless you have configured your Docker installation to avoid this particular threat). If you want to learn more about this, a very accessible source is the "Docker in Practice" book by Ian Miell and Aidan Hobson Sayers, Chapter 14 "Docker and Security").

Finally, we will need to define the 4 environment variables for the MariaDB container. In Kubernetes, you set environment variables using the spec.containers.env field. So the Deployment manifest for MariaDB looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mediawiki-db
spec:
  selector:
    matchLabels:
      app: mediawiki-db
  replicas: 1
  template:
    metadata:
      labels:
        app: mediawiki-db
    spec:
      containers:
        - name: mediawiki-db-container
          image: jsurf/rpi-mariadb
          env:
            - name: MYSQL_DATABASE
              value: my_wiki
            - name: MYSQL_USER
              value: wikiuser
            - name: MYSQL_PASSWORD
              value: example
            - name: MYSQL_RANDOM_ROOT_PASSWORD
              value: 'yes'

Use the microk8s.kubectl apply -f <mariadb-deployment-manifest.yml> to create the Deployment. You can also check the DefaultSettings.php file to see why we need to set these environment variables in the MariaDB container for it to work.

Adding MariaDB Service

Like with MediaWiki itself, we need to add a Kubernetes Service for MariaDB so it is exposed on the local host and thus can be accessed by the MediaWiki application. The default port number for MariaDB is 3306 and this is also the port MediaWiki will try to connect to. Hence, the manifest for the MariaDB service looks like this:

apiVersion: v1
kind: Service
metadata:
  name: mediawiki-db-srv
  labels:
    app: mediawiki-db
spec:
  type: NodePort
  ports:
    - port: 3306
      protocol: TCP
  selector:
    app: mediawiki-db

Use the microk8s.kubectl apply -f <mariadb-service-manifest.yml> to create the Service. You now should see the following Kubernetes objects running:

ubuntu@ubuntu:~/mediawiki$ microk8s.kubectl get all
NAME                                 READY   STATUS    RESTARTS   AGE
pod/mediawiki-app-55f45cf568-gmpzv   1/1     Running   0          9h
pod/mediawiki-db-5cb8db589f-r6q8k    1/1     Running   0          9h

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/kubernetes         ClusterIP   10.152.183.1     <none>        443/TCP          98d
service/mediawiki-db-srv   NodePort    10.152.183.195   <none>        3306:31501/TCP   9h
service/mediawiki-srv      NodePort    10.152.183.17    <none>        80:32681/TCP     9h

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mediawiki-app   1/1     1            1           9h
deployment.apps/mediawiki-db    1/1     1            1           9h

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/mediawiki-app-55f45cf568   1         1         1       9h
replicaset.apps/mediawiki-db-5cb8db589f    1         1         1       9h
ubuntu@ubuntu:~/mediawiki$ 

Setting Up MediaWiki

From your host computer, you can now point your browser to http://<ip address of you RPi>:32681 and click on the link to start setting up the MediaWiki application. The installation procedure is self-explanatory. The only thing you need to adjust during the setup is to set the IP address of the database to 127.0.0.1 (instead of the default localhost).

At the end of the installation, you will be prompted to download the LocalSettings.php file. You will need this setting file to run MediaWiki (otherwise, you will always end up in the setup mode). First, you need to scp this file to your Raspberry Pi:

paulduplys@Pauls-MBP Downloads % paulduplys@Pauls-MBP Downloads % scp LocalSettings.php ubuntu@192.168.178.66:/home/ubuntu/mediawiki/
ubuntu@192.168.178.66's password: 
LocalSettings.php                                                                                                                                                                                           100% 4233   858.3KB/s   00:00    
paulduplys@Pauls-MBP Downloads %

To mount this file into the mediawiki container, we use the hostPath volume which mounts a file or directory from the host node's filesystem into your Pod. So, we need to add the following to our Deployment manifest:


# -- snip --

    spec:
      containers:
        - name: mediawiki-container
          image: mediawiki
          volumeMounts:

            # -- snip --

            - mountPath: /var/www/html/LocalSettings.php
              name: mediawiki-localsettings
          ports:
            - containerPort: 80

      # -- snip --

      volumes:
        - name: mediawiki-volume
        - name: mediawiki-localsettings
          hostPath: 
            path: /home/ubuntu/mediawiki/LocalSettings.php

The final MediaWiki Deployment manifest looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mediawiki-app
spec:
  selector:
    matchLabels:
      app: mediawiki-app
  replicas: 1
  template:
    metadata:
      labels:
        app: mediawiki-app
    spec:
      containers:
        - name: mediawiki-container
          image: mediawiki
          volumeMounts:
            - mountPath: /var/www/html/images
              name: mediawiki-volume
            - mountPath: /var/www/html/LocalSettings.php
              name: mediawiki-localsettings
          ports:
            - containerPort: 80
      volumes:
        - name: mediawiki-volume
        - name: mediawiki-localsettings
          hostPath:
            path: /home/ubuntu/mediawiki/LocalSettings.php
            type: File

Issue microk8s.kubectl apply -f mediawiki.yml one last time and you should have a Kubernetes-managed MediaWiki instance. Enjoy!

💖 💪 🙅 🚩
duplys
Paul Duplys

Posted on July 29, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related