HarperDB Containerization Journey
Zachary Fowler
Posted on November 19, 2020
How I Single-Handedly Containered HarperDB
HarperDB is a simple database that is adequately configured, works as a distributed database verging towards serverless, and is significant for micro-service-oriented architecture. It enables developers to think responsibly about the data they are collecting. Companies can begin to isolate the important data from the noise, and store only the information they need where they need it. This is a possibility now more than ever before, and containers are a significant contributor to the paradigm.
Explaining containers is out of scope for this blog; the long and the short, a container is an isolated set of applications prerequisites that use the host systems resources.
Docker-ization
At HarperDB, I began ushering things to be organized in a container friendly manner early on. HarperDB on Docker hub was one of our first release channels. Docker created the ability for HarperDB to spawn quickly, persist data on the host, or load data into the container for an ephemeral instance. The HarperDB application and the data store are not tightly coupled; the HarperDB application can point to any HarperDB datastore. Below is a quick Docker example of two HarperDB application instances pointing to the same database.
Create one container and make port 9925 available on the docker host instance:
docker run -d -v /tmp/docker1/:/opt/harperdb/hdb -p 9925:9925 harperdb/hdb:latest
Login to the local HarperDB management Studio at http://localhost:9925
with username: HDB_ADMIN
and password: password
.
Create a schema, table and add a few records.
Now create a second container attached to the same data directory
docker run -d -v /tmp/docker1/:/opt/harperdb/hdb -p 9926:9925 harperdb/hdb:latest
Notice the host port 9926 because 9925 is in use.
Login to a new instance and create a new record! This time at http://localhost:9926
.
Refresh the instance on 9925!
Fun!
Is this useful in real life? Probably somewhere somehow, this can do something neat!
The example illustrates that HarperDB application containers are isolated instances that can mount any HarperDB storage.
Imagine a containers' host syncing data to a remote source like AWS S3, adding S3 data sync to update an S3 bucket periodically. Another container host instance across the planet pulls down the S3 data and starts a HarperDB instance. That is for another blog.
In real life, Docker is incredible for developers and database administrators. One-off instances are useful, but clusters and application stacks are more relevant for application tiers, and where Docker compose is helpful. However, Docker compose is not as robust as Kubernetes.
It has been my most recent project to get HarperDB on Helm, the Kubernetes package manager. For updates when that happens, follow HarperDB by joining our Slack Channel and/or subscribing to our company updates.
Please go check out HarperDB on the Docker hub; it provides examples to get a lot more out of the Docker image and all the configurations available so far.
Kubernetes In A Digital Ocean
Kubernetes is a robust infrastructure that helps orchestrate running containers. Kubernetes allows users to create large-scale deployments of single or multiple containers, the life cycles of containers, and the resources containers rely on, i.e., storage, network configuration, CPU, memory. With the help of a great cloud platform, DigitalOcean, you can already deploy HarperDB in their 1-click marketplace. They also provide 1-click Kubernetes apps; HarperDB will be available soon; this is a preview of HarperDB deployed on DigitalOcean with Helm.
This example assumes that you have following preconfigured: a DigitalOcean account, kubectl, doctl, and helm installed in your development environment.
Create a Kubernetes Cluster for simplicity; a two-node cluster is sufficient.
Use doctl to configure kubectl context to point to DigitalOcean Kubernetes cluster.
For security reasons, Kubernetes providers implement Role-based access control (RBAC). It is a method of regulating access to a computer or network resources based on individual users' roles within your organization. A dependency of Helm is tiller; the following provides tiller access to resources.
kubectl -n kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account=tiller
Helm charts use a YAML configuration file for resources exposed by the Kubernetes cluster as well as application specific resources. A few import configurations for HarperDB in the values.yaml.
image:
repository: harperdb/hdb
tag: latest
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 9925
volumes:
storageClassProvisioner: dobs.csi.digitalocean.com
storageClassName: do-block-storage
persistentClaim: harperdb-pvc
storage: 5Gi
volumeName: harperdb-ps
harperdb:
username: HDB_ADMIN
password: password
cluster_enabled: true
cluster_username: clustering
cluster_password: password
cluster_port: 1111
node_name: hdb-cluster-00
Most options in the "image" and "harperdb" blocks above should be self-explanatory. The volumes block will require an explanation. Each Kubernetes provider will expose specific storage parameters. Provisioner and Class values are available in the Kubernetes dashboard in the DigitalOcean dashboard.
The Helm chart can now create an instance of HarperDB that persists information on the host and exposes it as a service on port 9925.
The NOTES at the bottom will allow your local development environment to connect to the HarperDB instance management Studio.
Connect to the cluster through http://
localhost:8080
, which routes traffic to the Kubernetes HarperDB instance on port 9925.
In the real world, HarperDB would not generally be exposed to the world, and deployments would run over HTTPS. Application containers or Pods in Kubernetes would access HarperDB within the Kubernetes network. Again, for another blog, the HarperDB instance could be ephemeral, and the container could get a HarperDB storage copy from a remote source, then sync to the remote source as data is updated. Also, the HarperDB cluster could publish new data to another instance of HarperDB.
Q.E.D, What was to be shown
Containers are robust, HarperDB is powerful; their powers combined provide opportunities to think about data storage and data flow in innovative ways. Docker makes it easy to containerize an application. Kubernetes provides a command and control center to build the distributed infrastructure and orchestrate container resources and life cycle. Helm gives software providers better access to Kubernetes with easy to install deployments. HarperDB is working hard to expand its offerings to other deployment channels. If you did not know, HarperDB also offers database as a service. Your support is appreciated.
Posted on November 19, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.