SQL Server container in Azure Kubernetes Services (AKS)

So recently I got involved with an ASP.NET project which was built over 10 years ago and over the years Developers and Change Requests came and went. And over the period the Application became quite cumbersome and quite hard to understand and manage, the Application became quite large in terms of functionality, codebase, and data. It was cumbersome and quite hard to understand for a new developer and manage for the Ops team. A lot of technical debts started accumulating because there was no real-time spend on optimizing or refactoring the systems. The database was slow and deadlocks were becoming normal. It was hosted on huge on-premise hardware making it a heavy and costly solution.

The management realized this and decided it was time for a refresh. The plan was made to look at utilizing cloud technologies. And I joined the project as the software architect for the upgrade/migration.

In this article the focus is on Data and I am putting down my thoughts on what would be the right Cloud Data architecture for this organization.

Let's start with categorizing Data, Use case, and the corresponding system.
Datastore can be categorized into two types of systems.

Transaction
Analytical

And here is the *Data Architecture diagram *

On the left we have the client Apps connecting to Microservices through an API Management platform.
The Microservices are hosted in Azure Kubernetes Service. The apps will be hosted within VMs appropriate for running regular applications.
For database we are going with SQL Server Container: The SQL Server Container we will host in a different Nodepool with VMs optimized for Database usage.
Then for Data Warehousing we will use Snowflake. Snowflake is a SaaS-based Cloud Warehouse. They are optimized for cloud and do not have an on-premise offering.
ETL: To move data from SQL Server to Snowflake we will be using Apache Spark Jobs hosted on Azure Databricks.

This article is going to focus on item #3. The SQL Server Container. The others will be visited in future articles. So now we will walk-through on how to

Deploy SQL Server DB Instance into an AKS Cluster.

We will be using Kubernetes Static Persistent Volume storage for DB as opposed to Kubernetes Dynamic Persistent Volume Storage for the DB. This will give us control when we have to restore an existing Database(s) into the SQL Server Container. Using this approach, we can take regular snapshots of our Azure Disk and restore from the snapshot when disaster strikes. Using Dynamic Provisioning we would not have the control to specify an existing Azure Disk for storage/restoration.
We will look at how we are gaining Performance, High Availability, and have a plan in for Disaster Recovery. (restoration from a Snapshot or restore DB from a backup file).

Databases are stateful. That means it needs storage that can be persisted. So we will first look at storage.

AKS Storage

Applications hosted on Azure Kubernetes Service (AKS) may need to store and retrieve data. The data storage requirements of many types:

Fast local data storage and that need not be persisted after the pod is deleted.
Data storage that needs to be persisted even after the pod is deleted or relocated to some other node in the cluster.
Storage may need to be shared between multiple pods.
Also, their Access Modes required by the applications (like read/write) will be different.
For some application workloads, this data storage can use local, fast storage on the node that is no longer needed when the pods are deleted.
Some storage may be used to inject configuration or sensitive data into pods.

Below we will address four concepts that provide storage to applications in AKS

Volumes
Persistent volumes
Storage classes
Persistent volume claims

Volumes

This is the storage and in Azure it comes in two forms.

Azure Disks (there are many flavors of this. Starting with HDD. SDD, Ultra SDD)
Azure Files For our SQL Server container we will be creating an Azure Disk for the data storage requirements. ###Persistent Volume### A persistent volume is a storage resource that is managed by the Kubernetes Master API that can exist beyond the lifetime of Pod. It can be statically created by the Kubernetes cluster or dynamically provisioned. We will be looking at static provisioning.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azure-disk-pv
  namespace: db
spec:
  capacity:
    storage: 80Gi
  storageClassName: ""
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  azureDisk:
    kind: Managed
    diskName: Kube_static_disk
    diskURI:  /subscriptions/15cxx96af-xxxxx-xxx-a760-1f58cxxxxxfe/resourceGroups/MC_maltax_southeastasia/providers/Microsoft.Compute/disks/Kube_static_disk

Storage Classes (SC)

A storage class defines the tier (Premium/Standard), Access Modes. Reclaim policy.

Persistent Volume Claim (PVC)

When an application requires some persistent storage from AKS it has an issue a claim or Persistent Volume Claim. This has to define the Storage Class, Access Mode, and Size.
If the annotation like below is set in the PVC; then Kubernetes will try to dynamically create the resource. Assuming a matching storageclass called my-storage-class is found.

volume.beta.kubernetes.io/storage-class: m-azure-disk

But in our case, we are going for Static provisioning. We already created an Azure Disk earlier 80GB size. We also created the Persistent Volume (PV).

Now let’s create a Persistent Volume Claim (PVC).

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mssql-data-pvc
  namespace: db
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 80Gi

As you can we don't have the annotation for Dynamic provisioning. Instead Kubernetes will map this PVC with earlier created PV.

The above screenshot confirms the PVC is mapped to PV. Or the claim is mapped to real storage(disk).

Now we got storage sorted. Let's deploy the SQL Server Container onto Kubernetes. Below is the file that has the Kubernetes SQL Server Deployment and Service details. Check the section under Volumes.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
  namespace: db
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mssql
        image: mcr.microsoft.com/mssql/server:2017-latest
        ports:
        - containerPort: 1433
        env:
        - name: MSSQL_PID
          value: "Developer"
        - name: ACCEPT_EULA
          value: "Y"
        - name: SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD 
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: mssql-data-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-deployment
  namespace: db
spec:
  selector:
    app: mssql
  ports:
    - protocol: TCP
      port: 1433
      targetPort: 1433
  type: LoadBalancer

We will deploy this. See the below screenshot.

We can see after around 40 seconds the SQL Server Instance is up and running. We are able to connect using sqlcmd command.

We can also see that the pod is running on the VM and below screenshot shows that the VM has mounted the storage disk that we provisioned earlier.

Key Parameters

Now SQL Server is set up and running; let's look at some key parameters.

Performance
High Availability
Disaster Recovery

Performance

In AKS a feature called Azure Accelerated Networking is turned by default. Since the Applications and our SQL Server Container Instance are deployed in the same AKS instance we are automatically able to gain these benefits of this feature.

Significantly improved network performance.
Network throughput of up to 30Gbps.
Reduced latency / higher packets per second (pps).
Reduced jitters.
Decreased CPU Utilization: Less CPU Utilization for processing network traffic.

Another way to gain performance is to use Ultra SSD which scale
performance up to 160,000* IOPS and 2 GB/s per disk with zero downtime.

High Availability

Container Level:
Kubernetes regularly check whether the SQL Server Containers Instances are running healthy. If for some reason the instance crashes or stop being responsive. Kubernetes restarts it. When I tried to delete a SQL Server Pod; Kubernetes instantly detected and spun a new Pod within 4 seconds. See the below screenshot.

Disaster Recovery

Our storage for the container is Azure Disks which provides an SLA of 99.999% and Microsoft had no reported outage till now. But still, we should be prepared for a Disaster. One of the options we have to take regular incremental snapshots of Azure Disks. If a disaster strikes, we can restore the most snapshot back to a Disk. This Disk can be then mounted onto a new SQL Server Instance. But since we have the data (maybe old by a few mins or hours based on our Recovery Point Objective (RPO) of Disaster Recovery strategy).

Another way to get back up and running when a disaster strike is to use SqlPackage command. We can regular backups (automated). Disaster strikes we can spin up a new SQL Server Instance and Azure Disk Storage. Then we will restore the bacpac file back on to the Azure Disk for use by SQL Server.

Blog