Kubernetes: PersistentVolume and PersistentVolumeClaim — an overview with examples

setevoy

Arseny Zinchenko

Posted on August 5, 2020

Kubernetes: PersistentVolume and PersistentVolumeClaim — an overview with examples

Kubernetes: PersistentVolume and PersistentVolumeClaim — an overview with examples

For the persistent data Kubernetes provides two main types of objects — the PersistentVolume and PersistentVolumeClaim.

PersistentVolume — is a storage device and a filesystem volume on it, for example, it could be AWS EBS, which is attached to an AWS EC2, and from the cluster’s perspective of view, a PersistentVolume is a similar resource like let’s say a Kubernetes Worker Node.

PersistentVolumeClaim in its turn is a request to use such a PersistentVolume resource and is similar to a Kubernetes Pod — as a pod is requesting a WorkernNode’s resource, a PersistentVolumeClaim will request resources from a PersistentVolume: as a Pod requesting a CPU, memory from a WorkerNode — a PersistentVolumeClaim will request a necessary storage size and an access type — ReadWriteOnce, ReadOnlyMany, or ReadWriteMany, see the AccessModes.

A PersistentVolume can be created in two ways — a static, and dynamic (recommended one).

When creating a PV statically, you’ll have to create a storage device first, for example, AWS EBS, which will be used by a PersistentVolume.

In case of a cluster wasn’t able to find an appropriate PV for a PersistentVolumeClaim н- it can create a new storage device exactly for this PVC — this will be the dynamic PV creation way.

To make this works a PVC has to have a Storage Class set the same, and this class has to be supported by a cluster.

For example, for the AWS EKS, we have the gp2 StorageClass:

$ kubectl get storageclass
NAME PROVISIONER AGE
gp2 (default) kubernetes.io/aws-ebs 64d

Storage types

For a better understanding of the PersistentVolume concept — let’s see all available storages:

  • Node-local storage (emptyDir and hostPath)
  • Cloud volumes (for example, awsElasticBlockStore, gcePersistentDisk, and azureDiskVolume)
  • File-sharing volumes, such as Network File System
  • Distributed-file systems (for example, CephFS, RBD, and GlusterFS)
  • special types such as PersistentVolumeClaim, secret, and gitRepo

emptyDir and hostPath are attached to pods directly and can store data only while such a pod is alive, while cloud volumes, NFS, and PersistentVolume are independent of pods and will store data until such a volume will be deleted.

Create a PersistentVolumeClaim

Static PersistentVolume provisioning

Create an EBS

For the Static provisioning first, we need to create a storage device, in this case, it will be AWS EBS, and then we will create a PersistentVolume that will use this EBS.

Create an EBS:

$ aws ec2 --profile arseniy --region us-east-2 create-volume --availability-zone us-east-2a --size 50
{
“AvailabilityZone”: “us-east-2a”,
“CreateTime”: “2020–07–29T13:10:12.000Z”,
“Encrypted”: false,
“Size”: 50,
“SnapshotId”: “”,
“State”: “creating”,
“VolumeId”: “vol-0928650905a2491e2”,
“Iops”: 150,
“Tags”: [],
“VolumeType”: “gp2”
}

Store its ID — “vol-0928650905a2491e2”.

Create a PersistentVolume

Write a manifest file, let’s call it pv-static.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-static
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: gp2
  awsElasticBlockStore:
    fsType: ext4
    volumeID: vol-0928650905a2491e2

Here:

  • capacity: storage size
  • accessModes: access type, here it is the ReadWriteOnce, which means that this PV can be attached to an only one WorkerNode at the same time
  • storageClassName: storage access, see below
  • awsElasticBlockStore: used device type
  • fsType: a filesystem type to be created on this volume
  • volumeID: an AWS EBS disc ID

Create the PersistentVolume:

$ kubectl apply -f pv-static.yaml
persistentvolume/pv-static created

Check it:

$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-static 5Gi RWO Retain Available 69s

StorageClass

The storageClassName parameter will set the storage type.

Both PVC and PV must have the same class, otherwise, a PVC will not find a PV, and STATUS of such a PVC will be Pending.

If a PVC has no StorageClass set - then a default value will be used:

$ kubectl get storageclass -o wide
NAME PROVISIONER AGE
gp2 (default) kubernetes.io/aws-ebs 65d

During this, if the StorageClass is not set for a PV - this PV will be crated without class, and our PVC with the default class will not be able to use this PV with the " Cannot bind to requested volume "pvname": storageClassName does not match" error:

…
Events:
Type Reason Age From Message
 — — — — — — — — — — — — -
Warning VolumeMismatch 12s (x17 over 4m2s) persistentvolume-controller Cannot bind to requested volume “pvname”: storageClassName does not match
…

See documentation here>>> and here>>>.

Create a PersistentVolumeClaim

Now, we can create a PersistentVolumeClaim which will use the PersistentVolume we’ve created above to the pvc-static.yaml file:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-static
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  volumeName: pv-static

Create this PVC:

$ kubectl apply -f pvc-static.yaml
persistentvolumeclaim/pvc-static created

Check it:

$ kubectl get pvc pvc-static
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-static Bound pv-static 5Gi RWO gp2 31s

Dynamic PersistentVolume provisioning

The dynamic way to create a PersistentVolume is similar to the static with the only difference that you don’t need to create an AWS EBS and PersistentVolume resources manually — instead, you’ll just create a PersistentVolumeClaim object and Kubernetes will create an EBS via AWS API and will mount to an AWS EC2 which is playing the WorkerNode role in the Kubernetes cluster:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-dynamic
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Create this PVC:

$ kubectl apply -f pvc-dynamic.yaml
persistentvolumeclaim/pvc-dynamic created

Check it:

$ kubectl get pvc pvc-dynamic
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-dynamic Pending gp2 45s

Okay, but why it’s in the Pending STATUS? Check its Events:

$ kubectl describe pvc pvc-dynamic
…
Events:
Type Reason Age From Message
 — — — — — — — — — — — — -
Normal WaitForFirstConsumer 1s (x4 over 33s) persistentvolume-controller waiting for first consumer to be created before binding
Mounted By: <none>

WaitForFirstConsumer

Let’s see our default StorageClass's setting:

$ kubectl describe sc gp2
Name: gp2
IsDefaultClass: Yes
…
Provisioner: kubernetes.io/aws-ebs
Parameters: fsType=ext4,type=gp2
…
VolumeBindingMode: WaitForFirstConsumer
Events: <none>

Here, the VolumeBindingMode defines how exactly a PersistentVolume will be created. With the Immediate value such a PV will be created immediately when a requester VPC will appear, but with the WaitForFirstConsumer as in this case - Kubernetes will wait for a first consumer such as a pod, which will request this PV, and then depending on an AvailbiltyZone of a WorkerNode where this pod is running - Kubernetes will create a new PV and an AWS EBS disc.

Now, let’s create pods to consume those volumes.

Using PersistentVolumeClaim in Pods

Dynamic PersistentVolumeClaim

Let’s describe a pod which will use our dynamic PVC:

apiVersion: v1
kind: Pod
metadata:
  name: pv-dynamic-pod
spec:
  volumes:
    - name: pv-dynamic-storage
      persistentVolumeClaim:
        claimName: pvc-dynamic
  containers:
    - name: pv-dynamic-container
      image: nginx
      ports:
        - containerPort: 80
          name: "nginx"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: pv-dynamic-storage

Here:

  • volumes:
  • persistentVolumeClaim:
  • claimName: a PVC name which will be requested when a pod will be created
  • containers:
  • volumeMounts: mount the pv-dynamic-storage volume to the /usr/share/nginx/html directory in the pod

Create it:

$ kubectl apply -f pv-pods.yaml
pod/pv-dynamic-pod created

Check again our PVC:

$ kubectl get pvc pvc-dynamic
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-dynamic Bound pvc-6d024b40-a239–4c35–8694-f060bd117053 5Gi RWO gp2 21h

Now we can see a new Volume with the ID pvc-6d024b40-a239–4c35–8694-f060bd117053  —  check it:

$ kubectl describe pvc pvc-dynamic
Name: pvc-dynamic
Namespace: default
StorageClass: gp2
Status: Bound
Volume: pvc-6d024b40-a239–4c35–8694-f060bd117053
…
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 5Gi
Access Modes: RWO
VolumeMode: Filesystem
Events: <none>
Mounted By: pv-dynamic-pod

Check that volume:

$ kubectl describe pv pvc-6d024b40-a239–4c35–8694-f060bd117053
Name: pvc-6d024b40-a239–4c35–8694-f060bd117053
…
StorageClass: gp2
Status: Bound
Claim: default/pvc-dynamic
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 5Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [us-east-2b]
failure-domain.beta.kubernetes.io/region in [us-east-2]
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-2b/vol-040a5e004876f1a40
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>

And the AWS EBS vol-040a5e004876f1a40:

$ aws ec2 — profile arseniy — region us-east-2 describe-volumes — volume-ids vol-040a5e004876f1a40 — output json
{
“Volumes”: [
{
“Attachments”: [
{
“AttachTime”: “2020–07–30T11:08:29.000Z”,
“Device”: “/dev/xvdcy”,
“InstanceId”: “i-0a3225e9fe7cb7629”,
“State”: “attached”,
“VolumeId”: “vol-040a5e004876f1a40”,
“DeleteOnTermination”: false
}
],
…

Check inside of the pod:

$ kk exec -ti pv-dynamic-pod bash
root@pv-dynamic-pod:/# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 50G 0 disk
|-nvme0n1p1 259:1 0 50G 0 part /etc/hosts
`-nvme0n1p128 259:2 0 1M 0 part
nvme1n1 259:3 0 5G 0 disk /usr/share/nginx/html

nvme1n1 — here is our partition.

Let’s write some data:

root@pv-dynamic-pod:/# echo Test > /usr/share/nginx/html/index.html

Drop the pod:

$ kk delete pod pv-dynamic-pod
pod “pv-dynamic-pod” deleted

Re-create it:

$ kubectl apply -f pv-pods.yaml
pod/pv-dynamic-pod created

Check the data:

$ kk exec -ti pv-dynamic-pod cat /usr/share/nginx/html/index.html
Test

Everything is still in its place.

Static PersistentVolumeClaim

Now, let’s try to use our statically created PV.

We can use the same manifest  -  the pv-static.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-static
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: gp2
  awsElasticBlockStore:
    fsType: ext4
    volumeID: vol-0928650905a2491e2

And let’s use the pvc-static.yaml manifest for our PVC:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-static
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  volumeName: pv-static

Create the PV:

$ kk apply -f pv-static.yaml
persistentvolume/pv-static created

Check it:

$ kk get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-static 5Gi RWO Retain Available gp2 58s
…

Create the PVC:

$ kk apply -f pvc-static.yaml
persistentvolumeclaim/pvc-static created

Check it:

$ kk get pvc pvc-static
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-static Bound pv-static 5Gi RWO gp2 9s

STATUS Bound means that PVC was able to find its PV and was successfully connected.

Pod nodeAffinity

Next, we need to determine an AWS AvailabilityZone where is our AWS EBS for the Static PV was created:

$ aws ec2 — profile arseniy — region us-east-2 describe-volumes — volume-ids vol-0928650905a2491e2 — query '[Volumes[\*].AvailabilityZone]' — output text
us-east-2a

us-east-2a  -  okay, then we need to create a pod on a Kubernetes Worker Node in the same AvailabilityZone.

Create a manifest:

apiVersion: v1
kind: Pod
metadata:
  name: pv-static-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: failure-domain.beta.kubernetes.io/zone
            operator: In
            values:
            - us-east-2a
  volumes:
    - name: pv-static-storage
      persistentVolumeClaim:
        claimName: pvc-static
  containers:
    - name: pv-static-container
      image: nginx
      ports:
        - containerPort: 80
          name: "nginx"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: pv-static-storage

As opposed to the Dynamic PVC  -  here we’ve used the nodeAffinity to specify that we want to use a node from the s-east-2a AZ.

Create that pod:

$ kk apply -f pv-pod-stat.yaml
pod/pv-static-pod created

Check events:

0s Normal Scheduled Pod Successfully assigned default/pv-static-pod to ip-10–3–47–58.us-east-2.compute.internal
0s Normal SuccessfulAttachVolume Pod AttachVolume.Attach succeeded for volume “pv-static”
0s Normal Pulling Pod Pulling image “nginx”
0s Normal Pulled Pod Successfully pulled image “nginx”
0s Normal Created Pod Created container pv-static-container
0s Normal Started Pod Started container pv-static-container

Partitions in the pod:

$ kk exec -ti pv-static-pod bash
root@pv-static-pod:/# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 50G 0 disk
|-nvme0n1p1 259:1 0 50G 0 part /etc/hosts
`-nvme0n1p128 259:2 0 1M 0 part
nvme1n1 259:3 0 50G 0 disk /usr/share/nginx/html

nvme1n1 is mounted, all works.

PersistentVolume nodeAffinity

Another option could be nodeAffinity for the PersistentVolume.

Is this case when creating a pod that will use this PV, Kubernetes first will check which Worker Nodes can be used to attach this volume to, and then will create a pod on such a node.

In the pod’s manifest delete the nodeAffinity:

apiVersion: v1
kind: Pod
metadata:
  name: pv-static-pod
spec:
  volumes:
    - name: pv-static-storage
      persistentVolumeClaim:
        claimName: pvc-static
  containers:
    - name: pv-static-container
      image: nginx
      ports:
        - containerPort: 80
          name: "nginx"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: pv-static-storage

And add to the PV’s manifest:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-static
spec:
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - us-east-2a    
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: gp2
  awsElasticBlockStore:
    fsType: ext4
    volumeID: vol-0928650905a2491e2

Create this PV:

$ kk apply -f pv-static.yaml
persistentvolume/pv-static created

Create its PVC -  nothing was changed here:

$ kk apply -f pvc-static.yaml
persistentvolumeclaim/pvc-static created

Create the pod:

$ kk apply -f pv-pod-stat.yaml
pod/pv-static-pod created

Check logs:

0s Normal Scheduled Pod Successfully assigned default/pv-static-pod to ip-10–3–47–58.us-east-2.compute.internal
0s Normal SuccessfulAttachVolume Pod AttachVolume.Attach succeeded for volume “pv-static”
0s Normal Pulling Pod Pulling image “nginx”
0s Normal Pulled Pod Successfully pulled image “nginx”
0s Normal Created Pod Created container pv-static-container
0s Normal Started Pod Started container pv-static-container

Delete PersistentVolume and PersistentVolumeClaim

When a user wants to delete a PVC that is currently used by a live pod, such a PVC will not be deleted immediately -  it will be present until a corresponding pod is running.

Similarly, when deleting a PersistentVolume that has a binding from a PersistentVolumeClaim such a PV will not be deleted until such a binding present, e.g. until its PVC is present.

Reclaiming

Documentation is here>>>.

When we want to finish work with our PersistentVolume, we can delete it from a cluster to release a corresponding AWS EBS (reclaim).

The Reclaim policy for a PersistentVolume specifies to a cluster what it has to do with such a released volume and can have Retained, Recycled, or Deleted values.

Retain

The Retain policy allows us to clean up a disk manually.

After deleting related PersistentVolumeClaim, a PersistentVolume will not be deleted, and will be marked as “released”, but it will be available for new PersistentVolumeClaims as it still keeps some data from the previous PersistentVolumeClaim.

To make it available for the next use, you need to delete the PersistentVolume object from the cluster.

Delete

With the Delete value, when you delete a PVC it will drop its corresponding PersistentVolume and volume's device such as AWS EBS, GCE PD, or Azure Disk.

Keep in mind, that volumes created in the dynamic way will inherit policy from the StorageClass used, which is by default set to the Delete.

Recycle

Deprecated, was used to delete a data via common rm -rf.

Deleting PV and PVC — an example

So, we have a pod running:

$ kk get pod pv-static-pod
NAME READY STATUS RESTARTS AGE
pv-static-pod 1/1 Running 0 19s

Which is using a PVC:

$ kk get pvc pvc-static
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-static Bound pv-static 50Gi RWO gp2 19h

And this PVC is bound to the PV:

$ kk get pv pv-static
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-static 50Gi RWO Retain Bound default/pvc-static gp2 19h

And our PV has its RECLAIM POLICY set to the Retain - so, after we will drop its PVC and PV all data must be kept.

Let’s check — add some data:

$ kk exec -ti pv-static-pod bash
root@pv-static-pod:/# echo Test > /usr/share/nginx/html/test.txt
root@pv-static-pod:/# cat /usr/share/nginx/html/test.txt
Test

Exit from the pod and delete it, and then its PVC:

$ kubectl delete pod pv-static-pod
pod “pv-static-pod” deleted
kubectl delete pvc pvc-static
persistentvolumeclaim “pvc-static” deleted

Check the PV’s status:

$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-static 50Gi RWO Retain Released default/pvc-static gp2 25s

STATUS == Released, and at this moment we are not able to attach this volume again via a new PVC.

Let’s check -  create a PVC again:

$ kubectl apply -f pvc-static.yaml
persistentvolumeclaim/pvc-static created

Create a pod:

$ kubectl apply -f pv-pod-stat.yaml
pod/pv-static-pod created

And check its PVC status:

$ kubectl get pvc pvc-static
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-static Pending pv-static 0 gp2 59s

The STATUS is Pending.

Delete the pod, PVC and at this time  -  delete the PersistentVolume too:

$ kubectl delete -f pv-pod-stat.yaml
pod “pv-static-pod” deleted

$ kubectl delete -f pvc-static.yaml
persistentvolumeclaim “pvc-static” deleted

$ kubectl delete -f pv-static.yaml
persistentvolume “pv-static” deleted

Create all over again:

$ kubectl apply -f pv-static.yaml
persistentvolume/pv-static created

$ kubectl apply -f pvc-static.yaml
persistentvolumeclaim/pvc-static created

$ kubectl apply -f pv-pod-stat.yaml
pod/pv-static-pod created

Check the PVC:

$ kubectl get pvc pvc-static
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-static Bound pv-static 50Gi RWO gp2 27s

And check the data we’ve added earlier:

$ kubectl exec -ti pv-static-pod cat /usr/share/nginx/html/test.txt
Test

All good  -  the data is still in its place.

Changing Reclaim Policy for PersistentVolume

Documentation is here>>>.

Currently, our PV has the Retain value:

$ kubectl get pv pv-static -o jsonpath=’{.spec.persistentVolumeReclaimPolicy}’
Retain

Apply a patch  -  update its persistentVolumeReclaimPolicy parameter to the Delete value:

$ kubectl patch pv pv-static -p ‘{“spec”:{“persistentVolumeReclaimPolicy”:”Delete”}}’
persistentvolume/pv-static patched

Check it:

$ kubectl get pv pv-static -o jsonpath=’{.spec.persistentVolumeReclaimPolicy}’
Delete

Delete the pod and its PVC:

$ kubectl delete -f pv-pod-stat.yaml
pod “pv-static-pod” deleted

$ kubectl delete -f pvc-static.yaml
persistentvolumeclaim “pvc-static” deleted

Check the PersistentVolume:

$ kubectl get pv pv-static
Error from server (NotFound): persistentvolumes “pv-static” not found

And an AWS EBS which was used for this PV:

$ aws ec2 --profile arseniy --region us-east-2 describe-volumes --volume-ids vol-0928650905a2491e2

An error occurred (InvalidVolume.NotFound) when calling the DescribeVolumes operation: The volume ‘vol-0928650905a2491e2’ does not exist.

Actually, that’s all.

Useful links

Originally published at RTFM: Linux, DevOps, and system administration.


💖 💪 🙅 🚩
setevoy
Arseny Zinchenko

Posted on August 5, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related