Deploying Stateful Microservices: Persisting Data in Kubernetes
Persisting Data Storage in PostgreSQL
Kubernetes Volumes and the USB Drive Analogy
Containers are ephemeral by nature. When a PostgreSQL instance runs inside a container, it will store its data in the container's filesystem and, more specifically, in the writable layer of the container. If the container is deleted or recreated, all the data stored in that writable layer will be lost. To avoid such data loss, we need to use Kubernetes Volumes to persist the data.
Think of Volumes as plugging an external USB drive into your computer (Pod) to store your important files (PostgreSQL data, in this case).
All processes running on your computer (containers in the Pod) can read and write files to that USB drive (Volume).
In our example, PostgreSQL will store its data in the /var/lib/postgresql/data/postgres directory; therefore, our USB drive (Volume) needs to be mounted to that directory inside the computer (Pod).
If your computer (Pod) crashes, reboots, gets replaced, or even loses power, the files on the USB drive (Volume) remain intact and can be accessed again when you plug it back in. The PostgreSQL process (container) can continue to read and write files to the USB drive (Volume) as if nothing happened.
VolumeClaims: The Abstraction Layer for Storage in Kubernetes
To manage external resources, Kubernetes always uses an abstraction layer. For example, when you want to create an external LoadBalancer, you don't create it directly; instead, you create a Service of type LoadBalancer, and Kubernetes takes care of creating the actual LoadBalancer in your cloud provider. The same thing applies to storage.
When you want to use external storage in Kubernetes (volume), you don't create it directly; instead, you create a PersistentVolumeClaim (PVC), which is an abstraction layer that allows you to request storage resources without needing to know the details of how the storage is provided.
StorageClass: Defining the Type of Storage
There are multiple types of storage that can be used in Kubernetes, such as local storage, network-attached storage (NAS), cloud storage, etc. Each type of storage has its own characteristics, performance, cost, pros, cons, and use cases. To manage these different types of storage, Kubernetes uses the StorageClass resource. This resource is usually created by the cluster administrator to define the different types of storage available in the cluster.
If you are running an on-premises cluster, you may need to create the StorageClass yourself based on the storage solution you are using. In this case, you need to set up a CSI-compliant storage driver for your storage solution. CSI (Container Storage Interface) is a standard for exposing storage systems to containerized workloads on Kubernetes. Examples of technologies that are CSI-compliant include Rook, OpenEBS, Longhorn, Ceph, etc. The cluster administrator can then create a StorageClass that uses the driver to provision storage dynamically.
In most cloud providers, the StorageClass is created by default when you create a Kubernetes cluster. You can check the available StorageClasses in your cluster with the following command:
kubectl get storageclass
This is, for example, how DigitalOcean defines its default StorageClass (run kubectl get storageclass do-block-storage -o yaml to see the full definition):
# The version of the StorageClass API
apiVersion: storage.k8s.io/v1
# The kind of resource
kind: StorageClass
# Whether the volume can be expanded after creation
allowVolumeExpansion: true
# The provisioner that will be used to create the volume
# dobs.csi.digitalocean.com is the CSI driver for DigitalOcean Block Storage
provisioner: dobs.csi.digitalocean.com
# The reclaim policy for the volume
# "Delete" means that the volume will be deleted when the claim is deleted
reclaimPolicy: Delete
# The binding mode for the volume
# "Immediate" means that the volume will be bound to a claim as soon as it is created
volumeBindingMode: Immediate
metadata:
annotations:
# Annotation indicating that this is the default storage class
storageclass.kubernetes.io/is-default-class: "true"
labels:
c3.doks.digitalocean.com/component: csi-controller-service
c3.doks.digitalocean.com/plane: data
doks.digitalocean.com/managed: "true"
# The name of the StorageClass
name: do-block-storage
The default StorageClass in DigitalOcean is named do-block-storage, which uses the CSI driver dobs.csi.digitalocean.com to provision DigitalOcean Block Storage volumes dynamically. If you were using another cloud provider, the StorageClass and the provisioner would be different. Here are some examples of common cloud providers and their respective provisioners:
| Cloud / Platform | Example provisioner |
|---|---|
| AWS EBS | ebs.csi.aws.com |
| Google Cloud Persistent Disk | pd.csi.storage.gke.io |
| Azure Disk | disk.csi.azure.com |
| OpenEBS | openebs.io/local |
| NFS | nfs.csi.k8s.io |
You can check the documentation of your cloud provider or storage solution to find out which provisioner to use. Usually, the provisioner is provided by the CSI driver that you have installed in your cluster. Most popular CSI drivers are open-source and maintained by the community. Here are some public repositories:
Cloud-Native Microservices With Kubernetes - 2nd Edition
A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in KubernetesEnroll now to unlock all content and receive all future updates for free.
