Kubernetes CRD Versioning for Operator Developers

#Go   #Operator-sdk   #Kubernetes   #API  

Some guidelines for Kubernetes operator developers on how to version their CRDs

CRDs are similar to K8s built-in types and the expectation for operator developers is to follow same guidelines when it comes to their versioning. But k8s doesn’t enforce them.

CRD basics:

  • CRD can define multiple versions of the custom resource.
  • A version can be marked as served or not served.
  • Only one version can be marked as storage version — CR version ultimately stored in etcd.
  • If there is schema difference across versions, conversion webhooks are needed to convert between versions when necessary.

K8s recommended approach for versioning

  • Don’t make backward incompatible CRD API changes.
  • Adding a required new field or removing a filed is a backward incompatible change to the API. And doing so makes all the old versions immediately unusable.
  • If you want to still continue using the old versions they need to be updated with the new field as an optional parameter.
  • Because you must be able to roundtrip between versions without losing the data. As only one version is ultimately stored in etcd, it need to be converted to different versions based on the user request.
  • Making an incompatible/breaking change needs version bump.
  • And don’t use semantic versioning(eg: v1.2.3)
  • Version bump isn’t always associated with a API schema change.
  • For example: v1alpha, v1beta and v1 indicate the stability levels but not necessarily any backward incompatible changes.
  • Other possible ways of breaking changes are renaming a field or moving a field in the API schema.
  • Version bumps in this case can be handled using conversion webhooks. Webhook is provided by the operator developer when there is schema difference between versions. If no schema difference, k8s handles the version conversions.
  • K8s never removes a filed from the schema. Instead it just marks it as deprecated.
  • Similarly, K8s doesn’t add a required field, instead adds an optional field with a sane default value.
  • For example: when a new field is added, the new version of the operator would still behave as the old version when the new field is not provided in user input. This is more like a feature flag for the new features in the operator.
  • This is all why you never see most built-in type going beyond v1.

CRD Examples

I looked at the CRDs for some of the operator deployed apps. Many of them doesn’t have a conversion webhook. This is because either many of them don’t make any schema changes or don’t understand the need for conversion hooks when there are schema changes across crd versions. Even if the old versions are not actively served conversion hooks are still needed to support converting already stored old version CRs in the etcd to latest version.

Percona

crd: https://github.com/percona/percona-xtradb-cluster-operator/blob/main/deploy/crd.yaml

While Percona application strive to always make backward compatible changes(all crd versions are served), it doesn’t follow k8s approach. This is because

  • semantic version is used.
  • Version bumps without any schema changes. I think they do it when some major feature are added to the app the operator deploys. This is an anti-pattern.

CertManager

crd: https://github.com/jetstack/cert-manager/blob/master/deploy/crds/crd-certificaterequests.yaml

This is one CRD where conversion webhook is used. Looks this is because of api field renaming from “csr” to “request”. Although a breaking api change not a backward incompatible change, so they continued to serve both versions and the conversion is handled by webhook. This operator follows the k8s guidelines.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Discussed tools
Kubectl