Horizontal Pod Autoscaler in Practice

averageUtilization: Scaling Based on Relative Resource Usage

To configure how horizontal autoscaling works in our cluster, we need to create a HorizontalPodAutoscaler resource.

For example, to create an HPA that scales our stateless Flask API based on CPU usage, we can use the following manifest:

kubectl apply -f - <
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: stateless-flask-hpa
  namespace: stateless-flask
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: stateless-flask
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        # Scale when CPU usage exceeds 90% of the requested CPU
        averageUtilization: 90
EOF

Here is an explanation of the fields used in the manifest above:

scaleTargetRef: Specifies the target resource to scale, in this case, the stateless-flask Deployment.
minReplicas: The minimum number of replicas that the HPA will maintain.
maxReplicas: The maximum number of replicas that the HPA can scale up to.
metrics: Defines the metrics to monitor for scaling. In our case, we are telling K8s to scale based on CPU usage:
- If the average CPU usage across all Pods exceeds 90% of the requested CPU, the HPA will increase the number of replicas.
- If the average CPU usage falls below 90%, the HPA will decrease the number of replicas, but not below the minimum specified.

The goal of the HPA here is to keep the average CPU usage of the Pods around 90% of the requested CPU.

In other words, averageUtilization: 90 tells Kubernetes to keep the mean of (currentCPU / requestedCPU) across the target Pods near 90%.

You can see the HPA by running the following command:

kubectl get hpa -n stateless-flask

You should see output similar to the following:

NAME                 REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
stateless-flask-hpa   Deployment/stateless-flask   0%/90%    1         10        1          2m

If you see cpu:/90%, this could mean different things:

The Metrics Server is not installed.
The Metrics Server is not running properly.
The Metrics Server is not able to collect and serve metrics.
etc.

But the most important thing—something that we often forget—is that the Pods must have resource requests defined for CPU (and/or memory) in order for the HPA to work. When we defined the HPA, we specified that we want to scale based on CPU usage. This CPU usage is relative to the requested CPU. If this value is not set, the HPA cannot relativize.

Let's set up the target Deployment with resource requests and limits:

kubectl apply -f - <
# Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: stateless-flask
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateless-flask
  namespace: stateless-flask
spec:
  # replicas: 1 # No need to specify replicas, HPA will manage it
  selector:
    matchLabels:
      app: stateless-flask
  template:
    metadata:
      labels:
        app: stateless-flask
    spec:
      containers:
      - name: stateless-flask
        image: eon01/stateless-flask:v0
        imagePullPolicy: Never
        ports:
        - containerPort: 5000
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
---
# ClusterIP Service
apiVersion: v1
kind: Service
metadata:
  name: stateless-flask
  namespace: stateless-flask
spec:
  selector:
    app: stateless-flask
  ports:
  - protocol: TCP
    port: 5000
    targetPort: 5000
  type: ClusterIP
EOF

Recheck the HPA and you should see the CPU target properly populated:

kubectl get hpa -n stateless-flask

To verify that the HPA is working correctly, we can stress the API to see how the HPA works. However, the target CPU usage is set to 90% of the requested CPU (which is 100m), so we either need to generate a lot of load or lower the target to something like 1% for testing purposes.

Run the following commands:

# Lower the target to 1%
kubectl patch hpa stateless-flask-hpa -n stateless-flask \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/metrics/0/resource/target/averageUtilization","value":1}]'

# Generate load
kubectl run -i --tty load-generator-$(date +%s) -n stateless-flask \
  --rm --image=busybox:1.28 --restart=Never -- \
  /bin/sh -c \
  "while sleep 0.01; do wget -q -O- http://stateless-flask:5000/tasks; done"

After a short while, run the following command to watch the HPA in action:

kubectl get hpa -n stateless-flask -w

It's worth noting that HPA won’t scale below minReplicas or above maxReplicas. It also uses stabilization windows (default ~5 min for scale down) and tolerances to avoid flapping, so adjustments aren’t instantaneous. By default, metrics are checked every 15 seconds, but this can be configured.

We can also scale based on memory usage. Here's an example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: stateless-flask-hpa-memory
  namespace: stateless-flask
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: stateless-flask
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 90

Or based on both CPU and memory usage:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: stateless-flask-hpa-cpu-memory
  namespace: stateless-flask
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: stateless-flask
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type

Cloud-Native Microservices With Kubernetes - 2nd Edition

A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in Kubernetes

Enroll now to unlock all content and receive all future updates for free.

Unlock now $31.99 Learn More

Previous Next