Horizontally autoscale a Data Plane

Kong Gateway Operator can deploy data planes that will horizontally autoscale based on user defined criteria.

This guide shows how to autoscale data planes based on their average CPU utilization.

Before we begin

Kong Gateway Operator uses Kubernetes HorizontalPodAutoscaler to perform horizontal autoscaling of data planes.

In order to be able to use HorizontalPodAutoscaler in your clusters you’ll need to have a metrics server installed. More info on the metrics server can be found in official Kubernetes docs.

Create a DataPlane with horizontal autoscaling enabled

To enable horizontal autoscaling, you must specify the spec.deployment.scaling section in your DataPlane resource to indicate which metrics should be used for decision making.

In the example below autoscaling is triggered based on CPU utilization. The DataPlane resource can have between 2 and 10 replicas, and a new replica will be launched whenever CPU utilization is above 50%.

The scaleUp configuration states that either 100% of existing replicas, or 5 new pods (whichever is higher) may be launched every 10 seconds. If you have 3 replicas, 5 pods may be created. If you have 50 replicas, up to 50 more pods may be launched.

The scaleDown configuration states that 100% of pods may be removed (with a minReplicas value of 2).

echo '
apiVersion: gateway-operator.konghq.com/v1beta1
kind: DataPlane
metadata:
  name: horizontal-autoscaling
spec:
  deployment:
    scaling:
      horizontal:
        minReplicas: 2
        maxReplicas: 10
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 50
        behavior:
          scaleUp:
            stabilizationWindowSeconds: 1
            policies:
            - type: Percent
              value: 100
              periodSeconds: 10
            - type: Pods
              value: 5
              periodSeconds: 10
            selectPolicy: Max
          scaleDown:
            stabilizationWindowSeconds: 1
            policies:
            - type: Percent
              value: 100
              periodSeconds: 10
    podTemplateSpec:
      spec:
        containers:
        - name: proxy
          image: kong/kong-gateway:3.10.0.1
          resources:
            requests:
              memory: "64Mi"
              cpu: "250m"
            limits:
              memory: "1024Mi"
              cpu: "1000m"
          # Add any Konnect-related configuration here: environment variables, volumes, and so on.
' | kubectl apply -f -

Please consult the CRD reference for all scaling options.

A DataPlane is created when the manifest above is applied. This creates 2 Pods running Kong Gateway, as well as a HorizontalPodAutoscaler which will manage the replica count of those Pods to ensure that the average CPU utilization is around 50%.

kubectl get hpa

The output will show the HorizontalPodAutoscaler resource:

NAME                     REFERENCE                                           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontal-autoscaling   Deployment/dataplane-horizontal-autoscaling-4q72p   2%/50%    2         10        2          30s

Test autoscaling with a load test

You can test if the autoscaling works by using a load testing tool (e.g. k6s) to generate traffic.

Fetch the DataPlane address and store it in the PROXY_IP variable:

 export PROXY_IP=$(kubectl get dataplanes.gateway-operator.konghq.com -o jsonpath='{.status.addresses[0].value}' horizontal-autoscaling)

Install k6s, then create a configuration file containing the following code:

 import http from "k6/http";
 import { check } from "k6";

 export const options = {
   insecureSkipTLSVerify: true,
   stages: [
     { duration: "120s", target: 5 },
   ],
 };

 // Simulated user behavior
 export default function () {
   let res = http.get(`https://${__ENV.PROXY_IP}`);
   check(res, { "status was 404": (r) => r.status == 404 });
 }

Start the load test.
```
k6 run k6.js
```

Observe the scaling events in the cluster while the test is running.

 kubectl get events --field-selector involvedObject.name=horizontal-autoscaling --field-selector involvedObject.kind=HorizontalPodAutoscaler

The output will show the scaling events:

 LAST SEEN   TYPE      REASON                         OBJECT                                           MESSAGE
 3m55s       Normal    SuccessfulRescale              horizontalpodautoscaler/horizontal-autoscaling   New size: 6; reason: cpu resource utilization (percentage of request) above target
 3m25s       Normal    SuccessfulRescale              horizontalpodautoscaler/horizontal-autoscaling   New size: 7; reason: cpu resource utilization (percentage of request) above target
 2m55s       Normal    SuccessfulRescale              horizontalpodautoscaler/horizontal-autoscaling   New size: 10; reason: cpu resource utilization (percentage of request) above target
 85s         Normal    SuccessfulRescale              horizontalpodautoscaler/horizontal-autoscaling   New size: 2; reason: All metrics below target

The DataPlane’s status field will also be updated with the number of ready/target replicas:

 kubectl get dataplanes.gateway-operator.konghq.com horizontal-autoscaling -o jsonpath-as-json='{.status}'

 [
     {
         ...
         "readyReplicas": 2,
         "replicas": 2,
         ...
     }
 ]