Establish a Kong Gateway performance benchmark

While Kong Gateway is optimized out-of-the-box, there are still situations where tweaking some configuration options for Kong Gateway can substantially increase its performance. You can establish a baseline for performance by running an initial benchmark of Kong Gateway, optimizing the kong.conf file using the recommendations in this guide, and then conducting several additional benchmark tests.

This guide explains the following:

How to establish an initial Kong Gateway performance benchmark
How to optimize Kong Gateway performance before performing additional benchmarks
How to configure your kong.conf for benchmarking

Prerequisites

Before you conduct a benchmark test, you must make sure the testbed is configured correctly. Here are a few general recommendations before you begin the benchmark tests:

Use fewer nodes of Kong Gateway with 4 or 8 NGINX workers with corresponding CPU resource allocations rather than many smaller Kong Gateway nodes.
Run Kong Gateway in DB-less or hybrid mode. In these modes, Kong Gateway’s proxy nodes aren’t connected to a database, which can become another variable that might affect performance.

Perform a baseline Kong Gateway performance benchmark

Once you have implemented the recommendations in the prerequisites, you can begin the benchmark test:

Configure a route with a Request Termination plugin and measure Kong Gateway’s performance. In this case, Kong Gateway responds to the request and doesn’t send any traffic to the upstream server.
Run this test a few times to spot unexpected bottlenecks. Either Kong Gateway, the benchmarking client (such as k6 or Apache JMeter), or some other component will likely be an unexpected bottleneck. You should not expect higher performance from Kong Gateway until you solve these bottlenecks. Proceed to the next step only after this baseline performance is acceptable to you.
Once you have established the baseline, configure a route to send traffic to the upstream server without any plugins. This measures Kong Gateway’s proxy and your upstream server’s performance.
Verify that no components are unexpectedly causing a bottleneck before proceeding.
Run the benchmark multiple times to gain confidence in the data. Ensure that the difference between observations isn’t high (there’s a low standard deviation).
Discard the stats collected by the benchmark’s first one or two iterations. We recommend doing this to ensure that the system is operating at an optimal and stable level.

Only after the previous steps are completed should you proceed with benchmarking Kong Gateway with additional configuration. Carefully read the optimization recommendations in the following sections and make any changes to the configuration as needed before performing additional benchmarks.

Optimize Kong Gateway performance

The subsections in this section detail recommendations to improve your Kong Gateway performance for additional benchmark tests. Read each section carefully and make any necessary adjustments to your configuration file.

Check the `ulimit`

Action: Increase the ulimit if it’s less than 16384.

Explanation: While Kong Gateway can use as many resources as it can get from the system, the operating system (OS) limits the number of connections Kong Gateway can open with the upstream (or any other) server, or that it can accept from the client. The number of open connections in Kong Gateway defaults to the ulimit with an upper bound of 16384. This means that if the ulimit is unlimited or is a value higher than 16384, Kong Gateway limits itself to 16384.

You can shell into Kong Gateway’s container or VM and run ulimit -n to check the system’s ulimit. If Kong Gateway is running inside a container on top of a VM, you must shell into the container. If the value of ulimit is less than 16384, increase it. Also check and set the appropriate ulimit in the client and upstream server, since a connection bottleneck in these systems leads to suboptimal performance.

Increase connection reuse

Action: Configure upstream_keepalive_max_requests = 100000 and nginx_http_keepalive_requests = 100000.

Explanation: In high throughput scenarios with 10 000 or more RPS, the overhead of setting up TCP and TLS connections or insufficient connections can result in under utilization of network bandwidth or the upstream server. To increase connection re-use, you can increase upstream_keepalive_max_requests and nginx_http_keepalive_requests to 100000, or all the way up to 500000.

Avoid auto-scaling

Action: Ensure that Kong Gateway is not scaled in/out (horizontal) or up/down (vertical).

Explanation: During a benchmarking run, ensure that Kong Gateway is not scaled in/out (horizontal) or up/down (vertical). In Kubernetes, this is commonly done using a Horizontal or Vertical Pod autoscaler. Autoscalers interfere with statistics in a benchmark and introduce unnecessary noise.

Scale Kong Gateway out before testing the benchmark to avoid auto-scaling during the benchmark. Monitor the number of Kong Gateway nodes to ensure new nodes are spawned during the benchmark and existing nodes are not replaced.

Use multiple cores effectively

Action: On most VM setups, set nginx_worker_processes to auto. On Kubernetes, set nginx_worker_processes to one or two less than the worker node CPUs.

Explanation: Make sure nginx_worker_processes is configured correctly:

On most VM setups, set this to auto. This is the default setting. This ensures that NGINX spawns one worker process for each CPU core, which is desired.
We recommend setting this explicitly in Kubernetes. Ensure CPU requests and limits for Kong Gateway match the number of workers configured in Kong Gateway. For example, if you configure nginx_worker_processes=4, you must request 4 CPUs in your pod spec.

If you run Kong Gateway pods on Kubernetes worker nodes with n CPUs, allocate n-2 or n-1 to Kong Gateway, and configure a worker process count equal to this number. This ensures that any configured daemons and Kubernetes processes, like kubelet, don’t contend for resources with Kong Gateway.

Each additional worker uses additional memory, so you must ensure that Kong Gateway isn’t triggering the Linux Out-of-Memory Killer.

Resource contention

Action: Make sure the client (like Apache JMeter or k6), Kong Gateway, and upstream servers are on different machines (VM or bare metal) and run on the same local network with low latencies.

Explanation:

Ensure that the client (like Apache JMeter or k6), Kong Gateway, and the upstream servers run on different machines (VM or bare-metal). If these are all running in a Kubernetes cluster, ensure that the pods for these three systems are scheduled on dedicated nodes. Resource contention (usually CPU and network) between these can lead to suboptimal performance of any system.
Ensure the client, Kong Gateway, and upstream servers run on the same local network with low latencies. If requests between the client and Kong Gateway or Kong Gateway and the upstream server traverse the internet, then the results will contain unnecessary noise.

Upstream servers maxing out

Action: Verify that the upstream server isn’t maxing out.

Explanation: You can verify that the upstream server isn’t maxing out by checking the CPU and memory usage of the upstream server. If you deploy additional Kong Gateway nodes and the throughput or error rate remains the same, the upstream server or a system other than Kong Gateway is likely the bottleneck.

You must also ensure that upstream servers are not autoscaled.

Client maxing out

Action: The client must use keep-alive connections.

Explanation: Sometimes, the clients (such as k6 and Apache JMeter) max themselves out. To tune them, you need to understand the client. Increasing the CPU, threads, and connections on clients results in higher resource utilization and throughput.

The client must also use keep-alive connections. For example, k6 and the HTTPClient4 implementation in Apache JMeter both enable keep-alive by default. Verify that this is set up appropriately for your test setup.

Custom plugins

Action: Ensure custom plugins aren’t interfering with performance.

Explanation: Custom plugins can sometimes cause issues with performance. First, you should determine if custom plugins are the source of the performance issues. You can do this by measuring three configuration variations:

Measure Kong Gateway’s performance without enabling any plugins. This provides a baseline for Kong Gateway’s performance.
Enable necessary bundled plugins (plugins that come with the product), and then measure Kong Gateway’s performance.
Next, enable custom plugins (in addition to bundled plugins), and then measure Kong Gateway’s performance once again.

If Kong Gateway’s baseline performance is poor, then it’s likely that either Kong Gateway’s configuration needs tuning or external factors are affecting it. For external factors, see the other sections in this guide. A large difference between the performance in the second and third steps indicates that performance problems could be due to custom plugins.

Cloud-provider performance issues

Action: Ensure you aren’t using burstable instances or hitting bandwidth, TCP connection per unit time, or PPS limits.

Explanation: While AWS is mentioned in the following, the same recommendations apply to most cloud providers:

Ensure that you are not using burstable instances, like T type instances, in AWS. In this case, the CPU available to applications is variable, which leads to noise in the stats. For more information, see the Burstable performance instances AWS documentation.
Ensure you are not hitting bandwidth limits, TCP connections per unit time limits, or Packet Per Second (PPS) limits. For more information, see the Amazon EC2 instance network bandwidth AWS documentation.

Configuration changes during benchmark tests

Action: Don’t change the Kong Gateway configuration during a benchmark test.

Explanation: If you change the configuration during a test, Kong Gateway’s tail latencies can increase sharply. Avoid doing this unless you are measuring Kong Gateway’s performance under a configuration change.

Large request and response bodies

Action: Keep request bodies below 8 KB and response bodies below 32 KB.

Explanation: Most benchmarking setups generally consist of an HTTP request with a small HTTP body and a corresponding HTTP response with a JSON or HTML response body. A request body of less than 8 KB and a response body of less than 32 KB is considered small. If your request or response bodies are larger, Kong Gateway will buffer the request and response using the disk, which significantly impacts Kong Gateway’s performance.

Bottlenecks in third-party systems

Explanation: More often than not, the bottlenecks in Kong Gateway are caused by bottlenecks in third-party systems used by Kong Gateway. The following sections explain common third-party bottlenecks and how to fix them.

Redis

Action: If you use Redis and any plugin is enabled, the CPU can cause a bottleneck. Scale Redis vertically by giving it an additional CPU.

Explanation: If you use Redis and any plugin is enabled, ensure Redis is not a bottleneck. The CPU generally creates a bottleneck for Redis, so check CPU usage first. If this is the case, scale Redis vertically by giving it an additional CPU.

DNS client

Action: Migrate to the new DNS client.

Explanation: The new DNS client is designed to be more performant than the old one, so migrating will improve performance. For more information, see the migration docs.

DNS TTL

Action: Increase <dns|resolver>_stale_ttl to 300 or up to 86400.

Explanation: DNS servers can bottleneck Kong Gateway since Kong Gateway depends on DNS to determine where to send the request.

In the case of Kubernetes, DNS TTLs are 5 seconds long and can cause problems.

You can increase dns_stale_ttl or resolver_stale_ttl, depending on the DNS client you are using, to 300 or up to 86400 to rule out DNS as the issue.

If DNS servers are the root cause, you will see coredns pods creating a bottleneck on the CPU.

Blocking I/O for access logs

Action: Disable access logs for high throughput benchmarking tests by setting the proxy_access_log configuration parameter to off.

Explanation: Kong Gateway and the underlying NGINX are programmed for non-blocking network I/O and they avoid blocking disk I/O as much as possible. However, access logs are enabled by default, and if the disk powering a Kong Gateway node is slow for any reason, it can result in performance loss. Disable access logs for high throughput benchmarking tests by setting the proxy_access_log configuration parameter to off.

Internal errors in Kong Gateway

Action: Make sure that there are no errors in Kong Gateway’s error log.

Explanation: Check Kong Gateway’s error log for internal errors. Internal errors can highlight issues within Kong Gateway or a third-party system that Kong Gateway relies on to proxy traffic.

Example kong.conf for benchmarking

The following kong.conf file examples contain all the recommended parameters from the previous sections:

kong.conf

Environment variable format

Helm chart values.yaml

If applying configuration by directly editing kong.conf, use the following:

# For a Kubernetes setup, change nginx_worker_processes to a number matching the CPU limit. We recommend 4 or 8.
nginx_worker_processes=auto

upstream_keepalive_max_requests=100000
nginx_http_keepalive_requests=100000

proxy_access_log=off

dns_stale_ttl=3600

If applying configuration through environment variables, use the following:

# For a Kubernetes setup, change nginx_worker_processes to a number matching the CPU limit. We recommend 4 or 8.
KONG_NGINX_WORKER_PROCESSES="auto"
KONG_UPSTREAM_KEEPALIVE_MAX_REQUESTS="100000"
KONG_NGINX_HTTP_KEEPALIVE_REQUESTS="100000"

KONG_PROXY_ACCESS_LOG="off"

KONG_DNS_STALE_TTL="3600"

If applying configuration through a Helm chart, use the following:

# The value of 1 for nginx_worker_processes is a suggested value. 
# Change nginx_worker_processes to a number matching the CPU limit. We recommend 4 or 8.
# Allocate the same amount of CPU and appropriate memory to avoid OOM killer.
env:
  nginx_worker_processes: "1"
  upstream_keepalive_max_requests: "100000"
  nginx_http_keepalive_requests: "100000"
  proxy_access_log: "off"
  dns_stale_ttl: "3600"

resources:
  requests:
    cpu: 1
    memory: "2Gi"

Next steps

Now that you’ve optimized the performance of Kong Gateway, you can perform additional benchmarks. Always measure, make some changes, and measure again. Maintain a log of changes to help you figure out the next steps when you get stuck or trace back to another approach.

More information

Performance testing benchmark results: See Kong’s performance testing benchmark results for the current version and learn how to use Kong’s test suite to conduct your own performance tests.