Skip to content

HTTP Scaling with Kubernetes Gateway API

This guide demonstrates how to scale applications exposed through the Kubernetes Gateway API based on HTTP traffic. You’ll deploy a sample application, configure the necessary Gateway API resources (Gateway, HTTPRoute), deploy a KEDA ScaledObject, and observe how Kedify automatically manages traffic routing for efficient load-based scaling—including scale-to-zero when there’s no demand.

Architecture Overview

For applications exposed via the Gateway API, Kedify utilizes its autowiring feature. When using the kedify-http scaler with Gateway API resources, traffic flows similarly to other ingress methods, with Kedify intercepting traffic for scaling purposes:

Gateway -> HTTPRoute -> kedify-proxy -> Service -> Deployment

The kedify-proxy intercepts traffic directed by the HTTPRoute, collects metrics based on hosts and paths defined in the ScaledObject, and enables informed scaling decisions. When traffic increases, Kedify scales your application up; when traffic decreases, it scales down—even to zero if configured. Kedify automatically modifies the HTTPRoute backend references to point to the kedify-proxy.

Prerequisites

  • A running Kubernetes cluster (local or cloud-based).
  • The kubectl command line utility installed and accessible.
  • Connect your cluster in the Kedify Dashboard.
  • Install hey to send load to a web application.

Step 1: Install and Configure Envoy Gateway

Install Envoy Gateway:

Terminal window
helm upgrade --install eg oci://docker.io/envoyproxy/gateway-helm --version v1.1.0 -n envoy-gateway-system --create-namespace

Configure Envoy Gateway:

Terminal window
kubectl wait --for=condition=Available --namespace envoy-gateway-system deployment/envoy-gateway --timeout=5m

Create GatewayClass:

Terminal window
kubectl apply -f gateway-class.yaml

The GatewayClass YAML:

gateway-class.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller

Create Gateway:

Terminal window
kubectl apply -f gateway.yaml

The Gateway YAML:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: envoy-gateway-system
spec:
gatewayClassName: eg
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All

Step 2: Deploy Application and Gateway API Resources

Deploy the sample application, its Service, a Gateway, and an HTTPRoute to your cluster:

Terminal window
kubectl apply -f application.yaml

The combined application YAML:

application.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: application
spec:
replicas: 1
selector:
matchLabels:
app: application
template:
metadata:
labels:
app: application
spec:
containers:
- name: application
image: ghcr.io/kedify/sample-http-server:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: RESPONSE_DELAY
value: '0.3'
---
apiVersion: v1
kind: Service
metadata:
name: application-service
spec:
ports:
- name: http
protocol: TCP
port: 8080
targetPort: http
selector:
app: application
type: ClusterIP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: application-httproute
spec:
parentRefs:
- name: eg
namespace: envoy-gateway-system
hostnames:
- 'application.keda' # The hostname for accessing the application
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: application-service # Forwards traffic to the application Service
port: 8080
  • Deployment: Defines the simple Go-based HTTP server application.
  • Service: Provides internal routing to the application Pods.
  • HTTPRoute: Defines routing rules. It attaches to the Envoy Gateway (eg) in the envoy-gateway-system namespace, matches requests for the host application.keda, and forwards traffic to the application-service on port 8080. Kedify will automatically update the backendRefs of this resource to point to the kedify-proxy when autowiring is active.

Step 3: Apply ScaledObject to Autoscale

Now, apply the following ScaledObject to enable autoscaling based on HTTP traffic routed via the Gateway API:

Terminal window
kubectl apply -f scaledobject.yaml

The ScaledObject YAML:

scaledobject.yaml
kind: ScaledObject
apiVersion: keda.sh/v1alpha1
metadata:
name: application
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: application
cooldownPeriod: 5
minReplicaCount: 0 # Enable scale-to-zero
maxReplicaCount: 10
fallback:
failureThreshold: 2
replicas: 1
advanced:
restoreToOriginalReplicaCount: true
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 5
triggers:
- type: kedify-http
metadata:
hosts: application.keda # Must match the hostname in HTTPRoute
service: application-service # The backend service name
port: '8080' # The backend service port
scalingMetric: requestRate
targetValue: '1000' # Target requests per second per replica
granularity: 1s
window: 10s
trafficAutowire: httproute # Explicitly enable autowiring for HTTPRoute
  • type (kedify-http): Specifies the Kedify HTTP scaler.
  • metadata.hosts (application.keda): The hostname defined in the HTTPRoute to monitor for traffic.
  • metadata.service (application-service): The Kubernetes Service associated with the application deployment.
  • metadata.port (8080): The port on the service to monitor.
  • metadata.scalingMetric (requestRate): The metric used for scaling decisions.
  • metadata.targetValue (1000): Target request rate. KEDA scales out when the rate per replica exceeds this value.
  • metadata.trafficAutowire (httproute): This explicitly enables Kedify’s autowiring feature for HTTPRoute resources. Kedify will manage the backendRefs in the corresponding HTTPRoute to route traffic via the kedify-proxy.

You should see the ScaledObject appear in the Kedify Dashboard:

Kedify Dashboard With ScaledObject

Step 4: Test Autoscaling

First, let’s verify that the application is accessible through the Gateway:

Terminal window
curl -I -H "Host: application.keda" http://localhost:9080

If everything is correctly configured, you should receive a successful HTTP response:

Terminal window
HTTP/1.1 200 OK
content-length: 320
content-type: text/html
date: Tue, 29 Apr 2025 08:55:40 GMT
x-keda-http-cold-start: true
x-envoy-upstream-service-time: 6104
server: envoy

Now, let’s simulate higher load using hey:

Terminal window
hey -n 10000 -c 150 -host "application.keda" http://localhost:9080

After sending the load, you’ll see a response time histogram in the terminal:

Terminal window
Response time histogram:
0.301 [1] |
0.310 [2746] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.319 [3499] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.327 [2694] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.336 [683] |■■■■■■■■
0.345 [99] |
0.354 [20] |
0.362 [1] |
0.371 [23] |
0.380 [37] |
0.389 [80] |

In the Kedify Dashboard, you can also observe the traffic load and resulting scaling:

Kedify Dashboard ScaledObject Detail

Next steps

You can explore the complete documentation of the HTTP Scaler for more advanced configurations and details about its architecture and features, including autowiring for various ingress types.