HTTP Scaling with Kubernetes Gateway API

This guide demonstrates how to scale applications exposed through the Kubernetes Gateway API based on HTTP traffic. You’ll deploy a sample application, configure the necessary Gateway API resources (Gateway, HTTPRoute), deploy a KEDA ScaledObject, and observe how Kedify automatically manages traffic routing for efficient load-based scaling—including scale-to-zero when there’s no demand.

Architecture Overview

For applications exposed via the Gateway API, Kedify utilizes its autowiring feature. When using the kedify-http scaler with Gateway API resources, traffic flows similarly to other ingress methods, with Kedify intercepting traffic for scaling purposes:

Gateway -> HTTPRoute -> kedify-proxy -> Service -> Deployment

The kedify-proxy intercepts traffic directed by the HTTPRoute, collects metrics based on hosts and paths defined in the ScaledObject, and enables informed scaling decisions. When traffic increases, Kedify scales your application up; when traffic decreases, it scales down—even to zero if configured. Kedify automatically modifies the HTTPRoute backend references to point to the kedify-proxy.

Prerequisites

A running Kubernetes cluster (local or cloud-based).

The kubectl command line utility installed and accessible.
Connect your cluster in the Kedify Dashboard.
- If you do not have a connected cluster, you can find more information in the installation documentation.
Install hey to send load to a web application.

Step 1: Install and Configure Envoy Gateway

Install Envoy Gateway:

helm upgrade --install eg oci://docker.io/envoyproxy/gateway-helm --version v1.1.0 -n envoy-gateway-system --create-namespace

Configure Envoy Gateway:

kubectl wait --for=condition=Available --namespace envoy-gateway-system deployment/envoy-gateway --timeout=5m

Create GatewayClass:

kubectl apply -f gateway-class.yaml

The GatewayClass YAML:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller

Create Gateway:

kubectl apply -f gateway.yaml

The Gateway YAML:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: envoy-gateway-system
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All

Step 2: Deploy Application and Gateway API Resources

Deploy the sample application, its Service, a Gateway, and an HTTPRoute to your cluster:

kubectl apply -f application.yaml

The combined application YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: application
spec:
  replicas: 1
  selector:
    matchLabels:
      app: application
  template:
    metadata:
      labels:
        app: application
    spec:
      containers:
        - name: application
          image: ghcr.io/kedify/sample-http-server:latest
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: RESPONSE_DELAY
              value: '0.3'
---
apiVersion: v1
kind: Service
metadata:
  name: application-service
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: http
  selector:
    app: application
  type: ClusterIP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: application-httproute
spec:
  parentRefs:
    - name: eg
      namespace: envoy-gateway-system
  hostnames:
    - 'application.keda' # The hostname for accessing the application
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: application-service # Forwards traffic to the application Service
          port: 8080

Deployment: Defines the simple Go-based HTTP server application.
Service: Provides internal routing to the application Pods.
HTTPRoute: Defines routing rules. It attaches to the Envoy Gateway (eg) in the envoy-gateway-system namespace, matches requests for the host application.keda, and forwards traffic to the application-service on port 8080. Kedify will automatically update the backendRefs of this resource to point to the kedify-proxy when autowiring is active.

Step 3: Apply ScaledObject to Autoscale

Now, apply the following ScaledObject to enable autoscaling based on HTTP traffic routed via the Gateway API:

kubectl apply -f scaledobject.yaml

The ScaledObject YAML:

kind: ScaledObject
apiVersion: keda.sh/v1alpha1
metadata:
  name: application
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: application
  cooldownPeriod: 5
  minReplicaCount: 0 # Enable scale-to-zero
  maxReplicaCount: 10
  fallback:
    failureThreshold: 2
    replicas: 1
  advanced:
    restoreToOriginalReplicaCount: true
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 5
  triggers:
    - type: kedify-http
      metadata:
        hosts: application.keda # Must match the hostname in HTTPRoute
        service: application-service # The backend service name
        port: '8080' # The backend service port
        scalingMetric: requestRate
        targetValue: '1000' # Target requests per second per replica
        granularity: 1s
        window: 10s
        trafficAutowire: httproute # Explicitly enable autowiring for HTTPRoute

type (kedify-http): Specifies the Kedify HTTP scaler.
metadata.hosts (application.keda): The hostname defined in the HTTPRoute to monitor for traffic.
metadata.service (application-service): The Kubernetes Service associated with the application deployment.
metadata.port (8080): The port on the service to monitor.
metadata.scalingMetric (requestRate): The metric used for scaling decisions.
metadata.targetValue (1000): Target request rate. KEDA scales out when the rate per replica exceeds this value.
metadata.trafficAutowire (httproute): This explicitly enables Kedify’s autowiring feature for HTTPRoute resources. Kedify will manage the backendRefs in the corresponding HTTPRoute to route traffic via the kedify-proxy.

You should see the ScaledObject appear in the Kedify Dashboard:

Kedify Dashboard With ScaledObject

Step 4: Test Autoscaling

First, let’s verify that the application is accessible through the Gateway:

curl -I -H "Host: application.keda" http://localhost:9080

If everything is correctly configured, you should receive a successful HTTP response:

HTTP/1.1 200 OK
content-length: 320
content-type: text/html
date: Tue, 29 Apr 2025 08:55:40 GMT
x-keda-http-cold-start: true
x-envoy-upstream-service-time: 6104
server: envoy

Now, let’s simulate higher load using hey:

hey -n 10000 -c 150 -host "application.keda" http://localhost:9080

After sending the load, you’ll see a response time histogram in the terminal:

Response time histogram:
  0.301 [1]      |
  0.310 [2746]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.319 [3499]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.327 [2694]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.336 [683]    |■■■■■■■■
  0.345 [99]    |■
  0.354 [20]    |
  0.362 [1]      |
  0.371 [23]    |
  0.380 [37]    |
  0.389 [80]    |■

In the Kedify Dashboard, you can also observe the traffic load and resulting scaling:

Kedify Dashboard ScaledObject Detail

Next steps

You can explore the complete documentation of the HTTP Scaler for more advanced configurations and details about its architecture and features, including autowiring for various ingress types.