Mastering Kubernetes Autoscaling for AI and Real-Time Traffic

Speaker: Zbynek Roubalik Event: DevOpsCon Munich 2025

December 02, 2025

Autoscaling Kubernetes for real-time traffic, complex workloads, and AI/LLM applications introduces unique performance and scalability challenges. This session focuses on practical methods to scale efficiently for latency-sensitive scenarios and resource-intensive AI-driven tasks.

We’ll explore Kubernetes Event-Driven Autoscaling (KEDA) strategies tailored specifically for dynamic real-time traffic, custom metrics for AI workloads, and considerations for managing complex services. Additionally, we’ll address the trade-offs involved, pitfalls to avoid, and illustrate best practices through real-world examples.

What You’ll Learn

Real-time traffic autoscaling: Strategies for handling dynamic, latency-sensitive workloads
AI/LLM autoscaling patterns: Custom metrics and approaches for resource-intensive AI applications
KEDA for complex services: Advanced event-driven scaling techniques for sophisticated workloads
Performance optimization: Balancing responsiveness with resource efficiency
Trade-offs and pitfalls: Common mistakes and how to avoid them in production environments
Cost control strategies: Optimizing autoscaling for both performance and budget constraints

Join us for an insightful look at building robust autoscaling strategies optimized for real-time responsiveness, AI efficiency, and cost control in Kubernetes.

Reduce cloud costs and complexity.

Start autoscaling with KEDA today.

Get Started