Kubernetes for Generative AI Solutions
A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes
Master the complete GenAI project lifecycle on Kubernetes from design and optimization to deployment. This practical guide covers everything from setting up K8s clusters to scaling GenAI workloads in production, including model optimization, GPU efficiency, observability, security, and cost management.
Latest Posts
View all postsReducing LLM Cold-Start Times on Amazon EKS: A Benchmark of Eight Model Loading Strategies
Benchmarks eight strategies for loading LLM weights on EKS, comparing cold-start time, throughput, and cost.
January 15, 2024Cross EKS Cluster Execution of Argo Workflows
Run argo-workflow-controller in a hub EKS cluster and execute workflows in a spoke cluster across AWS accounts.
January 10, 2024Configure MostAllocated Scheduler Strategy in Amazon EKS
Create a custom kube-scheduler with MostAllocated strategy for efficient node binpacking.

