GPU on StackSimplify | DevOps & Cloud Education by Kalyan Reddy

GPU on StackSimplify | DevOps & Cloud Education by Kalyan Reddyhttps://stacksimplify.com/tags/gpu/Recent content in GPU on StackSimplify | DevOps & Cloud Education by Kalyan ReddyHugo -- gohugo.ioen-usFri, 17 Apr 2026 00:00:00 +0000GPU Scheduling on Kubernetes: MIG, Time-Slicing, and Node Poolshttps://stacksimplify.com/blog/gpu-scheduling-kubernetes-ml/Fri, 17 Apr 2026 00:00:00 +0000https://stacksimplify.com/blog/gpu-scheduling-kubernetes-ml/One NVIDIA A100 GPU costs $3 per hour on AWS. Your inference pod uses 12% of it. The other 88% sits idle, billed, and wasted. Kubernetes schedules GPUs as whole devices by default. One pod gets one GPU. No sharing. No slicing. Massive waste for inference workloads. The Problem: One GPU, One Pod A fraud detection model needs 2GB of GPU memory and runs a few requests per second. The node has an A100 with 40GB.