<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GPU on StackSimplify | DevOps &amp; Cloud Education by Kalyan Reddy</title><link>https://stacksimplify.com/tags/gpu/</link><description>Recent content in GPU on StackSimplify | DevOps &amp; Cloud Education by Kalyan Reddy</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Fri, 17 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://stacksimplify.com/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>GPU Scheduling on Kubernetes: MIG, Time-Slicing, and Node Pools</title><link>https://stacksimplify.com/blog/gpu-scheduling-kubernetes-ml/</link><pubDate>Fri, 17 Apr 2026 00:00:00 +0000</pubDate><guid>https://stacksimplify.com/blog/gpu-scheduling-kubernetes-ml/</guid><description>One NVIDIA A100 GPU costs $3 per hour on AWS. Your inference pod uses 12% of it. The other 88% sits idle, billed, and wasted.
Kubernetes schedules GPUs as whole devices by default. One pod gets one GPU. No sharing. No slicing. Massive waste for inference workloads.
The Problem: One GPU, One Pod A fraud detection model needs 2GB of GPU memory and runs a few requests per second. The node has an A100 with 40GB.</description></item></channel></rss>