Kubernetes
ML Security on Kubernetes: 4 Layers Protecting Your Models
Your model endpoint has no auth. Anyone with the URL gets predictions. That is the default on most KServe deployments. Here are the 4 layers that fix it.
GPU Scheduling on Kubernetes: MIG, Time-Slicing, and Node Pools
One A100 GPU costs $3/hour. Your model uses 12% of it. Here is how GPU sharing on Kubernetes cuts ML infrastructure bills by 60% or more.
Batch vs Real-Time ML Inference: 90% of Predictions Can Be Batch
Your model runs in real-time. 90% of your predictions do not need to. Here is the decision framework and the cost math showing 99.5% savings.
5 Levels of ML Model Deployment on Kubernetes
From baked Docker images to explainable AI. Each level adds production capabilities. Here is the progression every DevOps engineer should know.
Canary Deployments for ML Models with KServe and Istio
You do canary deployments for APIs. Why not for ML models? Here is how KServe and Istio split traffic between champion and candidate models.
DevOps Thinking Applied to MLOps: 5 Essential Tools
You already know 80% of MLOps. Here are 5 open-source tools that map directly to your existing DevOps skills.
ML Cost Optimization: One YAML Field Cut Our Bill by 80%
We changed minReplicas from 1 to 0. Infrastructure cost dropped 80%. Here is how KPA, scale-to-zero, and panic mode work for ML inference.
ML Pipeline Orchestration with Kubeflow on Kubernetes
Your ML team has 47 Jupyter notebooks. 12 should run in order. Nobody remembers which 12. Kubeflow Pipelines fixes this on your existing K8s cluster.
Scale-to-Zero for ML Models: Stop Paying for Idle Compute
Your ML model runs 24/7. Inference requests come 2% of the time. KServe plus Knative scales to zero when idle. Here is how.
The Two-Container Pattern: Transformer + Predictor for ML Serving
Your ML model expects clean features. Your API receives raw data. The two-container pattern with KServe solves this with clear separation of concerns.
5 Things I Wish I Knew Before Running EKS in Production
Hard-won lessons from running Amazon EKS in production — from Karpenter node consolidation to OpenTelemetry observability and real AWS database integrations.