MLOps for DevOps Engineers
A 25-part series bridging DevOps skills to MLOps. Same mindset, different artifacts.
DevOps Thinking Applied to MLOps: 5 Essential Tools
You already know 80% of MLOps. Here are 5 open-source tools that map directly to your existing DevOps skills.
MLflow in 60 Seconds: The Complete ML Model Lifecycle
From training to production in 5 steps. How MLflow tracks experiments, versions models, and enables instant rollbacks with zero code changes.
5 Questions to Ask Before Every ML Model Deployment
A data scientist hands you a model.pkl. Before deploying, ask these 5 production-ready questions every DevOps engineer should know.
DVC: Git for Your ML Training Data
You version code with Git. DVC does the same for ML training data. Here is your weekend starter guide to data version control.
5 Levels of ML Model Deployment on Kubernetes
From baked Docker images to explainable AI. Each level adds production capabilities. Here is the progression every DevOps engineer should know.
Canary Deployments for ML Models with KServe and Istio
You do canary deployments for APIs. Why not for ML models? Here is how KServe and Istio split traffic between champion and candidate models.
The Two-Container Pattern: Transformer + Predictor for ML Serving
Your ML model expects clean features. Your API receives raw data. The two-container pattern with KServe solves this with clear separation of concerns.
Scale-to-Zero for ML Models: Stop Paying for Idle Compute
Your ML model runs 24/7. Inference requests come 2% of the time. KServe plus Knative scales to zero when idle. Here is how.
SHAP Explainability: Why Your ML Model Flagged That Transaction
GDPR requires explanations for automated decisions. SHAP values tell you exactly why your model made each prediction. Here is how KServe serves explanations.
ML Model Monitoring: Your Grafana Dashboard Is Lying to You
Your model uses 10% CPU, zero errors, healthy pod status. And still returns garbage predictions. Here are the 3 alerts you need today.
Data Drift Detection: When Your Model Stops Being Right
Your model was trained on last year's data. The world moved on. Here are the 3 types of drift and how to detect them with Evidently AI.
ML Retraining Pipelines: From Drift Alert to Production Model
Your drift detector triggered. Now what? Here is the retraining pipeline every MLOps team needs, with quality gates to prevent deploying garbage.
A/B Testing for ML Models: When Offline Metrics Lie
You retrained the model. Accuracy went up 2% on the test set. Revenue dropped 5%. Here is why you need A/B testing for ML models.
ML Pipeline Orchestration with Kubeflow on Kubernetes
Your ML team has 47 Jupyter notebooks. 12 should run in order. Nobody remembers which 12. Kubeflow Pipelines fixes this on your existing K8s cluster.
Feature Stores: The Package Registry for ML Features
Your training pipeline computes 'average amount' as 30-day mean. Your API computes it as 7-day mean. Same name, different values. Feature stores fix this.
ML Governance: The Champion-Challenger Pattern for Model Deployment
Your ML serving code should never know version numbers. The champion-challenger pattern with MLflow aliases gives instant rollback.
CI/CD for ML: Same GitHub Actions, Different Artifact
Your CI/CD pipeline deploys code. Ours deploys models. Same tools: GitHub Actions, ArgoCD, Docker, DVC, MLflow. Here is the 7-job ML pipeline.
ML Cost Optimization: One YAML Field Cut Our Bill by 80%
We changed minReplicas from 1 to 0. Infrastructure cost dropped 80%. Here is how KPA, scale-to-zero, and panic mode work for ML inference.
Coming Soon
This part is being written. Stay tuned.
Coming Soon
This part is being written. Stay tuned.
Coming Soon
This part is being written. Stay tuned.
Coming Soon
This part is being written. Stay tuned.
Coming Soon
This part is being written. Stay tuned.
Coming Soon
This part is being written. Stay tuned.
Coming Soon
This part is being written. Stay tuned.
Get notified when new parts drop
Weekly DevOps & Cloud insights from a 383K+ Udemy instructor