5 Questions to Ask Before Every ML Model Deployment
A data scientist hands you a model.pkl. Before deploying, ask these 5 production-ready questions every DevOps engineer should know.
A data scientist hands you a model.pkl and says “deploy this.”
What do you ask?
Most engineers jump straight to containers and endpoints. But the questions that save you at 2 AM are the ones you ask before deployment, not during an incident.

The Checklist
| # | Question | Why It Matters |
|---|---|---|
| 1 | What input will break it? | Models return garbage confidently on bad input |
| 2 | What’s the rollback plan? | “Redeploy the old one” is not a plan |
| 3 | How do we know it’s broken? | ML models fail silently with HTTP 200 |
| 4 | What versions are pinned? | scikit-learn 1.3 vs 1.5 = model won’t load |
| 5 | Who gets paged at 2 AM? | Define ownership before production |
1. What Input Will Break It?
Missing fields? Nulls? Negative values where the model expects positive?
The dangerous thing about ML models: they don’t throw errors on bad input. They return predictions. Confident, wrong predictions. Your API returns HTTP 200 and the fraud detector happily approves a fraudulent transaction.
Fix: Validate inputs before they hit the model. Schema validation, range checks, null checks. The same input validation you’d put on any API endpoint.
2. What’s the Rollback Plan?
“We’ll just redeploy the previous version” sounds reasonable until you’re doing it at 2 AM with production traffic flowing.
Fix: Keep the last working version warm and ready. Test the switch before you need it. With MLflow’s alias system, this is as simple as moving the @champion alias back to the previous version. Zero redeployment.
3. How Do We Know It’s Broken?
This is the one that gets people. Traditional monitoring says everything is fine: latency is low, error rate is zero, CPU is normal.
But the model is returning garbage predictions. ML models fail silently. HTTP 200, valid JSON, completely wrong answers.
Fix: Monitor what the model outputs, not just that it responds. Track prediction distributions, confidence scores, and input data drift. If the model suddenly starts predicting 95% “not fraud” when the historical baseline is 70%, something is wrong.
4. What Versions Are Pinned?
scikit-learn 1.3 vs 1.5? Your model won’t even load. Same training code, different Python version? Different predictions. Same model, different numpy? Slightly different results.
Fix: Pin everything. Freeze the environment. Use Docker containers with exact dependency versions. No “latest” tags. No implicit upgrades. What you trained on is what you serve on.
| |
5. Who Gets Paged at 2 AM?
The data scientist built the model. You deployed it. The model starts drifting, predictions go bad.
Who owns it?
Fix: Define this before production. Create a shared runbook: if the model’s accuracy drops below X, who investigates? If the data pipeline is late, who gets paged? The answer might be different for infrastructure issues vs model quality issues.
ML models are just another artifact to operate. Same production questions. Different file type.
Key Takeaways
- Validate inputs before they reach the model
- Keep previous versions warm for instant rollback
- Monitor model outputs, not just service health
- Pin all dependencies in Docker containers
- Define ownership between data science and DevOps teams
These aren’t ML questions. They’re the same production readiness questions any DevOps engineer asks for any service. The artifact is different, the discipline is the same.
This is Part 3 of the MLOps for DevOps Engineers series. Next up: DVC for Data Version Control.
For weekly MLOps and DevOps tips, join the newsletter.