🎉 New Course

Ultimate DevOps Real-World Project Implementation on AWS

My newest course. Real-world DevOps on AWS with production architecture.

$15.99 $84.99 81% OFF

Coupon Code

Enroll Now on Udemy
MLOps A/B Testing KServe Istio
2 min read 254 words

A/B Testing for ML Models: When Offline Metrics Lie

You retrained the model. Accuracy went up 2% on the test set. Revenue dropped 5%. Here is why you need A/B testing for ML models.

You retrained the model. Accuracy went up 2% on the test set. You deployed it. Revenue dropped 5%.

What happened? Offline metrics lie. A model that scores better on historical data can score worse on real users.

A/B Testing for ML Models


Canary vs A/B Testing

ApproachQuestion It AnswersTraffic Split
Canary“Does it break anything?”10-20% to new model
A/B Testing“Does it actually improve outcomes?”50/50 to both models

You need both. Canary first, then A/B.


How to Split Traffic

KServe + Istio makes this simple:

  1. Deploy both models behind the same endpoint
  2. Set traffic split (50/50 for A/B)
  3. Tag requests so you can trace which model served which prediction
  4. Log everything: predictions, latency, and downstream outcomes

The split happens at the infrastructure level. Your application code doesn’t change.


What to Measure

Technical metrics alone are not enough. You need business metrics.

TypeMetrics
TechnicalAccuracy, precision, recall, F1, latency
BusinessRevenue per user, click-through rate, conversion rate, churn

A model with 2% higher accuracy but 5% lower conversion rate is a worse model. Period.

Run the test long enough to reach statistical significance. Deciding too early is the number one A/B testing mistake.


When A/B Testing Is Overkill

ScenarioUse A/B?
Model directly impacts revenueYes
Enough traffic for significance in daysYes
Internal batch predictionsNo. Compare offline metrics
Low-traffic endpointsNo. Won’t reach significance

Start with canary. Graduate to A/B when the business impact justifies it.


This is Part 13 of the MLOps for DevOps Engineers series. For weekly updates, join the newsletter.

Share this article
K
Kalyan Reddy Daida

Instructor with 383,000+ students across 21 courses on AWS, Azure, GCP, Terraform, Kubernetes & DevOps. Sharing real-world patterns from production environments.

Enjoyed this? Get more in your inbox.

Weekly DevOps & Cloud insights from a 383K+ Udemy instructor