🎉 New Course

Ultimate DevOps Real-World Project Implementation on AWS

My newest course. Real-world DevOps on AWS with production architecture.

$15.99 $84.99 81% OFF

Coupon Code

Enroll Now on Udemy
MLOps KServe Kubernetes ML Serving
2 min read 334 words

The Two-Container Pattern: Transformer + Predictor for ML Serving

Your ML model expects clean features. Your API receives raw data. The two-container pattern with KServe solves this with clear separation of concerns.

Your ML model expects clean features. Your API receives raw data. Where does the preprocessing live?

Every team gets this wrong the first time. They stuff everything into one container: data validation, feature engineering, ML inference, output formatting. It works. Until it doesn’t.

Transformer Predictor Pattern


The Problem with One Container

Model retrained? Rebuild the whole container. Feature logic changed? Rebuild the whole container. Need to scale inference independently? Everything scales together. Or breaks together.

One change. Full redeploy. Every time.


The Fix: Two Containers

ContainerResponsibilityOwned By
TransformerValidate inputs, engineer features, format outputML Engineer
PredictorLoad model, run inference, return raw scoresData Scientist

(Not the AI Transformer. This is a data transformer in KServe.)


The Full Flow

  1. Client sends raw request to Transformer
  2. Transformer validates and engineers features
  3. Clean features passed to Predictor
  4. Predictor runs inference, returns probabilities
  5. Transformer adds business labels, formats response
  6. Enriched response back to client

Round trip. Two containers. Neither knows the other’s internals.


Why This Changes Everything

  • Model retrained? Only the Predictor redeploys
  • Feature logic changed? Only the Transformer redeploys
  • Scale inference? Scale the Predictor alone
  • Debug preprocessing? Test the Transformer in isolation

Independent lifecycles. Independent scaling. Independent testing. One YAML file. Two containers. KServe handles routing automatically.


When to Use This Pattern

Not every model needs two containers. Here’s when the split makes sense:

ScenarioPattern
Simple model, minimal preprocessingSingle container is fine
Complex feature engineeringTwo containers
Model and preprocessing change at different ratesTwo containers
Need to scale inference independentlyTwo containers
Multiple models share the same preprocessingTwo containers (shared Transformer)

Start simple. Split when the single container becomes a deployment bottleneck.


The DevOps Parallel

If you’ve built microservices, you already know this principle. The sidecar pattern in Kubernetes is the same idea: two containers in one pod, each with a clear responsibility.

Separation of concerns. Applied to ML serving.


This is Part 7 of the MLOps for DevOps Engineers series. Next: Scale-to-Zero for ML models.

For weekly updates, join the newsletter.

Share this article
K
Kalyan Reddy Daida

Instructor with 383,000+ students across 21 courses on AWS, Azure, GCP, Terraform, Kubernetes & DevOps. Sharing real-world patterns from production environments.

Enjoyed this? Get more in your inbox.

Weekly DevOps & Cloud insights from a 383K+ Udemy instructor