The Two-Container Pattern: Transformer + Predictor for ML Serving

Your ML model expects clean features. Your API receives raw data. Where does the preprocessing live?

Every team gets this wrong the first time. They stuff everything into one container: data validation, feature engineering, ML inference, output formatting. It works. Until it doesn’t.

The Problem with One Container

Model retrained? Rebuild the whole container. Feature logic changed? Rebuild the whole container. Need to scale inference independently? Everything scales together. Or breaks together.

One change. Full redeploy. Every time.

The Fix: Two Containers

Container	Responsibility	Owned By
Transformer	Validate inputs, engineer features, format output	ML Engineer
Predictor	Load model, run inference, return raw scores	Data Scientist

(Not the AI Transformer. This is a data transformer in KServe.)

The Full Flow

Client sends raw request to Transformer
Transformer validates and engineers features
Clean features passed to Predictor
Predictor runs inference, returns probabilities
Transformer adds business labels, formats response
Enriched response back to client

Round trip. Two containers. Neither knows the other’s internals.

Why This Changes Everything

Model retrained? Only the Predictor redeploys
Feature logic changed? Only the Transformer redeploys
Scale inference? Scale the Predictor alone
Debug preprocessing? Test the Transformer in isolation

Independent lifecycles. Independent scaling. Independent testing. One YAML file. Two containers. KServe handles routing automatically.

When to Use This Pattern

Not every model needs two containers. Here’s when the split makes sense:

Scenario	Pattern
Simple model, minimal preprocessing	Single container is fine
Complex feature engineering	Two containers
Model and preprocessing change at different rates	Two containers
Need to scale inference independently	Two containers
Multiple models share the same preprocessing	Two containers (shared Transformer)

Start simple. Split when the single container becomes a deployment bottleneck.

The DevOps Parallel

If you’ve built microservices, you already know this principle. The sidecar pattern in Kubernetes is the same idea: two containers in one pod, each with a clear responsibility.

Separation of concerns. Applied to ML serving.

This is Part 7 of the MLOps for DevOps Engineers series. Next: Scale-to-Zero for ML models.

For weekly updates, join the newsletter.

The Two-Container Pattern: Transformer + Predictor for ML Serving

The Problem with One Container

The Fix: Two Containers

The Full Flow

Why This Changes Everything

When to Use This Pattern

The DevOps Parallel

Related Articles

Multi-Model Serving on Kubernetes: 50 Models, One Cluster

5 Levels of ML Model Deployment on Kubernetes

Canary Deployments for ML Models with KServe and Istio

Enjoyed this? Get more in your inbox.

Wait! Don't miss out.

Ultimate DevOps Real-World Project Implementation on AWS

The Problem with One Container

The Fix: Two Containers

The Full Flow

Why This Changes Everything

When to Use This Pattern

The DevOps Parallel

Related Articles

Multi-Model Serving on Kubernetes: 50 Models, One Cluster

5 Levels of ML Model Deployment on Kubernetes

Canary Deployments for ML Models with KServe and Istio

Enjoyed this? Get more in your inbox.

Wait! Don't miss out.