Blog

DevOps tutorials, Kubernetes guides, Terraform tips, cost optimization strategies, and cloud career advice from a 383K+ student instructor.

EKS AWS

5 Things I Wish I Knew Before Running EKS in Production

Running Amazon EKS in a tutorial and running it in production are two very different experiences. After deploying a 5-microservice retail store application with real AWS services, here are the five lessons that would have saved me time, money, and plenty of late-night debugging sessions.

1. Cluster Autoscaler Doesn’t Consolidate Nodes

Cluster Autoscaler only removes empty nodes. If a node is running a single tiny pod at 10% utilization, it stays — and you keep paying for it.

Feb 26, 2026 · 4 min read
OpenTelemetry EKS

Building a Complete Observability Stack for EKS with OpenTelemetry and ADOT

Most Kubernetes observability setups are incomplete. Teams install Prometheus, wire up a few dashboards, and call it done. Then a production incident hits and they’re grepping through logs at 3 AM, trying to find a needle in a haystack.

The problem isn’t the tooling — it’s the approach. You need all three observability pillars working together: Traces, Logs, and Metrics. Here’s how I built a complete stack on EKS using AWS Distro for OpenTelemetry (ADOT).

Feb 26, 2026 · 4 min read
Spot Instances EKS

How to Handle Spot Instance Interruptions on EKS with Zero Downtime

“Spot instances are too risky for production.”

That’s the most common objection I hear from DevOps engineers. And it’s wrong. With the right architecture, you can run production workloads on Spot instances with 70% cost savings and zero downtime during interruptions. Here’s exactly how.

The Fear (and Why It’s Overblown)

The concern is legitimate on the surface: AWS can reclaim a Spot instance with just 2 minutes of notice. Without preparation, your pods get terminated, requests fail, and users see errors.

Feb 26, 2026 · 4 min read
Terraform AWS

5 Terraform Mistakes That Cost You Money on AWS

If you’ve been running Terraform on AWS for any length of time, chances are your infrastructure has a few hidden cost leaks. I’ve seen these patterns across hundreds of student projects and enterprise environments. Here are the five most common Terraform mistakes that silently drain your AWS budget — and how to fix each one.

1. Not Setting instance_type Defaults Wisely

Many engineers copy-paste t3.large or m5.xlarge from tutorials without right-sizing. In Terraform, you should use variables with sensible defaults:

Feb 25, 2026 · 3 min read