Back to: AWS-Basics-Advanced
🎬 The Situation
It’s Monday.
Your production system is running smoothly.
-
10,000 users active
-
Payments processing
-
APIs responding
-
Monitoring green
Now management says:
“Upgrade the infrastructure today.”
You need to:
-
Upgrade EC2 instance type (t2 → t3)
-
Apply OS security patches
-
Update launch template
-
Improve scaling configuration
-
Harden security groups
And the scary part?
👉 You cannot bring the system down.
This is where professional infrastructure strategy begins.
❌ What Juniors Do
-
Modify production EC2
-
Restart instances
-
Update AMI directly
-
Hope nothing breaks
Result?
-
Downtime
-
Panic
-
Rollback chaos
✅ What Architects Do
They never touch production directly.
They create a parallel world.
Welcome to:
🔵🟢 Blue/Green Infrastructure Deployment
🏗 The Concept
| Environment | Meaning |
|---|---|
| 🔵 Blue | Current production |
| 🟢 Green | New upgraded infrastructure |
Instead of upgrading live infra:
-
Clone entire infrastructure
-
Upgrade in isolation
-
Test completely
-
Shift traffic gradually
-
Remove old infra later
No downtime.
No panic.
No surprises.
🌍 Visual Architecture
🟦 Phase 1 – Blue (Current Production)
Users
↓
Route53
↓
Blue ALB
↓
Blue EC2 / ASG / Cluster
Traffic:
| Blue | 100% |
| Green | 0% |
Everything stable
🟩 Phase 2 – Build Green Infrastructure (The Smart Way)
Now we build a completely new infrastructure:
-
New EC2 instances (upgraded type)
-
New Auto Scaling Group
-
New Launch Template
-
New Target Group
-
New ALB
-
Same ACM certificate
-
Same domain
Think of Green as:
“Production 2.0”
Important rule:
⚠ Never reuse old infrastructure
⚠ Never modify Blue
⚠ Green must be independent
🔐 Why Use the Same Certificate?
Because:
-
Domain remains the same
-
Users should not see certificate change
-
No trust warning
-
Seamless switch
Attach same ACM certificate to Green ALB.
🧪 Phase 3 – Deep Testing
Before touching traffic:
Test internally:
Check:
-
Health checks passing?
-
CPU stable?
-
Memory normal?
-
Logs clean?
-
Auto scaling working?
Green must be 100% healthy before traffic shift.
⚖ Phase 4 – Controlled Traffic Switching (Route53 Weighted Routing)
Now comes the magic.
Route53 allows weighted routing.
Instead of instant switch, we control traffic percentage.
Initial State
| Blue | 100 |
| Green | 0 |
No change yet.
🟡 Canary Phase (Gradual Infra Exposure)
Start with:
| Blue | 90 |
| Green | 10 |
Now:
-
10% users hit Green
-
90% still safe on Blue
Observe:
-
Latency
-
Error rate
-
CPU load
-
Logs
-
DB performance
If stable…
Increase:
| Blue | 50 |
| Green | 50 |
Then finally:
| Blue | 0 |
| Green | 100 |
Green becomes production.
No downtime.
Users don’t even notice.
🔥 Why This Is Powerful
Because:
-
Both infrastructures run simultaneously
-
No service interruption
-
Rollback is instant
-
DNS controls traffic
-
No instance restarts required
🔁 Rollback Plan (The Real Safety Net)
If something goes wrong:
Immediately change weight:
| Blue | 100 |
| Green | 0 |
Traffic instantly flows back.
Production saved.
This is what separates professionals from beginners.