Back to: AWS-Basics-Advanced

🎬 The Situation

It’s Monday.
Your production system is running smoothly.

10,000 users active
Payments processing
APIs responding
Monitoring green

Now management says:

“Upgrade the infrastructure today.”

You need to:

Upgrade EC2 instance type (t2 → t3)
Apply OS security patches
Update launch template
Improve scaling configuration
Harden security groups

And the scary part?

👉 You cannot bring the system down.

This is where professional infrastructure strategy begins.

❌ What Juniors Do

Modify production EC2
Restart instances
Update AMI directly
Hope nothing breaks

Result?

Downtime
Panic
Rollback chaos

✅ What Architects Do

They never touch production directly.

They create a parallel world.

Welcome to:

🔵🟢 Blue/Green Infrastructure Deployment

🏗 The Concept

Environment	Meaning
🔵 Blue	Current production
🟢 Green	New upgraded infrastructure

Instead of upgrading live infra:

Clone entire infrastructure
Upgrade in isolation
Test completely
Shift traffic gradually
Remove old infra later

No downtime.
No panic.
No surprises.

🌍 Visual Architecture

🟦 Phase 1 – Blue (Current Production)

Users
↓
Route53
↓
Blue ALB
↓
Blue EC2 / ASG / Cluster

Traffic:

| Blue | 100% |
| Green | 0% |

Everything stable

🟩 Phase 2 – Build Green Infrastructure (The Smart Way)

Now we build a completely new infrastructure:

New EC2 instances (upgraded type)
New Auto Scaling Group
New Launch Template
New Target Group
New ALB
Same ACM certificate
Same domain

Think of Green as:

“Production 2.0”

Important rule:

⚠ Never reuse old infrastructure
⚠ Never modify Blue
⚠ Green must be independent

🔐 Why Use the Same Certificate?

Because:

Domain remains the same
Users should not see certificate change
No trust warning
Seamless switch

Attach same ACM certificate to Green ALB.

🧪 Phase 3 – Deep Testing

Before touching traffic:

Test internally:

Check:

Health checks passing?
CPU stable?
Memory normal?
Logs clean?
Auto scaling working?

Green must be 100% healthy before traffic shift.

⚖ Phase 4 – Controlled Traffic Switching (Route53 Weighted Routing)

Now comes the magic.

Route53 allows weighted routing.

Instead of instant switch, we control traffic percentage.

Initial State

| Blue | 100 |
| Green | 0 |

No change yet.

🟡 Canary Phase (Gradual Infra Exposure)

Start with:

| Blue | 90 |
| Green | 10 |

Now:

10% users hit Green
90% still safe on Blue

Observe:

Latency
Error rate
CPU load
Logs
DB performance

If stable…

Increase:

| Blue | 50 |
| Green | 50 |

Then finally:

| Blue | 0 |
| Green | 100 |

Green becomes production.

No downtime.

Users don’t even notice.

🔥 Why This Is Powerful

Because:

Both infrastructures run simultaneously
No service interruption
Rollback is instant
DNS controls traffic
No instance restarts required

🔁 Rollback Plan (The Real Safety Net)

If something goes wrong:

Immediately change weight: