惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
阮一峰的网络日志
阮一峰的网络日志
J
Java Code Geeks
宝玉的分享
宝玉的分享
C
CXSECURITY Database RSS Feed - CXSecurity.com
P
Privacy International News Feed
The Register - Security
The Register - Security
T
Threat Research - Cisco Blogs
Recent Commits to openclaw:main
Recent Commits to openclaw:main
PCI Perspectives
PCI Perspectives
Hugging Face - Blog
Hugging Face - Blog
T
Tailwind CSS Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
N
News | PayPal Newsroom
Google Online Security Blog
Google Online Security Blog
aimingoo的专栏
aimingoo的专栏
F
Full Disclosure
P
Palo Alto Networks Blog
A
About on SuperTechFans
Microsoft Azure Blog
Microsoft Azure Blog
F
Fortinet All Blogs
爱范儿
爱范儿
Recorded Future
Recorded Future
月光博客
月光博客
T
True Tiger Recordings
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tenable Blog
L
Lohrmann on Cybersecurity
博客园 - 聂微东
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
大猫的无限游戏
大猫的无限游戏
S
Security @ Cisco Blogs
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
L
LINUX DO - 热门话题
Hacker News: Ask HN
Hacker News: Ask HN
C
Check Point Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
L
LangChain Blog
The Cloudflare Blog
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
I
InfoQ
N
Netflix TechBlog - Medium
Recent Announcements
Recent Announcements
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
SecWiki News
SecWiki News
云风的 BLOG
云风的 BLOG
T
ThreatConnect
博客园 - 叶小钗
B
Blog

DEV Community

I reproduced a Claude Code RCE. The bug pattern is everywhere. We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found. Jenkins CI/CD Pipeline for a Dockerized Node.js Application: Manual Trigger vs Automatic Trigger Using GitHub Webhooks How to Stream Live Forex Rates to Google Sheets API: A Complete Guide Small Models Will Beat Giant Models (And Most People Haven’t Realized Why Yet) How I Built 5 Linux Automation Scripts on AWS EC2 I built TokenPatch to measure AI coding cost per applied patch I built a Chrome extension to stop squinting at the web Producer audit clean, six tests red Conversa — A Multi-Agent AI Platform Powered by Gemma 4 Build a Real Agent in 15 Minutes with Gemini's New Managed Agents API What I Actually Build: AI Systems That Ship, Not Demos That Impress The Box Ticked While You Read This: LinkedIn, AI Training, and the Switch You Did Not Flip Investasi Masa Depan: Mengintip Fasilitas Laboratorium Komputer Kelas Dunia di Yogyakarta I Cancelled My $20 Claude Cowork Plan After a Week With OpenWork Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead How To Build an Image Cropper in Browser (Simple Steps) I built a macOS disk cleaner for developers and just launched it would love feedback Membangun Kompetensi dan Relasi: Mengapa Ekosistem Kampus Itu Penting I Built an AI That Decides Which AI to Talk To — Running 24/7 From My Living Room Codex Team Usage SOP How to Actually Become a Programmer: The Hard Part Nobody Wants to Explain Building a Production-Style Multi-Tool AI Agent with Python, Flask, React & Gemini AI The Caretaker Sandbox: An Offline-First Visual Playground & Template Engine powered by Gemma 4 # Building Instagram OSINT Projects with HikerAPI Your AI can read. Gemma 4 can see The Battle of the Senior Dev: Why AI Gives You Wings But Only If You're Ready to Pilot HiDream Raw Output Failed Tried Dev-2604 VRAM Math Killed It Won with a Prompt Enhancer Instead I Finally Finished a Project I Abandoned — And GitHub Copilot Helped Me Ship It SafeSMS: On-Device Threat Detection with Gemma 4 E4B, no internet required I Built OpenKap — A Loom Alternative for Small Teams Who Just Want to Ship Gemma 4 is Here: The Dawn of Local Multimodal Reasoning Offline-First Flutter: How We Built a CRM That Manages 100K+ Leads With No Internet Memory for Agents: When Vectors Meet Graphs, Bugs Drop 4 The Rise of Production-Grade AI Infrastructure I ran my idea-validation product through its own validator. The verdict was PIVOT. We Built an Agent Commerce API. Google I/O 2026 Changed Our 3-Month Roadmap in 24 Hours. "My Partner's Memory Was Full. I Didn't Know — Until We Tried to Talk." I’m a Front End Web Developer Learning Machine Learning From Scratch Laravel Waiting Request I Built a Chrome Extension to Track How Long You Actually Spend on Each Tab Why Google Can't See Your React Breadcrumbs (And the 4-Line Fix) AI Travel Assistant Powered by Gemma 4; With Streaming, Image Input, and Visual Recommendation Cards Microsoft tried to kill the printer driver. Healthcare said no. The Blueprint Beneath the Blueprint: Designing Data Model and Choosing Its Database REST APIs vs Webhooks in Telecom Billing - Which One Actually Makes Sense? Accounting Made Simple: AI-Powered Financial Insights of Japanese Companies with Gemma 4 The append-only AST trick that makes Flutter AI chat actually smooth Designing the Future of Payments — Why XML Still Matters in the Age of APIs From Legacy to Live — Reviving XMLPayments with GitHub Copilot Two Weeks Into Learning Solana XMLPayments — The Hidden Backbone of Modern Financial Orchestration AI Agents in Practice — Read from the beginning Reviving My Gemma Agentic Framework: From Prototype to Polished Repo Smart Contracts Demand Better Infrastructure: Building on contract.dev Self-Hosted LLM Tool Calling: Forge and the Build-vs-Buy Decision ORA-00072 오류 원인과 해결 방법 완벽 가이드 OpenWA for CTOs: Self-Hosted WhatsApp Gateway Trade-Offs NotebookLM Automation With notebooklm-py: Useful, But Classify Data First Docker v29.5.x Operator Upgrade Checklist Coding-Agent Instruction Design: The CLAUDE.md File That Prevents Rework When I Finally Realized My Runtime Was Holding Me Back GnokeOps: Host Your Own AI House Party The Death of Static Rate Limiters: Why Your Java Virtual Threads Need BBR-Style Adaptive Concurrency AI Agents in Practice — Part 2: What Makes Something an Agent Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine Beyond Prompts: Structuring AI Workflows for Real Frontend Engineering From an Abandoned Hackathon Project to an AI Study Workspace 🚀 Terraform with AI: Build AWS Infra (Cursor + MCP) What If AI Didn’t Need the Internet? 750,000 Chips, 140 Trillion Tokens: The Math Behind DeepSeek's Permanent Price Cut You're Renting Someone Else's Compute — And It's Costing You More Than You Think CSS :has() Selector: The Layout Trick I Wish I Knew 5 Years Ago Five Clusters. Five Lessons. One Production System. Synaptic: A Local-First AI Dev Companion That Remembers How You Think Revolutionizing Edge MedTech: Building a Sovereign Sleep Apnea Companion ("XiHan Snore Coach") with Gemma 4 HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DMARC p=none vs p=quarantine vs p=reject: what to use and when DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I solo-built a reputation layer for AI agents on NEAR — and here's what I learned I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090
Zero-Downtime Blue-Green and IP-Based Canary Deployments on ECS Fargate
POTHURAJU JA · 2026-05-23 · via DEV Community

Most ECS blue-green deployment tutorials eventually lead to the same stack:

  • AWS CodeDeploy
  • Deployment groups
  • AppSpec files
  • Lifecycle hooks
  • Weighted traffic shifting
  • Complex rollback orchestration

And while CodeDeploy works, I kept running into one practical limitation during real deployments:

I couldn’t let my internal team validate a new release on the actual production URL before exposing it to customers.

That became the entire motivation behind this setup.

I didn’t want:

  • separate staging domains
  • duplicate ALBs
  • temporary preview environments
  • “almost production” testing

I wanted something much simpler:

  • Internal users should see the new version first
  • Customers should continue seeing the stable version
  • Both should use the same production domain
  • Rollback should be immediate
  • Deployments should remain fully zero downtime

So I built a Terraform-driven deployment workflow using:

  • ECS Fargate
  • Application Load Balancer (ALB)
  • ALB listener priorities
  • Source IP routing
  • Terraform

without using CodeDeploy.

After running this setup in practice, I ended up preferring it for many ECS workloads.


The Core Idea

Both BLUE and GREEN environments run behind the same ALB.

Internal office/VPN IPs get routed to GREEN first.

Everyone else continues hitting BLUE.

That means QA and internal teams can validate the new release directly on the real production infrastructure before public rollout begins.

Same:

  • domain
  • SSL certificate
  • ALB
  • authentication flow
  • redirects
  • networking path

No “staging surprises” later.

A lot of deployment issues only appear on the real production routing path.


Real Example

Internal users open:

https://nginx.jayakrishnayadav.cloud

Enter fullscreen mode Exit fullscreen mode

…and immediately see the GREEN version.

Meanwhile, public users continue seeing BLUE.

No DNS switching.

No duplicate infrastructure.

Just ALB listener routing.


Architecture Overview

The deployment flow looks like this:

                ┌────────────────────┐
                │   Application LB   │
                └─────────┬──────────┘
                          │
         ┌────────────────┴────────────────┐
         │                                 │
 Internal Office/VPN IPs             Public Users
         │                                 │
         ▼                                 ▼
   GREEN Target Group               BLUE Target Group
         │                                 │
    ECS GREEN Tasks                  ECS BLUE Tasks

Enter fullscreen mode Exit fullscreen mode

The canary routing rule gets evaluated first.

If the request source IP matches internal CIDRs, traffic goes to GREEN.

Everything else falls back to BLUE.


Terraform Structure

I kept the Terraform layout modular so it could be reused across multiple services.

.
├── main.tf
├── variables.tf
├── outputs.tf
├── env/
│   ├── backend.hcl
│   └── terraform.tfvars
├── modules/
│   ├── vpc/
│   ├── iam/
│   ├── alb/
│   ├── ecs-cluster/
│   └── ecs-blue-green-service/
└── scripts/
    └── zero-downtime-test.sh

Enter fullscreen mode Exit fullscreen mode

Each ECS service gets:

  • BLUE ECS service
  • GREEN ECS service
  • BLUE target group
  • GREEN target group
  • production listener rule
  • optional canary listener rule

ALB Listener Rule Logic

The entire deployment behavior depends on ALB listener priorities.

The canary listener rule gets evaluated first.

If the request source IP matches internal CIDRs, traffic gets forwarded to GREEN.

resource "aws_lb_listener_rule" "canary" {
  count    = var.activate_canary ? 1 : 0
  priority = 99

  condition {
    source_ip {
      values = var.canary_source_ips
    }
  }

  condition {
    host_header {
      values = ["nginx.jayakrishnayadav.cloud"]
    }
  }

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.green.arn
  }
}

Enter fullscreen mode Exit fullscreen mode

The production rule remains below it:

resource "aws_lb_listener_rule" "production" {
  priority = 100

  condition {
    host_header {
      values = ["nginx.jayakrishnayadav.cloud"]
    }
  }

  action {
    type             = "forward"
    target_group_arn = local.active_target_group
  }
}

Enter fullscreen mode Exit fullscreen mode

That’s it.

No weighted routing.

No lifecycle hooks.

Just listener priorities.


Real Deployment Workflow

This wasn’t built as a theoretical architecture exercise.

I tested the rollout flow directly from Terraform while continuously validating traffic behavior against live ECS Fargate services.

Terraform initialization:

terraform init -backend-config=env/backend.hcl

Enter fullscreen mode Exit fullscreen mode

Deployment apply:

terraform apply \
  -var-file=env/terraform.tfvars \
  -lock=false \
  -auto-approve

Enter fullscreen mode Exit fullscreen mode

During canary validation, I continuously verified my public IP:

curl ifconfig.me

Enter fullscreen mode Exit fullscreen mode

That mattered because the ALB source-IP rule decides whether traffic reaches:

  • BLUE
  • GREEN

Once my IP matched the configured canary CIDRs, traffic immediately started routing to GREEN.


Deployment Flow

The nice part about this setup is that everything becomes variable-driven.


Step 1 — Normal Production State

BLUE handles all production traffic.

GREEN remains scaled down.

enable_canary   = false
activate_canary = false
promote_to_all  = false

Enter fullscreen mode Exit fullscreen mode

Apply:

terraform apply \
  -var-file=env/terraform.tfvars \
  -lock=false \
  -auto-approve

Enter fullscreen mode Exit fullscreen mode

Result:

  • BLUE active
  • GREEN inactive
  • minimal Fargate cost

Step 2 — Start GREEN Tasks

Now we start the GREEN environment.

enable_canary   = true
activate_canary = false
promote_to_all  = false

Enter fullscreen mode Exit fullscreen mode

Apply again:

terraform apply \
  -var-file=env/terraform.tfvars \
  -lock=false \
  -auto-approve

Enter fullscreen mode Exit fullscreen mode

At this stage:

  • GREEN tasks start
  • ECS health checks complete
  • ALB target registration completes
  • no production traffic reaches GREEN yet

Users never hit partially starting containers.


Step 3 — Internal Canary Validation

Now we enable canary routing.

enable_canary   = true
activate_canary = true
promote_to_all  = false

Enter fullscreen mode Exit fullscreen mode

Apply again:

terraform apply \
  -var-file=env/terraform.tfvars \
  -lock=false \
  -auto-approve

Enter fullscreen mode Exit fullscreen mode

Now:

  • internal office/VPN users hit GREEN
  • public users continue hitting BLUE

This became the most valuable phase of the deployment workflow.

Because now:

  • QA validates production behavior
  • developers inspect logs
  • authentication flows get tested
  • sessions and redirects get verified

while customers remain completely unaffected.


Internal Canary Routing

This is the ALB listener rules view while canary routing is enabled.

The priority 99 rule matches internal source IPs and forwards them to GREEN, while everyone else continues hitting BLUE.

ALB Canary Routing


Step 4 — Promote GREEN to Production

Once validation looks good:

enable_canary   = true
activate_canary = false
promote_to_all  = true

Enter fullscreen mode Exit fullscreen mode

Apply again:

terraform apply \
  -var-file=env/terraform.tfvars \
  -lock=false \
  -auto-approve

Enter fullscreen mode Exit fullscreen mode

Now:

  • production listener switches to GREEN
  • BLUE scales down
  • all users see the new version

No downtime occurs.

Traffic simply moves from one target group to another.


Verifying Zero Downtime

I didn’t want to assume the deployment was safe.

I wanted to verify it continuously during rollout.

So I used a simple curl-based validation script that continuously hit both applications while traffic shifted between BLUE and GREEN.

for i in {1..100}
do
  for url in \
    "https://nginx.jayakrishnayadav.cloud/" \
    "https://apache.jayakrishnayadav.cloud/"
  do
    response=$(curl -k -s -w " HTTPSTATUS:%{http_code}" "$url")

    body=${response% HTTPSTATUS:*}
    status=${response##*HTTPSTATUS:}

    if [[ $body == *"BLUE - v"* ]]; then
      color="BLUE"
    elif [[ $body == *"GREEN - v"* ]]; then
      color="GREEN"
    else
      color="UNKNOWN"
    fi

    echo "Run: $i | URL: $url | Status: $status | Version: $color"
  done
done

Enter fullscreen mode Exit fullscreen mode

Output during deployment:

Zero Downtime Validation

You can clearly see:

  • HTTP 200 responses throughout deployment
  • no failed requests
  • no 503s
  • clean traffic movement from BLUE to GREEN

That confirmed the deployment was genuinely zero downtime.


Production Promotion View

After promotion:

  • the canary rule disappears
  • the production listener points directly to GREEN
  • all traffic reaches the new version
  • BLUE scales down to zero

Clean and simple.

Production Listener Switch

Final Traffic Flow


Rollback

Rollback became extremely simple.

I just reverted the Terraform variables:

enable_canary   = false
activate_canary = false
promote_to_all  = false

Enter fullscreen mode Exit fullscreen mode

Apply Terraform again:

terraform apply \
  -var-file=env/terraform.tfvars \
  -lock=false \
  -auto-approve

Enter fullscreen mode Exit fullscreen mode

ALB immediately routes traffic back to BLUE.

The rollback process stays predictable because traffic switching is entirely controlled through ALB listener rules.


HTTPS Configuration

The ALB uses ACM certificates for HTTPS.

Listeners:

  • Port 80 → redirect to HTTPS
  • Port 443 → production traffic
  • optional internal listener → restricted to internal CIDRs

Example:

test_listener_allowed_cidrs = [
  "160.30.39.198/32"
]

Enter fullscreen mode Exit fullscreen mode

That keeps internal preview traffic private while still using the same production infrastructure.


Cost Optimization

One thing I specifically wanted to avoid was permanently doubling infrastructure cost.

Normal state:

  • only BLUE tasks run

Deployment window:

  • BLUE + GREEN both run temporarily

After promotion:

  • BLUE scales down again

So infrastructure cost only increases briefly during deployments.


Final Thoughts

This project started because I wanted a very practical deployment workflow:

Internal users should validate the new version on the actual production URL before customers ever see it.

Once I implemented that using ALB listener priorities and source IP routing, I realized I no longer really needed CodeDeploy for this workflow.

The end result became:

  • simpler
  • easier to operate
  • easier to rollback
  • easier to debug
  • easier to reason about
  • fully zero downtime

And because everything is Terraform-driven, the deployment process stays reproducible and predictable.


GitHub Repository

Full Terraform implementation:

https://github.com/jayakrishnayadav24/ecs-blue-green-deployment/tree/canary