惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Visual Studio Blog
MongoDB | Blog
MongoDB | Blog
Engineering at Meta
Engineering at Meta
云风的 BLOG
云风的 BLOG
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
T
The Exploit Database - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
Know Your Adversary
Know Your Adversary
月光博客
月光博客
I
InfoQ
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
爱范儿
爱范儿
S
Securelist
博客园 - 叶小钗
C
CERT Recently Published Vulnerability Notes
Recorded Future
Recorded Future
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
D
DataBreaches.Net
G
GRAHAM CLULEY
P
Proofpoint News Feed
A
About on SuperTechFans
Google DeepMind News
Google DeepMind News
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tor Project blog
Stack Overflow Blog
Stack Overflow Blog
T
Threat Research - Cisco Blogs
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Recent Announcements
Recent Announcements
P
Proofpoint News Feed
The GitHub Blog
The GitHub Blog
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
Jina AI
Jina AI
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
博客园 - 【当耐特】
H
Help Net Security
F
Fortinet All Blogs
T
The Blog of Author Tim Ferriss

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Fix Untagged AWS Resources Automatically with Python and Boto3
Oleksandr Kuryzhev · 2026-06-24 · via DEV Community

Originally published on kuryzhev.cloud


If your AWS Cost Explorer has a fat "No Tag" line item and nobody on the team knows who owns those resources, your tagging strategy isn't a policy problem — it's an automation gap you can close in an afternoon. AWS resource tagging automation with Python and Boto3 is the fastest path from chaos to chargeback clarity. This runbook walks through diagnosing the gap, fixing existing drift, and making sure it never comes back.

Symptoms — Your AWS Bill Has Untagged Resources and You Don't Know Who Owns Them

The first sign is always the same: Cost Explorer shows a four-figure "No Tag" bucket and every team points at someone else. Here's what that situation looks like in concrete terms.

Cost Explorer's tag-based cost allocation report shows significant spend under the No Tag grouping for keys like Environment, Owner, and CostCenter. Finance wants a chargeback report. Engineering can't produce one. The argument goes in circles.

Meanwhile, AWS Config's required-tags managed rule is firing alerts — maybe dozens per day. But because there's no automated remediation wired up, those alerts turn into Jira tickets that sit in a backlog nobody prioritizes. The ticket count grows faster than the team can manually fix instances.

New resources keep arriving. Engineers spin up EC2 instances through the console at 11pm during an incident. A Terraform module someone copied from Stack Overflow doesn't include a tags block. A Lambda function gets created by a CI pipeline that was never updated to pass required tags. Each of these becomes a ghost resource — running, costing money, and owned by nobody according to your billing data.

The longer this runs, the worse the audit trail. By the time someone investigates, the IAM principal that created the resource has been recycled and the CloudTrail event is beyond your retention window. You're left guessing.

Root Cause — Why Tagging Enforcement Fails at Scale

Three structural failures cause untagged resources to accumulate. Fixing one without addressing the others just shifts the problem.

Tag policies in audit mode, not enforcement mode. AWS Organizations Tag Policies exist in most mature accounts — but they're almost always set to "audit" mode. Resources get created, the policy violation is logged, and nothing stops it. The policy runs after resource creation, not before. Audit mode generates visibility, not compliance.

Boto3 tagging scripts that run silently fail. Most teams have some version of a tagging script. It runs on a cron. It worked once. Then an IAM policy change broke it, or a new region was added and nobody updated the region list, or pagination was never implemented so it only tags the first 100 resources and stops. No one notices because there's no alerting on script exit codes and no audit log of what was actually tagged.

IAM permission gaps that produce silent no-ops. This one is subtle and painful. The Lambda or script role often has ec2:CreateTags but is missing tag:TagResources — the permission required by the unified resourcegroupstaggingapi. The error is: botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the TagResources operation. If that's not logged and alerted on, the function returns successfully from the Lambda handler's perspective while silently doing nothing. I've seen this run undetected for six weeks in a production account.

Fix #1 — Audit Existing Untagged Resources with a Boto3 Scanner

Before you fix anything, you need to know the full scope. This script uses the resourcegroupstaggingapi client — the correct unified API — to scan every resource across multiple regions concurrently and export a CSV of everything missing required tag keys.

Watch out for: get_resources() silently returns only 100 results per call if you don't paginate. This is the single most common bug in tagging scripts. The script below handles it correctly with a PaginationToken loop. Without it, you'll think you have 80 untagged resources when you actually have 800.

The scanner uses concurrent.futures.ThreadPoolExecutor with a max of 5 workers. Scanning 6 regions sequentially takes 8–12 minutes and generates roughly 2,000 API calls. With 5 concurrent workers it finishes in under 90 seconds and stays within free-tier API rate limits. Don't go higher than 5 workers — I tested 10 and hit ThrottlingException in accounts with large resource counts.

# tag_auditor.py — Boto3 scanner for untagged AWS resources
# Requires: boto3>=1.34.0, Python 3.11+
# Usage: python tag_auditor.py --regions us-east-1 eu-west-1 --output untagged.csv

import boto3
import csv
import argparse
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime

# Required tag keys — update to match your org's tag policy
REQUIRED_TAGS = {"Environment", "Owner", "CostCenter"}

def get_untagged_resources(region: str) -> list[dict]:
    """
    Scan a single region for resources missing any required tag key.
    Uses resourcegroupstaggingapi for unified multi-service coverage.
    """
    client = boto3.client("resourcegroupstaggingapi", region_name=region)
    untagged = []
    pagination_token = ""

    while True:
        kwargs = {
            "ResourcesPerPage": 100,  # max allowed per call
            "TagFilters": [],         # empty = return ALL resources
        }
        # Only pass PaginationToken if we have one — API rejects empty string
        if pagination_token:
            kwargs["PaginationToken"] = pagination_token

        try:
            response = client.get_resources(**kwargs)
        except client.exceptions.InvalidParameterException as e:
            print(f"[{region}] InvalidParameterException: {e}")
            break

        for resource in response.get("ResourceTagMappingList", []):
            arn = resource["ResourceARN"]
            existing_keys = {tag["Key"] for tag in resource.get("Tags", [])}
            missing_keys = REQUIRED_TAGS - existing_keys

            if missing_keys:
                untagged.append({
                    "arn": arn,
                    "region": region,
                    "missing_tags": ", ".join(sorted(missing_keys)),
                    "scanned_at": datetime.utcnow().isoformat(),
                })

        pagination_token = response.get("PaginationToken", "")
        if not pagination_token:
            break  # no more pages

    print(f"[{region}] Found {len(untagged)} untagged resources")
    return untagged


def scan_all_regions(regions: list[str]) -> list[dict]:
    """
    Run region scans concurrently — max 5 workers to avoid API throttling.
    """
    all_results = []
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = {executor.submit(get_untagged_resources, r): r for r in regions}
        for future in as_completed(futures):
            try:
                all_results.extend(future.result())
            except Exception as e:
                print(f"Error scanning region {futures[future]}: {e}")
    return all_results


def write_csv(results: list[dict], output_path: str) -> None:
    if not results:
        print("No untagged resources found.")
        return
    with open(output_path, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["arn", "region", "missing_tags", "scanned_at"])
        writer.writeheader()
        writer.writerows(results)
    print(f"Report written to {output_path} — {len(results)} resources need tagging")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Audit untagged AWS resources")
    parser.add_argument("--regions", nargs="+", default=["us-east-1"], help="AWS regions to scan")
    parser.add_argument("--output", default="untagged_resources.csv", help="Output CSV path")
    args = parser.parse_args()

    results = scan_all_regions(args.regions)
    write_csv(results, args.output)

Run this, push the CSV to S3, and share it with the team. Now everyone sees the actual scope. That number usually ends the debate about whether this is worth fixing.

One more gotcha: tag key names are case-sensitive. environment and Environment are two different keys in AWS. Standardize your required key names in a constants file — not a wiki page — and import that file everywhere. If it lives only in documentation, it will drift.

Fix #2 — Auto-Tag Resources on Creation with an EventBridge + Lambda Pipeline

The audit tells you about the past. This fix handles the future. The pattern: EventBridge listens for CloudTrail RunInstances, CreateDBInstance, and CreateFunction events, then triggers a Lambda that applies ownership tags immediately at creation time.

The Lambda extracts the userIdentity block from the CloudTrail event detail — this gives you the IAM principal ARN of whoever or whatever created the resource. That becomes the Owner tag. No more guessing.

Critical prerequisite: CloudTrail must be enabled with Include management events = Write. A read-only trail will not capture RunInstances. Also — and this catches people constantly — if your CloudTrail is single-region and your EventBridge rule is in us-east-1, resources created in eu-west-1 will never trigger the Lambda. Use a multi-region trail. See the AWS CloudTrail multi-region documentation for setup details.

The Lambda execution role minimum permissions: ec2:DescribeInstances, tag:TagResources, logs:CreateLogGroup, logs:PutLogEvents. Nothing more. Scope the tag:TagResources with aws:ResourceTag condition keys — an unrestricted tag:TagResources on * lets this function overwrite security-sensitive tags like data-classification or backup-policy. That's a privilege escalation vector you don't want.

# lambda_auto_tagger.py — EventBridge-triggered Lambda for tagging resources on creation
# Deploy as Lambda function; trigger via EventBridge rule on CloudTrail RunInstances events
# Runtime: Python 3.12 | Memory: 128 MB | Timeout: 30s

import boto3
import logging
import os

logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Default fallback tags applied when owner cannot be determined from event
DEFAULT_TAGS = {
    "Environment": os.environ.get("DEFAULT_ENV", "unknown"),
    "CostCenter":  os.environ.get("DEFAULT_COST_CENTER", "unassigned"),
}

def extract_owner(event_detail: dict) -> str:
    """Pull principal identity from CloudTrail userIdentity block."""
    identity = event_detail.get("userIdentity", {})
    # Prefer assumed-role ARN; fall back to IAM user ARN; then 'unknown'
    return (
        identity.get("arn")
        or identity.get("userName", "unknown")
    )

def build_resource_arns(event_detail: dict, region: str, account_id: str) -> list[str]:
    """Extract EC2 instance ARNs from RunInstances response items."""
    arns = []
    items = event_detail.get("responseElements", {}).get("instancesSet", {}).get("items", [])
    for item in items:
        instance_id = item.get("instanceId")
        if instance_id:
            # Full ARN required by tag_resources() — instance ID alone causes InvalidResourceId
            arns.append(f"arn:aws:ec2:{region}:{account_id}:instance/{instance_id}")
    return arns

def handler(event, context):
    detail      = event.get("detail", {})
    region      = event.get("region", "us-east-1")
    account_id  = event.get("account", "")
    owner       = extract_owner(detail)
    arns        = build_resource_arns(detail, region, account_id)

    if not arns:
        logger.warning("No resource ARNs extracted from event — skipping")
        return {"status": "skipped", "reason": "no_arns"}

    tags = {**DEFAULT_TAGS, "Owner": owner}
    tagging_client = boto3.client("resourcegroupstaggingapi", region_name=region)

    # tag_resources() accepts max 20 ARNs per call — batch accordingly
    for i in range(0, len(arns), 20):
        batch = arns[i:i+20]
        try:
            resp = tagging_client.tag_resources(ResourceARNList=batch, Tags=tags)
            failed = resp.get("FailedResourcesMap", {})
            if failed:
                logger.error(f"Failed to tag resources: {failed}")
        except tagging_client.exceptions.InvalidParameterException as e:
            # Resource may be in terminal state (deleting) — log and continue
            logger.warning(f"InvalidParameterException for batch {batch}: {e}")

    logger.info(f"Tagged {len(arns)} instance(s) with Owner={owner}")
    return {"status": "ok", "tagged_count": len(arns)}

I stopped using service-specific tag APIs (ec2:create_tags, lambda:tag_resource) after spending two hours debugging why RDS instances weren't getting tagged when EC2 instances were. The unified resourcegroupstaggingapi handles EC2, RDS, Lambda, S3, and more in a single client. It's roughly 60% fewer API calls in multi-service environments. Use it exclusively.

Fix #3 — Enforce Tag Compliance with AWS Config + Boto3 Auto-Remediation

EventBridge catches resources at creation. But things slip through — manual console actions, API calls from services that don't emit the events you're watching, or resources created before you deployed the EventBridge rule. AWS Config is your safety net.

Deploy the required-tags managed Config rule scoped to AWS::EC2::Instance, AWS::RDS::DBInstance, and AWS::Lambda::Function. The rule supports up to 6 tag key/value pairs per deployment. Set evaluation frequency to 24 hours — continuous evaluation generates excessive Config API costs for large accounts.

Wire the AWS-TagResource SSM Automation document as the remediation action. Pass default tag values (Environment=unknown, Owner=unassigned) as remediation parameters. This keeps your cost reports clean — a resource tagged Owner=unassigned shows up in filtered reports and triggers follow-up. A resource with no tag at all is invisible.

Watch out for this one: The SSM remediation role needs both config:StartRemediationExecution AND tag:TagResources. Missing the second permission produces a silent no-op. Config shows the remediation as "in progress" indefinitely. There's no error surfaced in the Config console. You'll only catch it by checking CloudWatch Logs for the SSM automation execution. I've seen teams run this misconfigured for months thinking remediation was working.

Also: the AWS-TagResource document requires the full resource ARN in the format arn:aws:ec2:REGION:ACCOUNT:instance/i-XXXXXXXXX. Passing an instance ID alone raises InvalidResourceId. Config passes the ARN correctly if you use the RESOURCE_ID parameter mapping — double-check your parameter bindings in the remediation configuration. See the AWS Config remediation documentation for the full parameter reference.

Prevention — Enforce Tagging at the IaC Layer Before Resources Are Created

Remediation is reactive. Prevention is better. The goal is to make it impossible to deploy an untagged resource through your standard pipelines.

Add a check_required_tags() pre-flight validation function to every Boto3 deployment script. Call it before any create_* API call. If required tags are missing from the payload, fail fast with a clear error message: DeploymentError: Missing required tags: ['CostCenter', 'Owner']. Add them to your deployment config and retry. A clear error at deploy time is infinitely better than a mystery line item in next month's bill.

At the organization level, combine AWS Organizations Tag Policies with Service Control Policies. Set EnforcedFor on specific resource types in Tag Policies, and add an SCP that denies ec2:RunInstances unless the aws:RequestTag/CostCenter condition is present. This hard-blocks non-compliant creates at the API layer — no Lambda, no Config rule, no cron job required. It's the most reliable enforcement mechanism available.

Finally, set up a weekly Cost Explorer report filtered by tag:CostCenter = N/A and alert via SNS if spend exceeds $50. Cost Explorer tag data has a 24-hour activation lag after first application — newly tagged resources won't appear in filtered reports until the next day, so don't panic if your first run still shows spend. This financial early-warning gives you a business-level signal before the next billing cycle closes, and it's the kind of alert that actually gets acted on because it has a dollar amount attached.

AWS resource tagging automation with Python is a weekend project that pays for itself in the first billing cycle. Audit what you have, stop new drift at the creation event, catch stragglers with Config, and enforce at the IaC layer. Stack all four layers and the "No Tag" line item disappears. For more automation patterns on this site, see kuryzhev.cloud.

Related