惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

33x faster than Rust: Why I stopped waiting for my compiler and built my own. I Built My First Production AWS Project as a Career Changer Why Detecting PII Matters More Than Ever Python Tasks How I Started My Cybersecurity Journey as an SQA Engineer 🔐 Why "fancy fonts" in Discord and Instagram bios turn into boxes ☁️ GKE private cluster setup — common mistakes and how to avoid them I Thought a Username Didn’t Matter… Until I Saw How Much People Care About It Claude for Small Business: 382K Day-One Buyer's Guide I Built a Diagnostic Toolkit for PyTorch Because I Was Tired of Guessing Why Models Fail How I Built an AI-Powered Incident RCA Platform with LangGraph and RAG The Paywall Was a Painted Door Sonnet hallucinated. My agent stored it as fact. How React-Style Time-Slicing Keeps UIs Responsive 这个 Princeton 开源项目让 AI 自己修 Bug,19K Stars 但 90% 的人只用了 1% 功能 🔥 SWE-agent's 5 Hidden Uses Nobody Told You About 🔥 Decompiling Serial Number U-36: Python TERCOM Reconstruction, Cryptographic Logistical Forensics, and Swarm Consensus Fault Tolerance Microservices Patterns You Cannot Outrun a Wave I Fired My Entire Node.js Stack — Rust Rebuilt It in 3 Weeks (The Ugly Truth) BoxAgnts Introduction (2) — AI Agent Toolbox Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works. Prisma-7 A Complete Beginners Guide (With Free Cloud Database!) Akses HDD Rumah dari Laptop Kantor Pakai Tailscale + SMB (Tanpa VPN Ribet) Content Pipeline in MonoGame: Why I Don't Use It Debug Log #1 — The Pipeline That Looked Broken Data Structures in JavaScript: When to Use What (2026) BGP Route Flap Damping: A Solution or a New Problem? First look at AWS DevOps Agent The Next Big “Cult App” Probably Isn’t Another Social Media Platform From Template to Production-Shaped: An AI-Native Dev Flow for Go Side Projects Idempotency Keys: The API Pattern That Saves You From Duplicate Payments and Phantom Records Everyone's Building Jarvis. Nobody's Even Close. The Moment the Jaeger Tracer Exhausted Itself and What We Switched To How to Fix Tool-Use Loops in Autonomous Coding Agents Months of self-testing: Citations shine, other features remain unproven. Claude Code for Canary Deployments: How I Ship to 1% of Users Before Breaking Everything Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET) 20 Years of GPUs in Numbers: How FLOPS & TDP Grew, and Who Led the NVIDIA vs AMD Race (open dataset, 13.5k GPUs) Espressif Reveals CoreBoard and Korvo Dev Kits for ESP32-S31 Composable Abstraction Layer: o pattern que faltava entre Pinia e seus componentes Vue Your GitHub Actions Logs Are Leaking LLM Keys and Your SIEM Isn't Catching It Solving Complex Logic with Claude and Research Papers Building TheEpicBook: A Deep Dive into a Node.js Monolithic Web Application Haber yazilimi, haber scripti, haber sistemi: ayni urun, uc ayri arama niyeti Predicting Blood Glucose Fluctuations: Building a Transformer-based CGM Forecaster with PyTorch & InfluxDB Pre-task hooks: the one-line wire-up that gives your Hono agent shared memory Concurrent writes to a shared agent memory: what we shipped, what we punted on Building a Production Serverless URL Shortener on AWS — 21 Articles, Every Test Run for Real My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner Fabric AI Functions Turn GenAI Into a Data Pipeline Step Proximate vs Ultimate: The Bug Is Never Just the Bug The Treasure Hunt Engine That Broke Before the Traffic Did Reset Windows Update: The Definitive MSP Guide to RWU Your Resume Was Never Built for This AI Writes 46% of Code Now: What Snap's Layoffs Mean for Developers in 2026 From Chatbot to Agent — Tool Calling with NVIDIA NIM Fatigue and Fracture Mechanics: Why Parts Break Below Their Yield Strength I built a token-level debugger for comparing two LLMs VCP-Virtual Private Cloud Embedding sing-box in an iOS messenger to bypass Russian DPI (no VPN) Microsoft Copilot just exfiltrated a company's files. The attack was one email. Here's the mechanism. RAG 시스템 실전 구축 (v42) copilot cloud agent is becoming an automation api Cx Dev Log — 2026-04-23 Why Tesla Is Becoming the AI Enterprise Case Study Every Leader Should Understand ORA-00214 오류 원인과 해결 방법 완벽 가이드 SpecAgnt v2.0: The Agent Lifecycle Framework for AI-Native Engineering Optimizing Signal Latency and Weight Allocations in Algorithmic Pipelines SSH Under the Hood: Protocols, Mechanisms, and the Full Technical Story دليل بوابات الدفع للتاجر العربي في 2026 (وكيف تختار المناسبة لمتجرك) Cómo Mi Configuración de Docker Me Salvó de un Ataque de Supply Chain (Y Por Qué la Tuya Debería Hacerlo También) How My Docker Setup Saved Me From a Supply Chain Attack (And Why Yours Should Too) Astro: The epitome of SEO Technical Update I Gave My AI Agent the Ability to Research Before It Writes — Here’s What Changed Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5) CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl
Access Control Doesn't Scale Linearly -- Part 3
Anusha Mukka · 2026-05-26 · via DEV Community

Anusha Mukka

One day you look up and realize your permissions model is something only two people on the team can explain. One of them just put in their notice.

  Nobody planned to be in that position. It happened one exception at a time. One "just add a role for this" at a time. One "we'll clean this up later" at a time. Later never comes. It never comes.

  This is Part 3 of a series about assumptions that quietly break systems at scale.

How 15 roles become 340 (a horror story in slow motion)

 

When we built out the permission model for one of the systems I worked on, we had 15 roles. Clean, well-defined, each with a clear purpose. You could explain the whole model in ten minutes to anyone new on the team. I was proud of it, honestly.

  Two years later there were 340 roles. Three. Hundred. And forty.

  Nobody planned for that. Nobody woke up one morning and said "you know what this system needs? 340 roles." It happened like this: a team needed access to one resource but not another, so a new role was created. A contractor role was almost identical to the standard role but needed one extra permission, so another role was created. An emergency access role was supposed to be temporary but was kept "just in case" and never revisited.

  Each decision made perfect sense at the time. Collectively they produced a permission model that no single person could fully explain, audit, or reason about confidently. Including me, and I'd been there since the beginning.

  That is role explosion. It's not a failure of discipline. It's what happens when a model designed for a clean set of cases gets pushed, one reasonable exception at a time, into a reality more complex than it was designed for.

Why simple RBAC always eventually breaks

 

Role-based access control works great when access decisions are binary: you either have the role or you don't. Clean, auditable, easy to reason about.

  The problem is that real-world access decisions are almost never that clean.

  You need a user who can access their own records but not others. You need access that expires after a project ends. You need a decision that depends on the current state of the resource, not just who's asking. Each of these requirements pushes you either toward more roles (which gets unwieldy fast) or toward a richer model that can express context-aware decisions.

  Most teams take the path of more roles because it's faster in the moment. I've done this. You've probably done this. The second path -- attribute-based or policy-based access control -- is more work upfront and dramatically less work over time. But "more work upfront" loses to "we need this shipped by Friday" approximately 100% of the time.

The 10-minute incident (or: why caching permissions is terrifying)

 

Even a well-designed permission model has to be evaluated, and at scale the evaluation cost matters.

  The usual answer is caching. Cache the authorization decision with a TTL. Fast, cheap, easy to implement. But during that TTL window, you're making decisions based on permissions that may no longer be current. This is fine. This is a reasonable tradeoff. Until it isn't.

  We had a 10-minute TTL on cached permission decisions. The security team had asked what would happen if they needed to revoke access immediately. We said: up to 10 minutes. They accepted that.

  Then a credential was compromised.

  The security team revoked access and watched the logs. The system kept serving that user's requests for another eight minutes. Eight minutes is not long in most contexts. Standing in front of a security team watching real-time access logs during an active incident, trying to explain why the revocation hasn't taken effect yet, is a very different experience of those eight minutes. How did I eventually got around that problem? Take a guess in the comments.

  Anyways..I have never forgotten what that room felt like. I will never set a cache TTL on permissions without thinking about that room.

  That tradeoff -- cache TTL versus revocation speed -- exists whether or not your team has discussed it. The only variable is whether you made it consciously or discovered it during an incident.

Audit trails at volume (the compliance conversation from hell)

 

Every access decision needs to be attributable: who requested it, what they were authorized to do, what decision was made, and why. At 100,000 decisions per second, that's substantial write volume to your audit store.

  Synchronous writes add latency. Asynchronous writes mean you have to handle the failure case where a decision is made but the audit entry is lost -- which is a compliance conversation nobody wants to have. I've been in that conversation. It's not fun.

  I've worked on systems where the requirement was "log first, then execute." That constraint reshapes your entire architecture -- your latency budget, your failure handling, your storage design. It's buildable, but it needs to be in the design from the start. Retrofitting "log before execute" onto an existing system is expensive and almost never goes cleanly. Ask me how I know.

Granting is easy. Revocation is the real test.

 

Granting access is trivial. Write a row somewhere. Done. Ship it.

  Revocation is where the design quality shows.

  Access needs to be revoked across every cache, every replica, every long-running process that may have loaded a stale copy of that permission. A batch job that started before the revocation happened, loaded permissions at startup, and is still running an hour later -- technically, every individual check it made was valid at the time. But the aggregate behavior is wrong.

  Explaining that gap to a compliance team is not a conversation you want. "Well, technically, at the time of each individual check..." doesn't land the way you hope it will.

  Designing revocation that actually works means deciding explicitly what "immediately" means in your system and then building infrastructure to deliver it. Not assuming it'll sort itself out. It won't sort itself out.

What I'd do differently

 

Authorize close to the data, not just at the API boundary. Edge authorization is necessary and not sufficient.

  Design the hot-path permission check to require no joins. It should be cheap by construction, not by optimization. Optimization after the fact is harder and less reliable than just designing it right.

  Treat the cache staleness window as a product decision, not a technical one. Write it down. Make sure the people responsible for security incidents know what it is before the incident happens.

  Build the audit trail into the design before anyone writes application code. Retrofitting it under compliance pressure is one of the more unpleasant engineering experiences I can describe. And I've had some unpleasant ones.


 

Next week: LATENCY - we all have seen the websites go "loading..." before they respond, what was that experience like? Not great, right? So, let's talk about the culprit behind that.

  What access control decision do you wish you'd made differently? The 340 roles story is mine. I want to hear yours. The worse, the better.