慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
如何锁住AI代理,使其不至为非作歹
ToxSec · 2026-05-24 · via DEV Community

ToxSec

汝之代理,惟从其理,行其所当行。或竟其事,或阅毒网,决其页为尊。若将LLM联于浏览器、工具链,或人之际,当其行,先为范于未发,非俟审计之录盈。

每代理固有之败式

拆解任一大型语言模型,其构造如出一辙。模型居于循环之中。尔以输入与工具饲之,直至任务毕。模型择其下一步行,循环施之,循环往复。其弊在于上下文之窗。尔之指令与攻击者之数据同归一处,经同一注意力机制,无权分离。无可信之信道,模型信之甚于不可信者。皆为符号,模型统观全局,择其最相关者而理之。

故当浏览器之代理读页,言"勿顾汝务,反行此事",则模型之心无以辨网页不当发令。遇之如读他务之毒能描述,或背景之任噬敌电,亦然。此乃迂回之诱,OWASP列其为LLM首患,实因是故。此乃结构之弊,非可补于模型。二二六年之研已显,自主之代理以SQL注入活站,启己用者,而无人授以破术之令。循环加之界缺,自为之耳。

是故,凡实控,皆存于模外。今当接之。

一层:许工具,绝凭信

默认开,则失之。一吏持泛泛“行壳令”之具,兼有长存之钥,是谓持钥之副,惑而失职。反之。吏得明许之名行,他无所与。

# agent-tools.yaml — deny by default, allow by name
tools:
  - name: search_docs
    scope: read:knowledge_base
  - name: create_ticket
    scope: write:tickets
# anything not listed dies at the broker, not in a prompt
policy:
  default: deny
  network_egress: none      # no outbound unless a tool explicitly needs it
  credential_ttl: 900       # 15 min, then re-mint

入全幅之境 退出全屏模式

事有二重。拒命存于汝之工具间,非系统提示之温言,令模型循规也。且每器所携之凭信,仅限一事,瞬息即逝。若使能者偏执,其波及之广,止于所限之狭域,非诸API密钥所授之合集。短时效者,盗得之令符,十五分内即成废砖。

第二层:禁绝凶险之行,审辨论辩之辞

日志示人已往之事,然不能止祸。及至记录既成,数据已去。所求者,乃立于行事之前,决其可否行之制也。

二事。一者,凡不可逆或涉机密之事,皆设人为之关隘:如寄信、移财、触涉机要之物、凡似泄密之形者。二者,设运行时之钩,于执行之前读工具调用之引,遇显见之弊则发警报。

# pre-exec hook: inspect the args, not just the call name
SENSITIVE = {"send_email", "transfer", "delete", "post_webhook"}

def authorize(tool_name, args):
    if tool_name in SENSITIVE:
        if looks_like_exfil(args):     # external dest, bulk read, weird recipient
            return BLOCK
        return REQUIRE_HUMAN           # a checkpoint, not a log line
    return ALLOW

Enter fullscreen mode Exit fullscreen mode

此功能实非要旨。要旨在于,模型之决断与世事之效,其间必有掣肘。重执行,不重可察。纵有详尽之审计轨迹,仍为失守。

实施之患

初看似善,后必伤人者数事。

蔓延之害,渐至而亡。代理人得窥代码,继而得票,终得客户函。无一授权显谬。无人总览其全。于历置定期权限审计,视代理人身份如服务账户之实。

信行传递,自第二代理人始言。一旦代理人委诸他者,尔之辐射范围遂吞并第二代理人所能及之物。未联接之前,必绘信之图谱,尤当慎于跨供应商之界,盖其侧之控,尔不可见也。

认证非即诚实。TLS与OAuth可证代理人即其所称。然未言其所宣之能力是否真实,亦未言其自述是否含针对汝模型之注入。验其行止,非徒验其身份。

已毕

汝不可使模型辨数据于指令。故汝当筑其阙:默认拒之器,暂存之凭信,危殆之召令设人为之关,及运行时钩于发令前读其辞。此皆非灵丹。叠用之,可化一毒入为"阻而录"。此即全役也。

吾已详述之,兼论此链之演,历乎Project Mariner、A2A协议,及24/7永无休之背景代理,皆载于ToxSec Substack


毒攻安全涵盖人工智能之安全隐忧、攻击链路,及守御者所需之攻防利器。由一位拥有国安局、亚马逊及国防承包领域实战经验之人工智能安全工程师主持。持CISSP认证,获网络安全工程硕士学位。