- 書: AI 代理口袋指南:用大語言模型構建自主系統之模式
- 亦余所譯: 思乎乎於Go(二書系列)— Go編程完備指南 + Go中六邊形架構
- 吾之項目: 赫爾墨斯IDE |GitHub — 一款供开发者使用之IDE,可配合Claude Code及其他人工智能编程工具使用
- 吾言: xgabriel.com | GitHub
用机之式已发。其使可点按钮,书密码,览密码簿,并倾瀹浏览器中每枚之饼干。三沙盒之式,可束爆域之广,而不损其使实善之事。
吾友曩月以Anthropic之用戶API織入收賬自動化之器。首度試運行,登入Stripe之測試儀板,自URL欄抄取一API鑰,貼入於一聊天完成之處。鑰終至於一追蹤輸出,與賣家共享之。無人欲泄之。此代理僅有瀏覽器與憑證於同一地址空間,而模型遂作所請之事。
此非智巧之模,乃视之如凡庸之程:束之,通其密,遏其不可逆耳。
为何"但予之浏览器"非正道
用机之模得见屏影。此乃真界面也。模所睹之每一像素,皆可为工具之唤、思理之迹、或长时记忆之条。
观其表面积:
- 瀏覽器自動填寫。Chrome之密码自动填充,于聚焦时将密码呈于DOM中一帧之顷。截图得之。
-
剪贴板。
Cmd+V显其用户所复之文,诚然,亦含其三刻前所贴之恢复语。 - 系统之通知。 令旗载两重密钥,日程邀约附会链,Slack密信往来。
- 缓存会话。 每所之站,皆载吾之档案所携之饼。以代理之浏览器登录Gmail,则代理亦已登录Gmail。
-
文件系统之对谈。
Cmd+O显家目录。SSH之钥,.env之件,~/.aws/credentials。
特工不必怀恶意。误一次足矣。网页提示“继续前,请用户确认其API密钥”即可致乱。
三层模式,各堵一漏。
模式一—会话一器,瞬息即逝
每代理会话皆有其一时之器。此器无恒存之态,无凭信内嵌,会话既终,则器亦消亡。
此乃承重之范式。余二者乃叠于其上。
此乃Docker之布置。察其所缺:无卷挂载,无主网络,无--privileged,无Docker之套接。
# docker-compose.agent-session.yml
services:
agent-browser:
image: agent-sandbox:1.4.2
init: true
read_only: true
tmpfs:
- /tmp:size=512m,mode=1777
- /home/agent/.cache:size=256m,mode=0700
security_opt:
- no-new-privileges:true
- seccomp=./seccomp-strict.json
cap_drop:
- ALL
cap_add:
- CHOWN
- SETUID
- SETGID
user: "1001:1001"
networks:
- agent-egress
mem_limit: 2g
pids_limit: 256
environment:
DISPLAY: ":99"
SESSION_TTL_SECONDS: "1800"
# no volumes. ever. the container dies, state goes with it.
此图像自带硬化Chromium构建,Xvfb供虚拟显示之用,并有纤薄HTTP服务器,以示截图及操作端点于控制器。无SSH。运行时无包管理器(万物皆存于只读层)。无外出DNS,唯经出口代理(详见模式三)。
控制器侧之生命周期,以Python言之:
import asyncio
import uuid
from contextlib import asynccontextmanager
import docker
_client = docker.from_env()
@asynccontextmanager
async def agent_session(*, ttl_seconds: int = 1800):
"""One container per agent session. Auto-killed on exit or TTL."""
session_id = str(uuid.uuid4())
container = _client.containers.run(
image="agent-sandbox:1.4.2",
name=f"agent-{session_id}",
detach=True,
remove=True,
# network attached, but everything goes through the egress proxy
network="agent-egress",
environment={"SESSION_ID": session_id},
# the rest is in docker-compose; this is the runtime override layer
)
killer = asyncio.create_task(_kill_after(container, ttl_seconds))
try:
await _wait_for_ready(container)
yield SessionHandle(container, session_id)
finally:
killer.cancel()
try:
container.kill()
except docker.errors.NotFound:
pass # already gone, great
async def _kill_after(container, seconds: int) -> None:
await asyncio.sleep(seconds)
# hard kill, no graceful shutdown. the agent had its turn.
try:
container.kill()
except docker.errors.NotFound:
pass
时序逻辑之要,甚于众人之所期。若使代理运行九十分钟,则积聚境况,涉猎非其所当涉之域,且浏览器中或存陈旧之认证令牌。当限其会话。若事未竟,则明示传递相关状态,开启新会话。
了然:勿使容器跨代理运行以"节启动时"。冷启动,调适之Chromium图像需时1.2秒。越会状态污染之代价,远逾此数级.
模式二 — 凭信中介
代理永不见密码、API密钥或刷新令牌。其自沙盒外运行之中介服务,得受短暂、限域之令牌。
此乃闭截屏漏之式。若凭据永不出屏,则模型不能发之。
经纪者显一术。赐令符以行事X于物Y,效期N秒。代理人呼之。经纪者决其铸券与否,授其域,定其寿。
# broker/api.py — runs OUTSIDE the agent sandbox, talks to it over a unix socket
import secrets
import time
from dataclasses import dataclass
from typing import Literal
Resource = Literal["stripe", "gmail", "github", "calendar"]
Action = Literal["read", "write", "delete"]
@dataclass(frozen=True)
class TokenGrant:
token: str
expires_at: float
scope: tuple[Resource, Action]
class CredentialBroker:
def __init__(self, vault, policy, audit_log):
self._vault = vault
self._policy = policy
self._audit = audit_log
self._active: dict[str, TokenGrant] = {}
def grant(
self,
*,
session_id: str,
resource: Resource,
action: Action,
ttl_seconds: int = 60,
) -> TokenGrant | None:
# policy check: is this session allowed to ask for this scope?
if not self._policy.allows(session_id, resource, action):
self._audit.deny(session_id, resource, action, "policy")
return None
# the actual secret lives in the vault. the broker mints a proxy token.
underlying = self._vault.fetch(resource, action)
if underlying is None:
self._audit.deny(session_id, resource, action, "no-credential")
return None
token = f"agent-{secrets.token_urlsafe(32)}"
grant = TokenGrant(
token=token,
expires_at=time.time() + ttl_seconds,
scope=(resource, action),
)
self._active[token] = grant
self._audit.grant(session_id, resource, action, token, ttl_seconds)
return grant
def resolve(self, token: str) -> tuple[Resource, Action, str] | None:
"""Called by the egress proxy to swap a token for the real header."""
grant = self._active.get(token)
if grant is None or grant.expires_at < time.time():
self._active.pop(token, None)
return None
resource, action = grant.scope
real = self._vault.fetch(resource, action)
return resource, action, real
代理之器得令牌,未得密钥:
async def request_stripe_token(session: SessionHandle, action: str) -> str | None:
"""Tool exposed to the agent. Returns a stub token, never the real key."""
grant = await session.broker.grant(
session_id=session.id,
resource="stripe",
action=action,
ttl_seconds=120,
)
return grant.token if grant else None
出逸之介(沙盒唯一能通互联网之物)见外向之 stub 令牌Authorization之表,检于市,易以真Stripe密钥而出,返则去之。使者未尝持其秘。截屏无虞。踪迹导出无虞。内存条目无虞。
有两事似可无,实不可缺:
- 令牌之TTL。六十至三百秒。若使者陷于循环而焚令牌,必再请之,此乃尔拒之之第二机也。
- 会话之策.只读会话,永无得写令。当将策铸于中涓,非铸于诘问。诘问者,软篱也.
善哉:勿于策拒时返丰谬。{"error": "denied: stripe.write requires admin policy"}示诘问所引之使,知下次所求。返{"error": "credential_unavailable"}而录其真由。
三式——毁性之行为之准
有行为可复。阅页可复:闭之则毕。寄函于客不可复。刷卡不可复。删行不可复。
准之制,截凡不可复之行为,必待人力(或更严之策)之认,而后行。
分类器居模型工具调用之输出与实际副作用之间:
from enum import Enum
from typing import Awaitable, Callable
class Risk(Enum):
REVERSIBLE = "reversible"
RECOVERABLE_WITH_AUDIT = "recoverable_with_audit"
IRREVERSIBLE = "irreversible"
# the policy lives in code, not in the prompt
ACTION_RISK: dict[str, Risk] = {
"browser.click": Risk.REVERSIBLE,
"browser.type": Risk.REVERSIBLE,
"browser.screenshot": Risk.REVERSIBLE,
"browser.navigate": Risk.REVERSIBLE,
"form.submit_purchase": Risk.IRREVERSIBLE,
"email.send_to_customer": Risk.IRREVERSIBLE,
"stripe.charge": Risk.IRREVERSIBLE,
"stripe.refund": Risk.RECOVERABLE_WITH_AUDIT,
"db.delete_row": Risk.IRREVERSIBLE,
"git.force_push": Risk.IRREVERSIBLE,
}
async def execute_with_gate(
action: str,
payload: dict,
*,
executor: Callable[[str, dict], Awaitable[dict]],
approver: "ApprovalChannel",
) -> dict:
risk = ACTION_RISK.get(action, Risk.IRREVERSIBLE) # unknown = strict
if risk is Risk.REVERSIBLE:
return await executor(action, payload)
if risk is Risk.RECOVERABLE_WITH_AUDIT:
# auto-approve but write a full audit trail with a "revert" link
await approver.audit(action, payload)
return await executor(action, payload)
# IRREVERSIBLE: block until a human says yes (or times out)
decision = await approver.request(
action=action,
payload=payload,
timeout_seconds=300,
)
if decision.approved:
return await executor(action, payload)
return {"status": "denied", "reason": decision.reason}
吾approver界面或为 Slack 消息,或为接入管理界面之 Web hook,或为开发时之 CLI 提示。其要在于此。非正道之讯非代理自身所能答。以提示注入之代理,不能自许其购,盖许购之境非其行域也。
调二事:
-
固守不可逆之常态于未知之行。其线
ACTION_RISK.get(action, Risk.IRREVERSIBLE)安知安行之别,与子夜之页乎? - 设硬时限。若五分钟内无人核准,则事不果。否则,代理人可滞留候眠评者,而汝之工役队列遂壅。
了然:勿唯器之名而分类。db.execute若为可逆,则可SELECT且不可逆若其为DELETE器具泛化时,宜使负载经次分类器而通。
架构参考——容器+中介+门__
三者合构。代理行于暂存之器内;中介踞器外,代传所有外发凭信;门居模型谋算之环与诸器执行间__
+--------------------+
| Orchestrator |
| (your service) |
+----------+---------+
|
| spawn session
v
+----------+---------+ +-----------------+
| Ephemeral | | Credential |
| Container |<---->| Broker |
| (Chromium + tools)| | (vault facade) |
+----------+---------+ +--------+--------+
| |
| HTTPS via egress proxy |
v v
+--------------------+ +-----------------+
| Egress Proxy |<-----| Audit Log |
| (token resolver) | | (immutable) |
+----------+---------+ +-----------------+
|
v
Internet
总司者决其事之需阖,经纪者决其会之可索之凭,代行者决其器之可及之主。三权独立,若一权失,余二权犹存.
尚漏何物
诚言此法所不及:
- 模型于思辨中所显之屏文。若使代理人于纸上阅得顾客之名,而书曰"此用户似为约翰·史密斯也"约翰·例姆普劳尔@邮箱地址"在其规划中,那串信息现已在汝之轨迹导出。于轨迹处理器处删改之(样本处理已涵盖于内)。"《大语言模型可观测性之书》,另发一帖。
- 旁道时序。 必勇之攻者,或可由迟滞度推知凭据之长。此非多数应用之范畴;若输送于规管之业,则属其域。
- 模型之记忆。 若于运行间恒存会话之要,则会话N之漏,可复现于会话N+1。视记忆为迹之输出——存前必删。
出境之控——此乃众队所忘之部
汝之沙盒,其坚若网略之周遭。一容器能curl任意之主机,则可窃取。出站之代理,不可商榷。
最低之略:默认拒之,按会话允之,依域而定。
# Network policy — denied to anything except the proxy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: agent-sandbox-egress
spec:
podSelector:
matchLabels:
role: agent-sandbox
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
role: egress-proxy
ports:
- protocol: TCP
port: 8443
此代理自运一许列,以会元之元为钥。一会有责于“撮要Stripe之仪表”,则得api.stripe.com,得dashboard.stripe.com,除此之外,别无他物。不得pastebin.com,不得discord.com,亦不得汝之内部管理之面板.
亦阻绝代理外之DNS解析——否则,一巧慧之提示注入,可借DNS查询以窃数据,纵HTTP已严锁。
使用计算机之代理,非新患也,乃旧患而败速耳。视此代理如可信之实习生,授意而不授断。为其设净室,次第授钥,遇不可逆之事,则立于门侧以观。
汝所睹最甚之泄,或险而免者,自代理得真浏览器始,言于注。
若此有益
吾等AI Agents 袖珍指南深究代理之约束,兼及凭证中介之模式、核准之关隘决断之树,并章于最小权限之器物设计。若尔运送计算机使用之代理,则器物范围之极小化与流出政策之配对,直合于前述。













