惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

www.infosecurity-magazine.com
www.infosecurity-magazine.com
Security Archives - TechRepublic
Security Archives - TechRepublic
TaoSecurity Blog
TaoSecurity Blog
Cloudbric
Cloudbric
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
N
News and Events Feed by Topic
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
S
Securelist
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
D
DataBreaches.Net
S
Schneier on Security
L
LangChain Blog
Jina AI
Jina AI
M
MIT News - Artificial intelligence
Recent Announcements
Recent Announcements
T
Tenable Blog
B
Blog RSS Feed
V
Visual Studio Blog
Simon Willison's Weblog
Simon Willison's Weblog
G
Google Developers Blog
T
The Exploit Database - CXSecurity.com
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
WordPress大学
WordPress大学
W
WeLiveSecurity
I
InfoQ
The Hacker News
The Hacker News
雷峰网
雷峰网
月光博客
月光博客
P
Privacy & Cybersecurity Law Blog
O
OpenAI News
Hacker News: Ask HN
Hacker News: Ask HN
T
Threat Research - Cisco Blogs
GbyAI
GbyAI
The Last Watchdog
The Last Watchdog
P
Privacy International News Feed
Cyberwarzone
Cyberwarzone
S
SegmentFault 最新的问题
L
Lohrmann on Cybersecurity
人人都是产品经理
人人都是产品经理
V
V2EX
V
Vulnerabilities – Threatpost
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Cybersecurity and Infrastructure Security Agency CISA
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
T
Troy Hunt's Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
阮一峰的网络日志
阮一峰的网络日志
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog

CSO Online

New malware turns Linux systems into P2P attack networks Poisoned truth: The quiet security threat inside enterprise AI Train like you fight: Why cyber operations teams need no-notice drills Die besten DAST- & SAST-Tools CISA mulls new three-day remediation deadline for critical flaws CISA pushes critical infrastructure operators to prepare to work in isolation CISOs step up to the security workforce challenge 10 Anzeichen für einen schlechten CSO Anthropic Mythos spurs White House to weigh pre-release reviews for high-risk AI models Security agencies draw red lines around agentic AI deployments The fake IT worker problem CISOs can’t ignore How CISOs should utilize data security posture management to inform risk Was ist ein Botnet? Human-centric failures: Why BEC continues to work despite MFA Just 34% of cyber pros plan to stick with their current employer Managing OT risk at scale: Why OT cyber decisions are leadership decisions 4 ways to prepare your SOC for agentic AI ‘Trivial’ exploit can give attackers root access to Linux kernel Bank regulator sounds warning over cybersecurity threat posed by AI models Dismantle implicit trust in OT networks, CISA tells critical infrastructure operators Max-severity RCE flaw found in Google Gemini CLI Stopping the quiet drift toward excessive agency with re-permissioning ODNI to CISOs on threat assessments: You’re on your own 10 wichtige Security-Eigenschaften: So setzen Sie die Kraft Ihres IT-Sicherheitstechnik-Teams frei Researchers unearth industrial sabotage malware that predated Stuxnet by 5 years AWS leans on prior ingenuity to face future AI and quantum threats What it takes to win that CSO role Third Party Risk Management: So vermeiden Sie Compliance-Unheil Critical Cursor bug could turn routine Git into RCE Securing RAG pipelines in enterprise SaaS What CISOs need to get right as identity enters the agentic era Stopping AiTM attacks: The defenses that actually work after authentication succeeds EDR-Software – ein Kaufratgeber Microsoft patched an ‘agent-only’ role that was not AI is reshaping DevSecOps to bring security closer to the code The 'manager of agents': How AI evolves the SOC analyst role 4 Wege aus der Security-Akronymhölle Autonome KI-Agenten: Strategien für die neue Bedrohungslage New US House privacy bills raise hard questions about enterprise data collection Scattered Spider co-conspirator pleads guilty Security-KPIs und -KRIs: So messen Sie Cybersicherheit Bitwarden CLI password manager trojanized in supply chain attack 3 practical ways AI threat detection improves enterprise cyber resilience The curious case of Sean Plankey’s derailed CISA nomination Google gets agent-ready for the Mythos age Google drafts AI agents secure systems against AI hackers CNAPP – ein Kaufratgeber Riddled with flaws, serial-to-Ethernet converters endanger critical infrastructure NFC tap-to-pay gets tapped by hackers Anthropic bets on EPSS for the coming bug surge SBOM erklärt: Was ist eine Software Bill of Materials? Thousands of Apache ActiveMQ instances still unpatched, weeks after an actively exploited hole discovered Prompt injection turned Google’s Antigravity file search into RCE Why identity is the driving force behind digital transformation Top techniques attackers use to infiltrate your systems today The thin gray line: Handala, CyberAv3ngers and Iran’s proxy ops Attackers abuse Microsoft Teams to impersonate the IT helpdesk in a new enterprise intrusion playbook CISOs reshape their roles as business risk strategists Copilot & Agentforce offen für Prompt-Injection-Tricks Claude Mythos – ist der Hype gerechtfertigt? Für Cyberattacken gewappnet – Krisenkommunikation nach Plan Critical sandbox bypass fixed in popular Thymeleaf Java template engine White House moves to give federal agencies access to Anthropic’s Claude Mythos Another Microsoft Defender privilege escalation bug emerges days after patch Palo Alto’s Helmut Reisinger sees a cyber sea change ahead as AI advances Positiv denken für Sicherheitsentscheider: 6 Mindsets, die Sie sofort ablegen sollten NIST cuts down CVE analysis amid vulnerability overload Was bei der Cloud-Konfiguration schiefläuft – und wie es besser geht The endless CISO reporting line debate — and what it says about cybersecurity leadership Behind the Mythos hype, Glasswing has just one confirmed CVE Insurance carriers quietly back away from covering AI outputs RCE by design: MCP architectural choice haunts AI agent ecosystem Critical nginx UI tool vulnerability opens web servers to full compromise Copilot and Agentforce fall to form-based prompt injection tricks The deepfake dilemma: From financial fraud to reputational crisis 7 biggest healthcare security threats The need for a board-level definition of cyber resilience Mallory Launches AI-Native Threat Intelligence Platform, Turning Global Threat Data Into Prioritized Action 13 Fragen gegen Drittanbieterrisiken April Patch Tuesday roundup: Zero day vulnerabilities and critical bugs 4 questions to ask before outsourcing MDR 5 trends defining the future of AI-powered cybersecurity EU regulators largely denied access to Anthropic Mythos China-linked cloud credential heist runs on typos and SMTP How AI is transforming threat detection The AI inflection point: What security leaders must do now Cyber-Inspekteur: Hybride Attacken nehmen weiter zu Anthropic’s Mythos signals a structural cybersecurity shift Seven IBM WebSphere Liberty flaws can be chained into full takeover Old Docker authorization bypass pops up despite previous patch Hacker Unknown now known, named on Europol’s most-wanted list The cyber winners and losers in Trump’s 2027 budget CMMC compliance in the age of AI Claude uncovers a 13‑year‑old ActiveMQ RCE bug within minutes Was CISOs von Moschusochsen lernen können Hackers have been exploiting an unpatched Adobe Reader vulnerability for months New ClickFix variant bypasses Apple safeguards with one‑click script execution Cloudflare ‘actively adjusting’ quantum priorities in wake of Google warning Patch windows collapse as time-to-exploit accelerates So geht Post-Incident Review
5 runtime signals for catching a compromised AI agent
Ax Sharma · 2026-06-15 · via CSO Online

Once a signal of exploitation risk, Willison’s ‘lethal trifecta’ describes the baseline operations of every AI agent today. As a result, agent security is no longer architectural. Here’s what to watch for instead.

In June 2025, Simon Willison, the engineer who coined the term “prompt injection,” published a warning that circulated widely through the security community. He called it the lethal trifecta — three capabilities that, when combined in a single AI agent, create a near-guaranteed path to exploitation through indirect prompt injection: access to private data; exposure to untrusted content; the ability to communicate externally.

The framing was sharp and useful. If your agent reads your email, ingests arbitrary web content, and can make outbound requests, an attacker who embeds malicious instructions anywhere in that content pipeline can direct the agent to exfiltrate your data without you ever knowing. Willison illustrated the point with a long list of real production exploits: Microsoft 365 Copilot, GitHub’s MCP server, GitLab Duo, Slack AI, Google Bard, Amazon Q. The same class of attack, over and over.

The trifecta worked as a signal because, at the time, agents were mostly narrowly scoped. An agent capable of performing only one or two of the lethal trifecta activities could be assessed as lower risk. Avoiding the combination felt like a viable design strategy.

That window has closed given what practitioners deploy today: A customer-facing support agent reads ticket histories and customer records, ingests user messages and attached files, and calls CRMs, refund APIs, or ticketing systems. An email AI reads your inbox and calendar, processes inbound messages from strangers, and sends replies on your behalf.

Rather than being edge cases or poorly designed deployments, these are the agents enterprises and individuals actually want, and they’re the ones vendors are building toward.

Lethal trifecta as default configuration

Ross McKerchar, CISO at Sophos, put it plainly in a piece published this May: “the capabilities practitioners actually want (read my data, understand external context, take action) push firmly into dangerous territory. This isn’t a misconfiguration; it’s the architectural cost of usefulness.” He’s right. An agent without private data access is useless, one that can’t process external content is isolated, and the one that can’t communicate externally is inert. Strip any leg of the trifecta and you have something closer to a search box than an agent.

If every legitimate agent architecture exhibits all three trifecta properties, the trifecta is no longer a meaningful indicator of elevated risk. It’s the default configuration. Treating it as a red flag is like treating DNS resolution as a signal of network compromise. Technically true in some threat models, but universally present in every real deployment.

McKerchar’s piece frames the response as “blast radius reduction”: a reasonable operational philosophy, but one that accepts the trifecta as a given condition rather than a preventable one. That’s a reasonable call. The question is what comes after the acceptance.

Meta’s security team arrived at the same conclusion from the other direction. In October 2025, they published the “Rule of Two,” a framework that recommends agents satisfy no more than two of the three trifecta properties in a single session, with human-in-the-loop approval required if all three are necessary. Willison himself endorsed the framework as “the best practical advice for building secure LLM-powered agent systems today.”

Meta’s limitations section, however, concedes that many sought-after use cases won’t fit the framework cleanly, and that “designs that satisfy the Agents Rule of Two can still be prone to failure.” That’s not a criticism of the framework but confirmation that the problem has outgrown the architecture-level solution.

The scale of exposure is no longer theoretical. Google’s April 2026 sweep of the Common Crawl repository found prompt injection attempts across public web pages, ranging from pranks to data exfiltration payloads, with malicious attempts up 32% between November 2025 and February 2026. Google noted sophistication remains low for now but flagged the trend as a signal of maturing attacker interest.

The environment the trifecta warned about has arrived.

How to sleuth out a compromised agent

If the trifecta describes nearly every deployed agent, practitioners need signals that distinguish compromised behavior from normal operation within a trifecta-exhibiting system. That means shifting from architecture-level assessments to runtime behavioral detection.

The production evidence arrived in a cluster. From Jan. 7 to Jan. 15, 2026, researchers disclosed exploits against four separate AI productivity tools in eight days: IBM Bob, Superhuman AI, Notion AI, and Anthropic’s Claude Cowork. Each used indirect prompt injection to exfiltrate data via a channel the agent had legitimate access to. In the Cowork case, a hidden prompt embedded in an uploaded document directed the agent to exfiltrate files via Anthropic’s own allowlisted API domain, invisible to any perimeter control and indistinguishable from normal agent behavior until the data was already gone. In all of these cases, the trifecta wasn’t a risk factor but the operating condition.

Here’s what’s worth watching to detect an agent has been compromised.

Instruction-following anomalies. A compromised agent doesn’t usually do something structurally different from a healthy one. Following instructions is its normal function. The difference is whose instructions it’s following. Look for agent actions that have no plausible correspondence to a user-initiated task. An agent that was asked to summarize a quarterly report but then attempts an outbound DNS request to an unfamiliar domain didn’t spontaneously decide to do that. Something in the content it ingested told it to.

Tool call sequences that break expected topology. In a well-designed agent system, the graph of tool calls for any given task should be relatively predictable. A coding agent invoked to fix a bug should touch files, run tests, perhaps check documentation. It shouldn’t be reaching for email or calendar APIs. Tool call sequences that cross expected workflow boundaries are worth flagging even when each individual call looks legitimate on its own.

Exfiltration via low-bandwidth channels. The classic prompt injection exfiltration attack routes stolen data through a mechanism the agent has legitimate access to: a rendered image URL with encoded query parameters, an API call with data embedded in a parameter, a link in a generated document. These don’t look like data theft in isolation; they look like normal agent output. Detection requires correlating what data the agent had access to against what it embedded in its output. That requires end-to-end visibility into the agent’s actions, not just the final response.

Credential and secret access outside task scope. If an agent with legitimate access to a secrets store or key vault touches credentials that have no relationship to the current task, that’s a signal. An agent fixing a React rendering bug should likely not be reading AWS credentials. Least-privilege scoping is the architectural defense here, but monitoring for out-of-scope credential access is the detection layer that catches failures in that scoping.

Memory-write anomalies. Agents with persistent memory are a growing attack surface. A poisoned memory entry that looks like legitimate user context but contains dormant trigger instructions can persist across sessions and fire long after the initial injection. Monitoring for memory-writes containing instruction-like content, or writes made during sessions that ingested untrusted content, is worth adding to any agent observability pipeline.

Runtime alone can address the agent redirection threat

For practitioners operating production agent infrastructure, the lethal trifecta tells you what you know: Your agents are exposed. The question is what to do about it.

The answers are at the runtime layer, not the architecture layer. That’s where EDR and SIEM live for traditional infrastructure — agents need the same instrumentation, and most deployments don’t have it yet. Full execution traces on every agent invocation. Tool call anomaly detection. Input screening at ingest. Credential access monitoring scoped to task context. Memory-write auditing. Not a human attacker logging in. An agent that’s been quietly redirected.

Willison’s trifecta was the right alarm for its moment, which was last year. Almost every production agent now fits the profile. Because of that, only runtime anomaly detection can potentially provide adequate defense. The above signals are a good place to start.

SUBSCRIBE TO OUR NEWSLETTER

From our editors straight to your inbox

Get started by entering your email address below.