Vectra AI

‍Update May 12, 2026: GTIG's May 2026 AI threat tracker documents PROMPTSPY, an Android backdoor that operationalizes prompt-based command and control in the wild. Its GeminiAutomationAgent module serializes the device's UI hierarchy, sends it to the Gemini API alongside an attacker-supplied objective, and parses the model's response into executable gestures on the device. The prompt is the instruction set. The model is the interpreter. The C2 channel runs through the reasoning loop, exactly the pattern described below.

---

Traditional command-and-control is explicit. An infected system reaches out, receives instructions, executes them, and reports back. Even when encrypted, the structure remains. Something external is directing behavior.

Autonomous agents change that model.

They do not wait for instructions in the same way. They continuously ingest input, interpret it, and act. Emails, chats, APIs, documents… everything becomes context, and everything can influence behavior.

This creates a different control surface.

An attacker no longer needs a persistent channel if they can shape what the agent sees, remembers, and prioritizes.

Control becomes indirect, continuous, embedded in normal operation.

This is the basis of prompt control.

Recent research has already demonstrated prompt-based command-and-control frameworks where compromised agents receive tasks, execute them, and return results using only prompts and context, without traditional C2 infrastructure.

From Prompt Injection to Prompt Control

In these samples, agents trust external content. They execute tasks with real privileges. They coordinate across systems.

Each of these expands the attack surface.

Early security discussions focused heavily on prompt injection. A malicious instruction embedded in content triggers an unintended action.

That explains entry but it doesn’t explain persistence.

In recent demonstrations, a single prompt injection delivered through email or web content was enough to compromise an agent and modify its working context. From that point forward, the agent continued to retrieve attacker-controlled instructions from its own environment, effectively maintaining control without requiring re-exploitation.

A recent OpenClaw investigation showed how a single indirect prompt injection embedded in a webpage could do more than trigger one action. It invoked an execution tool, then planted instructions into the agent’s future context, allowing the attacker to continue issuing commands over time without re-accessing the system.

*Screenshot of the researchers prompt that instructs OpenClaw to retrieve and execute a malicious bash script*

The initial injection disappears but the influence remains.

Prompt control shapes how the system continues to behave after the initial interaction.

Prompt Control as Behavioral Influence

Prompt control steers behavior without issuing direct commands.

Instead of sending instructions, the attacker shapes what the agent treats as relevant and how it builds context. The agent then acts using its existing capabilities and permissions.

This follows the same principle as social engineering: influence the decision-maker, and the decision-maker executes the action.

The difference is scale and persistence. Agents operate continuously and rely on whatever context is available, even when that context has been shaped adversarially.

Prompt-Based Command and Control in Practice

Prompt control is not just influence, it can be operationalized.

Recent research shows how compromised agents can be enrolled into a centralized control system where tasks are issued as prompts and results are returned through normal agent workflows.

Once an agent is compromised, it does not need to be re-accessed. Instructions are persisted in the same places the agent already uses to operate: files, memory, and retrieved context. Execution loops become control loops.

Attackers issue tasks as prompts. The agent executes them using its existing permissions and returns results through normal workflows.

In one example, agents were configured to read a “heartbeat” file at regular intervals. By inserting malicious instructions into that file, attackers created a recurring execution point. Each time the agent processed the file, it retrieved new instructions and continued operating under attacker influence.

This mirrors traditional C2 behavior. The difference is that the communication channel is not traditional network beaconing. It is embedded in the agent’s own reasoning loop and execution paths.

Control shifts into what can be described as a cognitive control plane, where influence operates through:

Files the agent periodically reads
Memory stores used for retrieval
External content sources the agent trusts
Tool outputs feeding back into reasoning

Prompt Control as a Form of Persistence

In agent systems, persistence is not an implant. It is context that continues to be reloaded: memory entries, configuration files, or external sources the agent repeatedly consults. As long as that context remains, the control remains.

In practice, persistence is a context engineering problem. The challenge is not writing one malicious prompt, but getting the right instructions into the right context layer, in the right format, with enough precedence that they are repeatedly loaded and acted on. Modern agent frameworks already manage this holistic state through memory files, rules, agent configuration files, and scheduled or background re-entry points.

OpenClaw highlights how this plays out in practice. Agent memory stores often treat all inputs the same, regardless of source. Once malicious context is introduced, it can persist and continue influencing decisions with no distinction in trust.

Removing the attacker’s access does not remove the effect. If the agent continues to read attacker-influenced context, control persists.

In observed cases, this persistence survived restarts and continued until the underlying context was explicitly cleaned.

MITRE ATLAS and Continuous Influence

One important nuance is that prompt control is not deterministic. Agent behavior depends on probabilistic reasoning, context selection, and retrieval quality. The same prompt may produce different outcomes across runs, and attacks may partially succeed, fail, or require repetition.

From an attacker’s perspective, this introduces variability rather than preventing exploitation. Control becomes probabilistic: repeated influence, reinforcement, and multiple execution paths increase the likelihood of success over time.

Agents may also surface signs of compromise. In some observed cases, agents identified suspicious instructions or anomalous behavior during self-analysis or logging. These can act as early indicators of compromise. However, most agents are not yet trained or configured to treat these signals as security events or trigger defensive actions.

This is likely to evolve. As detection logic becomes embedded into agents themselves, these weak signals could become meaningful controls. For now, they remain inconsistent and rarely enforced.

MITRE ATLAS describes several relevant techniques:

Data poisoning influences inputs
Prompt injection overrides behavior
Model manipulation steers outputs

What changes in agent systems is not the techniques themselves, but how they combine. Prompt injection becomes the entry point, memory or context manipulation provides persistence, and tool use enables execution. Together, they function as a continuous control loop rather than isolated steps.

MITRE ATLAS' OpenClaw attack graph shows how these techniques combine in practice. Rather than a linear sequence, influence, execution, and persistence are interconnected and can reinforce each other across the agent lifecycle.

When Control Blends into Normal Activity

From a detection perspective, this doesn’t behave like traditional compromise.

Most SOC pipelines focus on execution artifacts such as network anomalies, process behavior, credential misuse, or lateral movement. Prompt control often doesn’t trigger these signals early.

Agents operate with valid access, call approved APIs, and follow expected workflows. From a technical standpoint, activity appears normal.

The difference is in how behavior evolves. The agent is not executing attacker commands, it is making decisions that happen to align with attacker objectives.

In one demonstration, an agent was asked to summarize a document containing an indirect prompt injection. The user received a normal response in Slack, with no indication anything was wrong. At the same time, the compromised agent began sending sensitive data to an attacker-controlled Telegram bot.

To the user, the system behaves correctly. To the attacker, it is already controlled.

The same access can be used for impact. Agents can retrieve data, modify it, or delete it using the permissions they were given to be useful.

Individual actions make sense. The overall pattern drifts.

There’s no single alert that explains the behavior. The signal emerges over time.

Detection needs to focus less on isolated events and more on how activity connects across identity, network, cloud, and SaaS environments.

This is the core challenge. When control is embedded in context, there is no single point to block. The only reliable signal is how behavior changes over time.

The Vectra AI Platform correlates behavior across these domains to identify coordination, misuse, and subtle deviations that don’t appear in individual alerts, providing visibility into how activity develops rather than relying on a single point of failure.

推荐订阅源

Vectra AI Blog