


















System prompt leakage refers to the risk that the system prompts or hidden instructions used to steer an LLM’s behavior may contain sensitive information that was not intended to be disclosed.
System prompts guide model behavior according to application requirements. However, they are not secure storage mechanisms and must not be treated as secrets or security controls.
Importantly, the real risk lies in what the system prompt contains and what the application improperly delegates to it.
Attackers can often infer guardrails and formatting rules simply by interacting with the model, even without directly extracting the exact wording of the system prompt.
System prompts can be influenced by prompt injection. They may be extracted through meta-prompt techniques. They are not deterministic enforcement mechanisms and should not contain sensitive operational data
If sensitive information (credentials, API keys, role definitions, connection strings) is embedded in system prompts, its exposure is a design failure, not merely a leakage event.
Additionally, if authorization rules or privilege logic are implemented inside the system prompt rather than in deterministic back-end systems, the architecture itself is insecure.
System prompts may reveal API keys, database credentials, user tokens, system architecture details, tool configuration or back-end technologies. For example, If a prompt reveals the type of database being used, attackers may tailor injection attacks accordingly.
System prompts may describe internal thresholds or decision rules. For example:
Attackers can use this knowledge to manipulate workflows, bypass controls or target logic weaknesses.
A system prompt may instruct: “If a user requests information about another user, respond with, ‘Sorry, I cannot assist.’”
Knowing this rule allows attackers to craft bypass strategies.
System prompts may reveal role definitions such as: “Admin users have full access to modify records.”
Attackers may then attempt privilege escalation.
A system prompt contains credentials for accessing a tool. An attacker extracts the prompt and uses those credentials independently to access back-end systems.
A system prompt prohibits offensive content, external links and code execution. An attacker extracts these rules and then uses prompt injection techniques to override or bypass them, potentially enabling remote code execution.
Disclosure of the system prompt is not the core vulnerability. The core vulnerability is storing sensitive data where it does not belong, delegating authorization logic to an LLM and relying on prompt text for enforcement of critical controls. Even if the exact wording of the system prompt remains hidden, attackers can often reverse-engineer guardrails through interaction.
Never embed API keys, authentication tokens, database names, role structures or permission mappings. Sensitive information must reside in secure backend systems inaccessible to the model.
LLMs are vulnerable to prompt injection. Critical controls (e.g., content filtering, policy enforcement) must be implemented outside the LLM in deterministic systems.
Use independent systems to inspect model outputs, validate compliance and enforce content restrictions. Model training alone is not sufficient.
Authorization, privilege separation, and access control must occur outside the LLM. It must be deterministic and auditable and not rely on model reasoning. If agents require different privilege levels, use separate agents configured with least privilege.
Treat system prompts as configuration hints, not security boundaries. Security must be enforced at the application layer in back-end systems through deterministic access control mechanisms. LLMs are probabilistic systems. Authorization and security enforcement must not be probabilistic.
System prompt leakage highlights a deeper issue. If leaking the system prompt breaks your security model, the architecture is already flawed. Do not store secrets in prompts. Do not rely on prompts for access control and do not treat hidden instructions as security mechanisms.
Security must exist outside the model.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。