Guardrails for Agent Output: Pluggable Validation Before and After LLM Calls

One of the harder problems in agent systems is constraining output quality without turning every prompt into a wall of instructions. You can ask the LLM to stay under 3000 characters, or to always include a conclusion section, or to never mention competitor products. But prompt-based constraints are probabilistic. The LLM might follow them. It might not.

Guardrails are the deterministic layer. They run as Java code before and after the LLM call, and they enforce rules that prompts cannot guarantee.

The Model

AgentEnsemble implements guardrails as two functional interfaces: InputGuardrail and OutputGuardrail. Both return a GuardrailResult -- either success or failure with a reason.

Input guardrails run before the LLM is contacted. If any fails, execution stops immediately and the agent's LLM is never called. Output guardrails run after the agent produces a response (and after structured output parsing, if configured).

InputGuardrail piiGuardrail = input -> {
    String desc = input.taskDescription().toLowerCase();
    if (desc.contains("ssn") || desc.contains("credit card")) {
        return GuardrailResult.failure(
            "Task description may contain personally identifiable information");
    }
    return GuardrailResult.success();
};

OutputGuardrail lengthGuardrail = output -> {
    if (output.rawResponse().length() > 3000) {
        return GuardrailResult.failure(
            "Response is " + output.rawResponse().length()
            + " chars, exceeds limit of 3000");
    }
    return GuardrailResult.success();
};

Both are configured per-task:

var task = Task.builder()
    .description("Write an executive summary")
    .expectedOutput("A concise summary")
    .agent(writer)
    .inputGuardrails(List.of(piiGuardrail))
    .outputGuardrails(List.of(lengthGuardrail))
    .build();

Why Functional Interfaces

The choice to make guardrails functional interfaces rather than annotation-based or configuration-driven has a few practical consequences.

First, guardrails are composable. You can build them from lambdas, combine them, or wrap them in utility methods. A guardrail that checks for PII can be reused across every task in the ensemble without any framework-specific wiring.

Second, they are testable in isolation. A guardrail is a pure function from input to result. You can unit test it without standing up an ensemble or mocking an LLM.

Third, they are stateless by default. Since guardrails may run concurrently (in parallel workflows), stateless lambdas are inherently thread-safe. If you need stateful validation, thread safety is your responsibility.

What Input Guardrails See

The GuardrailInput record carries everything you need to make a pre-execution decision:

taskDescription() -- the task description text
expectedOutput() -- the expected output specification
contextOutputs() -- outputs from prior context tasks (immutable)
agentRole() -- the role of the agent about to execute

This means you can write guardrails that check not just the current task, but the outputs of upstream tasks. For example, a guardrail that rejects a writing task if the research task upstream produced no findings:

InputGuardrail requireResearch = input -> {
    boolean hasResearch = input.contextOutputs().stream()
        .anyMatch(o -> o.getRaw().length() > 100);
    if (!hasResearch) {
        return GuardrailResult.failure("No substantive research output found");
    }
    return GuardrailResult.success();
};

Output Guardrails and Typed Output

When a task uses outputType for structured output, the execution order is:

Input guardrails run (before LLM)
LLM executes and produces raw text
Structured output parsing (JSON extraction + deserialization)
Output guardrails run (with both rawResponse() and parsedOutput() available)

This means output guardrails can inspect the typed Java object directly:

record ResearchReport(String title, List<String> findings, String conclusion) {}

OutputGuardrail findingsGuardrail = output -> {
    if (output.parsedOutput() instanceof ResearchReport report) {
        if (report.findings() == null || report.findings().isEmpty()) {
            return GuardrailResult.failure(
                "Report must include at least one finding");
        }
    }
    return GuardrailResult.success();
};

This is where guardrails and typed outputs reinforce each other. The type system gives you a parsed object; the guardrail gives you a place to enforce business rules on that object.

Multiple Guardrails and Evaluation Order

Multiple guardrails per task are evaluated in order. The first failure stops evaluation -- subsequent guardrails are not called.

var task = Task.builder()
    .description("Write an article")
    .expectedOutput("An article")
    .agent(writer)
    .inputGuardrails(List.of(piiGuardrail, roleGuardrail, domainGuardrail))
    .outputGuardrails(List.of(lengthGuardrail, conclusionGuardrail))
    .build();

If you want to collect all failures rather than short-circuit, compose them into a single guardrail:

InputGuardrail compositeGuardrail = input -> {
    List<String> failures = new ArrayList<>();
    for (InputGuardrail g : List.of(piiGuardrail, roleGuardrail)) {
        GuardrailResult r = g.validate(input);
        if (!r.isSuccess()) failures.add(r.getMessage());
    }
    return failures.isEmpty()
        ? GuardrailResult.success()
        : GuardrailResult.failure(String.join("; ", failures));
};

Exception Propagation

When a guardrail fails, GuardrailViolationException is thrown. It propagates through the workflow executor and is wrapped in TaskExecutionException, following the same pattern as other task failures.

The exception carries structured information -- guardrail type (INPUT or OUTPUT), violation message, task description, and agent role -- so you can route failures to metrics or alerting without parsing error strings.

try {
    ensemble.run();
} catch (TaskExecutionException ex) {
    if (ex.getCause() instanceof GuardrailViolationException gve) {
        metrics.increment("guardrail.violation." + gve.getGuardrailType());
        log.warn("Guardrail blocked task '{}': {}",
            gve.getTaskDescription(), gve.getViolationMessage());
    }
}

The Tradeoff

Guardrails are deterministic checks, not semantic analysis. A length limit is easy to enforce. A toxicity check is harder -- you would need to call an external classifier inside the guardrail, which adds latency and its own failure modes.

The design intentionally keeps guardrails as simple synchronous functions. If you need async validation, external API calls, or retry logic, you implement that inside the guardrail function. The framework does not impose an opinion on how complex your validation should be.

This means guardrails are most useful for structural and policy checks -- length limits, required sections, PII filters, role-based access, schema validation on typed outputs. For semantic quality checks, the phase review and task reflection mechanisms (covered in earlier posts) are a better fit.

The full guardrails guide is in the AgentEnsemble documentation.

I'd be interested in whether the input/output split feels like the right abstraction, or whether you have seen validation needs that do not fit cleanly into either category.

推荐订阅源

DEV Community