GitHub - wdh107/agent-audit-trail: The open specification and reference SDK for recording AI Agent decision chains. Every decision, recorded. Every alternative, documented.

Agent Audit Trail Format (AATF)

Every Agent decision, recorded. Every alternative, documented.

The open specification and reference SDK for recording AI Agent decision chains.

Quick Start · The Format · Why Not Existing Tools? · SPEC · Examples

What Is This?

AATF is not another logging library. It's an open specification for recording why an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do.

Think of it as:

OpenTelemetry → for observability
AATF → for Agent decision accountability

User asks: "Book a flight to Shanghai"

Step 1: [human_input]  → User request received
Step 2: [reasoning]    → Intent: flight booking (confidence: 0.95)
                          Alt: hotel booking → rejected (user said "flight")
                          Alt: train booking → rejected (user said "flight")
Step 3: [tool_call]    → flight_search_api (342ms) → 3 results
Step 4: [reasoning]    → Decision: CA1234 at ¥2580 (confidence: 0.88)
                          Alt: MU5678 at ¥2890 → rejected (¥310 more)
                          Alt: CZ9012 at ¥3200 → rejected (over budget)

→ SHA-256 hash chain: ✓ tamper-evident
→ PII redaction: ✓ email, phone, card numbers
→ Export: JSON / CSV / HTML (AATF-compliant)

Quick Start (5 Lines)

from agent_audit_trail import AuditSession, Decision, Alternative

with AuditSession(agent_id="my-agent") as session:
    session.add_reasoning_step(
        name="choose_tool",
        decision=Decision(
            input_summary="User wants weather info",
            decision="Use weather API",
            reasoning="Factual query requiring real-time data",
            confidence=0.95,
            alternatives_considered=[
                Alternative(description="Answer from memory",
                           reason_rejected="Weather changes constantly"),
                Alternative(description="Ask for clarification",
                           reason_rejected="Query is clear enough"),
            ]
        )
    )

That's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format.

The AATF Format

The heart of AATF is the Decision record:

{
  "type": "reasoning",
  "name": "intent_classification",
  "decision": {
    "input_summary": "User wants to book a flight to Shanghai",
    "decision": "Classified as flight-booking intent",
    "reasoning": "Explicit keywords: 'flight' + destination + budget",
    "confidence": 0.95,
    "confidence_basis": "All three slots explicitly stated by user",
    "alternatives_considered": [
      {
        "description": "Hotel booking intent",
        "reason_rejected": "User said 'flight', not 'hotel'",
        "score": 0.05
      },
      {
        "description": "Train booking intent",
        "reason_rejected": "User explicitly said 'flight'",
        "score": 0.02
      }
    ]
  },
  "step_hash": "458942bbf4162f4d9cca121d93b9423413ec..."
}

Three things no other format captures:

Feature	What It Does	Why It Matters
`alternatives_considered`	Forces agents to list what they didn't choose	Proves the agent didn't just rationalize a foregone conclusion
`confidence` + `confidence_basis`	Numeric confidence + how it was determined	Lets auditors distinguish "95% sure because X" from "95% sure because vibes"
`confidence_trajectory`	Tracks confidence across the full decision chain	Reveals when an agent becomes more or less certain as it gathers information

Why Not Existing Tools?

We respect the existing ecosystem. Here's where AATF fits:

Tool	What It Does	What AATF Does Differently
Blockchain ledgers (Notary, Action Ledger)	Store agent actions on-chain for immutability	We're format-agnostic. Store wherever you want. We focus on what to record, not where.
LangChain callbacks	Framework-specific tracing	We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything.
MCP audit tools	Audit tool calls in MCP protocol	We go deeper: not just what tool was called, but why it was chosen over alternatives.
General logging (structlog, etc.)	Key-value event logs	We're structured for decision reasoning, not generic events.

TL;DR: Other tools audit what the agent did. AATF audits why the agent did it.

Integrations

# LangChain
from agent_audit_trail.integrations.langchain import AATFCallbackHandler
agent = create_agent(callbacks=[AATFCallbackHandler()])

# OpenAI
from agent_audit_trail.integrations.openai import AATFOpenAIWrapper
client = AATFOpenAIWrapper(OpenAI())

# Generic decorator (any framework)
from agent_audit_trail import audit_traced
@audit_traced(agent_id="my-agent")
def my_agent_function(query):
    return "answer"

Installation

pip install agent-audit-trail

Zero external dependencies. Python 3.10+. 700 lines of pure stdlib.

Real Self-Audit Example

We used AATF to audit ourselves — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized.

📄 View the full audit trail JSON

AATF is an open specification, not a product. The SDK is the reference implementation.

📋 Read the full AATF v0.1.0 Specification

This is a draft spec. We want your feedback. Open an issue if you disagree with any design decision. Especially:

Should alternatives_considered be mandatory or optional?
Is confidence (0.0-1.0) the right abstraction, or should we use qualitative labels?
What hash algorithm should be standard? (Currently SHA-256)
Should the format support streaming/traces that are still in-progress?

Who Is This For?

Role	What You Get
Agent Developer	Prove your agent reasons well. Debug decision failures. Show stakeholders the full chain.
Compliance Officer	Machine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements.
CISO	Tamper-evident hash chains. PII redaction built-in. Export for auditors.
Researcher	Structured data on agent reasoning patterns. Confidence trajectories. Decision trees.

Project Status

✅ AATF Specification v0.1.0
✅ Reference SDK (Python) — 134 tests passing
✅ PII Redaction (email, phone)
✅ Hash Chain Integrity Verification
✅ LangChain / OpenAI / Generic Integrations
✅ JSON / CSV / HTML Export
🔲 PII Redaction expansion (credit card, SSN, API keys, IP)
🔲 TypeScript/JavaScript SDK
🔲 Community RFC process for spec changes
🔲 LangChain/CrewAI published plugins

Contributing

This project wants contributors. If you care about Agent accountability:

Read the SPEC — understand the format
Open an issue — disagree with something? We want to hear it
Build an integration — your framework? Your plugin welcome
Spread the word — star, tweet, blog post

License

MIT. Use it, fork it, improve it. The spec belongs to everyone.

If your Agent can think, its thinking should be auditable.

pip install agent-audit-trail

推荐订阅源

Hacker News - Newest: "AI"