Agent Audit Trail Format (AATF)
Every Agent decision, recorded. Every alternative, documented.
The open specification and reference SDK for recording AI Agent decision chains.
Quick Start · The Format · Why Not Existing Tools? · SPEC · Examples
What Is This?
AATF is not another logging library. It's an open specification for recording why an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do.
Think of it as:
- OpenTelemetry → for observability
- AATF → for Agent decision accountability
User asks: "Book a flight to Shanghai"
Step 1: [human_input] → User request received
Step 2: [reasoning] → Intent: flight booking (confidence: 0.95)
Alt: hotel booking → rejected (user said "flight")
Alt: train booking → rejected (user said "flight")
Step 3: [tool_call] → flight_search_api (342ms) → 3 results
Step 4: [reasoning] → Decision: CA1234 at ¥2580 (confidence: 0.88)
Alt: MU5678 at ¥2890 → rejected (¥310 more)
Alt: CZ9012 at ¥3200 → rejected (over budget)
→ SHA-256 hash chain: ✓ tamper-evident
→ PII redaction: ✓ email, phone, card numbers
→ Export: JSON / CSV / HTML (AATF-compliant)
Quick Start (5 Lines)
from agent_audit_trail import AuditSession, Decision, Alternative with AuditSession(agent_id="my-agent") as session: session.add_reasoning_step( name="choose_tool", decision=Decision( input_summary="User wants weather info", decision="Use weather API", reasoning="Factual query requiring real-time data", confidence=0.95, alternatives_considered=[ Alternative(description="Answer from memory", reason_rejected="Weather changes constantly"), Alternative(description="Ask for clarification", reason_rejected="Query is clear enough"), ] ) )
That's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format.
The AATF Format
The heart of AATF is the Decision record:
{
"type": "reasoning",
"name": "intent_classification",
"decision": {
"input_summary": "User wants to book a flight to Shanghai",
"decision": "Classified as flight-booking intent",
"reasoning": "Explicit keywords: 'flight' + destination + budget",
"confidence": 0.95,
"confidence_basis": "All three slots explicitly stated by user",
"alternatives_considered": [
{
"description": "Hotel booking intent",
"reason_rejected": "User said 'flight', not 'hotel'",
"score": 0.05
},
{
"description": "Train booking intent",
"reason_rejected": "User explicitly said 'flight'",
"score": 0.02
}
]
},
"step_hash": "458942bbf4162f4d9cca121d93b9423413ec..."
}Three things no other format captures:
| Feature | What It Does | Why It Matters |
|---|---|---|
alternatives_considered |
Forces agents to list what they didn't choose | Proves the agent didn't just rationalize a foregone conclusion |
confidence + confidence_basis |
Numeric confidence + how it was determined | Lets auditors distinguish "95% sure because X" from "95% sure because vibes" |
confidence_trajectory |
Tracks confidence across the full decision chain | Reveals when an agent becomes more or less certain as it gathers information |
Why Not Existing Tools?
We respect the existing ecosystem. Here's where AATF fits:
| Tool | What It Does | What AATF Does Differently |
|---|---|---|
| Blockchain ledgers (Notary, Action Ledger) | Store agent actions on-chain for immutability | We're format-agnostic. Store wherever you want. We focus on what to record, not where. |
| LangChain callbacks | Framework-specific tracing | We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything. |
| MCP audit tools | Audit tool calls in MCP protocol | We go deeper: not just what tool was called, but why it was chosen over alternatives. |
| General logging (structlog, etc.) | Key-value event logs | We're structured for decision reasoning, not generic events. |
TL;DR: Other tools audit what the agent did. AATF audits why the agent did it.
Integrations
# LangChain from agent_audit_trail.integrations.langchain import AATFCallbackHandler agent = create_agent(callbacks=[AATFCallbackHandler()]) # OpenAI from agent_audit_trail.integrations.openai import AATFOpenAIWrapper client = AATFOpenAIWrapper(OpenAI()) # Generic decorator (any framework) from agent_audit_trail import audit_traced @audit_traced(agent_id="my-agent") def my_agent_function(query): return "answer"
Installation
pip install agent-audit-trail
Zero external dependencies. Python 3.10+. 700 lines of pure stdlib.
Real Self-Audit Example
We used AATF to audit ourselves — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized.
📄 View the full audit trail JSON
AATF is an open specification, not a product. The SDK is the reference implementation.
📋 Read the full AATF v0.1.0 Specification
This is a draft spec. We want your feedback. Open an issue if you disagree with any design decision. Especially:
- Should
alternatives_consideredbe mandatory or optional? - Is
confidence(0.0-1.0) the right abstraction, or should we use qualitative labels? - What hash algorithm should be standard? (Currently SHA-256)
- Should the format support streaming/traces that are still in-progress?
Who Is This For?
| Role | What You Get |
|---|---|
| Agent Developer | Prove your agent reasons well. Debug decision failures. Show stakeholders the full chain. |
| Compliance Officer | Machine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements. |
| CISO | Tamper-evident hash chains. PII redaction built-in. Export for auditors. |
| Researcher | Structured data on agent reasoning patterns. Confidence trajectories. Decision trees. |
Project Status
- ✅ AATF Specification v0.1.0
- ✅ Reference SDK (Python) — 134 tests passing
- ✅ PII Redaction (email, phone)
- ✅ Hash Chain Integrity Verification
- ✅ LangChain / OpenAI / Generic Integrations
- ✅ JSON / CSV / HTML Export
- 🔲 PII Redaction expansion (credit card, SSN, API keys, IP)
- 🔲 TypeScript/JavaScript SDK
- 🔲 Community RFC process for spec changes
- 🔲 LangChain/CrewAI published plugins
Contributing
This project wants contributors. If you care about Agent accountability:
- Read the SPEC — understand the format
- Open an issue — disagree with something? We want to hear it
- Build an integration — your framework? Your plugin welcome
- Spread the word — star, tweet, blog post
License
MIT. Use it, fork it, improve it. The spec belongs to everyone.
If your Agent can think, its thinking should be auditable.
pip install agent-audit-trail























