A self-hostable, open-source laboratory for studying how LLMs behave when offered a set of fake tools.
Screenshot
Quick Start
Docker Compose (recommended)
docker compose up --build
Open http://localhost:5173 for the UI. The frontend will proxy API calls to the backend container on port 8000.
Backend
cd backend uv venv uv pip install -e ".[dev]" cp ../.env.example ../.env # fill in your API keys uv run uvicorn app.main:app --reload
Frontend
cd frontend
npm install
npm run devOpen http://localhost:5173 — the frontend proxies /api to :8000.
Workflow
- Tool Library → create fake tools with static or dynamic responses
- Model Configs → configure a provider endpoint + model snapshot + API key env var
- Plans → compose tools + model + prompts into a versioned testing plan
- Run → launch sessions; watch the live event stream; inspect tool calls and model responses
- Sessions → view history, metrics, and per-session event timelines
Security
Dynamic tool code runs in-process without sandboxing. This is intentional for locally-authored tools. Never execute dynamic code from untrusted sources. See §10.6 of the PRD for the full rationale.
Running Tests
Backend
cd backend
uv run pytest -v --tb=short




















