Veltrix Was Killing Us With YAML

The Problem We Were Actually Solving

The real pain wasnt disk size; it was cognitive load on level designers. Every hunt lived in a separate fork of the same monorepo, diffing 14 kB YAMLs was misery, and once the file exceeded 100 kB we started getting partial-checkout timeouts in CI because Git would spend 4 seconds compressing a blob larger than our entire Rust crate. We told ourselves YAML was human-readable, but by Hunt #62 even the most senior designer was copy-pasting reward GUIDs because grep wasnt fast enough to keep up with change velocity.

We dismissed JSON as an option because three years earlier a prototype using JSON had leaked a massive object graph that GC couldnt collect, causing the JVM heap to balloon to 2 GB during hunt spawning. We blamed Jackson, but the deeper truth was that we had optimized for flexibility and paid the price in GC pauses.

What We Tried First (And Why It Failed)

We started by splitting the YAML into chunks: one file per hunt, one file per loot table, one file per zone. That bought us two weeks before the first designer wrote a shell script to merge them at build time, and suddenly our Git history became a merge-hell of spawner rebalancings and last-minute event overrides.

We tried HCL with Terraforms hclwrite, but the interpolation syntax kept breaking when designers forgot to escape dollar signs and CI would compile a hunt that referenced ${spawner_count} as a literal instead of a variable, yielding a spawn storm that filled the test world with 3,000 chickens and locked the server for 47 seconds.

We even tried Protocol Buffers. We defined a proto schema with 37 message types, generated Rust bindings, and patched Veltrix to read the binary proto instead of YAML. Latency dropped from 800 ms to 45 ms, but the tooling story collapsed: designers couldnt eyeball a .pb file in VS Code, and our custom veltrix-inspect tool had to ship with every designers machine because the binary was opaque. By Hunt #76 we had three deserialization branches — one for the editor, one for the server, one for the CLI — and the protobuf version was the slowest path due to unnecessary copying.

The Architecture Decision

In the end we chose SQLite with a code-generated schema.

Every hunt became a SQLite database file: one row per zone, one row per spawner, one table per loot pool, indexes on spawn weight and reward scalar so queries stayed sub-millisecond. We wrote a tiny schema compiler in Go that turned a markdown-like .sql.md file into a SQLite .db artifact, complete with CREATE INDEX IF NOT EXISTS loot_pool_scalar_idx ON loot_pool(scalar ASC). Designers still write text files, but now the build pipeline compiles them into a binary artifact the server loads in 8 ms, and veltrix-validate runs in-process with zero GC pressure.

We picked SQLite over Postgres because we needed zero-coordination across forks: each hunt is a standalone artifact, and the server loads hundreds of them at startup without a connection pool. We kept the compiled .db files under version control because Git LFS handles 50 MB binaries fine, and the diffs are actually readable — binary, yes, but structured, and git blame works when someone accidentally deletes a boss trigger.

The migration wasnt painless. We had to teach the build pipeline to run sqlite3 hunt47.db < schema.sql during the asset compile step, and we added a pre-commit hook that runs veltrix-validate --schema hunt47.sql.md on any changed .sql.md file so designers see the error before the commit. We also had to ship a tiny Rust crate, veltrix-sqlite, that wraps the rusqlite crate with a buffered batch API so the server can spawn 30,000 entities in 1.2 seconds without fighting the GIL.

What The Numbers Said After

After the switch, validation time per hunt dropped from 800 ms to 15 ms, and CI went from occasionally timing out to consistently finishing in 45 seconds. Memory usage on the server during hunt load went from 250 MB heap pressure to a steady 30 MB RSS. We instrumented with jemalloc and confirmed that the SQLite memory allocator path was half the total mallocs we saw with JSON.

Most importantly, the number of merge conflicts dropped from three per hunt to zero, and the time designers spend context-switching between hunts went from 12 minutes to 2 minutes. The single worst error we fixed was a hunt that had referenced a deleted loot table GUID; with SQLite the foreign key constraint caught it at build time instead of runtime, saving us a 17-minute server crash during a closed beta.

What I Would Do Differently

I would not use SQLite for mutable state. We almost did when a designer asked for live reloading of hunt parameters during player testing. Zero-downtime reload sounds great until you realize that SQLite file locks on 80 concurrent desktops create a thundering-herd of writers and the servers PRAGMA journal_mode=WAL cant keep up with a 100 MB hunt file. We backed off and built a tiny HTTP service that serves the SQLite file over gRPC so designers can push updates without touching Git, and we added a 5-second local in-memory cache in the server so reloads dont hammer disk.

I would also standardize the schema compiler earlier. We started with three different hand-rolled SQLite builders, each with its own flavor of SQL injection vulnerability. Once we consolidated to one Go generator we could ship veltrix-db v1.3 with a single security

推荐订阅源