One of our AI agents deleted a directory it was never supposed to touch. The Python it wrote was valid. The model was confident. It did the wrong thing.
The agent was only supposed to query a database. But we gave it a full Python runtime, so it had access to os, shutil, everything. That's when we realized the problem wasn't the model — it was us handing it way too much power.
Why sandboxing is harder than it looks
The usual options aren't great:
- Full runtime (Python/Node.js): easy to set up, hard to lock down properly. Restricting it after the fact is whack-a-mole.
- Docker per agent: proper isolation, but ~200ms cold start and 100MB+ RAM each. At 50 concurrent agents that's 5GB just idling.
We wanted something lighter. Not "restricted Python" — something designed from scratch for how AI actually writes code.
AI code has a specific profile
After running a lot of agent scripts in production, the pattern is pretty consistent:
- Under 100 lines almost always
- Runs frequently, not once
- Doesn't need filesystem, network, or OS access
- Tends to produce infinite loops, wrong types, null accesses
General-purpose languages aren't built for this. So we built Autolang — a small scripting VM where AI can only call functions you explicitly registered. Nothing else is reachable.
How it works
AI writes Autolang script
→ static compiler validates types and scope
→ your registered JS / C++ functions do the actual work
You wrap your existing functions as bindings. The AI calls those. That's it. It can't reach outside what you've registered.
Here's a real example — register a database binding:
compiler.registerBuiltInLibrary("company/products", `
class Product (val name: String, val price: Int, val inStock: Bool)
class Database {
@native("get_products")
static func get_products(): Array<Product>
}
`, { autoImport: true }, {
"get_products": () => fetchFromYourDB()
})
The AI then writes something like:
@import("company/products")
val affordable = Database.get_products()
.filter {|p| p.inStock && p.price <= 30 }
affordable.forEach {|p| println("- ${p.name}: $${p.price}") }
It can't touch anything outside company/products. If it writes an infinite loop, the opcode limit kills it before it hangs your process.
The numbers
| Native | npm | |
|---|---|---|
| Cold start | ~10ms | ~20ms |
| Warm start | 1–2ms | 2–4ms |
| RAM per instance | ~4MB | ~12MB |
50 concurrent agents: ~200MB total. Docker would be 5GB+.
When it makes sense
Good fit if you're running 5+ concurrent agents, scripts are short and frequent, and you want controlled access to existing functions without rewriting them.
Probably not worth it if you have only a handful of agents, need OS-level security guarantees, need Python bindings (not ready yet), or your AI writes long complex programs.
npm install autolang-compiler
Philosophy: autolang.vercel.app/docs/philosophy-vision
Live editor: autolang.vercel.app/docs/editor
Curious how others are handling this. What's your current setup for sandboxed agent code?
























