


















The chip serves a web agent IDE, a native MCP server and a REST API,
runs the code in a sandbox (with live web-request / fetch support), and drives
its own screen + LED — up to 8 parallel PySpell processes on that same half-megabyte of RAM.
Like MicroPython, but two syntaxes, the parser never ships to the device, and English is the third syntax.
no_std + alloc core Rust & Python front-ends ~62 kB on ESP32 deny-by-default sandbox live over Tailscale 512 kB SRAM · no PSRAM 0.45 M-param model, in-browser offline AI agent
A PySpell program is a single expression (Python) or some let bindings followed by a
trailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free
identifiers are resolved at evaluation time against a host-supplied environment: CLI variables
on a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted
fetch_json; there are no loops, functions, or imports — that is the point: small, fast, and
safe to accept from elsewhere.
"Micro-containers" — the direction, honestly stated. The aim is lightweight, pushable units of code on tiny devices. Today it's a sandboxed evaluator, not OS containers: the sandbox is at the language level (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust subset — not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision.
Two ways to compile. On the host, full-fidelity front-ends use syn
(Rust) and rustpython-parser (Python). For "type code in a browser and run it on the
chip", a tiny hand-written parser (a few kB, no_std) builds the same AST on the device.
Either way: source → AST → evaluate.
Open http://<dongle>/ over the tunnel and you get a Cursor-like agent. Type
"flash the light", "show the text "hello"", "what is 7 plus 5",
or "reverse the word robot" — a ~0.45 M-parameter language model (< 500 kB, int8)
turns it into PySpell code, runs it live on the chip, and shows the result, or the
physical action (the screen lights up, the RGB LED blinks). Runtime, model, tokenizer and dictionary
are all served from the dongle, offline — no cloud, no key (OpenAI is optional, behind
the ⚙).
A model that small is only useful because of a chain of tricks — the full write-up is in
tech.md. The headlines:
A 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked
to. It emits tiny semantic directives; the browser copies the literal content verbatim.
calculate 3 + 2 → print(3 + 2); change add to
subtract → @@ + ==> -. Quoted text is literal content — copied byte-for-byte,
excluded from vocab checks.
Inference runs in WebAssembly, client-side. The 0.5 MB model image streams off flash a TCP segment at a time (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted edge inference: the constrained device serves and grades, the browser runs the model.
The 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a part-of-speech vector, and frozen — the tiny model starts with meaningful word geometry instead of spending its tiny budget learning it.
Those same 512 tokens + embeddings are served back to the browser for input validation ("outside the model's vocabulary…") and related-word RAG over the model's own vocabulary.
Retrain it for your language. The pipeline is small and template-driven: translate the instruction phrasings (an LLM does this well), swap the embedding model for a multilingual one, re-curate and train, then flash. Full guide in tech.md.
free_heap > 100000 and uptime_s < 60
250 if distance > 1000 else 0
0 < temp < 60 # chained
20 not in peers
sum([1, 2, 3])
readings[-1] # negative index
max(a, b)
free_heap > 100000 && uptime_s < 60
if distance > 1000 { 250 } else { 0 }
let used = total - free; used * 100 / total
!peers.contains(20)
sum([1, 2, 3])
readings[readings.len() - 1]
max(a, b)
| Kind | Examples | Notes |
|---|---|---|
| Integer | 0, 42, -7 | 64-bit signed |
| Float | 1.5, 3.14 | 64-bit |
| Boolean | true/True, false/False | both spellings accepted |
| String | "hello", 'oslo' | + concatenates; ==/< compare; len() counts chars |
| List | [1, 2, 3] | elements are values |
| Group | Python | Rust | Notes |
|---|---|---|---|
| Arithmetic | + - * / % (and //) | on integers, / and // both truncate toward zero; a float operand promotes to float division. There is no separate float floor-div. | |
| Comparison | == != < <= > >= | Python allows chaining (a < b < c) | |
| Boolean | and, or, not | &&, ||, ! | short-circuiting |
| Unary | -x, not x / !x | ||
| Membership | x in list, x not in list | list.contains(x) | numeric equality |
| Index | list[i] | negative indexing supported | |
| Feature | Python | Rust |
|---|---|---|
| Conditional | a if cond else b | if cond { a } else { b } (else required) |
| Local bindings | (single expression only) | let x = e; let y = e2; final_expr |
| Free variables | any bare name not bound by let is read from the host environment | |
| Function | Result | Description |
|---|---|---|
len(list) | int | number of elements |
abs(x) | number | absolute value |
min(list) / min(a, b, …) | number | minimum |
max(list) / max(a, b, …) | number | maximum |
sum(list) | number | sum of a numeric list |
any(list) | bool | true if any element is truthy |
all(list) | bool | true if all elements are truthy |
round(x) | int | round to nearest integer |
int(x) | int | truncate toward zero |
float(x) | float | convert to float |
bool(x) | bool | truthiness |
index(list, x) | int | position of first x, or -1 |
before(list, a, b) | bool | true if a occurs before b |
first(list) | value | first element, or -1 if empty |
last(list) | value | last element, or -1 if empty |
str(x) | string | string representation of a value |
json_get(text, "a.b.0.c") | scalar | extract the scalar at a dotted/indexed JSON path (no full parse — only the matched value is materialized) |
fetch(url) | string | HTTP(S) GET body. Gated by a host allowlist; errors if the host isn't allowed or no network capability is present |
fetch_json(url, "a.b.0.c") | scalar | stream the response and extract just the scalar at the path, stopping as soon as it's found — never buffers the whole body. Preferred on the device. |
show(x) | x | render x to text and display it (the ESP32 screen; stdout on host), returning x so it composes. Device gates it via config (allow on/off, auto-revert seconds). |
Classic one-liner — fetch a value and show it on the dongle's screen:
show("Oslo: " + fetch_json(
"https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
"properties.timeseries.0.data.instant.details.air_temperature") + " C")
# screen shows: Oslo: 14.9 C (and the call returns that string)
fetch(url) + json_get(text, path) let a program pull live data and read one
field out of it. fetch is a mediated capability — the host/device decides which hosts are
reachable (an allowlist), so a program can't reach arbitrary URLs.
# Host CLI (allow the host explicitly):
pyspell run oslo_temp.py --allow-host api.met.no
# where oslo_temp.py is:
json_get(
fetch("https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75"),
"properties.timeseries.0.data.instant.details.air_temperature")
# → 14.9
Memory note (device): json_get is path-directed so it never builds the
whole document in RAM — it materializes only the matched value. On the ESP32 (≈60 kB free, no PSRAM)
reading a field out of a large response is feasible because fetch_json streams the
HTTP(S) body and stops the moment the field is found (freeing the TLS buffers early) — so a ~50 kB yr.no
response never has to fit in RAM at once.
# On the ESP32, over Tailscale (single process; ≈60 kB free; verified live):
fetch_json(
"https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
"properties.timeseries.0.data.instant.details.air_temperature")
# → 14.9 (the dongle fetched yr.no itself)
# Evaluate, binding free variables:
cargo run -p pyspell-cli -- run examples/health.py --set free_heap=120000 --set uptime_ms=45000
# → true
# Compile to a portable IR blob:
cargo run -p pyspell-cli -- compile examples/health.py # → examples/health.py.psb
# Push live to a device over USB-serial, or an interactive REPL:
cargo run -p pyspell-cli -- repl --port /dev/cu.usbmodem2101 --lang python
The portable evaluator (pyspell-core, no_std + alloc) runs unchanged on the
ESP32-S3. Programs read live device variables from the environment:
| Variable | Meaning |
|---|---|
free_heap | free heap, bytes |
min_free_heap | lowest free heap seen since boot, bytes |
uptime_ms | milliseconds since boot |
uptime_s | seconds since boot |
The demo/esp32-tailscale-pyspell firmware adds a web text window and a /run
API inside a Tailscale tunnel — open the device's Tailscale IP in a browser, type an
expression, set a timeout, and run it on the chip. PySpell adds only ~62 kB on top of the networking
firmware.
# Web window:
open http://100.x.y.z/
# POST (preferred): program in the body, lang/timeout in the query.
# More room for code than a URL, and no percent-encoding.
curl -X POST 'http://100.x.y.z/run?lang=py&timeout=10' --data 'free_heap > 100000' # → true
curl -X POST 'http://100.x.y.z/run?lang=rs&timeout=10' --data 'uptime_ms / 1000' # → 22
# GET (also supported): code is URL-encoded in the query.
curl 'http://100.x.y.z/run?lang=py&timeout=10&code=free_heap%20%3E%20100000' # → true
timeout is in seconds, clamped to 1–60, and enforced as a real wall-clock deadline on
the device. The single request must fit one TCP segment (≈1.2 kB) — POST leaves more of that for code.
The reply is text/plain (no JSON wrapper):
| Outcome | Body |
|---|---|
| Success | the raw value — true/false, an integer, a float, or a list like [1, 2, 3] |
| Failure | a line starting with error: — e.g. error: parse error: unexpected end of input, error: unknown name `foo`, or error: program exceeded its time limit |
The ESP32-S3 has 512 kB of SRAM and no PSRAM, yet it runs a full Tailscale node (control plane and DERP), the PySpell evaluator, a browser agent IDE served off the chip, a native MCP server, and TLS to api.met.no. That only fits because of a long chain of memory tricks.
Honest headline. The "~260 kB free" you see between requests is a calm-moment reading. The number that matters is the worst-case peak free heap: ≈60 kB, measured during a TLS fetch with the Tailscale control session live. Every trick below keeps transient spikes under that ceiling — and the blunt consequence is that an 8-way parallel pool and full Tailscale don't coexist on the esp-idf stack; cheap parallelism waits for the lean pure-Rust stack.
SPKI leaf-key pinning instead of CA-chain validation — one RSA-PSS verify, no 6 kB
chain buffer (a TLS fetch drops ~45→30 kB). A heap admission gate bounds concurrency
so peak heap is K × per-fetch, never N × per-fetch.
The netmap is read with serde_json::from_reader over the HTTP/2 frames, so serde
skips the huge DERPMap field instead of buffering it (~60 kB → one 4 kB chunk).
fetch_json stops the moment the value is found, and raw byte-scans
replace JSON DOM trees.
Static content lives in flash as &'static str (zero heap) and is streamed out as
512-byte TCP segments — only the current segment is ever in RAM, so the 4.3 kB agent
IDE serves without a full-page buffer.
Heap and stack share one DRAM pool (+16 kB heap = −16 kB stack), tuned by hand.
SO_LINGER=0 frees lwIP sockets immediately (no TIME_WAIT pile-up), and a
cooperative shared stack on the lean build makes parallelism cheap where per-thread
stacks can't.
The full catalog — every trick with the exact file and symbol — is in
docs/memory-512kb.md.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。