PySpell — sandboxed Rust/Python expressions, live on ESP32

Hacker News - Newest: "LLM"

GitHub - lechmazur/position_bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders. Flex routing (EU and EFTA) Dark Factories: Retooling for LLM Velocity Ask HN: What would be the impact of a LLM output injection attack? GitHub - AronDaron/dataset-generator: No-code desktop app for generating high-quality synthetic datasets to fine-tune LLMs — plan-then-execute pipeline, LLM-as-judge, HuggingFace upload. GitHub - Oaklight/llm-rosetta: Production-ready LLM API translation layer for Python — bidirectional conversion between OpenAI, Anthropic & Google formats via hub-and-spoke IR. Optional API gateway. Streaming & non-streaming. Zero core deps. Contributions welcome! GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. GitHub - moeen-mahmud/remen: Remen turns thoughts into something you can return to Analyzing 156 LLM Launch Posts on Hacker News ChatGPT vs Gemini vs Claude: The Best LLM Subscription You Should Buy GitHub - salaamalykum/quran-semantic-search: High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture) GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. The State of LLM Bug Bounties in 2026 Operational Readiness Criteria for Tool-Using LLM Agents Meshcore: Architecture for a Decentralized P2P LLM Inference Network How an LLM becomes more coherent as we train it GitHub - seetrex-ai/laimark GitHub - Jossifresben/BibCrit: AI-assited biblical textual criticism GitHub - wastedcode/memex: File system based wiki, maintained by Claude 99helpers.com GitHub - cliver-project/AITrigram GitHub - unbody-io/adapt: A self-evolving memory layer for AI agents. GitHub - hb20007/awesome-gen-ai-fails: A list of incidents where reliance on generative AI and LLMs resulted in harm to companies, individuals, or society GitHub - nevenkordic/localmind: Run any local LLM with persistent memory and context. CLI agent over Ollama with SQLite-backed hybrid recall. No cloud. Ask HN: What are the machine requirements for a LLM like Llama-3.1-8B? Faster LLM Inference via Sequential Monte Carlo grpo explained: group relative policy optimization for llm finetuning - cgft Stop comparing price per million tokens: the hidden LLM API costs · TensorZero Andrej Karpathy's LLM Wiki Is a Bad Idea GitHub - GG-QandV/mnemostroma: Offline RAM-first cognitive leer/coprocessor for AI agents and robotics. Solves "Context Abandonment" with 20-80ms latency using a dual-thread biomimetic memory architecture (ONNX + SQLite WAL). mempalace/agent at agent · skorotkiewicz/mempalace GitHub - Nyquest-ai/nyquest-rust-fullstack-pub: Nyquest — Semantic Compression Proxy for LLMs. 350+ rules, local LLM stage, 15-75% token savings. Full Rust stack. GitHub - TheoV823/mneme: Enforce architectural decisions in AI-assisted development. GitHub - klemenvod/TokenBrawl: A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it. Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow Power Circuit AI: Designing Power Electronic Circuits for Motor Drives with Generative Artificial Intelligence Ask HN: How to program with IDE and LLM on CPU locally? Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis Bonsai 1-bit WebGPU - a Hugging Face Space by webml-community The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows Ask HN: Simple tooling for local LLM code critique without IDE integration? Can a General LLM Diagnose a DICOM Slice? A 10-Case Public Benchmark Charts-of-Thought: Enhancing LLM Visualization Literacy (PDF, 2026) GitHub - Mesh-LLM/mesh-llm: Distributed AI/LLM for the people. Share compute privately or publicly to power your agents and chat. GitHub - seamus-brady/springdrift: A persistent runtime for long-lived LLM agents Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation Ask HN: Which LLM model and agentic CLI are you using for local development? GitHub - wayneColt/modelcascade: Route local. Escalate smart. Never overspend. Open-source multi-model cascade routing for autonomous agents. LLM pricing is 100x harder than you think GitHub - asakin/llm-primer: Pre-warmed Claude Code sessions in tmux. No startup wait. GitHub - EggerMarc/chat-rs: A multi-provider LLM framework for Rust. GitHub - SynapseKit/SynapseKit: Minimal, async-first Python framework for production LLM apps- 2 hard deps, no magic, no SaaS. A Claude Skill that Makes LLM Paragraphs More Bearable Does Gas Town 'steal' usage from users' LLM credits & paid services to improve itself? What's Claude Code Actually Doing? Open the Black Box with the Arthur Engine Milla Jovovich's New Open Source LLM Memory App and the Dark Code Problem Your intuition of LLM token usage might be wrong Show HN: Bloomberg Terminal for LLM ops – free and open source GitHub - 0xchamin/mcptube: Transform YouTube videos into a compounding knowledge base with transcripts, vision analysis, and agentic search. Works as an MCP server for Claude, Copilot & more. Show HN: Open KB: Open LLM Knowledge Base Your LLM is a compiler, not a runtime GitHub - sapountzis/Unslop: A Web Feed That Deserves You crates.io: Rust Package Registry Beyond Karpathy's LLM-Wiki: The Necessity of Cognitive Governance GitHub - amitshekhariitbhu/llm-internals: Learn LLM internals step by step - from tokenization to attention to inference optimization. GitHub - parallem-ai/parallem: An expressive library for running agents with the Batch API. GitHub - stfurkan/pi-llm LLM-Wiki Show HN: Formal – Formal verification for AI-generated code using Lean 4 LRTS – Regression testing for LLM prompts (open source, local-first) LLM Wiki Skill: Build a Second Brain with Claude Code and Obsidian I built an LLM Wiki and RAG solution: here's a demo for a security KB The biggest advance in AI since the LLM Predict-Rlm: The LLM Runtime That Lets Models Write Their Own Control Flow the-synthetic-library/the-synthetic-mind at main · joshferrer1/the-synthetic-library GitHub - yisding/reviewwiggum GitHub - Donnyb369/mcp-spine: Context Minifier & State Guard — Local-first MCP middleware proxy GitHub - Beledarian/wgpu-llm: A from-scratch LLM inference engine that uses wgpu (the cross-platform WebGPU implementation) to dispatch WGSL compute shaders for every math operation a Transformer needs. No CUDA. No Python. No massive framework dependencies. Just Rust, raw shaders, and your GPU. GitHub - anitiue/Hindsight: An experience-driven self-improvement framework for LLM agents — 基于经验的 LLM Agent 自我改进框架 GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. GitHub - alainnothere/AmdPerformanceTesting: Amd Performance Testing Ask HN: Is a purely Markdown-based CRM a terrible idea? Optimized for LLM agents Context Engineering - LLM Memory and Retrieval for AI Agents | Weaviate little_helper_tui/letter.md at main · sleepyeldrazi/little_helper_tui GitHub - EvanZhouDev/umr: The Unified Model Registry for all your local AI apps. GitHub - JordanCT/VigIA-Orchestrator Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain A Taxonomy of RL Environments for LLM Agents Llama LLM Network Feture GitHub - genedeng-ca/ai-mac-migration: AI-powered Mac-to-Mac migration tool - replace Apple Migration Assistant with intelligent, selective transfer using local LLMs GitHub - lunargate-ai/gateway: High-performance self-hosted AI gateway (OpenAI-compatible) with routing, retries, and streaming GitHub - AuthBits/webmcp: A lightweight, prompt-driven MCP web research server for high-quality LLM powered information extraction. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

punnerud · 2026-06-18 · via Hacker News - Newest: "LLM"

The chip serves a web agent IDE, a native MCP server and a REST API, runs the code in a sandbox (with live web-request / fetch support), and drives its own screen + LED — up to 8 parallel PySpell processes on that same half-megabyte of RAM. Like MicroPython, but two syntaxes, the parser never ships to the device, and English is the third syntax.

no_std + alloc core Rust & Python front-ends ~62 kB on ESP32 deny-by-default sandbox live over Tailscale 512 kB SRAM · no PSRAM 0.45 M-param model, in-browser offline AI agent

What it is

A PySpell program is a single expression (Python) or some let bindings followed by a trailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free identifiers are resolved at evaluation time against a host-supplied environment: CLI variables on a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted fetch_json; there are no loops, functions, or imports — that is the point: small, fast, and safe to accept from elsewhere.

"Micro-containers" — the direction, honestly stated. The aim is lightweight, pushable units of code on tiny devices. Today it's a sandboxed evaluator, not OS containers: the sandbox is at the language level (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust subset — not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision.

Two ways to compile. On the host, full-fidelity front-ends use syn (Rust) and rustpython-parser (Python). For "type code in a browser and run it on the chip", a tiny hand-written parser (a few kB, no_std) builds the same AST on the device. Either way: source → AST → evaluate.

An offline AI coding agent, served off the chip

Open http://<dongle>/ over the tunnel and you get a Cursor-like agent. Type "flash the light", "show the text "hello"", "what is 7 plus 5", or "reverse the word robot" — a ~0.45 M-parameter language model (< 500 kB, int8) turns it into PySpell code, runs it live on the chip, and shows the result, or the physical action (the screen lights up, the RGB LED blinks). Runtime, model, tokenizer and dictionary are all served from the dongle, offline — no cloud, no key (OpenAI is optional, behind the ⚙).

A model that small is only useful because of a chain of tricks — the full write-up is in tech.md. The headlines:

The model points, the browser copies

A 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked to. It emits tiny semantic directives; the browser copies the literal content verbatim. calculate 3 + 2 → print(3 + 2); change add to subtract → @@ + ==> -. Quoted text is literal content — copied byte-for-byte, excluded from vocab checks.

The device serves; the browser computes

Inference runs in WebAssembly, client-side. The 0.5 MB model image streams off flash a TCP segment at a time (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted edge inference: the constrained device serves and grades, the browser runs the model.

Frozen embeddings, distilled

The 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a part-of-speech vector, and frozen — the tiny model starts with meaningful word geometry instead of spending its tiny budget learning it.

The vocabulary is the dictionary

Those same 512 tokens + embeddings are served back to the browser for input validation ("outside the model's vocabulary…") and related-word RAG over the model's own vocabulary.

Retrain it for your language. The pipeline is small and template-driven: translate the instruction phrasings (an LLM does this well), swap the embedding model for a multilingual one, re-curate and train, then flash. Full guide in tech.md.

Syntax at a glance

Python

free_heap > 100000 and uptime_s < 60
250 if distance > 1000 else 0
0 < temp < 60          # chained
20 not in peers
sum([1, 2, 3])
readings[-1]           # negative index
max(a, b)

Rust

free_heap > 100000 && uptime_s < 60
if distance > 1000 { 250 } else { 0 }
let used = total - free; used * 100 / total
!peers.contains(20)
sum([1, 2, 3])
readings[readings.len() - 1]
max(a, b)

Language reference

Literals & values

Kind	Examples	Notes
Integer	`0`, `42`, `-7`	64-bit signed
Float	`1.5`, `3.14`	64-bit
Boolean	`true`/`True`, `false`/`False`	both spellings accepted
String	`"hello"`, `'oslo'`	`+` concatenates; `==`/`<` compare; `len()` counts chars
List	`[1, 2, 3]`	elements are values

Operators

Group	Python	Rust	Notes
Arithmetic	`+ - * / %` (and `//`)		on integers, `/` and `//` both truncate toward zero; a float operand promotes to float division. There is no separate float floor-div.
Comparison	`== != < <= > >=`		Python allows chaining (`a < b < c`)
Boolean	`and`, `or`, `not`	`&&`, `\|\|`, `!`	short-circuiting
Unary	`-x`, `not x` / `!x`
Membership	`x in list`, `x not in list`	`list.contains(x)`	numeric equality
Index	`list[i]`		negative indexing supported

Control flow & bindings

Feature	Python	Rust
Conditional	`a if cond else b`	`if cond { a } else { b }` (else required)
Local bindings	(single expression only)	`let x = e; let y = e2; final_expr`
Free variables	any bare name not bound by `let` is read from the host environment

Built-in functions

Function	Result	Description
`len(list)`	int	number of elements
`abs(x)`	number	absolute value
`min(list)` / `min(a, b, …)`	number	minimum
`max(list)` / `max(a, b, …)`	number	maximum
`sum(list)`	number	sum of a numeric list
`any(list)`	bool	true if any element is truthy
`all(list)`	bool	true if all elements are truthy
`round(x)`	int	round to nearest integer
`int(x)`	int	truncate toward zero
`float(x)`	float	convert to float
`bool(x)`	bool	truthiness
`index(list, x)`	int	position of first `x`, or `-1`
`before(list, a, b)`	bool	true if `a` occurs before `b`
`first(list)`	value	first element, or `-1` if empty
`last(list)`	value	last element, or `-1` if empty
`str(x)`	string	string representation of a value
`json_get(text, "a.b.0.c")`	scalar	extract the scalar at a dotted/indexed JSON path (no full parse — only the matched value is materialized)
`fetch(url)`	string	HTTP(S) GET body. Gated by a host allowlist; errors if the host isn't allowed or no network capability is present
`fetch_json(url, "a.b.0.c")`	scalar	stream the response and extract just the scalar at the path, stopping as soon as it's found — never buffers the whole body. Preferred on the device.
`show(x)`	x	render `x` to text and display it (the ESP32 screen; stdout on host), returning `x` so it composes. Device gates it via config (allow on/off, auto-revert seconds).

Classic one-liner — fetch a value and show it on the dongle's screen:

show("Oslo: " + fetch_json(
  "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
  "properties.timeseries.0.data.instant.details.air_temperature") + " C")
# screen shows:  Oslo: 14.9 C   (and the call returns that string)

Network & JSON

fetch(url) + json_get(text, path) let a program pull live data and read one field out of it. fetch is a mediated capability — the host/device decides which hosts are reachable (an allowlist), so a program can't reach arbitrary URLs.

# Host CLI (allow the host explicitly):
pyspell run oslo_temp.py --allow-host api.met.no
# where oslo_temp.py is:
json_get(
  fetch("https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75"),
  "properties.timeseries.0.data.instant.details.air_temperature")
# → 14.9

Memory note (device): json_get is path-directed so it never builds the whole document in RAM — it materializes only the matched value. On the ESP32 (≈60 kB free, no PSRAM) reading a field out of a large response is feasible because fetch_json streams the HTTP(S) body and stops the moment the field is found (freeing the TLS buffers early) — so a ~50 kB yr.no response never has to fit in RAM at once.

# On the ESP32, over Tailscale (single process; ≈60 kB free; verified live):
fetch_json(
  "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
  "properties.timeseries.0.data.instant.details.air_temperature")
# → 14.9   (the dongle fetched yr.no itself)

Running on the host

# Evaluate, binding free variables:
cargo run -p pyspell-cli -- run examples/health.py --set free_heap=120000 --set uptime_ms=45000
# → true

# Compile to a portable IR blob:
cargo run -p pyspell-cli -- compile examples/health.py    # → examples/health.py.psb

# Push live to a device over USB-serial, or an interactive REPL:
cargo run -p pyspell-cli -- repl --port /dev/cu.usbmodem2101 --lang python

Running on the ESP32

The portable evaluator (pyspell-core, no_std + alloc) runs unchanged on the ESP32-S3. Programs read live device variables from the environment:

Variable	Meaning
`free_heap`	free heap, bytes
`min_free_heap`	lowest free heap seen since boot, bytes
`uptime_ms`	milliseconds since boot
`uptime_s`	seconds since boot

Demo: PySpell over Tailscale

The demo/esp32-tailscale-pyspell firmware adds a web text window and a /run API inside a Tailscale tunnel — open the device's Tailscale IP in a browser, type an expression, set a timeout, and run it on the chip. PySpell adds only ~62 kB on top of the networking firmware.

# Web window:
open http://100.x.y.z/

# POST (preferred): program in the body, lang/timeout in the query.
# More room for code than a URL, and no percent-encoding.
curl -X POST 'http://100.x.y.z/run?lang=py&timeout=10' --data 'free_heap > 100000'   # → true
curl -X POST 'http://100.x.y.z/run?lang=rs&timeout=10' --data 'uptime_ms / 1000'       # → 22

# GET (also supported): code is URL-encoded in the query.
curl 'http://100.x.y.z/run?lang=py&timeout=10&code=free_heap%20%3E%20100000'   # → true

timeout is in seconds, clamped to 1–60, and enforced as a real wall-clock deadline on the device. The single request must fit one TCP segment (≈1.2 kB) — POST leaves more of that for code.

Response format

The reply is text/plain (no JSON wrapper):

Outcome	Body
Success	the raw value — `true`/`false`, an integer, a float, or a list like `[1, 2, 3]`
Failure	a line starting with `error:` — e.g. `error: parse error: unexpected end of input`, error: unknown name `foo`, or `error: program exceeded its time limit`

How it fits in 512 kB

The ESP32-S3 has 512 kB of SRAM and no PSRAM, yet it runs a full Tailscale node (control plane and DERP), the PySpell evaluator, a browser agent IDE served off the chip, a native MCP server, and TLS to api.met.no. That only fits because of a long chain of memory tricks.

Honest headline. The "~260 kB free" you see between requests is a calm-moment reading. The number that matters is the worst-case peak free heap: ≈60 kB, measured during a TLS fetch with the Tailscale control session live. Every trick below keeps transient spikes under that ceiling — and the blunt consequence is that an 8-way parallel pool and full Tailscale don't coexist on the esp-idf stack; cheap parallelism waits for the lean pure-Rust stack.

Crypto & TLS

SPKI leaf-key pinning instead of CA-chain validation — one RSA-PSS verify, no 6 kB chain buffer (a TLS fetch drops ~45→30 kB). A heap admission gate bounds concurrency so peak heap is K × per-fetch, never N × per-fetch.

Stream, don't buffer

The netmap is read with serde_json::from_reader over the HTTP/2 frames, so serde skips the huge DERPMap field instead of buffering it (~60 kB → one 4 kB chunk). fetch_json stops the moment the value is found, and raw byte-scans replace JSON DOM trees.

Pages from flash

Static content lives in flash as &'static str (zero heap) and is streamed out as 512-byte TCP segments — only the current segment is ever in RAM, so the 4.3 kB agent IDE serves without a full-page buffer.

Allocator & sockets

Heap and stack share one DRAM pool (+16 kB heap = −16 kB stack), tuned by hand. SO_LINGER=0 frees lwIP sockets immediately (no TIME_WAIT pile-up), and a cooperative shared stack on the lean build makes parallelism cheap where per-thread stacks can't.

The full catalog — every trick with the exact file and symbol — is in docs/memory-512kb.md.

Sandbox & limits

Deny-by-default grammar. Only the whitelisted expression nodes and the built-ins above exist — no loops, functions, recursion, attribute access, imports, strings, or I/O.
Instruction budget. Every evaluation has a step limit (runaway guard).
Wall-clock timeout. A caller can supply a deadline (e.g. 10 s); on the device the ESP timer enforces it.
Parser stays small. The on-device parser accepts only the safe subset, so the device's attack surface is just a bounded decoder + evaluator.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hacker News - Newest: "LLM"

What it is

An offline AI coding agent, served off the chip

The model points, the browser copies

The device serves; the browser computes

Frozen embeddings, distilled

The vocabulary is the dictionary

Syntax at a glance

Python

Rust

Language reference

Literals & values

Operators

Control flow & bindings

Built-in functions

Network & JSON

Running on the host

Running on the ESP32

Demo: PySpell over Tailscale

Response format

How it fits in 512 kB

Crypto & TLS

Stream, don't buffer

Pages from flash

Allocator & sockets

Sandbox & limits