The LLM never writes the query: a declarative search layer over sensitive records

Hacker News - Newest: "LLM"

GitHub - lechmazur/position_bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders. Flex routing (EU and EFTA) Dark Factories: Retooling for LLM Velocity Ask HN: What would be the impact of a LLM output injection attack? GitHub - AronDaron/dataset-generator: No-code desktop app for generating high-quality synthetic datasets to fine-tune LLMs — plan-then-execute pipeline, LLM-as-judge, HuggingFace upload. GitHub - Oaklight/llm-rosetta: Production-ready LLM API translation layer for Python — bidirectional conversion between OpenAI, Anthropic & Google formats via hub-and-spoke IR. Optional API gateway. Streaming & non-streaming. Zero core deps. Contributions welcome! GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. GitHub - moeen-mahmud/remen: Remen turns thoughts into something you can return to Analyzing 156 LLM Launch Posts on Hacker News ChatGPT vs Gemini vs Claude: The Best LLM Subscription You Should Buy GitHub - salaamalykum/quran-semantic-search: High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture) GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. The State of LLM Bug Bounties in 2026 Operational Readiness Criteria for Tool-Using LLM Agents Meshcore: Architecture for a Decentralized P2P LLM Inference Network How an LLM becomes more coherent as we train it GitHub - seetrex-ai/laimark GitHub - Jossifresben/BibCrit: AI-assited biblical textual criticism GitHub - wastedcode/memex: File system based wiki, maintained by Claude 99helpers.com GitHub - cliver-project/AITrigram GitHub - unbody-io/adapt: A self-evolving memory layer for AI agents. GitHub - hb20007/awesome-gen-ai-fails: A list of incidents where reliance on generative AI and LLMs resulted in harm to companies, individuals, or society GitHub - nevenkordic/localmind: Run any local LLM with persistent memory and context. CLI agent over Ollama with SQLite-backed hybrid recall. No cloud. Ask HN: What are the machine requirements for a LLM like Llama-3.1-8B? Faster LLM Inference via Sequential Monte Carlo grpo explained: group relative policy optimization for llm finetuning - cgft Stop comparing price per million tokens: the hidden LLM API costs · TensorZero Andrej Karpathy's LLM Wiki Is a Bad Idea GitHub - GG-QandV/mnemostroma: Offline RAM-first cognitive leer/coprocessor for AI agents and robotics. Solves "Context Abandonment" with 20-80ms latency using a dual-thread biomimetic memory architecture (ONNX + SQLite WAL). mempalace/agent at agent · skorotkiewicz/mempalace GitHub - Nyquest-ai/nyquest-rust-fullstack-pub: Nyquest — Semantic Compression Proxy for LLMs. 350+ rules, local LLM stage, 15-75% token savings. Full Rust stack. GitHub - TheoV823/mneme: Enforce architectural decisions in AI-assisted development. GitHub - klemenvod/TokenBrawl: A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it. Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow Power Circuit AI: Designing Power Electronic Circuits for Motor Drives with Generative Artificial Intelligence Ask HN: How to program with IDE and LLM on CPU locally? Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis Bonsai 1-bit WebGPU - a Hugging Face Space by webml-community The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows Ask HN: Simple tooling for local LLM code critique without IDE integration? Can a General LLM Diagnose a DICOM Slice? A 10-Case Public Benchmark Charts-of-Thought: Enhancing LLM Visualization Literacy (PDF, 2026) GitHub - Mesh-LLM/mesh-llm: Distributed AI/LLM for the people. Share compute privately or publicly to power your agents and chat. GitHub - seamus-brady/springdrift: A persistent runtime for long-lived LLM agents Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation Ask HN: Which LLM model and agentic CLI are you using for local development? GitHub - wayneColt/modelcascade: Route local. Escalate smart. Never overspend. Open-source multi-model cascade routing for autonomous agents. LLM pricing is 100x harder than you think GitHub - asakin/llm-primer: Pre-warmed Claude Code sessions in tmux. No startup wait. GitHub - EggerMarc/chat-rs: A multi-provider LLM framework for Rust. GitHub - SynapseKit/SynapseKit: Minimal, async-first Python framework for production LLM apps- 2 hard deps, no magic, no SaaS. A Claude Skill that Makes LLM Paragraphs More Bearable Does Gas Town 'steal' usage from users' LLM credits & paid services to improve itself? What's Claude Code Actually Doing? Open the Black Box with the Arthur Engine Milla Jovovich's New Open Source LLM Memory App and the Dark Code Problem Your intuition of LLM token usage might be wrong Show HN: Bloomberg Terminal for LLM ops – free and open source GitHub - 0xchamin/mcptube: Transform YouTube videos into a compounding knowledge base with transcripts, vision analysis, and agentic search. Works as an MCP server for Claude, Copilot & more. Show HN: Open KB: Open LLM Knowledge Base Your LLM is a compiler, not a runtime GitHub - sapountzis/Unslop: A Web Feed That Deserves You crates.io: Rust Package Registry Beyond Karpathy's LLM-Wiki: The Necessity of Cognitive Governance GitHub - amitshekhariitbhu/llm-internals: Learn LLM internals step by step - from tokenization to attention to inference optimization. GitHub - parallem-ai/parallem: An expressive library for running agents with the Batch API. GitHub - stfurkan/pi-llm LLM-Wiki Show HN: Formal – Formal verification for AI-generated code using Lean 4 LRTS – Regression testing for LLM prompts (open source, local-first) LLM Wiki Skill: Build a Second Brain with Claude Code and Obsidian I built an LLM Wiki and RAG solution: here's a demo for a security KB The biggest advance in AI since the LLM Predict-Rlm: The LLM Runtime That Lets Models Write Their Own Control Flow the-synthetic-library/the-synthetic-mind at main · joshferrer1/the-synthetic-library GitHub - yisding/reviewwiggum GitHub - Donnyb369/mcp-spine: Context Minifier & State Guard — Local-first MCP middleware proxy GitHub - Beledarian/wgpu-llm: A from-scratch LLM inference engine that uses wgpu (the cross-platform WebGPU implementation) to dispatch WGSL compute shaders for every math operation a Transformer needs. No CUDA. No Python. No massive framework dependencies. Just Rust, raw shaders, and your GPU. GitHub - anitiue/Hindsight: An experience-driven self-improvement framework for LLM agents — 基于经验的 LLM Agent 自我改进框架 GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. GitHub - alainnothere/AmdPerformanceTesting: Amd Performance Testing Ask HN: Is a purely Markdown-based CRM a terrible idea? Optimized for LLM agents Context Engineering - LLM Memory and Retrieval for AI Agents | Weaviate little_helper_tui/letter.md at main · sleepyeldrazi/little_helper_tui GitHub - EvanZhouDev/umr: The Unified Model Registry for all your local AI apps. GitHub - JordanCT/VigIA-Orchestrator Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain A Taxonomy of RL Environments for LLM Agents Llama LLM Network Feture GitHub - genedeng-ca/ai-mac-migration: AI-powered Mac-to-Mac migration tool - replace Apple Migration Assistant with intelligent, selective transfer using local LLMs GitHub - lunargate-ai/gateway: High-performance self-hosted AI gateway (OpenAI-compatible) with routing, retries, and streaming GitHub - AuthBits/webmcp: A lightweight, prompt-driven MCP web research server for high-quality LLM powered information extraction. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

alechash · 2026-05-22 · via Hacker News - Newest: "LLM"

We have an internal assistant. One of the things it does is find people.

By people I mean records in a system of record — names, contacts, home addresses, current assignments, and other personal details that exist in exactly one place and actually matter. This is the most sensitive data we hold. Staff can search it by typing a request in plain language, such as “find translators in France who speak Spanish,” and getting an answer back.

The model handles the request itself without much trouble. This post is about what happens between the request and the answer.

The setup

First, some background on what the assistant is.

It’s an internal chat tool. A staff member opens it, types a request in plain language, and gets an answer. Under the hood it’s an LLM with a set of tools, which are small functions it’s allowed to call. The model has no direct connection to a database. When it needs data, it calls a tool, and the tool is the only code that touches a real record.

Finding people is one of those tools. When someone types “find translators in France who speak Spanish,” the model reads the request, works out the search criteria, and calls the person-search tool with them. The tool runs the query and returns the matches, and the model presents them.

So the model turns language into criteria, and the tool turns criteria into people. The rest of this post is about the tool: the interface it exposes and how it runs a query.

The model doesn’t get a query language

The obvious way to build this is to give the model a flexible search tool and let it improvise — hand it something query-shaped and let it filter however it likes.

For data like this, that’s a bad idea. The records are read-only to the assistant by construction; it can’t write to them, and the whole feature is gated behind a permission claim. But read-only access still leaves room to read more than you should. If the model improvises its own queries, it can request data in shapes nobody reviewed beforehand.

So instead of a query language, the model gets a fixed vocabulary. Every criterion it can express is a small declared object with a field, an operator, and a value:

{ "field": "departments", "operator": "current",
  "value": { "departmentId": "…", "isManager": true } }

The model can’t invent a field or an operator. Whatever it sends is validated against a registry of things we marked as searchable, and that validation runs before any record is read. The model still works out which criteria a request needs, but it doesn’t control the shape of the search.

Not everyone sees the same person

There’s a second reason the model doesn’t get a query language, and it matters more than the first.

Not everyone who uses the assistant sees the same data. Permissions here aren’t a single yes-or-no on whether you can search people; they’re more granular. Two people can run the same search and get back different fields on the same person, because each record is authorized field by field against the claims of whoever is asking. Which details you can see depends on who you are.

A free-form query language has no good way to handle that. It lets any caller name any field, then relies on the backend to drop whatever they weren’t cleared for on every query. The model ends up reasoning in a vocabulary that may not be valid for the person it’s working for.

A declared registry handles it in one place. The fields are finite and known, so “what can this caller search?” has a definite answer. You give each user only the fields their claims allow, and the model can’t form a criterion for a field it was never given. Fields a user isn’t cleared for are never offered to the model, so they don’t have to be filtered out of the results afterward.

One definition, five jobs

The old version of this tool had two parallel systems bolted together. There were about sixteen typed parameters for the fast path, plus a stringly-typed JSON blob for everything else, with the filter logic reflected out of attributes at runtime. Adding one searchable field meant editing five files and making sure they stayed consistent. The two halves didn’t even agree on what “searchable” meant.

The rewrite replaces that with a single concept. A search field is defined once, and that one definition does five separate jobs.

SearchField.ObjectCollection<DepartmentAssignment>("departments")
    .Operators(Current, Past, Future, Ever, Never)
    .TemporalRange(d => d.StartDate, d => d.EndDate)
    .Member("departmentId", …)
    .Member("isManager", …)
    .Phase1("departmentIds")              // how it narrows the server query
    .Selects("departments { … }");        // what it fetches back

That one block produces all five things: the description the model reads, the rules its input is validated against, how the criterion gets pushed into the upstream search, what data we fetch back, and how a match is decided. Adding a field means writing one of these blocks, and all five jobs come from it.

The builder produces one object behind a small interface. The interface is those five jobs:

public interface ISearchField
{
    string Name { get; }

    // 1 — what the model is told this field is
    void DescribeForLlm(StringBuilder text);

    // 2 — is a given { operator, value } even legal here?
    //     null means fine; otherwise it's the error handed back to the model
    string? Validate(SearchOperator op, JsonElement value);

    // 3 — push this criterion into the Phase-1 server query, if it can be
    bool TryContributeToPhase1(SearchOperator op, JsonElement value, Phase1Query query);

    // 4 — what to pull back from the full record in Phase 2
    string? GraphQlSelection { get; }

    // 5 — the final, authoritative yes/no for one person
    bool Evaluate(PersonRecord person, SearchOperator op, JsonElement value);
}

Two implementations cover every field we have. A scalar field is a single value on the record, such as a name, a date, or a status. An object-collection field is a list of sub-records, like assignments or languages, each with its own shape. The registry is a dictionary of these, keyed by name. Generating the prompt text, validating input, building the query, choosing what to fetch, and deciding the match are all the same loop over the same objects.

This is useful for sensitive data specifically. Because what’s searchable and what’s fetchable come from the same definition, you can’t accidentally fetch a field you didn’t mean to expose. A mistake in a field definition is contained to that one definition.

Two phases, and a hard limit

The search runs in two phases, because the shape of the data doesn’t allow anything simpler.

There are around 8–10 million person records, and each one has hundreds of fields. You can’t query that in a single shot. The search index, which is the thing that can scan all of those records quickly, only covers some of the fields — the common ones that are worth indexing. The rest of the fields live in the full record, which a fast query can’t see.

The two phases split along that line.

Phase 1 is a query against the upstream service. It narrows on the indexed fields — location, department, team — on the server, across all ten million records at once. Its job is to reduce ten million records to a small candidate set. That work has to happen on the upstream service; you don’t want to pull a large set of sensitive personal records into your own process just to sift through them.

Phase 2 takes the narrowed set, fetches the full records including the un-indexed fields, and evaluates the remaining criteria in memory. Those criteria are the ones the index can’t express, like “currently in this department,” which depends on a start date and an end date.

You can’t merge the two phases. Indexing hundreds of fields across ten million records and keeping it fast isn’t feasible, and neither is loading ten million full records into a process to filter them there. Each phase does the part it can.

There’s one rule between the phases that matters more than speed: a Phase-1 narrowing has to be sound. It’s allowed to return too many records, but it can’t drop a real match. If a criterion can’t be pushed down to Phase 1 without risking a false exclusion, we don’t push it, and Phase 2 evaluates it instead. We would rather return a slow result than a wrong one.

This is why TryContributeToPhase1 returns a bool. The return value doesn’t mean “did this field contribute something.” It means “did it fully express this criterion on the server.” The tool tracks that across every criterion:

var needsPhase2 = false;
foreach (var criterion in criteria)
{
    bool fullyPushed = criterion.Field.TryContributeToPhase1(
        criterion.Operator, criterion.Value, phase1Query);
    needsPhase2 |= !fullyPushed;
}

If every criterion pushes cleanly into the indexed query, needsPhase2 stays false and Phase 2 is skipped. The Phase-1 result is the answer, and nothing gets enriched. If one criterion can’t be fully expressed on the server — a temporal check, an un-indexed field, a compound object match — Phase 2 runs, but only as a verification pass over a candidate set that’s already small. So the expensive path only runs when the criteria actually need it.

There’s also a ceiling. If Phase 1 returns more than 2,000 candidates, the tool doesn’t fetch them; it asks for a narrower search instead. If a question is vague enough to match ten thousand people, the right response is to ask for more detail, not to load ten thousand full records into memory.

“Currently”

Temporal scope is a good example of how an ordinary word ends up with a precise meaning here.

“Who’s in the translation department” and “who used to be in it” are different questions, and the difference is two dates on an assignment. So temporal scope is an operator. Every object field with a start and end date gets the same five operators — current, past, future, ever, never — and one function handles all of them:

bool InScope(Assignment a, SearchOperator op, DateOnly today) => op switch
{
    Current => (a.Start is null || a.Start <= today)
            && (a.End   is null || a.End   >= today),
    Past    => a.End   is { } end   && end   <  today,
    Future  => a.Start is { } start && start >  today,
    Ever    => true,
    _       => false,   // Never: every item is in scope, negated after the match
};

The model doesn’t work with dates at all. It picks the word current, and the function defines what current means.

The other half of a match is the value. For an object field, the value isn’t a single scalar; it’s a partial object. Matching works by subset containment: the criterion holds if every member it names matches the item, and members it doesn’t name are ignored.

bool Contains(Assignment item, JsonElement value)
{
    foreach (var member in value.EnumerateObject())
        if (!field.Members[member.Name].Matches(item, member.Value))
            return false;   // a named member disagreed
    return true;            // everything named agreed
}

This is what lets one field cover a range of questions. { departmentId: X } means “in department X, in any role.” Adding isManager: true narrows it to “managing department X.” Choosing the operator current narrows it further, to “managing department X right now.” It’s the same field and the same two functions in each case; the model just names more members or picks a different operator.

Evaluating the field end to end works as you’d expect: take the person’s assignments, keep the ones that are InScope for the operator, and check whether any of them Contains the value. Never runs the same check and negates the result.

Read-only by construction

The safety properties here come from the structure of the tool, not from runtime checks added at the end.

The tool has no write path, so there’s no code path that could change a record. The feature is gated behind a claim, so it isn’t available to people who shouldn’t have it. The model can only name fields from a fixed registry, only use the operators those fields declared, and only receive the data those definitions specify. The candidate pool is capped. Every criterion is validated before any record is read.

Taken together, this means the model only ever receives fields it named explicitly, and it can only name fields from a vocabulary we defined.

Why build it this way

Giving the model a plain search box would have been faster to build, and for most features that tradeoff is fine.

But the data being searched is a record of real people, and I didn’t want “the AI improvised a query” to be a possible explanation for an incident involving that data. The declarative layer isn’t there to be elegant, although it is convenient that adding a field now takes one block instead of edits across five files. It’s there so that the only way to use the tool is also a way we reviewed in advance.

The model is good at turning language into criteria, and that’s the job it has here. What data a query is allowed to touch is decided by the tool and the field definitions, not improvised by the model. Keeping those two responsibilities separate is most of what the design does.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hacker News - Newest: "LLM"

The setup

The model doesn’t get a query language

Not everyone sees the same person

One definition, five jobs

Two phases, and a hard limit

“Currently”

Read-only by construction

Why build it this way