惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

Hacker News - Newest: "LLM"

Throughput vs Goodput: The Performance Metric You Are Probably Ignoring in LLM Testing - QAInsights The LLM Death Spiral | Hacker News Installation The Special Token `<Think>` Problem/Bug of Latest DeepSeek LLM Client Challenge GitHub - baidu-baige/LoongForge: A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models. LLM System Design Benchmark 3.125-Bit LLM quantization bypassing tensor cores Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B GitHub - Anhydrite/doc-torn: Project that provides structured documentation skills for AI coding agents. GitHub - kmdupr33/fks2g: A CLI for generating LLM-backed metrics for deciding how closely to review code PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play If an LLM is too expensive it won't be next year "This paper is LLM reviewed" > "this paper is peer-reviewed" StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s Guide and Methodology. Creating another MCP server, but this one is for research LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents Sator Arepo - a Hugging Face Space by akolpakov Customizing an LLM for Enterprise Software Engineering Most AI agent papers stack one LLM with a vector store, we flipped it Evaluating job search ranking with LLM judged NDCG GitHub - quadracollision/llmisp: JSON AST > Clojure Parity Contracts for Polyglot LLM Commerce: A Case Study GitHub - ndom91/llama-dash: The operations layer for your local LLM stack Agentically optimizing LLM prompt cache TTLs for fun and profit Ask HN: What's your go-to LLM for coding? How do you reduce LLM spam in PR reviews? Ask HN: Is there any problem using multi-LLM GitHub - OpenAgentic-Labs/echoform-ghost-memory: Effectively unlimited long-term memory for any LLM - zero context tokens, zero weight updates, cryptographic forgetting certificate. PSA — Posture Sequence Analysis Why More Context Can Make an LLM Worse GitHub - robertoranon/tokoro: A toolbox for building event publish & discovery web sites, apps, feeds, and more GitHub - sermakarevich/chunker: Agentic approach to chunking a document A new EDIT tool for LLM agents LLMCap — Hard Dollar Caps on LLM API Calls MLSys @ WukLab - Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5 Managing metadata is essential in LLM world Fixing LLM Writing with Distribution Fine Tuning twitter.com Show HN: An LLM that's better at writing The local shape of LLM stable regions GitHub - msunda17/impactarbiter-cli The Infrastructure Behind Making Local LLM Agents Useful PostgreSQL ext makes LLM available as an index for similarity searches,inference GitHub - Tetrahedroned/Agent-Braille: Deterministic 8-bit machine-to-machine protocol for AI agent state. ~92% fewer state-tracking tokens on real Claude Code sessions, a proven single-bit-error-safe command code, fully reproducible. Tell HN: Writing an LLM critique/takedown? – Do not use an LLM to write it 🌱 an LLM models our worst behavior Prompt eval cues predicted refusal shifts across 32k LLM rollouts Ask HN: Is Java the ideal language for LLM-assisted coding? AI Foundry – Flat-Fee Unlimited LLM Inference on Blackwell GPUs in NZ LLM tracing with MLflow AI Gateway LLM Performance by Programming Language The LLM Looked Smart. The Metrics Disagreed – tiago.rio.br The Four Horsemen of the LLM Apocalypse GitHub - piqoni/piqo-extension: A good interface is invisible Intro to TLA+ for the LLM Era: Prompt Your Way to Victory Give every tool LLM wiki and bypass Claude Code SSH Throttle The Ultimate LLM Fine-Tuning Guide Ask HN: What LLM models are you using and why? Five Agents, One Browser: Werewolf on Quack + DuckDB LLM models are not ready for orchestrating many agents ClickBook — Offline AI eReader - Apps on Google Play DeepSeek-V4-Flash means LLM steering is interesting again Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention We Built SynapseKit: The Truth About Production LLM Frameworks GitHub - albedan/ai-ml-gpu-bench: A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. if you are redlining the LLM, you aren't headlining Most Meaningful Dates on the Web and for an LLM I tested 8 LLM models on Linux without using the GPU RelaxAI – UK sovereign LLM inference at 80% cheaper than OpenAI/Claude GitHub - Andyyyy64/whichllm: Find the local LLM that actually runs — and performs best — on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. GitHub - krellixlabs/llm-reasoning-research: Curated, annotated research on reasoning gaps in large language models — temporal reasoning, causal reasoning, and beyond. Agentic evals or LLM as a judge? considering cost, time and quality Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces Add an LLM policy for `rust-lang/rust` by jyn514 · Pull Request #1040 · rust-lang/rust-forge GitHub - nimeshnayaju/markdown-parser: A streaming-capable markdown parser, written in TypeScript Dragos Documents First LLM-Assisted Strike on Water Infrastructure in Mexico Alchemize: PyMC's model to replace Stan/PyMC, etc. with an LLM BlitzGraph - The AI-native backend. Pokémon SVG Bench LLM Witch Hunts are getting F'in Irritating bliki: Interrogatory LLM Ctx-opt: TypeScript middleware to trim LLM chats to a token budget Show HN: Local-first Kubernetes YAML visualizer (no server, no LLM) Why Ruby Is the Better Language for LLM-Powered Development Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training Show HN: Asciidia – LLM-Powered Game State media control shapes LLM behaviour by influencing training data Small Model Forensics How LLM Inference Works Multi-LLM AI trading agent harness GitHub - crawshaw/yeah: yeah: LLM-powered yes/no CLI tool Predicting Rare LLM Failures with 30× Fewer Rollouts — LessWrong Mechanism Design for Quality-Preserving LLM Advertising
The LLM never writes the query: a declarative search layer over sensitive records
alechash · 2026-05-22 · via Hacker News - Newest: "LLM"

We have an internal assistant. One of the things it does is find people.

By people I mean records in a system of record — names, contacts, home addresses, current assignments, and other personal details that exist in exactly one place and actually matter. This is the most sensitive data we hold. Staff can search it by typing a request in plain language, such as “find translators in France who speak Spanish,” and getting an answer back.

The model handles the request itself without much trouble. This post is about what happens between the request and the answer.

The setup

First, some background on what the assistant is.

It’s an internal chat tool. A staff member opens it, types a request in plain language, and gets an answer. Under the hood it’s an LLM with a set of tools, which are small functions it’s allowed to call. The model has no direct connection to a database. When it needs data, it calls a tool, and the tool is the only code that touches a real record.

Finding people is one of those tools. When someone types “find translators in France who speak Spanish,” the model reads the request, works out the search criteria, and calls the person-search tool with them. The tool runs the query and returns the matches, and the model presents them.

So the model turns language into criteria, and the tool turns criteria into people. The rest of this post is about the tool: the interface it exposes and how it runs a query.

The model doesn’t get a query language

The obvious way to build this is to give the model a flexible search tool and let it improvise — hand it something query-shaped and let it filter however it likes.

For data like this, that’s a bad idea. The records are read-only to the assistant by construction; it can’t write to them, and the whole feature is gated behind a permission claim. But read-only access still leaves room to read more than you should. If the model improvises its own queries, it can request data in shapes nobody reviewed beforehand.

So instead of a query language, the model gets a fixed vocabulary. Every criterion it can express is a small declared object with a field, an operator, and a value:

{ "field": "departments", "operator": "current",
  "value": { "departmentId": "…", "isManager": true } }

The model can’t invent a field or an operator. Whatever it sends is validated against a registry of things we marked as searchable, and that validation runs before any record is read. The model still works out which criteria a request needs, but it doesn’t control the shape of the search.

Not everyone sees the same person

There’s a second reason the model doesn’t get a query language, and it matters more than the first.

Not everyone who uses the assistant sees the same data. Permissions here aren’t a single yes-or-no on whether you can search people; they’re more granular. Two people can run the same search and get back different fields on the same person, because each record is authorized field by field against the claims of whoever is asking. Which details you can see depends on who you are.

A free-form query language has no good way to handle that. It lets any caller name any field, then relies on the backend to drop whatever they weren’t cleared for on every query. The model ends up reasoning in a vocabulary that may not be valid for the person it’s working for.

A declared registry handles it in one place. The fields are finite and known, so “what can this caller search?” has a definite answer. You give each user only the fields their claims allow, and the model can’t form a criterion for a field it was never given. Fields a user isn’t cleared for are never offered to the model, so they don’t have to be filtered out of the results afterward.

One definition, five jobs

The old version of this tool had two parallel systems bolted together. There were about sixteen typed parameters for the fast path, plus a stringly-typed JSON blob for everything else, with the filter logic reflected out of attributes at runtime. Adding one searchable field meant editing five files and making sure they stayed consistent. The two halves didn’t even agree on what “searchable” meant.

The rewrite replaces that with a single concept. A search field is defined once, and that one definition does five separate jobs.

SearchField.ObjectCollection<DepartmentAssignment>("departments")
    .Operators(Current, Past, Future, Ever, Never)
    .TemporalRange(d => d.StartDate, d => d.EndDate)
    .Member("departmentId", )
    .Member("isManager", )
    .Phase1("departmentIds")              // how it narrows the server query
    .Selects("departments { … }");        // what it fetches back

That one block produces all five things: the description the model reads, the rules its input is validated against, how the criterion gets pushed into the upstream search, what data we fetch back, and how a match is decided. Adding a field means writing one of these blocks, and all five jobs come from it.

The builder produces one object behind a small interface. The interface is those five jobs:

public interface ISearchField
{
    string Name { get; }

    // 1 — what the model is told this field is
    void DescribeForLlm(StringBuilder text);

    // 2 — is a given { operator, value } even legal here?
    //     null means fine; otherwise it's the error handed back to the model
    string? Validate(SearchOperator op, JsonElement value);

    // 3 — push this criterion into the Phase-1 server query, if it can be
    bool TryContributeToPhase1(SearchOperator op, JsonElement value, Phase1Query query);

    // 4 — what to pull back from the full record in Phase 2
    string? GraphQlSelection { get; }

    // 5 — the final, authoritative yes/no for one person
    bool Evaluate(PersonRecord person, SearchOperator op, JsonElement value);
}

Two implementations cover every field we have. A scalar field is a single value on the record, such as a name, a date, or a status. An object-collection field is a list of sub-records, like assignments or languages, each with its own shape. The registry is a dictionary of these, keyed by name. Generating the prompt text, validating input, building the query, choosing what to fetch, and deciding the match are all the same loop over the same objects.

This is useful for sensitive data specifically. Because what’s searchable and what’s fetchable come from the same definition, you can’t accidentally fetch a field you didn’t mean to expose. A mistake in a field definition is contained to that one definition.

Two phases, and a hard limit

The search runs in two phases, because the shape of the data doesn’t allow anything simpler.

There are around 8–10 million person records, and each one has hundreds of fields. You can’t query that in a single shot. The search index, which is the thing that can scan all of those records quickly, only covers some of the fields — the common ones that are worth indexing. The rest of the fields live in the full record, which a fast query can’t see.

The two phases split along that line.

Phase 1 is a query against the upstream service. It narrows on the indexed fields — location, department, team — on the server, across all ten million records at once. Its job is to reduce ten million records to a small candidate set. That work has to happen on the upstream service; you don’t want to pull a large set of sensitive personal records into your own process just to sift through them.

Phase 2 takes the narrowed set, fetches the full records including the un-indexed fields, and evaluates the remaining criteria in memory. Those criteria are the ones the index can’t express, like “currently in this department,” which depends on a start date and an end date.

You can’t merge the two phases. Indexing hundreds of fields across ten million records and keeping it fast isn’t feasible, and neither is loading ten million full records into a process to filter them there. Each phase does the part it can.

There’s one rule between the phases that matters more than speed: a Phase-1 narrowing has to be sound. It’s allowed to return too many records, but it can’t drop a real match. If a criterion can’t be pushed down to Phase 1 without risking a false exclusion, we don’t push it, and Phase 2 evaluates it instead. We would rather return a slow result than a wrong one.

This is why TryContributeToPhase1 returns a bool. The return value doesn’t mean “did this field contribute something.” It means “did it fully express this criterion on the server.” The tool tracks that across every criterion:

var needsPhase2 = false;
foreach (var criterion in criteria)
{
    bool fullyPushed = criterion.Field.TryContributeToPhase1(
        criterion.Operator, criterion.Value, phase1Query);
    needsPhase2 |= !fullyPushed;
}

If every criterion pushes cleanly into the indexed query, needsPhase2 stays false and Phase 2 is skipped. The Phase-1 result is the answer, and nothing gets enriched. If one criterion can’t be fully expressed on the server — a temporal check, an un-indexed field, a compound object match — Phase 2 runs, but only as a verification pass over a candidate set that’s already small. So the expensive path only runs when the criteria actually need it.

There’s also a ceiling. If Phase 1 returns more than 2,000 candidates, the tool doesn’t fetch them; it asks for a narrower search instead. If a question is vague enough to match ten thousand people, the right response is to ask for more detail, not to load ten thousand full records into memory.

“Currently”

Temporal scope is a good example of how an ordinary word ends up with a precise meaning here.

“Who’s in the translation department” and “who used to be in it” are different questions, and the difference is two dates on an assignment. So temporal scope is an operator. Every object field with a start and end date gets the same five operators — current, past, future, ever, never — and one function handles all of them:

bool InScope(Assignment a, SearchOperator op, DateOnly today) => op switch
{
    Current => (a.Start is null || a.Start <= today)
            && (a.End   is null || a.End   >= today),
    Past    => a.End   is { } end   && end   <  today,
    Future  => a.Start is { } start && start >  today,
    Ever    => true,
    _       => false,   // Never: every item is in scope, negated after the match
};

The model doesn’t work with dates at all. It picks the word current, and the function defines what current means.

The other half of a match is the value. For an object field, the value isn’t a single scalar; it’s a partial object. Matching works by subset containment: the criterion holds if every member it names matches the item, and members it doesn’t name are ignored.

bool Contains(Assignment item, JsonElement value)
{
    foreach (var member in value.EnumerateObject())
        if (!field.Members[member.Name].Matches(item, member.Value))
            return false;   // a named member disagreed
    return true;            // everything named agreed
}

This is what lets one field cover a range of questions. { departmentId: X } means “in department X, in any role.” Adding isManager: true narrows it to “managing department X.” Choosing the operator current narrows it further, to “managing department X right now.” It’s the same field and the same two functions in each case; the model just names more members or picks a different operator.

Evaluating the field end to end works as you’d expect: take the person’s assignments, keep the ones that are InScope for the operator, and check whether any of them Contains the value. Never runs the same check and negates the result.

Read-only by construction

The safety properties here come from the structure of the tool, not from runtime checks added at the end.

The tool has no write path, so there’s no code path that could change a record. The feature is gated behind a claim, so it isn’t available to people who shouldn’t have it. The model can only name fields from a fixed registry, only use the operators those fields declared, and only receive the data those definitions specify. The candidate pool is capped. Every criterion is validated before any record is read.

Taken together, this means the model only ever receives fields it named explicitly, and it can only name fields from a vocabulary we defined.

Why build it this way

Giving the model a plain search box would have been faster to build, and for most features that tradeoff is fine.

But the data being searched is a record of real people, and I didn’t want “the AI improvised a query” to be a possible explanation for an incident involving that data. The declarative layer isn’t there to be elegant, although it is convenient that adding a field now takes one block instead of edits across five files. It’s there so that the only way to use the tool is also a way we reviewed in advance.

The model is good at turning language into criteria, and that’s the job it has here. What data a query is allowed to touch is decided by the tool and the field definitions, not improvised by the model. Keeping those two responsibilities separate is most of what the design does.