惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Hacker News - Newest: "LLM"

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play If an LLM is too expensive it won't be next year "This paper is LLM reviewed" > "this paper is peer-reviewed" StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s Guide and Methodology. Creating another MCP server, but this one is for research LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents Sator Arepo - a Hugging Face Space by akolpakov Customizing an LLM for Enterprise Software Engineering Most AI agent papers stack one LLM with a vector store, we flipped it Evaluating job search ranking with LLM judged NDCG GitHub - quadracollision/llmisp: JSON AST > Clojure Parity Contracts for Polyglot LLM Commerce: A Case Study GitHub - ndom91/llama-dash: The operations layer for your local LLM stack Agentically optimizing LLM prompt cache TTLs for fun and profit Ask HN: What's your go-to LLM for coding? How do you reduce LLM spam in PR reviews? Ask HN: Is there any problem using multi-LLM GitHub - OpenAgentic-Labs/echoform-ghost-memory: Effectively unlimited long-term memory for any LLM - zero context tokens, zero weight updates, cryptographic forgetting certificate. PSA — Posture Sequence Analysis Why More Context Can Make an LLM Worse GitHub - robertoranon/tokoro: A toolbox for building event publish & discovery web sites, apps, feeds, and more GitHub - sermakarevich/chunker: Agentic approach to chunking a document A new EDIT tool for LLM agents LLMCap — Hard Dollar Caps on LLM API Calls MLSys @ WukLab - Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5 Managing metadata is essential in LLM world Fixing LLM Writing with Distribution Fine Tuning twitter.com Show HN: An LLM that's better at writing The local shape of LLM stable regions GitHub - msunda17/impactarbiter-cli The Infrastructure Behind Making Local LLM Agents Useful PostgreSQL ext makes LLM available as an index for similarity searches,inference GitHub - Tetrahedroned/Agent-Braille: Deterministic 8-bit machine-to-machine protocol for AI agent state. ~92% fewer state-tracking tokens on real Claude Code sessions, a proven single-bit-error-safe command code, fully reproducible. Tell HN: Writing an LLM critique/takedown? – Do not use an LLM to write it 🌱 an LLM models our worst behavior Prompt eval cues predicted refusal shifts across 32k LLM rollouts Ask HN: Is Java the ideal language for LLM-assisted coding? AI Foundry – Flat-Fee Unlimited LLM Inference on Blackwell GPUs in NZ LLM tracing with MLflow AI Gateway LLM Performance by Programming Language The LLM Looked Smart. The Metrics Disagreed – tiago.rio.br The Four Horsemen of the LLM Apocalypse GitHub - piqoni/piqo-extension: A good interface is invisible Intro to TLA+ for the LLM Era: Prompt Your Way to Victory Give every tool LLM wiki and bypass Claude Code SSH Throttle The Ultimate LLM Fine-Tuning Guide Ask HN: What LLM models are you using and why? Five Agents, One Browser: Werewolf on Quack + DuckDB LLM models are not ready for orchestrating many agents ClickBook — Offline AI eReader - Apps on Google Play DeepSeek-V4-Flash means LLM steering is interesting again Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention We Built SynapseKit: The Truth About Production LLM Frameworks GitHub - albedan/ai-ml-gpu-bench: A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. if you are redlining the LLM, you aren't headlining Most Meaningful Dates on the Web and for an LLM I tested 8 LLM models on Linux without using the GPU RelaxAI – UK sovereign LLM inference at 80% cheaper than OpenAI/Claude GitHub - Andyyyy64/whichllm: Find the local LLM that actually runs — and performs best — on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. GitHub - krellixlabs/llm-reasoning-research: Curated, annotated research on reasoning gaps in large language models — temporal reasoning, causal reasoning, and beyond. Agentic evals or LLM as a judge? considering cost, time and quality Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces Add an LLM policy for `rust-lang/rust` by jyn514 · Pull Request #1040 · rust-lang/rust-forge GitHub - nimeshnayaju/markdown-parser: A streaming-capable markdown parser, written in TypeScript Dragos Documents First LLM-Assisted Strike on Water Infrastructure in Mexico Alchemize: PyMC's model to replace Stan/PyMC, etc. with an LLM BlitzGraph - The AI-native backend. Pokémon SVG Bench LLM Witch Hunts are getting F'in Irritating bliki: Interrogatory LLM Ctx-opt: TypeScript middleware to trim LLM chats to a token budget Show HN: Local-first Kubernetes YAML visualizer (no server, no LLM) Why Ruby Is the Better Language for LLM-Powered Development Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training Show HN: Asciidia – LLM-Powered Game State media control shapes LLM behaviour by influencing training data Small Model Forensics How LLM Inference Works Multi-LLM AI trading agent harness GitHub - crawshaw/yeah: yeah: LLM-powered yes/no CLI tool Predicting Rare LLM Failures with 30× Fewer Rollouts — LessWrong Mechanism Design for Quality-Preserving LLM Advertising I tried to put an on-device LLM in an iOS Share Extension. It didn't fit Show HN: Gox – Strict static analyzer for Go designed for LLM-written code GitHub - torrix-ai/install Show HN: MCPSafe – Free security scanner for MCP servers using 5-LLM consensus Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference Atlas Inference Engine Hi-Vis: one-shot jailbreak disguised as LLM "software patch" reaching 100% ASR Loading/running every LLM with 4M ctx in 3 clicks Free AI Leak Checker — Is Your Prompt Leaking Data? GLiGuard: 16x Faster Safety Moderation with a Small Language Model - Pioneer AI by Fastino Labs Are LLM Useful for Solo Founders
GitHub - EvanZhouDev/umr: The Unified Model Registry for all your local AI apps.
2026-04-10 · via Hacker News - Newest: "LLM"

UMR banner

Get Started  ·  Docs  ·  NPM

npm i -g umr-cli

What is UMR?

UMR is the Unified Model Registry for your local AI apps. It allows you to maintain a single, centralized copy of a model to use across your favorite local AI apps, instead of having each one manage a separate copy.

That means you can:

  • Save disk space
  • Use the same model across all of your apps instantly
  • Manage all your local models in one place

Install

Install UMR via NPM or your JS package manager of choice.

npm i -g umr-cli

The umr CLI will be available after installation.

Getting Started

Get started by adding a model to the UMR-maintained registry.

# Add a model from Hugging Face
# You will be prompted to choose a quant version
# This will use HF Cache, but UMR will now know about it
umr add hf ggml-org/gemma-4-E2B-it-GGUF

# Add a GGUF file manually
# This will make a copy of the GGUF to UMR's own store
umr add ./gemma-4-E2B-it-q8-0.gguf

After adding, check your available models

# Output depends on which quant you chose
umr list


# NAME                 SOURCE  FORMAT  SIZE     CLIENTS    STATUS
# gemma-4-e2b-it-q8-0  hf      gguf    4.63 GB  -          ok

Now you can use the model in all your favorite apps right away. umr link is lightning fast, and the model should appear immediately in the linked app.

# Link the model to LM Studio
umr link lmstudio gemma-4-e2b-it-q8-0

# Link the model to Ollama
umr link ollama gemma-4-e2b-it-q8-0

# Link the model to Jan
umr link jan gemma-4-e2b-it-q8-0

Alternatively, you can also get the raw GGUF path to use with other AI runtimes

# Get the path to the GGUF
umr show gemma-4-e2b-it-q8-0 --path

# Run it with llama.cpp, for example
llama-cli -m "$(umr show gemma-4-e2b-it-q8-0 --path)"

Docs

UMR has 3 main concepts:

  • Source: where a model comes from, like Hugging Face or a local file
  • Model: the canonical instance of a model's weights UMR tracks and stores
  • Client: an app that uses that model, like LM Studio, Ollama, or Jan

Note that Models are not always a literal file stored by UMR. Often, they are a reference, such as to existing Hugging Face Cache. UMR simply keeps track of where all the files are.

Whenever you add a Model from a Source, you can use that Model across all your Clients, without needing to store an extra copy of it. In order to do that, UMR either hardlinks a copy of the model into the Client's own model directory, or simply points the Client over to UMR's managed instance of the model.

Commands

umr add

Add a model to UMR from Hugging Face or a local GGUF file.

There are two supported Sources for UMR currently.

Hugging Face

When you add a Hugging Face model, UMR will attempt to find the model in your HF Cache first (see available models with hf cache list). If not present, UMR will ask if you want to install it. Note that if a repo has multiple GGUF files, UMR will let you pick one.

umr add hf <repo>

Local File

When you add a local file, UMR will clone a copy of the file into its own store in ~/.umr (by default). This is to prevent changes to the original copy of the file messing with the UMR managed copy.

umr add ./model.gguf

umr list

List the models UMR is tracking, including source, format, linked clients, and status.

umr list

umr show

Show details for a tracked model, or print only the managed file path with --path.

umr show <model>
umr show <model> --path

The --path flag is useful for passing a path to the model for clients that require a path like llama.cpp. For example, you may write:

llama-cli -m "$(umr show gemma-4-e2b-it --path)"

umr link

Link a tracked model to a client app.

umr link lmstudio <model>
umr link ollama <model>
umr link jan <model>

Each client app uses a different linking method under-the-hood, but generally, all of them should be incredibly fast (especially compared to downloading the file). Occasionally, you may need to restart the app for it to discover the new models.

umr unlink

Remove the linked model from a Client.

umr unlink lmstudio <model>
umr unlink ollama <model>
umr unlink jan <model>

Occasionally, you may need to restart the app for the unlinking to take effect.

umr remove

Remove a model from UMR tracking. A model must be unlinked from all clients before it can be removed.

umr remove <model>

Note that remove will only remove UMR's tracking of the model and not necessarily the model itself. That is:

  • Hugging Face Sources: When these models are removed, they will not be deleted from Hugging Face cache.
  • Local Sources: When these models are removed, the instance stored in UMR will be deleted.

umr check

Check UMR for missing files or stale client links. Some errors may be automatically fixable.

umr check

Use --fix to remove stale UMR-side links automatically when it is safe to do so.

umr check --fix

For example, if you link a model to a Client and then delete it Client-side, you may need to run umr check --fix to help UMR update its own Registry to reflect that.