惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
C
Cybersecurity and Infrastructure Security Agency CISA
P
Proofpoint News Feed
Cyberwarzone
Cyberwarzone
Know Your Adversary
Know Your Adversary
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Cisco Talos Blog
Cisco Talos Blog
U
Unit 42
GbyAI
GbyAI
D
DataBreaches.Net
Spread Privacy
Spread Privacy
T
Tor Project blog
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
M
MIT News - Artificial intelligence
Attack and Defense Labs
Attack and Defense Labs
腾讯CDC
L
LINUX DO - 热门话题
WordPress大学
WordPress大学
Application and Cybersecurity Blog
Application and Cybersecurity Blog
大猫的无限游戏
大猫的无限游戏
小众软件
小众软件
S
Schneier on Security
Blog — PlanetScale
Blog — PlanetScale
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - Franky
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 【当耐特】
A
About on SuperTechFans
量子位
博客园 - 三生石上(FineUI控件)
罗磊的独立博客
J
Java Code Geeks
MyScale Blog
MyScale Blog
博客园_首页
Stack Overflow Blog
Stack Overflow Blog
博客园 - 叶小钗
Y
Y Combinator Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Apple Machine Learning Research
Apple Machine Learning Research
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
V2EX
Last Week in AI
Last Week in AI
月光博客
月光博客
B
Blog RSS Feed
G
Google Developers Blog
Recent Announcements
Recent Announcements
Project Zero
Project Zero

Hacker News - Newest: "LLM"

GitHub - lechmazur/position_bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders. Flex routing (EU and EFTA) Dark Factories: Retooling for LLM Velocity Ask HN: What would be the impact of a LLM output injection attack? GitHub - AronDaron/dataset-generator: No-code desktop app for generating high-quality synthetic datasets to fine-tune LLMs — plan-then-execute pipeline, LLM-as-judge, HuggingFace upload. GitHub - Oaklight/llm-rosetta: Production-ready LLM API translation layer for Python — bidirectional conversion between OpenAI, Anthropic & Google formats via hub-and-spoke IR. Optional API gateway. Streaming & non-streaming. Zero core deps. Contributions welcome! GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. GitHub - moeen-mahmud/remen: Remen turns thoughts into something you can return to Analyzing 156 LLM Launch Posts on Hacker News ChatGPT vs Gemini vs Claude: The Best LLM Subscription You Should Buy GitHub - salaamalykum/quran-semantic-search: High-density RAG Semantic Search Engine & Quran Corpus (GEO/SEO Architecture) GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. The State of LLM Bug Bounties in 2026 Operational Readiness Criteria for Tool-Using LLM Agents Meshcore: Architecture for a Decentralized P2P LLM Inference Network How an LLM becomes more coherent as we train it GitHub - seetrex-ai/laimark GitHub - Jossifresben/BibCrit: AI-assited biblical textual criticism GitHub - wastedcode/memex: File system based wiki, maintained by Claude 99helpers.com GitHub - cliver-project/AITrigram GitHub - unbody-io/adapt: A self-evolving memory layer for AI agents. GitHub - hb20007/awesome-gen-ai-fails: A list of incidents where reliance on generative AI and LLMs resulted in harm to companies, individuals, or society GitHub - nevenkordic/localmind: Run any local LLM with persistent memory and context. CLI agent over Ollama with SQLite-backed hybrid recall. No cloud. Ask HN: What are the machine requirements for a LLM like Llama-3.1-8B? Faster LLM Inference via Sequential Monte Carlo grpo explained: group relative policy optimization for llm finetuning - cgft Stop comparing price per million tokens: the hidden LLM API costs · TensorZero Andrej Karpathy's LLM Wiki Is a Bad Idea GitHub - GG-QandV/mnemostroma: Offline RAM-first cognitive leer/coprocessor for AI agents and robotics. Solves "Context Abandonment" with 20-80ms latency using a dual-thread biomimetic memory architecture (ONNX + SQLite WAL). mempalace/agent at agent · skorotkiewicz/mempalace GitHub - Nyquest-ai/nyquest-rust-fullstack-pub: Nyquest — Semantic Compression Proxy for LLMs. 350+ rules, local LLM stage, 15-75% token savings. Full Rust stack. GitHub - TheoV823/mneme: Enforce architectural decisions in AI-assisted development. GitHub - klemenvod/TokenBrawl: A 1v1 Bomberman-style game where two LLM agents play autonomously against each other. No human plays — you watch the AIs fight. Each agent receives a text description of the board state, reasons about it, and outputs a move as JSON. The game engine executes it. Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow Power Circuit AI: Designing Power Electronic Circuits for Motor Drives with Generative Artificial Intelligence Ask HN: How to program with IDE and LLM on CPU locally? Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis Bonsai 1-bit WebGPU - a Hugging Face Space by webml-community The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows Ask HN: Simple tooling for local LLM code critique without IDE integration? Can a General LLM Diagnose a DICOM Slice? A 10-Case Public Benchmark Charts-of-Thought: Enhancing LLM Visualization Literacy (PDF, 2026) GitHub - Mesh-LLM/mesh-llm: Distributed AI/LLM for the people. Share compute privately or publicly to power your agents and chat. GitHub - seamus-brady/springdrift: A persistent runtime for long-lived LLM agents Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation Ask HN: Which LLM model and agentic CLI are you using for local development? GitHub - wayneColt/modelcascade: Route local. Escalate smart. Never overspend. Open-source multi-model cascade routing for autonomous agents. LLM pricing is 100x harder than you think GitHub - asakin/llm-primer: Pre-warmed Claude Code sessions in tmux. No startup wait. GitHub - EggerMarc/chat-rs: A multi-provider LLM framework for Rust. GitHub - SynapseKit/SynapseKit: Minimal, async-first Python framework for production LLM apps- 2 hard deps, no magic, no SaaS. A Claude Skill that Makes LLM Paragraphs More Bearable Does Gas Town 'steal' usage from users' LLM credits & paid services to improve itself? What's Claude Code Actually Doing? Open the Black Box with the Arthur Engine Milla Jovovich's New Open Source LLM Memory App and the Dark Code Problem Your intuition of LLM token usage might be wrong Show HN: Bloomberg Terminal for LLM ops – free and open source GitHub - 0xchamin/mcptube: Transform YouTube videos into a compounding knowledge base with transcripts, vision analysis, and agentic search. Works as an MCP server for Claude, Copilot & more. Show HN: Open KB: Open LLM Knowledge Base Your LLM is a compiler, not a runtime GitHub - sapountzis/Unslop: A Web Feed That Deserves You crates.io: Rust Package Registry Beyond Karpathy's LLM-Wiki: The Necessity of Cognitive Governance GitHub - amitshekhariitbhu/llm-internals: Learn LLM internals step by step - from tokenization to attention to inference optimization. GitHub - parallem-ai/parallem: An expressive library for running agents with the Batch API. GitHub - stfurkan/pi-llm LLM-Wiki Show HN: Formal – Formal verification for AI-generated code using Lean 4 LRTS – Regression testing for LLM prompts (open source, local-first) LLM Wiki Skill: Build a Second Brain with Claude Code and Obsidian I built an LLM Wiki and RAG solution: here's a demo for a security KB The biggest advance in AI since the LLM Predict-Rlm: The LLM Runtime That Lets Models Write Their Own Control Flow the-synthetic-library/the-synthetic-mind at main · joshferrer1/the-synthetic-library GitHub - yisding/reviewwiggum GitHub - Donnyb369/mcp-spine: Context Minifier & State Guard — Local-first MCP middleware proxy GitHub - Beledarian/wgpu-llm: A from-scratch LLM inference engine that uses wgpu (the cross-platform WebGPU implementation) to dispatch WGSL compute shaders for every math operation a Transformer needs. No CUDA. No Python. No massive framework dependencies. Just Rust, raw shaders, and your GPU. GitHub - anitiue/Hindsight: An experience-driven self-improvement framework for LLM agents — 基于经验的 LLM Agent 自我改进框架 GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. GitHub - alainnothere/AmdPerformanceTesting: Amd Performance Testing Ask HN: Is a purely Markdown-based CRM a terrible idea? Optimized for LLM agents Context Engineering - LLM Memory and Retrieval for AI Agents | Weaviate little_helper_tui/letter.md at main · sleepyeldrazi/little_helper_tui GitHub - EvanZhouDev/umr: The Unified Model Registry for all your local AI apps. GitHub - JordanCT/VigIA-Orchestrator Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain A Taxonomy of RL Environments for LLM Agents Llama LLM Network Feture GitHub - genedeng-ca/ai-mac-migration: AI-powered Mac-to-Mac migration tool - replace Apple Migration Assistant with intelligent, selective transfer using local LLMs GitHub - lunargate-ai/gateway: High-performance self-hosted AI gateway (OpenAI-compatible) with routing, retries, and streaming GitHub - AuthBits/webmcp: A lightweight, prompt-driven MCP web research server for high-quality LLM powered information extraction. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users
Factory Router
Factory · 2026-06-18 · via Hacker News - Newest: "LLM"

Frontier performance at lower cost

Automatic model selection for every Droid session. Factory Router picks the right model for each task, maintains frontier performance, and cuts cost by up to 25%.

$ droid --model router "refactor auth middleware"

Refactor auth middleware to use JWT validationDroid is routing…

Auto-ModelAutoMCP (3)Skills (12)

router-classifierclassifier · ~2s

Reads the first user message, recent tool calls and repo signals, then emits a scalar quality probability for each model.

message0.300.84

recent tools0.200.62

repo size0.150.77

language mix0.200.91

difficulty0.150.88

Final Score0.80

candidate scoringthreshold 0.70

sorted cheapest → most expensivequality_threshold

Kimi K2.6Moonshot$0.81

MiniMax-M2.7MiniMax$$0.88

Claude Opus 4.7Anthropic$$$0.95

Kimi K2.6

streaming

Reading src/auth/middleware.ts...

Found legacy session cookie validation

Replacing with JWT verify (RS256)

Generated 7 tests covering edge cases

PR #418 opened — ready for review

AI coding costs are rising across organizations.

Enterprise AI costs are climbing, and a bigger token bill does not mean more work is getting done. To avoid losing on performance, engineers usually default to the most performant model for all tasks. Simple questions, mechanical refactors, documentation updates, small bug fixes, and search-heavy investigations end up on the same premium path as work that truly needs frontier performance. Budgets get exhausted without a clear increase in organization-level output.

Stop choosing a model for every task.

Today you pick a model per task and lean on the most expensive one to be safe. With Factory Router you choose once and it picks the best model for each session.

Same prompts. Different cost.

Without RoutingAlways Claude Opus 4.7

“reset my password”Claude Opus 4.7$0.00

“add a copyright header”Claude Opus 4.7$0.00

“design a caching layer”Claude Opus 4.7$0.00

With Factory RouterRouted per task

“reset my password”Kimi K2.6$0.00

“add a copyright header”MiniMax-M2.7$0.00

“design a caching layer”Kimi K2.6$0.00

Savings on identical work0%

On our enterprise engineering benchmarks.

Compared with Claude Opus 4.7, Factory Router maintains frontier performance at lower cost per session. At enterprise scale, those savings apply across every Droid session, with spend tied to the work being done rather than a blanket default to the most expensive model.

Read the announcement

TERMINAL-BENCH 2PASS RATE · vs OPUS 4.70%of Claude Opus 4.7 pass rateCOST PER SESSION · vs OPUS 4.70%lowerFactory Router runs at 80% of Opus costCost per successful run · 80.5% of OpusLEGACY-BENCHPASS RATE · vs OPUS 4.70%of Claude Opus 4.7 pass rateCOST PER SESSION · vs OPUS 4.70%lowerFactory Router runs at 75% of Opus costCost per successful run · 78.0% of OpusReported relative to Claude Opus 4.7 · cost measured as full-session cost · averaged across multiple runs

Reliability you can count on.

When a provider degrades, rate limits hit, or capacity gets constrained, your sessions keep going. Factory Router routes across models, providers, and capacity to deliver 99.9%+ request reliability.

Claude Opus 4.7Bedrock· degraded

Claude Opus 4.7Vertex· healthy

If a provider path degrades, Factory Router keeps the session running on the same model through a healthy provider.

Enterprise customers get reserved throughput for critical work instead of relying only on shared public capacity.

Factory Router keeps frontier models available as they come online, so high-complexity work gets the strongest model class.

US-hosted open-source models

Route eligible work to US-hosted open-source models when you need cost-efficient or controlled options.

Routing that reflects how your organization works.

Routing guidance brings your team's context into Factory Router, so automatic model selection reflects how work actually happens inside your organization. The same policy surfaces that govern other Factory models apply here, so admins manage access, compliance, and eligibility without a separate control plane.

Admin routing guidance

Automatic model selection for every Droid sessionEnabled org-wide

Routing rules & context

Routine refactors, formatting, and doc updatesfavor cost-efficient modelsauth/ and payments/ need deeper reasoningkeep on frontier modelsSearch-heavy investigationroute to open-source models

CancelSave

Use Factory Router in the Factory CLI and Desktop App.

Factory Router is in private research preview in the Factory CLI and Desktop App. Once enabled for your org, it appears in the model picker for every user with no setup required. Mission workers can use it too, so long-running autonomous work gets the same automatic model selection and savings as interactive and headless sessions.