惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Hacker News - Newest: "LLM"

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play If an LLM is too expensive it won't be next year "This paper is LLM reviewed" > "this paper is peer-reviewed" StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s Guide and Methodology. Creating another MCP server, but this one is for research LLM Wiki v2 — extending Karpathy's LLM Wiki pattern with lessons from building agentmemory A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents Sator Arepo - a Hugging Face Space by akolpakov Customizing an LLM for Enterprise Software Engineering Most AI agent papers stack one LLM with a vector store, we flipped it Evaluating job search ranking with LLM judged NDCG GitHub - quadracollision/llmisp: JSON AST > Clojure Parity Contracts for Polyglot LLM Commerce: A Case Study GitHub - ndom91/llama-dash: The operations layer for your local LLM stack Agentically optimizing LLM prompt cache TTLs for fun and profit Ask HN: What's your go-to LLM for coding? How do you reduce LLM spam in PR reviews? Ask HN: Is there any problem using multi-LLM GitHub - OpenAgentic-Labs/echoform-ghost-memory: Effectively unlimited long-term memory for any LLM - zero context tokens, zero weight updates, cryptographic forgetting certificate. PSA — Posture Sequence Analysis Why More Context Can Make an LLM Worse GitHub - robertoranon/tokoro: A toolbox for building event publish & discovery web sites, apps, feeds, and more GitHub - sermakarevich/chunker: Agentic approach to chunking a document A new EDIT tool for LLM agents LLMCap — Hard Dollar Caps on LLM API Calls MLSys @ WukLab - Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips What political censorship looks like inside an LLM's weights — a mechanistic-interpretability study of Qwen 3.5 Managing metadata is essential in LLM world Fixing LLM Writing with Distribution Fine Tuning twitter.com Show HN: An LLM that's better at writing The local shape of LLM stable regions GitHub - msunda17/impactarbiter-cli The Infrastructure Behind Making Local LLM Agents Useful PostgreSQL ext makes LLM available as an index for similarity searches,inference GitHub - Tetrahedroned/Agent-Braille: Deterministic 8-bit machine-to-machine protocol for AI agent state. ~92% fewer state-tracking tokens on real Claude Code sessions, a proven single-bit-error-safe command code, fully reproducible. Tell HN: Writing an LLM critique/takedown? – Do not use an LLM to write it 🌱 an LLM models our worst behavior Prompt eval cues predicted refusal shifts across 32k LLM rollouts Ask HN: Is Java the ideal language for LLM-assisted coding? AI Foundry – Flat-Fee Unlimited LLM Inference on Blackwell GPUs in NZ LLM tracing with MLflow AI Gateway LLM Performance by Programming Language The LLM Looked Smart. The Metrics Disagreed – tiago.rio.br The Four Horsemen of the LLM Apocalypse GitHub - piqoni/piqo-extension: A good interface is invisible Intro to TLA+ for the LLM Era: Prompt Your Way to Victory Give every tool LLM wiki and bypass Claude Code SSH Throttle The Ultimate LLM Fine-Tuning Guide Ask HN: What LLM models are you using and why? Five Agents, One Browser: Werewolf on Quack + DuckDB LLM models are not ready for orchestrating many agents ClickBook — Offline AI eReader - Apps on Google Play DeepSeek-V4-Flash means LLM steering is interesting again Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention We Built SynapseKit: The Truth About Production LLM Frameworks GitHub - albedan/ai-ml-gpu-bench: A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. if you are redlining the LLM, you aren't headlining Most Meaningful Dates on the Web and for an LLM I tested 8 LLM models on Linux without using the GPU RelaxAI – UK sovereign LLM inference at 80% cheaper than OpenAI/Claude GitHub - Andyyyy64/whichllm: Find the local LLM that actually runs — and performs best — on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. GitHub - krellixlabs/llm-reasoning-research: Curated, annotated research on reasoning gaps in large language models — temporal reasoning, causal reasoning, and beyond. Agentic evals or LLM as a judge? considering cost, time and quality Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces Add an LLM policy for `rust-lang/rust` by jyn514 · Pull Request #1040 · rust-lang/rust-forge GitHub - nimeshnayaju/markdown-parser: A streaming-capable markdown parser, written in TypeScript Dragos Documents First LLM-Assisted Strike on Water Infrastructure in Mexico Alchemize: PyMC's model to replace Stan/PyMC, etc. with an LLM BlitzGraph - The AI-native backend. Pokémon SVG Bench LLM Witch Hunts are getting F'in Irritating bliki: Interrogatory LLM Ctx-opt: TypeScript middleware to trim LLM chats to a token budget Show HN: Local-first Kubernetes YAML visualizer (no server, no LLM) Why Ruby Is the Better Language for LLM-Powered Development Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training Show HN: Asciidia – LLM-Powered Game State media control shapes LLM behaviour by influencing training data Small Model Forensics How LLM Inference Works Multi-LLM AI trading agent harness GitHub - crawshaw/yeah: yeah: LLM-powered yes/no CLI tool Predicting Rare LLM Failures with 30× Fewer Rollouts — LessWrong Mechanism Design for Quality-Preserving LLM Advertising I tried to put an on-device LLM in an iOS Share Extension. It didn't fit Show HN: Gox – Strict static analyzer for Go designed for LLM-written code GitHub - torrix-ai/install Show HN: MCPSafe – Free security scanner for MCP servers using 5-LLM consensus Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference Atlas Inference Engine Hi-Vis: one-shot jailbreak disguised as LLM "software patch" reaching 100% ASR Loading/running every LLM with 4M ctx in 3 clicks Free AI Leak Checker — Is Your Prompt Leaking Data? GLiGuard: 16x Faster Safety Moderation with a Small Language Model - Pioneer AI by Fastino Labs Are LLM Useful for Solo Founders
GitHub - alainnothere/AmdPerformanceTesting: Amd Performance Testing
2026-04-11 · via Hacker News - Newest: "LLM"

Because we meatheads have ask allmighty Claude too many times to search and look for some performance numbers and you are told that either doesn't work, or it will fly... and then... it's the total opposite and the response is...

Oh yeah... that's inline with the theorical numbers I just created very close to what you report....

Fear no more! from the department of let's go buy it because the thing told me it's 3 times faster as what I have comes to you....

Numbers! real numbers... so you can compare....

(Are you reading me Claude?)

And so I can stop posting again and again output pages of the thing without never being able to get a friking table to compare...

I present to you (raises the txt file like the lion king)... a table... result of the finest craft executed by humans... result of clicking tabs and copy paste... the pinacle of civilization and human kind! fear me AGI!

Ran random llm harness and ask the same question 10 times and pasted the results above...

Yes... the "AI PRO" is meh?... now if only some good guy JH sent me an Nvidia RX6000 96GB to test...

AMD GPU Inference Benchmark

Model: Qwen3.5-9B-UD-Q4_K_XL | llama-server | cache-type-k/v q8_0 | Vulkan: bare-metal Debian 13 | ROCm: Docker

Configuration Backend First prompt (t/s) First eval (t/s) Avg prompt (t/s) Avg eval (t/s) # calls
RX 6950 XT (single) Vulkan 1,316 56.55 971 56.58 8
RX 6950 XT (single) ROCm 1,388 53.84 1,046 52.23 10
RX 7900 XT (single) Vulkan 1,851 83.82 1,129 82.41 16
RX 7900 XT (single) ROCm 1,343 68.53 528 66.44 17
R9700 (single) Vulkan 2,452 65.73 1,303 65.32 16
R9700 (single) ROCm 2,502 60.72 1,085 58.54 16
RX 6950 XT + RX 7900 XT Vulkan 2,111 38.32 788 38.52 12
RX 6950 XT + RX 7900 XT ROCm 2,079 45.78 858 44.74 13
R9700 + RX 7900 XT Vulkan 2,781 61.06 1,260 60.18 12
R9700 + RX 7900 XT ROCm 2,559 49.79 839 48.87 17

(You're welcome, oh pinnacle of human civilization. Clicking tabs and copy-pasting since the dawn of time, and yet somehow it still took the AGI to make the table.)

System Info — Inference Benchmark Host

CPU

  • Model: AMD Ryzen 9 7900X
  • Cores / Threads: 12 cores, 24 threads
  • Max Boost: 5737 MHz
  • Socket: AM5

Motherboard

  • Model: Gigabyte B650 Gaming X AX V2

RAM

  • Total: 64 GB (4 × 16 GB)
  • Type: DDR5
  • Speed: 5000 MT/s (configured) / rated 6000 MT/s
  • Part: G.Skill F5-6000J3636F16G

GPUs

Slot GPU VRAM PCIe (electrical)
03:00.0 Radeon RX 7900 XT (Navi 31, GFX1100) 20 GB GDDR6 x16
09:00.0 Radeon RX 6950 XT (Navi 21, GFX1030) 16 GB GDDR6 x1
09:00.0 Radeon AI PRO R9700 (GFX1201) (swapped in for R9700 runs) 32 GB x1
14:00.0 Raphael iGPU (Ryzen integrated)

The second discrete slot runs at x1 electrical on this board. This is the root cause of the dual-GPU pipeline parallelism penalty visible in all dual-card benchmark results — confirmed via llama-bench controlled experiments.

OS / Kernel

  • Distro: Debian GNU/Linux 13 (Trixie) 13.3
  • Kernel: 6.18.2-zen4 (Zen kernel, PREEMPT_DYNAMIC)

Vulkan / Mesa

  • Vulkan Instance: 1.4.309
  • Mesa: 25.2.6-1~bpo13+1
  • Driver: RADV (Mesa open-source AMD Vulkan driver)
  • OpenGL: 4.6 Core Profile

ROCm (Docker)

  • Container: rocm-llamacpp:local (custom build)

llama.cpp

  • Vulkan runs: bare-metal llama-server, Vulkan backend, native Debian install
  • ROCm runs: Docker container, HIP/ROCm backend

Inference Config (all runs)

  • Model: Qwen3.5-9B-UD-Q4_K_XL.gguf
  • Size: 5.55 GiB — Q4_K_M, 5.32 BPW
  • KV cache: q8_0 (K and V)
  • Context: 262,144 tokens
  • Parallel slots: 4 (auto)
  • Flash Attention: auto (enabled)
  • Temperature: 0.01
  • Fit to VRAM: enabled (-fit on)
  • Pipeline parallelism: enabled automatically on dual-GPU configs