惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

量子位
C
CXSECURITY Database RSS Feed - CXSecurity.com
Project Zero
Project Zero
O
OpenAI News
C
Cisco Blogs
Microsoft Azure Blog
Microsoft Azure Blog
Security Latest
Security Latest
T
Tor Project blog
S
SegmentFault 最新的问题
P
Privacy & Cybersecurity Law Blog
博客园 - 【当耐特】
V
Vulnerabilities – Threatpost
W
WeLiveSecurity
小众软件
小众软件
博客园 - 聂微东
Y
Y Combinator Blog
Spread Privacy
Spread Privacy
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
Scott Helme
Scott Helme
B
Blog RSS Feed
N
News | PayPal Newsroom
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
TaoSecurity Blog
TaoSecurity Blog
D
Docker
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
L
LINUX DO - 最新话题
MongoDB | Blog
MongoDB | Blog
Recorded Future
Recorded Future
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
L
LangChain Blog
Cloudbric
Cloudbric
罗磊的独立博客
宝玉的分享
宝玉的分享
Jina AI
Jina AI
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
N
News and Events Feed by Topic
GbyAI
GbyAI
大猫的无限游戏
大猫的无限游戏
A
About on SuperTechFans
L
LINUX DO - 热门话题
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC

Hacker News: Front Page

Dillo release 3.3.0 旧金山,这座世界人工智能之都,在经济上却表现欠佳 AI Water Use Distractions and Lessons for California - California WaterBlog GitHub - MinishLab/semble: Fast and Accurate Code Search for Agents Issue links open automatically in a popup · community · Discussion #192666 Raiders of the lost hard drive At least 10 people tied to sensitive US research have died or disappeared in recent years, sparking federal investigation Progress Report: Linux 7.0 - Asahi Linux GitHub - knight-lang/knight-lang: The Knight Programming Language Darkness around us in all but forgotten now. In reply, Sartha. Applied AI Strategist - Market Intelligence (Health) at Terra API | Y Combinator The West Forgot How to Build. Now It's Forgetting Code GitHub - agzam/remoto.el: Browse GitHub repos without cloning What the FCC router ban means for FOSS My Homemade PBX [Announce] GnuPG 2.5.19 released Wakamoleguy - home of the wakamoleguy Agents Aren’t Coworkers, Embed Them in Your Software The Stanford Freshmen Who Think They Rule the World IPv7: Identity-Centric Network Protocol for Security, Proxy Mitigation, and Operability GitHub - rockcat/HATS: AI Personas Release v26.04 · niri-wm/niri UK to permanently ban future generations from buying cigarettes: 'It will save lives' The First Sixty Seconds Only One Side Will Be The True Successor to MS-DOS! – Windows 2.x – GUI Wonderland #12a GitHub - nakagami/grdpwasm GitHub - MartinGalway/C64_music: Music source files from 1980's Commodore 64 games GitHub - nex-crm/wuphf: Slack for AI employees with a shared brain. Get Claudes, Codexes and OpenClaws to collaborate and do your work autonomously while never losing context. iCloud 钥匙串的托管安全性 Quirks of Human Anatomy by Lewis Held GitHub - magiblot/tvision: A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support. GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. Stash — Your AI has amnesia. We fixed it. Cosmology with Geometry Nodes quantumslop/URANDOM_DEMO.md at 25ad2e76ae58baa96f6219742459407db9dd17f5 · yuvadm/quantumslop Repairing the Ruins: Why AI Can’t Replace Education Databases Were Not Designed For This The bull case for graph DBs in law GitHub - manankharwar/fusioncore: ROS 2 sensor fusion SDK: UKF, 3D native, proper GNSS, zero manual tuning. Apache 2.0. Tell HN: Claude 4.7 is ignoring stop hooks Could a Claude Code routine watch my finances? | Driggsby GitHub - adam-s/HNswered: Notifies you when someone replies to your Hacker News posts and comments. The Nintendo Switch Switch Changelog | OpenAI API LLM research on Hacker News is drying up – Dylan Castillo GitHub - delta-hq/cc-canary CSS As A Query Language · evdc.me Tesla (TSLA) quietly discloses $2 billion AI hardware company acquisition buried in filing Params Vs Compute Add DOS platform support (DJGPP) by AJenbo · Pull Request #15377 · libsdl-org/SDL Diatec, known for its mechanical keyboard brand FILCO, has ceased operations. Google Plans to Invest Up to $40B in Anthropic Why I Cancelled Claude: Token Issues, Declining Quality, and Poor Support Why you should refuse to let your doctor record you Why I’m Done Making Desktop Applications GitHub - trycua/cua: Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows). On sabotaging projects by overthinking, scope creep, and structural diffing Machine Learning Supports Existence of Previously Unrecognized Transient Astronomical Phenomena in Historical Observatory Images Tariffs Raised Consumers’ Prices, but the Refunds Go Only to Businesses Linux 7.1 Removes Drivers For Long Obsolete Input Hardware: Bye Bus Mouse Support Amateur armed with ChatGPT ‘vibe maths’ a 60-year-old problem Learn Something Old Every Day, Part XX: 8087 Emulation on 8086 Systems GitHub - AndrewVos/endless-toil: Hear your agent suffer through your code GitHub - RivoLink/leaf: Terminal Markdown previewer — GUI-like experience. Neukgu: South Korea police arrest man over AI image of runaway wolf The Rich and Powerful Want to Live Forever GitHub - matz/spinel DeepSeek V4 - almost on the frontier, a fraction of the price My .config Ship of Theseus Habitual coffee intake shapes the gut microbiome and modifies host physiology and cognition There Will Be a Scientific Theory of Deep Learning deepseek-ai/DeepSeek-V4-Pro · Hugging Face DeepSeek V4 Preview Release | DeepSeek API Docs GitHub - Nimaoth/Nev: Nev is a keyboard focused GUI and terminal text editor Why I Write | The Orwell Foundation The George Business by Roger Z Redesigning the Recurse Center application to inspire curious programmers - Blog - Recurse Center US special forces soldier arrested after allegedly winning $400,000 on Maduro raid How Hard Is It To Open a File? Using the internet like its 1999 - The Universe of Joshua Blais Endangered Mexican axolotl discovered by girl, 10, under a bridge in Wales Meta tells staff it will cut 10% of jobs These Middle Eastern News Sites Are Actually U.S. Government Propaganda Operations GPT-5.5: Mythos-Like Hacking, Open To All Astronomers Find the Edge of the Milky Way Meshcore.io - Why The Split? - MeshCore Blog My Phone Replaced a Brass Plug Incident with multiple GitHub services If America's So Rich, How'd It Get So Sad? Decoupled DiLoCo: A new frontier for resilient, distributed AI training French government agency confirms breach as hacker offers to sell data Math is hard To Protect And Swerve: NYPD Cop Has 547 Speeding Tickets Yet Remains On The Force GitHub - NV404/gova Trump administration reclassifies cannabis as less dangerous Optimizing Datalog for the GPU Release raylib v6.0 · raysan5/raylib GitHub - russellromney/honker: SQLite extension + bindings for Postgres NOTIFY/LISTEN semantics with durable queues, streams, pub/sub, and scheduler On commenting and approving pull requests Writing a C Compiler, in Zig
GitHub - tensorzero/tensorzero: TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
hek2sch · 2026-06-13 · via Hacker News: Front Page

TensorZero Logo

GitHub Trending - #1 Repository Of The Day

TensorZero is an open-source LLMOps platform that unifies:

  • Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
  • Observability: store inferences and feedback in your database, available programmatically or in the UI
  • Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
  • Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
  • Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

You can take what you need, adopt incrementally, and complement with other tools. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM provider.

TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today.

Demo

tensorzero-demo.mp4

Features

Note

🆕 TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests.

It dramatically improves the performance of LLM agents across diverse tasks:

Bar chart showing baseline vs. optimized scores across diverse LLM tasks

Learn more →

🌐 LLM Gateway

Integrate with TensorZero once and access every major LLM provider.

Supported Model Providers

Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI (Grok).

Need something else? TensorZero also supports any OpenAI-compatible API (e.g. Ollama).

Usage Example

You can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client.

  1. Deploy the TensorZero Gateway (one Docker container).
  2. Update the base_url and model in your OpenAI-compatible client.
  3. Run inference:
from openai import OpenAI

# Point the client to the TensorZero Gateway
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")

response = client.chat.completions.create(
    # Call any model provider (or TensorZero function)
    model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": "Share a fun fact about TensorZero.",
        }
    ],
)

See Quick Start for more information.

🔍 LLM Observability

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.

📈 LLM Optimization

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.

  • Optimize your models with supervised fine-tuning, RLHF, and other techniques
  • Optimize your prompts with automated prompt engineering algorithms like GEPA
  • Optimize your inference strategy with dynamic in-context learning, best/mixture-of-N sampling, etc.
  • Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
  • Soon: synthetic data generation

📊 LLM Evaluation

Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.

  • Evaluate individual inferences with inference evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs)
  • Evaluate end-to-end workflows with workflow evaluations with complete flexibility (≈ integration tests for LLMs)
  • Optimize LLM judges just like any other TensorZero function to align them to human preferences
  • Soon: more built-in evaluators; headless evaluations
Evaluation » UI Evaluation » CLI
docker compose run --rm evaluations \
  --evaluation-name extract_data \
  --dataset-name hard_test_cases \
  --variant-name gpt_4o \
  --concurrency 5
Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4
Number of datapoints: 100
██████████████████████████████████████ 100/100
exact_match: 0.83 ± 0.03 (n=100)
semantic_match: 0.98 ± 0.01 (n=100)
item_count: 7.15 ± 0.39 (n=100)

🧪 LLM Experimentation

Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

  • Run adaptive A/B tests to ship with confidence and identify the best prompts and models for your use cases.
  • Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more.

& more!

Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.

  • Build simple applications or massive deployments with GitOps-friendly orchestration
  • Extend TensorZero with built-in escape hatches, programmatic-first usage, direct database access, and more
  • Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.
  • Iterate quickly by experimenting with prompts interactively using the Playground UI

Frequently Asked Questions

How is TensorZero different from other LLM frameworks?

  1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback.
  2. TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc.
  3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges.

Can I use TensorZero with ___?

Yes. Every major programming language is supported. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM provider.

Is TensorZero production-ready?

Yes. TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today.

Here's a case study: Automating Code Changelogs at a Large Bank with LLMs

How much does TensorZero cost?

TensorZero (LLMOps platform) is 100% self-hosted and open-source.

TensorZero Autopilot (automated AI engineer) is a complementary paid product powered by TensorZero.

Who is building TensorZero?

Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our $7.3M seed round announcement and coverage from VentureBeat. We're hiring in NYC.

How do I get started?

You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.

Get Started

Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero.

Questions? Ask us on Slack or Discord.

Using TensorZero at work? Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team (free).

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero

This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.

Agentic RAG — Multi-Hop Question Answering with LLMs

This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.

Writing Haikus to Satisfy a Judge with Hidden Preferences

This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.

Image Data Extraction — Multimodal (Vision) Fine-tuning

This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).

Improving LLM Chess Ability with Best-of-N Sampling

This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Blog Posts

We write about LLM engineering on the TensorZero Blog. Here are some of our favorite posts: