惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

MyScale Blog
MyScale Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Google DeepMind News
Google DeepMind News
C
Cisco Blogs
量子位
WordPress大学
WordPress大学
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
C
Comments on: Blog
Blog — PlanetScale
Blog — PlanetScale
PCI Perspectives
PCI Perspectives
Martin Fowler
Martin Fowler
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
D
DataBreaches.Net
T
The Exploit Database - CXSecurity.com
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
Simon Willison's Weblog
Simon Willison's Weblog
Stack Overflow Blog
Stack Overflow Blog
月光博客
月光博客
T
Troy Hunt's Blog
L
Lohrmann on Cybersecurity
L
LangChain Blog
Security Latest
Security Latest
A
Arctic Wolf
博客园 - Franky
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
C
Check Point Blog
V
Vulnerabilities – Threatpost
博客园 - 聂微东
SecWiki News
SecWiki News
H
Hackread – Cybersecurity News, Data Breaches, AI and More
I
Intezer
腾讯CDC
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
N
News and Events Feed by Topic
E
Exploit-DB.com RSS Feed
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Engineering at Meta
Engineering at Meta
Microsoft Security Blog
Microsoft Security Blog
Google DeepMind News
Google DeepMind News
Spread Privacy
Spread Privacy
Recorded Future
Recorded Future
C
CERT Recently Published Vulnerability Notes
Last Week in AI
Last Week in AI
大猫的无限游戏
大猫的无限游戏
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
小众软件
小众软件

Hacker News - Newest: "AI"

Show HN: AI Chess Coach GitHub - ninjahawk/singleton-attractor: Why one dominant intelligence is the inevitable long-run outcome in any competitive recursive-improvement environment. Ask HN: How can you have fun doing corporate dev work in the age of AI tools? AI Mistakes Are Infuriating Gamers as Developers Seek Savings Polish Nobel literature laureate Tokarczuk sparks controversy after admitting using AI The AI Slot Machine is Draining My Creativity Knowledge: You can just build your own AI feed to keep up, without the noise To Understand AI, Think Like A Dragonfly Polsia Raises $30M as Its AI Autonomously Runs 7,600 Businesses AI companies use malware proxies to mount DDoS attacks on web sites AI cost crisis hits tech giants as employee 'tokenmaxxing' backfires &mdash; agentic AI eats up to 1000x more tokens than standard AI, sparks corporate pullback at Microsoft, Meta, and Amazon Frello — A small revolt against bloated software CostHawk - Track AI Adoption, Cost, and Rollout Across Your Team AI is changing the internet forever I let an AI agent loose on my network – it owned my supply chain in 12 minutes GitHub - ogulcancelik/herdr: agent multiplexer that lives in your terminal. fifa2026 GitHub - openclaw/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 AI doesn't divide developers — it just reveals them greenvilleAI.coffee · Gloomy Doomy Ask HN: Where AI Researchers Congregate? Show HN: Directionally bad – a newsletter about risks of AI centralization The AI Great Leap Forward (a warning) AMD's Lemonade SDK For AI Promotes macOS To GA Status, ROCm 7.13 Integrated Embedded acoustic AI with <16ms latency running on 8MB RAM Does anyone in your organisation own "correctness" in your AI products? Return on Intelligence, Part 3: Moats | rebecca powell GitHub - monkidy/ai-ops-sop-pack: Documentation-only SOP pack for bounded AI-agent engineering operations: PR audit, crash recovery, handoff discipline, templates, and stop conditions. AI Was Used to Recreate the Voices of Dead Pilots. The NTSB Responded by Locking Down Its Database. - Fire ... AI Visibility Engineering Glossary — AIMENSION™ Terminology AI-Declaration.md 10 AI Prompt Examples and Techniques for Better AI Outputs in 2026 Is U.S. AI Adoption Plateauing? A Comprehensive Analysis Is AI Becoming Too Smart for Its Own Good? [audio] ThinkLLM — Think through your LLM choices Show HN: Waiting for AI Grand Prix racing SIM? Me too So I made one How the Library of Congress is using both AI and volunteers to unlock public broadcasting history Verification Tree Architecture: A Probabilistic Attention Orchestration Framework for Bug Report Management in the AI Era Let Me AI That For You The dominant paradigm in AI development is scale. Bigger models, more parameters, more compute. PHI // DRIFT is a different bet. It's a cognitive middleware architecture built on a single thesis: that distinct, continuous, contextually coherent behavior in an AI companion emerges not from model weights alone — but from what is assembled into the prompt, what is retrieved from memory, and what structured state is updated between turns. Five architectural contributions: DMU — Decision Memory Unit. Replaces cosine similarity retrieval with exp(-t/τ) × reinforcement × contextual × extra. Memories are scored by what mattered to the system's ongoing state — not just what was semantically adjacent. Ablation confirmed 14.8% more context injected per prompt than cosine-only RAG. On CPU-only hardware that's a 45.4% latency difference. PEDI / DII — Persistence-Embodiment-Drift Index. A five-component falsifiable proxy metric for behavioral continuity across context resets. Not a claim about consciousness. A measurement Is AI Profitable Yet? Chemical & Engineering News What it takes to run an AI coworker on iMessage 94% will keep spending on AI even when it fails Purr - Apps on Google Play Ask HN: What to learn and do, that makes me least affected by AI in STEM? Second Brain — Your AI Tools Finally Remember You atom.plumocracy.com Cheap AI could derail OpenAI and Anthropic's IPOs The Future of AI-Facilitated Medicine I used $30,983 of AI tokens last month in Claude code on $200/mo plan Don't just 'quote' the AI Did Google’s AI agents really build an operating system for $916? AI and Doctrinal Collapse Two Loops: How China's Open AI Strategy Reinforces Its Industrial Dominance [pdf] Battle over WiseTech AI job cuts intensifies amid China staff accusations Provenance Exclusive: Departing Meta staffer posts biting anti-AI video internally amid mass layoffs GitHub - ppserapiao/mneme: the open, user-sovereign memory layer for AI. local-first · client-side encrypted · open protocol. your memory. your keys. every model. GitHub - Hades-HY-LI/ai-native-founder-playbook-skills: Provider-neutral AI agent skills for AI-native startup founders across Idea, MVP, Launch, and Scale. AI is minting new billionaires, and workers want their share GitHub - anomalyco/models.dev: An open-source database of AI models. Pivoting to reach a wider audience and hitting a 5-figure MRR Datapoint AI China: What I Learned from the AI Labs, Robotics Startups and Academia Home Why Tech Companies Are Quietly Cancelling AI Data Centers [video] On AI Maybe AI Bots Are (Mostly) Harmless ASK HN: AI was always a probability problem? Ask HN: How to get involved and meet people in AI in SF? AI users re-create dead pilots’ voices from crash investigation docs Linux Sound Subsystem Also Seeing Many Fixes Driven By AI/LLMs GitHub - GitMonsters/13-Impossible-ARC-Tasks-SOLVED: 13 ARC-AGI-2 tasks with 0% AI solve rate — solved by TranscendPlexity. NVARC, GPT-4, Claude, Gemini: 0/13. We got 13/13. Verified, deterministic Python solvers. Gen Z is not booing AI. It is booing its own job market AI #169: New Knowledge AI as a Design Medium Frontier labs don’t use most AI compute (yet) It's 2026...where are all the AI NPCs? Ask HN: Do people lie about why they hate AI writing on social media? CoreMem - Your context, any AI agent Sundar Pichai discusses AI search VICTORY: POLITICO agrees to shut down both AI tools at center of landmark arbitration AI's Plummeting Prices Are a Software Story, Not a Hardware One The Invisible Cliff: AI Development and Architectural Debt Show HN: AI-Mirror - Self-optimising ranking engine for modern web applications. Starbucks is ditching an AI inventory tool that kept miscounting milk and syrups Why lawyers keep citing fake cases invented by AI Microsoft reports are exposing AI's real cost problem: Using the tech is more expensive than paying human employees | Fortune How do AI chips work? [video] Navigating the New Frontier: AI's Role in Revolutionizing Mathematics and the Quest for Ethical Science Show HN: My dad is a forensic accountant. I automated ~62% of his job Trump's unsigned AI executive order Mdview.io – a Markdown viewer for AI era documentation Anti-"doomer" feedback derails Trump's AI executive order Agents League: The Esports-Inspired Hackathon Where AI Agents Battle for Glory The AI Superstars Who Say a 'Vibe Slop' Crisis Is Coming Show HN: Lilo – An open source personal AI assistant that lives in Telegram Google’s AI search is so broken it can ‘disregard’ what you’re looking for Cannes Film Cost $500k to Make. $400k Was AI Compute Costs
AgenticVBench
ameddserM · 2026-05-24 · via Hacker News - Newest: "AI"

May 2026

Can AI agents do real-world post-production work?

We gave the 7 best frontier models 100 expert-authored tasks across the four stages of post-production. The best agent barely crosses 30%. Human experts scored 89%.

100

Tasks

20

Industry experts

7

Frontier models

4

Task families

Why this benchmark exists

Verification is not here for free.

RLVR works in math and code because centuries of humanistic work built the verifiers, the bill was paid before we got there. Creative work hasn't paid that bill. AgenticVBench is what paying it looks like in film.

Read the full essay →

RankAgentAvgRepurposeSeqRepairAssembly
·Human expertsreference88.5%95%90%88%81%
1GPT-5.5· Codex31.0%± 4.030%26%30%38%
2GPT-5.5· OpenCode27.4%± 3.527%20%27%37%
4Claude Opus 4.7· Claude Code22.1%± 3.530%20%17%22%
5GPT-5.5· OpenClaw21.9%± 2.920%29%21%18%
6Claude Opus 4.7· OpenClaw21.1%± 3.418%19%25%22%

What the bench tests

Four task families spanning the real-world post-production workflow.

Authored by 20 industry experts averaging 6 years of post-production experience. Tasks span 30 minutes to one week of human work.

Assembly

18 tasks

43

pp gap

Given a storyboard with 3–6 slots and a shuffled pool of candidate clips, select the clip that matches each slot.

Best agent 38%Human 81%

Repair

18 tasks

59

pp gap

Given a video with defects (frozen scene, scene swap, color drift, or audio noise), localize them and produce a fixed cut.

Best agent 30%Human 88%

Sequencing

28 tasks

61

pp gap

Given a brief story overview and a shuffled set of clips, recover the correct narrative order.

Best agent 29%Human 90%

Repurpose

36 tasks

65

pp gap

Given 4-150 minutes of source video and a creative brief, repurpose it into a short deliverable that follows the brief and preserves the story.

Best agent 30%Human 95%

The harness finding

The harness matters as much as the model.

Holding the model fixed and varying the harness shifts GPT-5.5's Assembly score by 20 percentage points, comparable to the gap between adjacent models on the leaderboard.

Most benchmarks today are still model-based. The data here says that's wrong. Agent performance is determined by both the model and the scaffolding around it. Reporting only the model misses the larger story.

Agent = model × harness.

GPT-5.5 on Assembly · score by harness

Codex

38%

OpenCode

37%

OpenClaw

18%

Same model. 20-point swing.