為何優秀工程師在應用AI時表現變差

Hacker News - Newest: "AI"

AI can't read an investor deck AI as an attorney? Student uses ChatGPT, Gemini to sue UW Hacking MCP Servers in AI Systems – The Rug Pull: Tool Changes After Approval GitHub - MeepCastana/KubeezCut: Free Web based video editor GitHub - GenAI-Gurus/awesome-eu-ai-act: Curated tools, official sources, OSS, templates, and guides for EU AI Act compliance. Can AI judge journalism? A Thiel-backed startup says yes, even if it risks chilling whistleblowers Coming soon: 10 Things That Matter in AI Right Now DARPA built an AI to fact-check enemy weapons claims IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures What explains heterogeneity in AI adoption? When AI Meets Muscle: Context-Aware Electrical Stimulation Promises a New Way to Guide Human Movements - Department of Computer Science AI Changed How We Build. It Did Not Change What Matters. Linux rules on using AI-generated code - Copilot is OK, but humans must take 'full responsibility for the… Meta spins up AI version of Mark Zuckerberg to engage with employees Code Mode: Let Your AI Write Programs, Not Just Call Tools | TanStack Blog GitHub - Delavalom/graft: Go framework for building AI agents. Type-safe tools, multi-provider (OpenAI, Anthropic, Gemini, Bedrock), zero vendor SDKs. India's TCS tops estimates, says new AI models did not dent services demand Gen Z's fading AI hype Strong feeling: we are in a folded AI reality GitHub - machinarii/total-recall-catalog: A reference catalog of latest knowledge retrieval, memory & RAG systems GitHub - mensfeld/code-on-incus: Give each AI agent its own isolated machine with root, Docker, and systemd. Active defense detects and stops threats automatically.. Quantization, LoRA, and the 8% Problem: Benchmarking Local LLMs for Production AI Iran war: We spoke to the man making Lego-style AI videos that experts say are powerful propaganda Powell, Bessent discussed Anthropic's Mythos AI cyber threat with major U.S. banks GitHub - immartian/bellamem: Persistent belief-graph memory for AI agents. Retrieves decisive context by importance — not recency, not RAG, not /compact. recursive-mode: The Repo-Native Operating System for AI Engineering After the attack on Sam Altman's home, will AI CEO's go on the offensive? The biggest advance in AI since the LLM Opus 4.6 vs GPT 5.4 One Prompt Unity World Generation Test “AI polls” are fake polls Client Challenge Can AI be a 'child of God'? Inside Anthropic's meeting with Christian leaders How to Switch AI Chatbots and Why You Might Want To GitHub - MattMessinger1/agentic_refund_guardrail: Safe refund policy layer for AI agents — Python + TypeScript. Same behavior, shared tests. Adam/papers/emergent_values_whitepaper.md at master · strangeadvancedmarketing/Adam Ask HN: How do you stop playing 20 questions with your AI coding tools How far can automation and AI support psychotherapy? - @theU GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits A Mac Studio for Local AI — 6 Months Later A History of the Early Years of AI at the University of Edinburgh Why AI Coding Tools Still Feel Stuck on Localhost MSN AI Datacenters Are Becoming Strategic Targets twitter.com Penn Researchers Use AI to Surface Unreported GLP-1 Side Effects in Reddit Posts Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 AI models are terrible at betting on soccer—especially xAI Grok GitHub - xialeistudio/echoic GitHub - HimashaHerath/github-dev-wrapped: AI-powered weekly GitHub activity reports deployed to GitHub Pages

為何優秀工程師在應用AI時表現變差

sneruz · 2026-05-24 · via Hacker News - Newest: "AI"

10x 工程師正在回歸到平均值.

法蘭西斯·高爾頓於1886年命名了這種效應^[1]，當他注意到身高異常高的父母所生的孩子更接近平均身高。LLMs在設計上就是回歸機器。解碼步驟從提示中最可能的續接中採樣。那是訓練分佈的平均值，條件是根據你輸入的內容。

Four code samples arranged around a regression curve, converging on a center. — 平均數迴歸。標準模式被提升到它上面；新穎算法被拖到它下面。相同機制，相反結果.

效果是不對稱的。在常規工作中，10倍優秀的工程師變成100倍。在創新工作中，同一個工程師被拖到平均水準，交付看起來正確卻不實際的程式碼。模型不知道你是哪一種.

失敗的樣子

文件說明描述了行為。你只能指定你已經知道的事項.

我使用了一篇來自ICML 2026^[2]的論文，其貢獻是一個注意力核公式。我移除了實現部分，並向DeepSeek V4 Pro傳送了簽名和文件說明，然後在完成時捕捉了logprobs.

import torch
import torch.nn.functional as F

def spherical_attention(Q, K, V):
    """
    Attention with spherical-constrained Q, K and positive scoring kernel.

    Queries and keys are normalized to the unit sphere. A positive kernel
    function maps the cosine similarity between query and key directions
    to an attention score. Scores are normalized per query and used to
    weight V.

    Args:
        Q, K, V: (batch, heads, seq, head_dim) tensors.

    Returns:
        Attention output of shape (batch, heads, seq, head_dim).
    """
    Q = F.normalize(Q, dim=-1)
    K = F.normalize(K, dim=-1)
    S = torch.einsum('bhqd,bhkd->bhqk', Q, K)
    C = 2.0 + 1e-6
    S = S**2 / (C - 2*S)                      # Yat-kernel
    A = S / S.sum(dim=-1, keepdim=True)
    O = torch.einsum('bhqk,bhkd->bhqd', A, V)
    return O

Model completion with per-token uncertainty heatmap. The kernel line torch.relu(S) + 1e-6 is highlighted red. — 模型的完成結果。更深的紅色表示信心更低。

七條相同線條。一條不同：當紙張寫入S**2 / (C - 2*S)（Yat-kernel，紙張的貢獻），模型寫入了torch.relu(S) + 1e-6。模型從常見的正向函數中採樣：ReLU、softplus、exp。Yat-kernel不在候選集中。

當給予公式時，模型能夠正確處理。瞭解公式就不需要模型了。承重線上的程式碼結構正確，但公式錯誤。

它無法失敗的地方

2026年5月，OpenAI的推理模型推翻了Erdős單位距猜想^[4]，一個自1946年以來一直開放的組合數學問題。DeepMind的AlphaProof Nexus在^[5]的同一週內解決了353個開放Erdős問題中的九個。

兩者使用相同的結構：模型產生候選的構造；Lean，一個形式證明檢查器，驗證每一個。證明要么編譯成功，要么失敗。看起來像AI解決新數學問題，實際上是在一個有真實答案的Oracle的空間中進行搜索。

核心實驗沒有神谕。模型生成了一個完成結果，沒有任何東西驗證它，而且最可能的標記是 ReLU。logprobs 顯示在那行上存在不確定性；模型知道它處於尾部。但由於下游沒有驗證器，不確定性會坍塌成模態標記.

什麼是永恆的

你可能會預期這會自動解決：發表論文，下一個模型在它上面進行訓練，差距就會縮小。其中一部分確實如此。但前沿總是位於截止點之外，而最高價值的成果從未發表過。高頻交易定價邏輯、FAANG基礎設施、銀行風險系統仍然留在公司防火牆後面 ^[6]。總是有一些尾部，而最好的工程師就在這些尾部工作。

罕有性是診斷。標準應用程式碼位於分佈的中心附近，而模型將其提升。罕見模式位於尾部，模型在此處對其學習不足^[3]，並產生形狀相同且自信錯誤的結果。

保持銳利的工程師知道哪些線條承載貢獻。模型不知道。如果你一直在委派判斷哪些線條重要的任務，你才是那個在退步的人。

參考

維基百科。平均回歸 | 發現。維基百科。
Luna、Bouhsine 與 Choromanski。SLAY: 具備幾何感知的球面線性化注意力機制與 Yat-核。arXiv:2602.04915，2026。ICML 2026。
Kandpal 等人大型語言模型難以學習長尾知識。arXiv:2211.08411，2023。ICML 2023。
OpenAI。有關單距離猜測的證明的評論arXiv:2605.20695，2026
Google DeepMindAlphaProof Nexus。arXiv:2605.22763，2026
Ahmed 等人研究開源和開放數據上的大語言模型性能 arXiv:2402.15100， 2024。

此內容由慣性聚合(RSS閱讀器)自動聚合整理，僅供閱讀參考。原文來自 — 版權歸原作者所有。

推薦訂閱源

Hacker News - Newest: "AI"

失敗的樣子

它無法失敗的地方

什麼是永恆的

參考