慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
多例对比与零例对比:增列实例反损精准之辨
Gabriel Anha · 2026-05-24 · via DEV Community

"增一例以示之"者,乃互联网上最常授之提示也。Stack Overflow。Twitter。半数提示工程之YouTube频道。其效甚彰于2022年。

今实作之务,已谬矣。此例非惟无益,反损其精,增令费,使出偏于所例之形。尔乃载其谬而咎于模。

此篇旨在防其谬于实作之始。

二零二二年之训,成Cargo Cult。

昔GPT-3为前沿时,少样本乃小奇术。其基模不谙任务之框架,必先显之此即答案之形貌也。2020年之文,创"语言模型乃少样本学习者"之语,非戏谑也。实乃真实之能力鸿沟也。

三世之后,事有变。调教指令之模(GPT-4o、Claude Sonnet 4、Gemini 2.5)不待例示,已能解"自此文摘日,返JSON";推论之模(o系、Claude Opus思延、Gemini思变)更进。述事而后退,其效常优。例示反成噪,模须辨之。

所言之谏未更。众以"此为输入,此为输出"之式,复贴于Claude 4.7之提示,盖因二零二二年之博文告之也。及至其问,何故其评鉴之貌怪诞耶?

三务形,例反为害

非事事皆然。少样本犹有其所用,于狭分类、新输出之式,及数数学之理(后详)。然有三形,加例则失其精,而此三形,于生产中恒现。

高召回提取。 汲取一文档中某类之诸实体。名、日、细目、索引用、交易号。其弊:汝之一例,自二百字之文段中得四日。模型暗契“此任返四物”,虽实文有十一,亦止于四。忆失之。其解非增例也。但移其锚。非例也。

创意生发。 营销之辞,替代文字,摘要之文,皆有特定之音。今之例乃提示中最具体之物,故模型摹其节奏,句之长短,及词汇而作。尔欲变;尔得五十输出,皆若例文。甚者,尔着陆页之A/B测试初显模型之例句,直漏于生产之文。其解:以散文述其音,然后信模型。

严令遵式. "以 A、B、C 为字段输出 JSON;无序言;无 Markdown 括号。"此模式于 Claude 4 与 GPT-4o 皆善。添一例示,则模型始仿 结构之择。出诸例:引文之式,字段之序,是否裹JSON于代码栏。若例有尾行,今半数输出亦具尾行。汝硬编缺陷。

三者之式一:例过锚。模型视之为先例,非为微示。于已通任务之模,此乃退步。

editorial illustration of two prompt scrolls labeled zero-shot and few-shot weighed against each other

何故边疆推理之模态反其少射之算

推理之模态,应答之前必先思量。其以隐秘之符为筹,谋略、草拟、修正。若尔授之少射之例,则其耗推理之资效颦 徒仿其例,不究其本。此情可于追迹之器观之。思之轨迹,将引例为鉴,力效其形,复制其果,使自出之语,与之相合。

欲使模型自悟其理,此乃耗。O系文档明言此弊:减少少例提示,尚明任务之旨。Anthropic之Claude 4.x提示工程指南亦言及此,谓延思之弊。例证可断其思链。

此非小效也。吾尝睹一队于内中运一数学格式之测试,自三击之提示易为零击于Claude Opus之深思,精度增六分。彼少击之版乃其已发行之版也.

三十行之消融,可于己任证之

勿信此帖,勿信模型卡。当以实际任务、实际数据,行消融实验。三十行Python而已。

import asyncio
import json
from anthropic import AsyncAnthropic

client = AsyncAnthropic()
MODEL = "claude-sonnet-4-5"

ZERO_SHOT = "Extract every date mentioned in the text. Return JSON array of ISO-8601 strings. No preamble."
FEW_SHOT = ZERO_SHOT + """

Example input: "We met on March 3rd 2024 and again on the 7th."
Example output: ["2024-03-03", "2024-03-07"]"""

async def run(prompt: str, text: str) -> list[str]:
    msg = await client.messages.create(
        model=MODEL,
        max_tokens=512,
        messages=[{"role": "user", "content": f"{prompt}\n\nText:\n{text}"}],
    )
    return json.loads(msg.content[0].text)

async def main(eval_set: list[dict]) -> None:
    for variant, prompt in [("zero", ZERO_SHOT), ("few", FEW_SHOT)]:
        preds = await asyncio.gather(*[run(prompt, row["text"]) for row in eval_set])
        hits = sum(set(p) == set(r["truth"]) for p, r in zip(preds, eval_set))
        print(f"{variant}: {hits}/{len(eval_set)} exact-match")

asyncio.run(main(json.load(open("eval.json"))))

入全景模式 出全屏模式

此事毕矣。以五十例标示于真实流量之中,不足一息,其隙(或无隙)自现。答案之形,重于绝对之数。若零射者以四分之差胜于五十例,则尔之少射提示,实损尔之精准也。符文。

诚行之道,有二要义。其一,例须多变。劣之少射之效,非弊于法,乃弊于例。试三异例;若皆败于零射,则法非其任。其二,观谬答,非唯计分。若少射败于记少,零射败于幻生,此乃异弊,当有别治。

当例实助时

犹有三形之务,

标签隐晦之分类,汝将支援票牍分属十四内类,其名曰T1-billing-disputed-charge。__JHSNS_SEG_a19fd7cd_52__勿尽弃之。。此模无识君之分类意,独观其标而已。每类二例,可教其界;显标一例,亦足。精度跃迁,自"猜度"至"有用"

新式输出之形。需输出于前所未见之定制DSL。此乃内部规则引擎之配置。专属查询语言。示之不言之。二例胜于四百字之规,盖模型解例较之散文为速也。

数学之少样本思维链。"循序渐进"之术,合以实例。犹效。例固锚之推理之式,非答案之形,于数理则胜。推论之模,大抵已受此益,然于非推论之列(如无思之Sonnet,GPT-4o-mini),犹能移数。

若业非此三者,当守零射,惟当例证显其助益时,方加之。

editorial illustration of an over-stuffed scroll crowding out a single command line

例之长短,重于数。

此乃众队所遗之要。其害之增,随例之长,非随例之多也。

五例为式,每例十五符,可也。一例为式,独例八百符(一整篇文档及其全然萃取),则默然毁尔精准。模型须于处理实输入时,于注意中持八百符之“此乃佳貌”,信噪比遂崩。

此代币经济之推,同趋一途。例不缓存,非以提示缓存之(Anthropic之cache_control,OpenAI之自动前缀缓存)。若君之一例含八百代币,而发百万请托,则耗八百万代币,复演同例。此乃巨款之一项,或损其精也。

两则实用之规:

  1. 每例之长,当限模型有效视窗之十分之一,以应其务。若汝所处理者乃二千词之输入,则每例须在二百词以下。当峻切删削。模型非需全文,惟需输入输出之形耳。
  2. 若不能压缩下例之帽,则此任务或欲别法。依式解码、结构化输出API,或微调,皆胜于"庞杂粘滞之例"于生产经济。

若诚需长例,则提示缓存为权宜。设cache_control: {"type": "ephemeral"}置诸例块之上,首请之后,例分摊于诸请,直至缓存TTL尽(Anthropic之默认五分)。此可减费,然不能解心神之散。无论缓存与否,八百符之例犹可没二百符之入.

所得之要

少射乃器非常。二零二二年之谏适于二零二二年之模。三代模出,此乃所求之器而后尔未尝量之也。

三则之规以行舟也:

  • 以零样本为默认。唯当消融实验证明其有助于任务时,方加例证。
  • 推理之模,益趋零样本。此例竞于思迹。
  • 例之长短,重于数之多少。长例乃每请之税也。

前三十行之费,当于初删六百词之少例时自偿,此少例既耗汝分文,复损汝金钱。

何谓提示,删去例文而任务之效更优?于评论中列其前后数字。


若此有益

提示工程袖珍指南》 探赜索隐,究极各术之效:零样本、少样本、思维链、自洽性,及结构化输出之范式,悄然取代半数旧法。论及消融之学,恰与是篇相契:试之方术何如,误答之理何在,及“最佳实践”何时化Cargo Cult于吾任之特务。

Prompt Engineering Pocket Guide: Techniques for Getting the Most from LLMs