惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38) Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive Resume Building using HTML & CSS SpecFlow: Multi-Agent SDD in Cursor (4 phases, /approve, single code writer) Running ASR for smart homes in the NPU of Intel processors "Building a CI/CD Pipeline From Scratch: A Practical Guide for Developers (with GitHub Actions)" SpecFlow: SDD multi-agente en Cursor (4 fases, /approve, un solo escritor de código) How to Extract Your Full Team Hierarchy from HubSpot (the API doesn't expose it) Adobe Commerce Cloud now costs $40k/year. We migrated from Adobe Commerce to Magento Open Source — here's the honest breakdown .klickd v4.0.0 — Portable AI memory with constraints, strict schemas, and test vectors We Trust Third Party Code, It’s Time to Trust AI Generated Code LangGraph 워크플로우 템플릿 (v38) Sustainable AI Starts with Efficient AI Find Remove duplicated files in Google Drive How to Detect GPU Waste in a Kubernetes Cluster The Privacy Bug in My First Chrome Extension (And How to Avoid It) Serverless Mental Models: What They Don't Tell You Before You Build Preventing GPT hallucination in automated content pipelines: how I structure Make.com flows with data injection Hmm, where were we? AI Visibility Tools, Math Proofs, and Stripped Guardrails Shape Developer Landscape How AI and Electronics Are Changing Healthcare Devices: The Future of Smart Healthcare Author: Shivam Wakade | Founder, PrivSR Making Claude Sound Like Optimus Prime Understanding Reinforcement Learning with Human Feedback Part 5: Training the Reward Model with Loss Functions Learning Progress Pt.20 How Secure LoRa Communication Devices Work: Building the Future of Private and Long-Range Connectivity Author: Shivam Wakade | Founder, PrivSR How I Rebuilt an RPG Map Editor with Rust, React, and WASM Building a System That Automates YouTube Post-Production Building a 100% Serverless Digital Asset Packager in the Browser Game Recommended AI What is Human-In-The-Loop (HITL)? Deep Dive: React Server Components in TanStack Start Migrating off Google Analytics: Umami vs Plausible vs Fathom Building a Portfolio That Actually Demonstrates Software Engineering Async/Await in JavaScript: From Callbacks to Clean Code (2026) Benchmarking LLM Structured Outputs Angular 21 Multiselect Dropdown: A Migration-Friendly Component with Live Functional Tests ShareBox v5 — GPU transcoding, Netflix-style grid, and why I don't need Plex anymore TOML Schema is live Handling Duplicate Shopify Webhook Events (And Why You Must) Original Kubernetes Dashboard — retired upstream, upgraded to Angular 21. لماذا أسست ترينافو للتجار العرب الذين تتجاهلهم المنصات الغربية
A prompt is not a conversation. It's a component contract.
Carlos Salda · 2026-05-26 · via DEV Community

Most of us use LLMs by trial and error. This post gives you a structure: the building blocks of an LLM, and a reusable template for writing production prompts.

What is an LLM?

Foundation models are very large models pretrained on internet-data; that's what builds Generative AI. With a foundation model, you can adapt one pretrained model to many tasks.

A Large Language Model (LLM) is a foundation model for text, and at its core, the same FM can be used for many tasks: summarisation, classification, translation, code generation.

So what an LLM does is predict the next word (next token) in a sequence. At each step it checks the surrounding context of what it has seen so far and then produces a probability distribution over the possible next tokens. Running this in a loop produces fluent, coherent new content. So basically, an LLM gets text and returns text. The input is called the prompt and the output is called completion.

What goes into a prompt

A prompt is any input given to a generative model to produce a desired output. Prompt engineering is the practice of designing and refining those prompts to get the best possible results from the model. Refining a prompt means experimenting with the factors that influence the model's output. A vague prompt can lead to many reasonable responses; every constraint you add reduces the number of possible responses.

A well-structured prompt is usually assembled from four parts.

Building block Role
Instruction The task you want performed.
Context Relevant background that frames the situation for the model.
Input data The specific content the task should operate on.
Output indicator A description (or example) of the form the response should take.

The instruction and context here are two of the template slots, Task and Context, we'll pull together at the end.

And best practice for writing prompts can be organised into four dimensions. Strong prompts attend to all four.

Dimension Practice
Clarity Use simple, direct language. Avoid ambiguous or overly complex terminology so the prompt is easily understood.
Context Provide relevant background and specific details to guide the model's understanding of the situation.
Precision Clearly state the type of response you want, and use examples to illustrate the expected output.
Role-play / Persona Write the prompt from the perspective of a specific character or expert, with enough detail for the model to assume that role effectively.

Think of a prompt like a search beam. Vague = wide beam, the model lands somewhere in a large valid region. Each constraint narrows the beam. Specificity isn't politeness toward the model; it's aiming.

Who reads the output: code or a person

The output is the more interesting part for us. An LLM's output has two possible audiences, a parser and a person, and you write a contract for each. Even when a person reads the output, "looks fine" isn't the same as "matches what I needed." When an LLM's output is read by code instead of eyes, the output is an API response and the prompt is its schema. A human forgives a messy answer; json.loads() doesn't. It either succeeds or throws. Without an explicit spec, the model decides format, length, tone, and depth, and it picks something plausible but generic. Controlling output here means moving that decision from the model to you.

Two kinds of control:

  • Format control: the shape of the output (JSON keys and types; or prose vs. bullets vs. table, headings, sections).
  • Behavior control: what the model may and may not do (e.g. "valid Terraform only, no comments"; or "concise executive tone, no technical jargon").

Each audience fails differently, and one of them fails silently:

Parser (output read by code) Person (output read by a human)
What breaks Markdown fences, chatty preamble, invented/renamed keys, numbers as strings Too long, wrong tone, missing a section, pitched at the wrong level
How it breaks Loud: json.loads() throws, the pipeline stops; you notice immediately Quiet: output looks fluent and complete; the gap only shows on a second read, and nobody flags it

A loud failure is annoying; a quiet failure is more dangerous because a subtly off-target answer can go unnoticed.

There are some mitigations you can apply for better results:

For the parser:

  1. Specify the exact schema.
  2. Forbid the noise.
  3. Give one example of the exact shape.
  4. Set temperature: 0.
  5. Use native structured outputs / tool-calling.

For the person:

  1. State the structure explicitly.
  2. Give a length budget.
  3. Name the audience.
  4. List the required elements.
  5. For a recurring format, show one example of the desired output.

So the point here is that prompt-only format control is a request; decode-time constraints are a guarantee. Treat the model's output as an API response, and the prompt as its schema.

Constraints and Output are template slots too: the rules that prune behavior, and the exact shape you contract for.

The system and user split

We already saw the role as one of the four dimensions of an effective prompt. It's the part lots of people forget about.

A prompt can include three types of messages: system, user, and assistant.

  • System message: defines the model's behavior, rules, and overall role.
  • User message: contains the request or input for the current task.
  • Assistant message: previous responses from the model, used as conversational context or examples.

The "type" is usually set through the role field of each message.

The system message is the component's configuration, while the user message is the input for a specific call.

The system prompt defines persistent behavior and rules. The user message provides the variable data for that particular request. The system sets how the component behaves in general; the user message tells it what to do right now.

Why do roles work? It's not magic.

When you say, "Act as a senior security engineer," the model shifts its output toward patterns statistically associated with that kind of writing in its training data.

Likewise, "Explain this to a junior developer" pushes the model toward simpler, more educational, and more heavily explained responses.

A role doesn't give the model a real personality. It changes the probability distribution of the kind of text the model is likely to generate. This is the Role slot, the first line of the template.

Why the system/user split matters in production:

  1. Caching. The system message is usually stable and reused across requests, which makes it ideal for prompt caching. Keep the system prompt consistent, and place most of the changing data in the user message.
  2. Testability. The system prompt is one of the highest-leverage parts of an AI application. Treat it like code: version it, compare changes, and test it carefully.
  3. Security. Trusted instructions should live in the system message. Untrusted content (user input, retrieved documents, tool outputs) should stay in the user message. When untrusted text lands somewhere the model treats as instructions, you get prompt injection. Clean separation is the first line of defense.

Showing the model examples

One good prompt gets you far. A few techniques get you further. The most useful one: showing the model examples.

In few-shot prompting, the prompt includes a few worked demonstrations. The model uses in-context learning to infer the pattern and apply it to the new input.

The few-shot examples are unit tests that double as a spec.

Why it works: the model is a pattern continuator. A few input→output pairs establish a strong, low-ambiguity pattern, and the model continues it.

When to use which:

  • Zero-shot: simple, common tasks the model has clearly seen many times.

    [
      {
        "role": "user",
        "content": "Summarize this article in 3 bullet points."
      }
    ]
    
  • One-shot: when you mainly need to pin the format.

    [
      {
        "role": "user",
        "content": "Convert countries to JSON.\n\nExample:\nFrance -> {\"country\": \"France\"}\n\nNow convert:\nBrazil"
      }
    ]
    
  • Few-shot: classification, structured output, code style, anything with a specific schema or edge cases the model wouldn't guess.

Those worked demonstrations are the Examples slot.

In the example below, the last message is the new input; the model produces the assistant turn.

[
   {"role": "user",      "content": "Today the weather is fantastic"},
   {"role": "assistant", "content": "positive"},
   {"role": "user",      "content": "I don't like your attitude"},
   {"role": "assistant", "content": "negative"},
   {"role": "user",      "content": "That shot selection was awful"},
]

Enter fullscreen mode Exit fullscreen mode

Tokens are what you pay for

LLMs are not free, and the token is the meter. For an LLM, it's the unit of both cost and latency. For most of us the mental model is familiar: tokens are network payload, and each LLM call is a billable, latency-bearing API request.

Output tokens dominate latency. The model generates text one token at a time, and each new token depends on all the previous ones, so generation has to happen sequentially.

Input works differently: the prompt is processed in a single parallel "prefill" pass, which is relatively fast (even though you still pay for those tokens).

Four levers do most of the work:

  1. Compress the input

    Instead of sending entire documents, summarize them first or extract only the relevant parts. Most prompts ship context the model never reads.

  2. Limit and structure the output

    Set max_tokens, prefer structured formats like JSON or arrays instead of long prose, and ask for concise summaries such as "keep under 120 tokens."

  3. Use the right model for the task

    Small models are faster and cheaper for simple work like classification, routing, or extraction. Save the stronger models for tasks that actually require reasoning or synthesis.

  4. Use caching

    There are two different kinds:

- **Prompt/prefix caching**

    Reuses stable prompt sections like system prompts, examples, or reference documents. Since the provider caches this server-side, you avoid recomputing the expensive input processing step.

    Practical implication: put stable content first and variable content last to maximize cache hits.

- **Response/semantic caching**

    Your own infrastructure stores previous answers and reuses them when the same or a very similar request appears again. This caches outputs, not prompts.

Enter fullscreen mode Exit fullscreen mode

A Reusable Prompt Template

All that we've seen here gives us the structure to build a template to use when working on a production prompt.

The template R-T-C-C-E-O:

[ROLE]        Who the model acts as. Sets the output distribution.
[TASK]        The one thing to do, stated unambiguously.
[CONTEXT]     Inputs, background, data; clearly delimited from instructions.
[CONSTRAINTS] Rules that prune the space. Each maps to a real failure mode.
[EXAMPLES]    1–3 representative input→output pairs (for structured/edge-case tasks).
[OUTPUT]      The exact shape of the response. Schema + example if machine-consumed.

Enter fullscreen mode Exit fullscreen mode

Not every prompt needs all six, but every prompt should be a deliberate subset, not an accident.

Finally, let's make the production checklist, a pre-flight pass before a prompt ships:

  • Role set and matched to the task?
  • Task: could a competent stranger misread it?
  • Constraints: does each map to a failure mode you've actually seen? Drop the decorative ones.
  • Output contract: explicit; schema + example if consumed by code?
  • Format guarantee: decode-time constraint (structured output / tool call), not just a worded request?
  • Examples: present for classification/structured work; diverse, consistent, minimal?
  • Sampling: temperature matched to the task?
  • Token budget: max_tokens capped; output format compact?
  • Model: right-sized, not just "the strong one"?
  • Cache-friendliness: stable content first, variable content last?
  • Injection safety: instructions in the system message, data in the user message?
  • Versioned & tested: in source control with a few regression cases?

So we covered six slots: Role, Task, Context, Constraints, Examples, Output, plus a pre-flight checklist. Run every production prompt through both, and you've turned a hopeful request into a tested component. A prompt is not a conversation. It's a component contract.