惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Proofpoint News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cisco Talos Blog
Cisco Talos Blog
Martin Fowler
Martin Fowler
S
SegmentFault 最新的问题
宝玉的分享
宝玉的分享
T
Tenable Blog
Stack Overflow Blog
Stack Overflow Blog
P
Palo Alto Networks Blog
J
Java Code Geeks
T
True Tiger Recordings
S
Schneier on Security
C
Cybersecurity and Infrastructure Security Agency CISA
Stack Overflow Blog
Stack Overflow Blog
爱范儿
爱范儿
博客园 - 【当耐特】
WordPress大学
WordPress大学
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
H
Help Net Security
F
Future of Privacy Forum
Scott Helme
Scott Helme
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
S
Security @ Cisco Blogs
Application and Cybersecurity Blog
Application and Cybersecurity Blog
博客园 - 司徒正美
V
V2EX
Google DeepMind News
Google DeepMind News
云风的 BLOG
云风的 BLOG
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Malwarebytes
Malwarebytes
大猫的无限游戏
大猫的无限游戏
C
Check Point Blog
The GitHub Blog
The GitHub Blog
The Hacker News
The Hacker News
博客园 - 聂微东
李成银的技术随笔
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
Vulnerabilities – Threatpost
O
OpenAI News
C
Cyber Attacks, Cyber Crime and Cyber Security
C
Comments on: Blog
Project Zero
Project Zero
Engineering at Meta
Engineering at Meta
Recent Announcements
Recent Announcements
N
Netflix TechBlog - Medium
博客园 - Franky
aimingoo的专栏
aimingoo的专栏
M
Microsoft Research Blog - Microsoft Research
Security Latest
Security Latest
T
Tor Project blog

DEV Community

Self-Hosted LLM Tool Calling: Forge and the Build-vs-Buy Decision ORA-00072 오류 원인과 해결 방법 완벽 가이드 OpenWA for CTOs: Self-Hosted WhatsApp Gateway Trade-Offs NotebookLM Automation With notebooklm-py: Useful, But Classify Data First Docker v29.5.x Operator Upgrade Checklist Coding-Agent Instruction Design: The CLAUDE.md File That Prevents Rework When I Finally Realized My Runtime Was Holding Me Back GnokeOps: Host Your Own AI House Party Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine Beyond Prompts: Structuring AI Workflows for Real Frontend Engineering From an Abandoned Hackathon Project to an AI Study Workspace 🚀 Terraform with AI: Build AWS Infra (Cursor + MCP) What If AI Didn’t Need the Internet? 750,000 Chips, 140 Trillion Tokens: The Math Behind DeepSeek's Permanent Price Cut You're Renting Someone Else's Compute — And It's Costing You More Than You Think CSS :has() Selector: The Layout Trick I Wish I Knew 5 Years Ago Five Clusters. Five Lessons. One Production System. Synaptic: A Local-First AI Dev Companion That Remembers How You Think Revolutionizing Edge MedTech: Building a Sovereign Sleep Apnea Companion ("XiHan Snore Coach") with Gemma 4 HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DMARC p=none vs p=quarantine vs p=reject: what to use and when DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I solo-built a reputation layer for AI agents on NEAR — and here's what I learned I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks
AI Agents in Practice — Part 2: What Makes Something an Agent
Gursharan Si · 2026-05-23 · via DEV Community

Part 1 ended with Priya's order shipped and the agent confidently refunding her anyway.

Here's the same request, in a system that's been built differently:

"Hi, I'd like to cancel order #4471 and get a refund."

The system reads the order status — shipped. It sees that the cancellation procedure requires the order not to be shipped. It doesn't try to cancel. It doesn't apologize and ask if there's anything else. It says:

"Order #4471 already shipped yesterday. Automatic cancellation only applies before shipment. I can start a return when it arrives, or connect you with a human agent right now. Which would you prefer?"

Then it stops and waits.

Nothing about that response required a smarter model. The model is the same one that confidently refunded Priya in Part 1. What changed is the system around the model.

This article is about what that system actually is.

Same Request, Different System

The Part 1 cancellation case wasn't a story about a bad agent. It was a story about a system that didn't have the right pieces in the right places.

Walk through what the "different system" did, without naming the pieces yet:

  • Before acting, it checked the actual state of the order.
  • It compared that state against the procedure that governed what's allowed — and "don't cancel" was a legitimate path, not an exception.
  • It offered the customer alternatives that fit the actual situation.
  • It stopped and waited for the customer to choose, instead of confidently picking one.

Notice what's not in that list: smarter natural language, better wording in the system prompt, a more advanced model. Every difference is structural. The system made room for the right decision to be made.

Part 1's three gaps — state awareness, stopping condition, and escalation path — all had structural answers here.

How those pieces actually compose into a working agent is Part 6's full build. For now, the point is just: the system did things in the right order, with the right checks, and used composition where the broken agent used prompt stuffing.

What Changed Is the Loop, Not the Model

The model is one component. The agent is the system you build around it.

The simplest accurate way to describe an agent is: a loop that runs the model multiple times, with state that carries across turns and tools that let the model do things in the world.

The loop has five recognizable steps:

Observe → decide → act → check → repeat.

Step What happens
Observe Gather the current state — request, prior turns, last tool result, what's known.
Decide The model picks the next step: call a tool, ask the user, or stop.
Act The chosen step runs — a tool fires, a message goes out, a decision is recorded.
Check The result comes back. The next observation includes it.
Repeat Until done, blocked, or escalated.

That's the shape. It's not exotic. The loop itself is simple.

What makes an agent an agent is not the cleverness of the loop. It's the fact that the model gets to decide which step to take on every iteration. That's the move. Not a fixed script. Not a hard-coded flow. The model decides — within the boundaries the system gave it.

(The mechanics of how the loop actually works — state, stopping conditions, context as a finite resource — is Part 3. For now, just hold the shape.)

The "different system" from earlier was running this kind of loop. The loop created room to read state before attempting cancellation. In some systems, the model may choose that step. In others, the system may require it as a gate. Either way, the important point is that the agent does not jump straight from request to action.

For contrast: a workflow runs steps the developer wrote in advance. An agent decides each step at runtime. Same pieces — different wiring. The diagram makes the difference visible.

Workflow vs. Agent — Same parts, different wiring. The workflow shows a fixed path from input to LLM, tool, LLM, and output, where the developer defines the steps. The agent shows an LLM calling a tool, receiving an observation, and looping back until done, with a dashed exit to output. The same LLM and tool pieces can exist in both systems; the difference is who decides the next step.

Workflow vs. Agent — Same parts, different wiring.

Agents Compose Three Practical Primitives

An agent doesn't need to invent its capabilities from scratch. It composes three primitives that you've probably already encountered:

MCP — for acting.
Standardized way for the agent to call tools that do things in the world: query a database, call an API, run a calculation, send an email. The agent's "verbs."

This is the same MCP covered in the MCP in Practice series. New to MCP? You do not need that background to follow this article. For now, the mental model is enough: MCP helps the agent invoke tools through a clean protocol.

RAG — for knowing.
Retrieval that brings outside knowledge into the agent's context when it needs it: company policies, product documentation, historical case notes, eligibility rules.

This is the same RAG covered in the RAG in Practice series. New to RAG? Same here — this article is self-contained. For now, the mental model is enough: RAG helps the agent ground decisions in retrieved facts instead of relying only on what the model was trained on.

Skills — for following reusable procedures.
A markdown file that names a procedure the agent can apply repeatedly: when to use it, the steps, the failure modes, the approval rule. Instead of stuffing "if the order is shipped, escalate to a human" into the system prompt every turn, the skill file holds the procedure and the agent loads it when relevant.

For example, a cancel-order skill might say: check status first, refuse if shipped, offer the customer a return when applicable, and escalate if the customer asks for an exception. That keeps procedures versioned, reviewable, and loaded only when relevant instead of buried in one growing prompt. Skills become more important later when we talk about patterns, control surfaces, and production builds.

The agent's job is to decide when to use which.

That decision — which primitive applies right now — is the central agent move. Not all three on every turn. Often just one. Sometimes none, and the agent answers directly.

The cancellation system from earlier used a skill to name the procedure and MCP tools to read state and act. RAG can supply the policy details when the system needs the exact return policy text. The model didn't have to invent any of that — it picked from what the system already had, in the right order. Part 6 walks through the full composition end-to-end.

Three Primitives an Agent Composes — Acting, knowing, and following reusable procedures. An Agent container box sits at the top, with arrows descending into three columns: MCP for acting (when the agent needs to do something, example: call cancel_order), RAG for knowing (when the agent needs outside facts, example: retrieve return policy), and Skills for procedures (when the agent needs a reusable playbook, example: cancel-order/SKILL.md). Caption: The agent decides when to use which.

Three Primitives an Agent Composes — Acting, knowing, and following reusable procedures.

From Manual ReAct to Native Tool Calling

Manual ReAct treats the model's output as text your code has to parse. Native tool calling treats the model's output as structured intent your code can run. That single contract change is what this section is about.

Part 1 showed a manual ReAct prompt with a STRICT RULES section growing as the developer discovered new edge cases. That prompt was doing manual ReAct: the model returns a string in a specific format, regex extracts an "Action:" line, the system calls the named tool, the result gets stuffed back into the prompt as an "Observation:" line, and the cycle continues.

Manual ReAct is useful because it is easy to prototype and great for demos — you can see the model thinking and acting in one place, all in plain text. But in production, that same simplicity becomes brittle.

Three things break:

  1. The model has to format its output as a string the regex can parse. If the model phrases the action slightly differently — different capitalization, an extra word, a typo — the regex misses it and the agent stalls.

  2. Every rule about how the model should behave lives in the prompt. "Don't cancel shipped orders" is English. "Use the exact format Action: tool_name" is English. "Stop after final answer" is English. The model sometimes follows English rules and sometimes ignores them.

  3. Tool descriptions are part of the prompt text. Add a tool, the prompt gets longer. Change a tool, the prompt has to be edited. The prompt is doing the job of a schema, a parser, a state machine, and a procedure manual — all in one block.

Native tool calling is the production move. It's not a new model capability; it's a different contract between the application and the model.

It does not fix Priya's refund failure by itself. But it gives the system a structural place to enforce "do not cancel shipped orders" as a check, instead of leaving it as one more sentence in a prompt.

In native tool calling:

  • Tool definitions live as structured schemas the model is given as a parameter to the API call, not as English in the prompt.
  • When the model wants to call a tool, it returns a structured tool-use block — not a string the application has to parse.
  • The application sees {"tool": "cancel_order", "arguments": {"order_id": "4471"}} directly. No regex. No format brittleness.
  • The system prompt shrinks. Format rules go away. Tool descriptions are no longer prose.

Structured tool calls don't enforce policy by themselves — the application or tool server still validates arguments, checks permissions, and rejects unsafe actions. The improvement is that those checks now happen at a structured boundary instead of being buried as another English rule in the prompt.

In plain language: instead of the model writing Action: cancel_order in text and your code parsing it, the model returns a structured object your app can read directly. The "schema" is the formal description of what tools exist and what arguments they take; the "tool-use block" is what the model returns when it wants to call one. Both are objects, not text.

That structural change is where the fix starts — not where it ends.

MCP fits into this picture as the protocol layer.

Native tool calling is the contract between one model and one application. MCP is the standardized contract between the application and many tool servers. Native tool calling structures the model-to-app boundary; MCP structures the app-to-tool-server boundary.

Critically: native tool calling and MCP compose. They are not competitors. A production agent uses native tool calling on the model side and MCP on the tool-server side. The series will use both throughout, in Part 6's build.

(If MCP or RAG is new, I have separate series on both; here we only need the mental model: MCP helps the agent act, RAG helps it know. The agent uses each the same way a non-agent system would.)

Manual ReAct vs. Native Tool Calling — Same agent, same task, different contract. The left panel labeled Manual ReAct shows everything in the prompt: one tall gray-tinted box with a stuffed system prompt containing tools described in prose, format spec for Thought/Action/Action Input cycles, a STRICT RULES section, and a stopping rule. Below it, the Model outputs raw text like Action: cancel_order, passes through a parse/regex step with dashed outline signaling fragility, then reaches a tool call. A dashed arrow drops to a label reading parse failure if format slips. The right panel labeled Native tool calling shows three separate stacked boxes: a short purple system prompt with just role and tone, a blue tool schemas box with structured tool definitions, and a tool call box showing the structured emission. Below it, the Model outputs a structured JSON object that passes through a runtime validates step with solid outline signaling stability, then reaches a tool call — no failure fork. Caption: Same task. Different contract: parse text vs. run structured intent.

Manual ReAct vs. Native Tool Calling — Same agent, same task, different contract.

Agents vs Chatbots vs Workflows

The word "agent" gets used for several different things. Some of them are agents. Some of them are not. The distinction isn't snobbery — different systems have different failure modes, and confusing them leads to building the wrong thing.

Chatbot.
Reply-only. The user says something; the model replies. It may remember conversation history, but it does not call tools, take actions in the world, or run a control loop.
Failure mode: makes things up confidently when it doesn't know.

Workflow.
A controller (not the model) decides which step happens next, based on conditions. The model is called inside specific steps to do specific work, but the model isn't choosing what step to take. A prompt chain is the simplest case: a workflow with one fixed path, where every step always runs in the same order.
Failure mode: edge cases the controller's branching logic didn't anticipate fall through.

Agent.
The model decides what step to take on each turn, within designed boundaries. State persists across turns. Tools are available. The loop continues until done, blocked, or escalated.
Failure mode: confident-and-wrong decisions, and the failure modes Part 1 named.

Workflows are not lesser agents. For many production problems, a workflow is the right answer — the path is well-known, the steps are stable, the model doesn't need to decide what comes next. Part 5 of this series is about when to choose which.

The line is not "smart vs dumb." The line is who decides what happens next — and how much room the system gives the model to be wrong.

The Line That Defines an Agent

The important design question is not which model you picked. It is what the system allows the model to decide.

That's the identity move of this series.

Bounded autonomy: model-driven choice inside designed boundaries. The boundaries are real engineering — what tools the agent has, what state it can read, what state it can write, what actions require approval, what escalation paths exist, what the stopping condition is. The system composes three primitives (MCP, RAG, Skills) and gives the model the room to choose between them — and the room to say "I shouldn't be the one to do this."

What makes something an agent isn't how smart the model is. It's what the system lets the model decide.

That decision shows up across the rest of the series. Part 3 opens the loop: state, stopping, and context as production concerns. From there, the series builds outward into patterns, tradeoffs, the TechNova build, diagnostics, evaluation, and guardrails.


Three takeaways

  1. An agent is a control loop with tools, knowledge, and a stopping condition. Five words: observe → decide → act → check → repeat. The model chooses the step. The system gives it room and limits.

  2. Agents compose MCP for acting, RAG for knowing, and Skills for following reusable procedures. The agent decides when to use which.

  3. What makes something an agent isn't how smart the model is. It's what the system lets the model decide.


We have the components. We have the primitives. We have the boundary between manual ReAct and native tool calling. What we do not have yet is the actual loop — what happens turn by turn when the agent runs. That is where state, stopping, and context become engineering problems instead of definitions. That is Part 3.