AI Agents in Practice — Part 2: What Makes Something an Agent

Part 1 ended with Priya's order shipped and the agent confidently refunding her anyway.

Here's the same request, in a system that's been built differently:

"Hi, I'd like to cancel order #4471 and get a refund."

The system reads the order status — shipped. It sees that the cancellation procedure requires the order not to be shipped. It doesn't try to cancel. It doesn't apologize and ask if there's anything else. It says:

"Order #4471 already shipped yesterday. Automatic cancellation only applies before shipment. I can start a return when it arrives, or connect you with a human agent right now. Which would you prefer?"

Then it stops and waits.

Nothing about that response required a smarter model. The model is the same one that confidently refunded Priya in Part 1. What changed is the system around the model.

This article is about what that system actually is.

Same Request, Different System

The Part 1 cancellation case wasn't a story about a bad agent. It was a story about a system that didn't have the right pieces in the right places.

Walk through what the "different system" did, without naming the pieces yet:

Before acting, it checked the actual state of the order.
It compared that state against the procedure that governed what's allowed — and "don't cancel" was a legitimate path, not an exception.
It offered the customer alternatives that fit the actual situation.
It stopped and waited for the customer to choose, instead of confidently picking one.

Notice what's not in that list: smarter natural language, better wording in the system prompt, a more advanced model. Every difference is structural. The system made room for the right decision to be made.

Part 1's three gaps — state awareness, stopping condition, and escalation path — all had structural answers here.

How those pieces actually compose into a working agent is Part 6's full build. For now, the point is just: the system did things in the right order, with the right checks, and used composition where the broken agent used prompt stuffing.

What Changed Is the Loop, Not the Model

The model is one component. The agent is the system you build around it.

The simplest accurate way to describe an agent is: a loop that runs the model multiple times, with state that carries across turns and tools that let the model do things in the world.

The loop has five recognizable steps:

Observe → decide → act → check → repeat.

Step	What happens
Observe	Gather the current state — request, prior turns, last tool result, what's known.
Decide	The model picks the next step: call a tool, ask the user, or stop.
Act	The chosen step runs — a tool fires, a message goes out, a decision is recorded.
Check	The result comes back. The next observation includes it.
Repeat	Until done, blocked, or escalated.

That's the shape. It's not exotic. The loop itself is simple.

What makes an agent an agent is not the cleverness of the loop. It's the fact that the model gets to decide which step to take on every iteration. That's the move. Not a fixed script. Not a hard-coded flow. The model decides — within the boundaries the system gave it.

(The mechanics of how the loop actually works — state, stopping conditions, context as a finite resource — is Part 3. For now, just hold the shape.)

The "different system" from earlier was running this kind of loop. The loop created room to read state before attempting cancellation. In some systems, the model may choose that step. In others, the system may require it as a gate. Either way, the important point is that the agent does not jump straight from request to action.

For contrast: a workflow runs steps the developer wrote in advance. An agent decides each step at runtime. Same pieces — different wiring. The diagram makes the difference visible.

Workflow vs. Agent — Same parts, different wiring.

Agents Compose Three Practical Primitives

An agent doesn't need to invent its capabilities from scratch. It composes three primitives that you've probably already encountered:

MCP — for acting.
Standardized way for the agent to call tools that do things in the world: query a database, call an API, run a calculation, send an email. The agent's "verbs."

This is the same MCP covered in the MCP in Practice series. New to MCP? You do not need that background to follow this article. For now, the mental model is enough: MCP helps the agent invoke tools through a clean protocol.

RAG — for knowing.
Retrieval that brings outside knowledge into the agent's context when it needs it: company policies, product documentation, historical case notes, eligibility rules.

This is the same RAG covered in the RAG in Practice series. New to RAG? Same here — this article is self-contained. For now, the mental model is enough: RAG helps the agent ground decisions in retrieved facts instead of relying only on what the model was trained on.

Skills — for following reusable procedures.
A markdown file that names a procedure the agent can apply repeatedly: when to use it, the steps, the failure modes, the approval rule. Instead of stuffing "if the order is shipped, escalate to a human" into the system prompt every turn, the skill file holds the procedure and the agent loads it when relevant.

For example, a cancel-order skill might say: check status first, refuse if shipped, offer the customer a return when applicable, and escalate if the customer asks for an exception. That keeps procedures versioned, reviewable, and loaded only when relevant instead of buried in one growing prompt. Skills become more important later when we talk about patterns, control surfaces, and production builds.

The agent's job is to decide when to use which.

That decision — which primitive applies right now — is the central agent move. Not all three on every turn. Often just one. Sometimes none, and the agent answers directly.

The cancellation system from earlier used a skill to name the procedure and MCP tools to read state and act. RAG can supply the policy details when the system needs the exact return policy text. The model didn't have to invent any of that — it picked from what the system already had, in the right order. Part 6 walks through the full composition end-to-end.

Three Primitives an Agent Composes — Acting, knowing, and following reusable procedures.

From Manual ReAct to Native Tool Calling

Manual ReAct treats the model's output as text your code has to parse. Native tool calling treats the model's output as structured intent your code can run. That single contract change is what this section is about.

Part 1 showed a manual ReAct prompt with a STRICT RULES section growing as the developer discovered new edge cases. That prompt was doing manual ReAct: the model returns a string in a specific format, regex extracts an "Action:" line, the system calls the named tool, the result gets stuffed back into the prompt as an "Observation:" line, and the cycle continues.

Manual ReAct is useful because it is easy to prototype and great for demos — you can see the model thinking and acting in one place, all in plain text. But in production, that same simplicity becomes brittle.

Three things break:

The model has to format its output as a string the regex can parse. If the model phrases the action slightly differently — different capitalization, an extra word, a typo — the regex misses it and the agent stalls.
Every rule about how the model should behave lives in the prompt. "Don't cancel shipped orders" is English. "Use the exact format Action: tool_name" is English. "Stop after final answer" is English. The model sometimes follows English rules and sometimes ignores them.
Tool descriptions are part of the prompt text. Add a tool, the prompt gets longer. Change a tool, the prompt has to be edited. The prompt is doing the job of a schema, a parser, a state machine, and a procedure manual — all in one block.

Native tool calling is the production move. It's not a new model capability; it's a different contract between the application and the model.

It does not fix Priya's refund failure by itself. But it gives the system a structural place to enforce "do not cancel shipped orders" as a check, instead of leaving it as one more sentence in a prompt.

In native tool calling:

Tool definitions live as structured schemas the model is given as a parameter to the API call, not as English in the prompt.
When the model wants to call a tool, it returns a structured tool-use block — not a string the application has to parse.
The application sees {"tool": "cancel_order", "arguments": {"order_id": "4471"}} directly. No regex. No format brittleness.
The system prompt shrinks. Format rules go away. Tool descriptions are no longer prose.

Structured tool calls don't enforce policy by themselves — the application or tool server still validates arguments, checks permissions, and rejects unsafe actions. The improvement is that those checks now happen at a structured boundary instead of being buried as another English rule in the prompt.

In plain language: instead of the model writing Action: cancel_order in text and your code parsing it, the model returns a structured object your app can read directly. The "schema" is the formal description of what tools exist and what arguments they take; the "tool-use block" is what the model returns when it wants to call one. Both are objects, not text.

That structural change is where the fix starts — not where it ends.

MCP fits into this picture as the protocol layer.

Native tool calling is the contract between one model and one application. MCP is the standardized contract between the application and many tool servers. Native tool calling structures the model-to-app boundary; MCP structures the app-to-tool-server boundary.

Critically: native tool calling and MCP compose. They are not competitors. A production agent uses native tool calling on the model side and MCP on the tool-server side. The series will use both throughout, in Part 6's build.

(If MCP or RAG is new, I have separate series on both; here we only need the mental model: MCP helps the agent act, RAG helps it know. The agent uses each the same way a non-agent system would.)

Manual ReAct vs. Native Tool Calling — Same agent, same task, different contract.

Agents vs Chatbots vs Workflows

The word "agent" gets used for several different things. Some of them are agents. Some of them are not. The distinction isn't snobbery — different systems have different failure modes, and confusing them leads to building the wrong thing.

Chatbot.
Reply-only. The user says something; the model replies. It may remember conversation history, but it does not call tools, take actions in the world, or run a control loop.
Failure mode: makes things up confidently when it doesn't know.

Workflow.
A controller (not the model) decides which step happens next, based on conditions. The model is called inside specific steps to do specific work, but the model isn't choosing what step to take. A prompt chain is the simplest case: a workflow with one fixed path, where every step always runs in the same order.
Failure mode: edge cases the controller's branching logic didn't anticipate fall through.

Agent.
The model decides what step to take on each turn, within designed boundaries. State persists across turns. Tools are available. The loop continues until done, blocked, or escalated.
Failure mode: confident-and-wrong decisions, and the failure modes Part 1 named.

Workflows are not lesser agents. For many production problems, a workflow is the right answer — the path is well-known, the steps are stable, the model doesn't need to decide what comes next. Part 5 of this series is about when to choose which.

The line is not "smart vs dumb." The line is who decides what happens next — and how much room the system gives the model to be wrong.

The Line That Defines an Agent

The important design question is not which model you picked. It is what the system allows the model to decide.

That's the identity move of this series.

Bounded autonomy: model-driven choice inside designed boundaries. The boundaries are real engineering — what tools the agent has, what state it can read, what state it can write, what actions require approval, what escalation paths exist, what the stopping condition is. The system composes three primitives (MCP, RAG, Skills) and gives the model the room to choose between them — and the room to say "I shouldn't be the one to do this."

What makes something an agent isn't how smart the model is. It's what the system lets the model decide.

That decision shows up across the rest of the series. Part 3 opens the loop: state, stopping, and context as production concerns. From there, the series builds outward into patterns, tradeoffs, the TechNova build, diagnostics, evaluation, and guardrails.

Three takeaways

An agent is a control loop with tools, knowledge, and a stopping condition. Five words: observe → decide → act → check → repeat. The model chooses the step. The system gives it room and limits.
Agents compose MCP for acting, RAG for knowing, and Skills for following reusable procedures. The agent decides when to use which.
What makes something an agent isn't how smart the model is. It's what the system lets the model decide.

We have the components. We have the primitives. We have the boundary between manual ReAct and native tool calling. What we do not have yet is the actual loop — what happens turn by turn when the agent runs. That is where state, stopping, and context become engineering problems instead of definitions. That is Part 3.

推荐订阅源

DEV Community