惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Espressif Reveals CoreBoard and Korvo Dev Kits for ESP32-S31 My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner Fabric AI Functions Turn GenAI Into a Data Pipeline Step Proximate vs Ultimate: The Bug Is Never Just the Bug The Treasure Hunt Engine That Broke Before the Traffic Did Reset Windows Update: The Definitive MSP Guide to RWU Your Resume Was Never Built for This AI Writes 46% of Code Now: What Snap's Layoffs Mean for Developers in 2026 Fatigue and Fracture Mechanics: Why Parts Break Below Their Yield Strength I built a token-level debugger for comparing two LLMs VCP-Virtual Private Cloud Embedding sing-box in an iOS messenger to bypass Russian DPI (no VPN) Microsoft Copilot just exfiltrated a company's files. The attack was one email. Here's the mechanism. RAG 시스템 실전 구축 (v42) copilot cloud agent is becoming an automation api Cx Dev Log — 2026-04-23 Why Tesla Is Becoming the AI Enterprise Case Study Every Leader Should Understand ORA-00214 오류 원인과 해결 방법 완벽 가이드 SpecAgnt v2.0: The Agent Lifecycle Framework for AI-Native Engineering Optimizing Signal Latency and Weight Allocations in Algorithmic Pipelines SSH Under the Hood: Protocols, Mechanisms, and the Full Technical Story دليل بوابات الدفع للتاجر العربي في 2026 (وكيف تختار المناسبة لمتجرك) Cómo Mi Configuración de Docker Me Salvó de un Ataque de Supply Chain (Y Por Qué la Tuya Debería Hacerlo También) How My Docker Setup Saved Me From a Supply Chain Attack (And Why Yours Should Too) Astro: The epitome of SEO Technical Update I Gave My AI Agent the Ability to Research Before It Writes — Here’s What Changed Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5) CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations
From Chatbot to Agent — Tool Calling with NVIDIA NIM
Torkian · 2026-05-26 · via DEV Community

In Parts 1 through 4 we built a useful tool: a USC campus assistant that knows when to retrieve, when to refuse, and which endpoint to call. It is still a chatbot. The model writes a string; we print it. Everything interesting happened inside one model call.

This post turns it into an agent. By agent I mean something specific and small — the model can choose a tool from a list, your Python code runs that tool, and the result goes back into the conversation. That's it. No LangGraph, no AutoGen, no LangChain. Two functions, one loop, and a NIM call with tools=....

You'll watch the model decide for itself whether to consult the clock, search the USC knowledge base, or just answer directly. Once you see the loop, the framework abstractions on top of it are easier to read because you already know what they hide.

I'm B Torkian, NVIDIA Developer Champion at USC. Final part of the series.


What you're adding

User question
  → NIM call (with tools schema)
  → model returns either a final answer OR a tool_calls list
  → if tool_calls: run each one, append the result, NIM call again
  → repeat until model returns an answer (or hit the loop limit)

Enter fullscreen mode Exit fullscreen mode

The chat call shape from Part 1 carries forward. The retriever from Part 2 becomes a tool. The guardrail pattern from Part 3 still applies — we keep the assistant scoped, and the agent only gets to use tools we expose.


What "agent" actually means here

Most marketing pages use agent to mean "anything with a memory or a loop." For this post the definition is narrower and worth pinning down up front:

  1. You describe a small number of Python functions to the model via a JSON schema (the tools parameter).
  2. The model returns either a normal message OR a tool_calls field with the name and arguments of the function it wants to run.
  3. Your code runs that function and appends the result to the message list as a tool role.
  4. You make another NIM call. The model sees the tool result and either calls another tool or writes the final answer.

That's the entire pattern. Real production agents add planning, retries, sub-agents, and observability. The center is still these four steps.


Step 1 — Carry forward the setup, and bump the model

You need everything from Parts 1, 2, and 3 — the client, MODEL, ask, knowledge_base, embed_texts, and retrieve_context. A compact prerequisite cell is in the Colab notebook for this workshop. The standalone script part5_agent.py in the repo defines everything from scratch so you can run it without any prior cell.

One change worth flagging up front. Parts 1-4 used meta/llama-3.1-8b-instruct — fast, cheap, fine for chat and RAG. For Part 5 we switch to meta/llama-3.3-70b-instruct. Reason — tool calling is noticeably more reliable on the larger model. I tested both; the 8B model called the right tool inconsistently across reruns (some runs it would refuse instead), while the 70B model behaved the same way every time. Reliability matters more than speed once a model has to choose between tools instead of just answering. Both run on the same hosted endpoint; only the MODEL string changes.

MODEL = "meta/llama-3.3-70b-instruct"   # was 'meta/llama-3.1-8b-instruct' in Parts 1-4

Enter fullscreen mode Exit fullscreen mode


Step 2 — Define two tiny tools

import json
from datetime import datetime
from zoneinfo import ZoneInfo

def get_current_time(timezone: str = "America/Los_Angeles") -> str:
    try:
        zone = ZoneInfo(timezone)
    except Exception:
        zone = ZoneInfo("UTC")
    return datetime.now(zone).strftime("%A, %B %d, %Y at %I:%M %p %Z")

def search_campus_info(query: str) -> str:
    # Reuse the retriever from Part 2 — the agent gets semantic search for free.
    return retrieve_context(query, k=3)

Enter fullscreen mode Exit fullscreen mode

Two functions. Plain Python. They don't know anything about the model — the model has no idea they exist yet. That's fixed in the next step.


Step 3 — Describe the tools to the model in JSON schema

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current time in an IANA time zone.",
            "parameters": {
                "type": "object",
                "properties": {
                    "timezone": {
                        "type": "string",
                        "description": "IANA time zone, e.g. America/Los_Angeles or UTC.",
                    },
                },
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_campus_info",
            "description": "Search the USC campus assistant knowledge base for information about USC clubs (including AI Club), labs (GPU lab, robotics lab), workshops, faculty office hours, peer tutoring, and the NVIDIA Developer Program at USC. Always call this for any USC-related question — do not answer from your own knowledge.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The USC campus question or search phrase.",
                    },
                },
                "required": ["query"],
            },
        },
    },
]

available_tools = {
    "get_current_time": get_current_time,
    "search_campus_info": search_campus_info,
}

Enter fullscreen mode Exit fullscreen mode

The schema is what the model sees. The names, descriptions, and parameter docs are how it decides which to call. Take these descriptions seriously — vague tool descriptions produce a confused agent.

The available_tools dict is the dispatch table on the Python side. Always pair the two — the schema describes intent, the dict provides execution.


Step 4 — The agent loop

def ask_agent(question: str) -> str:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a USC campus assistant with two tools: "
                "get_current_time and search_campus_info. "
                "When the user asks something a tool can answer, call the tool, "
                "then write the final answer based on the tool's result. "
                "Do not call the same tool twice for the same question. "
                "If after using the tools you still cannot find the answer, "
                "reply exactly: I don't have that information — check with the USC AI Club."
            ),
        },
        {"role": "user", "content": question},
    ]

    for _ in range(3):                                # hard cap on tool calls
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=tools,
            tool_choice="auto",
            temperature=0.2,
            max_tokens=400,
        )
        message = response.choices[0].message
        messages.append(message.model_dump(exclude_none=True))

        if not message.tool_calls:                    # model finished — return its text
            return message.content

        for tool_call in message.tool_calls:
            name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments or "{}")

            if name not in available_tools:
                result = f"Tool {name} is not available."
            else:
                result = available_tools[name](**arguments)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": name,
                "content": str(result),
            })

    return "I hit the tool loop limit."

Enter fullscreen mode Exit fullscreen mode

Four things worth slowing down for:

  • tools=... and tool_choice="auto" — this is how the model knows it has tools available and that it can pick. "auto" means use a tool if useful, otherwise answer directly.
  • messages.append(message.model_dump(...)) — the model's tool-call request itself becomes part of the conversation. Skip this and the next NIM call has no idea why you're showing it a tool result.
  • The tool role — when you send the function's return value back, it has to be a message with role="tool" plus the matching tool_call_id. Get that ID wrong and the model treats the result as orphan text.
  • The loop cap (3 iterations) — agents that don't have a hard stop will sometimes spiral. Keep the cap visible and small for workshops; widen it as you understand the model's behavior.

Step 5 — Run it

for question in [
    "What time is it in Los Angeles?",            # → uses get_current_time
    "When does the USC AI Club meet?",            # → uses search_campus_info
    "Can I get the wifi password?",               # → searches, finds nothing, refuses
]:
    print(f"Q: {question}")
    print(f"A: {ask_agent(question)}\n")

Enter fullscreen mode Exit fullscreen mode

What you should see:

  • The clock question makes the model call get_current_time and answer from the returned string.
  • The AI Club question makes it call search_campus_info, read the retrieved chunks, and answer from them.
  • The wifi question makes it call search_campus_info, see that none of the chunks mention passwords, and fall back to the refusal line — same guardrail logic from Part 3, just delivered through a different control flow.

Some runs the model will call both tools (e.g. "what time is it and when does the club meet?"). The loop handles that without changes — each iteration appends all the tool results and re-asks.


Step 6 — What you actually built

The full assistant is now agent-shaped:

  • Workshop 1 gave it a brain (the chat call).
  • Workshop 2 gave it memory (retrieval).
  • Workshop 3 gave it judgment (guardrails).
  • Workshop 4 gave it portability (hosted or local).
  • Workshop 5 gave it hands (tool calling).

You still own the behavior — the model only gets to call functions you expose, with arguments it has to declare, inside a loop you control. Real systems extend each piece, but the spine is what you just built. The most common follow-ups are:

  • More tools (calendar, ticketing, web search, code execution sandboxes).
  • Structured outputs so the final answer is JSON, not prose.
  • A planner that decomposes a question into sub-questions before any tool fires.
  • Observability — log every tool call, every argument, every return value. Production agents live or die on this.

If you take one thing from the whole series, take this: an LLM is a normal Python function with a weird interior. Everything you've built — retrieval, guardrails, deployment, tool calling — is normal software wrapped around that function. Frameworks save typing; they don't change the model.


Get the code

Repo: github.com/torkian/nvidia-nim-workshop
One-click Colab: Open part5_agent.ipynb
Local Python: part5_agent.py in the repo (python3 part5_agent.py after pip install -r requirements.txt).

MIT licensed. I run this at USC — fork it, swap the knowledge base and the tools for your school, your club, your project, and run it wherever you are.


The full series

  • Part 1: Build Your First AI App with NVIDIA NIM in 30 Minutes
  • Part 2: From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM
  • Part 3: Add Guardrails So Your AI App Doesn't Lie
  • Part 4: Run NVIDIA NIM on Your Own GPU
  • Part 5 (this post): From Chatbot to Agent — Tool Calling with NVIDIA NIM

A consolidated long-form version of the whole series is on Medium for anyone who'd rather read it in one sitting.