惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5) CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38)
I Gave My AI Agent the Ability to Research Before It Writes — Here’s What Changed
ivan cazares · 2026-05-26 · via DEV Community

I Gave My AI Agent the Ability to Research Before It Writes — Here's What Changed

Four weeks ago, I had no idea what an AI agent was. Now I'm building one that researches market trends before writing about them, synthesizes information from three independent sources, and produces work that scores 96/100 on my eval system.

The change didn't come from a new model or a fancy framework. It came from stopping my agent from writing blind.

The Problem: Writing From Memory, Not Evidence

When I first built ShipStack's article factory, it was simple. I'd prompt Claude: "Write an article about AI agents and multi-agent orchestration." Claude would write. It was fine. Coherent. On-brand.

But it was hollow.

I realized what was happening: my agent was writing from pattern matching. It knew what an article about AI agents should look like because it had seen thousands. But it didn't know what was actually happening in the market right now. It didn't know that Cursor just hit $9.9B valuation with Agent Mode as the headline feature. It didn't know that enterprise leaders are abandoning 60% of AI projects because their data isn't ready. It didn't know that accuracy compounds exponentially—if each agent action hits 85%, a 10-step workflow only succeeds 20% of the time.

It was writing confidently about things it didn't actually understand.

That's the gap between a chatbot and a real agent. A chatbot answers questions. An agent investigates before it acts.

The Insight: Research First, Then Write

I started noticing something in my own process. When I write about something I actually understand, the work is sharper. More specific. I reference concrete numbers, real tools, actual timelines. When I'm writing from half-memory, it's generic. Filler. Safe.

So I asked myself: what if my writing agent worked the same way?

Instead of "Write an article about X," the prompt became:

  1. Research X using three independent sources (Brave Search, DuckDuckGo, Wikipedia)
  2. Synthesize what you find into a structured brief with four sections: background, what's happening now, gaps nobody's talking about, and actual numbers
  3. Then write the article using the brief as your foundation

The research agent runs independently. Error handling per source. If one fails, the others still work. Claude Haiku synthesizes the raw results into a clean brief—background noise removed, signal amplified. The brief gets injected into the writer's context before the first word is written.

First article written with research scored 96/100 on eval.

What Actually Changed

1. Specificity Became Default

Before research: "AI agents are transforming business automation."

After research: "Cursor's Agent Mode hit 8 parallel agents and $9.9B valuation. NVIDIA's GTC 2026 saw agentic frameworks draw the largest attendance, signaling enterprise deployment momentum."

One is a claim. The other is evidence.

The brief gives the writer ammunition. Real stats. Real context. Real angles. The writing doesn't have to be cautious anymore because it's grounded in something verifiable.

2. Gaps Became Visible

Here's what shocked me: the research agent found problems that nobody is talking about, even though they're critical.

Accuracy compounding is a perfect example. Everyone talks about 85% per-action accuracy as a win. Almost nobody mentions that this cascades to ~20% success in 10-step workflows. The brief highlighted this as an "angle nobody's exploring." The article could then address it directly.

A writer without research writes from memory gaps. A writer with research writes from knowledge gaps—and those are infinitely more valuable.

3. Trust Became Quantifiable

When I read the 96/100 article, I didn't just feel it was better. I could point to why. The piece mentioned three validated statistics. It cited specific products and company valuations. It acknowledged real problems with real consequences. The eval system rated it higher because the work was verifiable.

That's the real shift. The agent isn't smarter. But it's more honest.

How It Actually Works (The Technical Part)

I'm not going to pretend this is rocket science. It's not. But it's also not trivial, and I had to think through some real problems.

# Simplified version of the research pipeline

async def research_topic(topic: str) -> dict:
    """
    Research a topic across three independent sources.
    Returns structured brief with background, current discussion, gaps, and stats.
    """

    sources = [
        {"name": "Brave Search", "func": search_brave},
        {"name": "DuckDuckGo", "func": search_duckduckgo},
        {"name": "Wikipedia", "func": search_wikipedia}
    ]

    results = {}

    # Run all searches in parallel
    for source in sources:
        try:
            results[source["name"]] = await source["func"](topic)
        except Exception as e:
            # Individual source failure doesn't kill the whole pipeline
            results[source["name"]] = {"error": str(e), "data": None}

    # Synthesize results into structured brief
    brief = await synthesize_with_claude(
        results,
        sections=[
            "background",
            "what_is_being_discussed_now",
            "gaps_and_underexplored_angles",
            "key_stats_and_data_points"
        ]
    )

    return brief

Enter fullscreen mode Exit fullscreen mode

The critical decisions:

Independent error handling: If Brave Search fails, DuckDuckGo still runs. If Wikipedia times out, the brief still synthesizes from two sources. I learned this the hard way—the first version failed if any source failed. Production taught me otherwise.

Parallel execution: All three search queries run at the same time using asyncio.gather(). Sequential would take 3x longer. In production, speed matters because every second of latency is a second the user waits.

Structured synthesis: The brief isn't just raw search results dumped together. Claude is instructed to organize findings into four specific sections. Background is history and context. "What's being discussed now" is current momentum and trends. "Gaps" is the angle—what everyone's talking about versus what's actually critical. "Key stats" is the ammunition. This structure forces clarity.

The Real Cost

I need to be honest about the downside: this costs more tokens.

Each research cycle uses Haiku (cheap) for searches and synthesis, then Sonnet (more expensive) for actual writing. A single article now pulls maybe 15,000-20,000 tokens where it used to pull 8,000-10,000. It's not dramatic, but it adds up across the article factory.

What I found: the 20% increase in token cost produces articles that score 15-20% higher on eval. The math works. But only if you're actually shipping and measuring. If you're just trying to sound smart, research doesn't matter.

What I'd Do Differently (If I Started Over)

  1. Measure before and after: I didn't quantify article quality until after I built this. If I were starting over, I'd eval the old system, then the new one, so I'd have concrete proof. (I got lucky—the 96/100 score validated it retroactively.)

  2. Automate source selection: Right now I hardcoded three sources. But different topics benefit from different research. A technical deep-dive needs StackOverflow and GitHub. A market analysis needs Crunchbase and SEC filings. A framework update needs the official docs. Future version should route the query to the right sources automatically.

  3. Build a feedback loop from eval back to research: If an article scores 72/100 because it's missing recent data, the research agent should know that. Right now research and writing are separate. Next step is making them iterative.

The Broader Pattern

This is bigger than article writing.

I'm seeing this pattern across all six of ShipStack's pipelines. When agents have access to real-time context—whether that's current market data, your inbox status, your GitHub repos, or your company priorities—they make better decisions. When they're operating on stale mental models, they fail quietly.

The Morning Brief pipeline needs current date and recent priorities (stored in memory) to actually be useful. The Inbox Zero processor needs to know which senders matter historically. The Repo Triage system needs to know what's actively shipping versus abandoned. Right now memory is an island—only the article factory reads from it. Next is connecting memory to all five other pipelines.

Research before action. Memory-informed decisions. Real-time context.

That's the move from agent-as-tool to agent-as-actually-useful.

What I'm Building Next

I want to push this further. Instead of research happening once per article, I want continuous background research running on topics I care about—AI agents, agent engineering, multi-agent orchestration, memory architecture. The agent saves interesting findings to memory automatically. When I sit down to write, the brief isn't built from scratch—it's pulled from three weeks of background research plus fresh daily snapshots.

That's still hypothetical. But I'm building toward it.

The Real Lesson

Four weeks ago I thought building an AI agent meant connecting APIs and writing prompts. Now I understand it's about giving agents the ability to think before they act.

A chatbot without research writes confidently about things it doesn't know. An agent with research writes carefully about things it does.

The difference is measurable. It's in the eval scores. It's in the specificity of the output. It's in the actual value delivered.

I'm one month into this. I'm still figuring out what's possible. But I know for certain: the best agent isn't the one that can generate text fastest. It's the one that can verify reality before speaking.

That's worth building toward.