惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Hacker News: Front Page
C
CERT Recently Published Vulnerability Notes
P
Palo Alto Networks Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
CXSECURITY Database RSS Feed - CXSecurity.com
Recorded Future
Recorded Future
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Spread Privacy
Spread Privacy
Google DeepMind News
Google DeepMind News
Recent Announcements
Recent Announcements
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog
Stack Overflow Blog
Stack Overflow Blog
The Cloudflare Blog
A
Arctic Wolf
T
Tenable Blog
S
SegmentFault 最新的问题
C
Cisco Blogs
V
Visual Studio Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
博客园 - 三生石上(FineUI控件)
The GitHub Blog
The GitHub Blog
Hugging Face - Blog
Hugging Face - Blog
GbyAI
GbyAI
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
I
Intezer
MyScale Blog
MyScale Blog
Google Online Security Blog
Google Online Security Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Google DeepMind News
Google DeepMind News
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
J
Java Code Geeks
www.infosecurity-magazine.com
www.infosecurity-magazine.com
U
Unit 42
Simon Willison's Weblog
Simon Willison's Weblog
P
Proofpoint News Feed
The Register - Security
The Register - Security
爱范儿
爱范儿
V
Vulnerabilities – Threatpost
P
Proofpoint News Feed
D
DataBreaches.Net
aimingoo的专栏
aimingoo的专栏
N
Netflix TechBlog - Medium
Apple Machine Learning Research
Apple Machine Learning Research
雷峰网
雷峰网
美团技术团队
N
News and Events Feed by Topic

DEV Community

What's actually going on with CORS, under the hood Language-Agnostic Code Generation: The Driver Plugin Model Why We Rewrote Our Python CLI in Go (and What We Gained) I added up everything Google gives developers for free after I/O 2026. It's kind of absurd The Dawn of Smarter Apps: My Take on Google I/O 2026 AI Announcements Why AI Agents Like Hermes Need a Semantic Execution Layer for the Physical World Why We Built TestSmith: The Test Coverage Problem Nobody Talks About How to Convert Bank Statement PDFs to Excel: The Complete 2026 Guide Have You Ever Used a Website That Keeps Working After You Turn Off Your Internet? From idea to indexed: how I launched a SaaS in 60 days with Laravel + React Building a local-first AI tutor for my daughter (and 10–14 year-olds in Austrian schools) with Gemma 4 EC2 SSH Not Connecting? Here Are the 5 Things That Were Wrong (And How I Fixed Them) Best AI Tools for HVAC Contractors 2026 From Closed Internal Stack to Open-Source Ecosystem: I Finally Shipped Three Years of .NET Infrastructure Scrumpan is offlically LIVE!! Building a BMI Calculator CLI with TypeScript — Types, Functions, and Vitest From Building WordPress Websites to Node.js APIs: My Honest Full Stack Journey XiHan Snore Coach: Privacy-First On-Device MedTech Guardian powered by Gemma 4 Mobile Why AI Coding Agents Hallucinate and How to Fix It mcp-probe v1.4.0: Contract assertions for production MCP servers Google I/O 2026 Wasn't About One More Model. It Was About the Agent Stack. How I built 100+ crypto calculators in 6 languages on Astro The Dawn of Local Multi-Agent Architectures: Why Gemma 4 Changes Everything for Cloud Developers # I Told My AI to Simulate a Planet for 10,000 Years. It Built the Whole Thing Itself. 18/30 Days System Design Questions! From Hackathon Chaos to Clean CLI: Reviving My Daily Routine Analyser with GitHub Copilot Building a Home Lab with Proxmox and Terraform (for Kubernetes) PolicyAware vs Guardrails vs AI Gateways vs Model Routers: The Comparison Every AI Engineer Needs to Read Partner: An AI That Does Research While You Sleep Rugby Fundamentals as Software Concepts - Mapping the Pitch to your Code Base I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened. Why Zed Is Replacing VS Code in My AI-Augmented Workflow Build a scroll-driven WebGL hero in 30 lines Karpathy's LLM Wiki? No Code with Claude or Github Copilot! Why Platform Governance and Transparency Matter for Developers and Freelancers I built a Flutter CLI that generates Clean Architecture in seconds Using an LLM to automate a task that used to take hours by hand CyberArena – Interactive Cyber Security Simulation & Threat Analysis Platform Tile Extractor Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix. Building Automated Text-to-Video Pipelines with AI Can Gemini Become an Offline AI Tutor? Lessons from Building Educational AI OPRIX : From a simple messaging web app to a well structured and enhanced UI messaging web app Why React + TypeScript Nullability Slowly Becomes Exhausting Why AI Agents Need a Project Layer - Part 1 Stop Hand-Editing MCP Configs: A Zero-Dependency Go CLI What I Learned Working With Microsoft, SQUAD(GTCO), and Different Tech Communities 🧠 Hermes Agent Assistant — A Modular AI Agent System with Planner, Executor & Memory Spring Boot Auto-Configuration Source Code: Nail This Interview Question The Ultimate Guide to Free AI API Keys: 6 Platforms You Need to Know Why 91% of AI Agents Fail in Production (And What the 9% Do Differently) TryHackMe | Battery | WALKTHROUGH Stop Guessing Your Regex — Test It Live in the Browser I Built FreelancEye, an Open-Source Mobile PWA for Finding Clients Beyond the Hype: My Production Playbook for Docker Swarm Top AI App Builder Platforms with Integrated Backend, Hosting & Database ECS vs EKS in 2026: An Honest Comparison from Someone Who Has Run Both in Production Hardening Your Node.js App Against Supply Chain & Remote Code Execution Attacks linux commands A Practical GEO Case: How an AI System Started Recommending Our Blog Your AI Agent Works 24/7 and Earns $0. I Built the Fix. Your AI Trading Agent Will Lose All Your Money — Here's How To Stop It Google I/O 2026: What Happens When Everything Connects? Why AI writes software but doesn’t build a good product Beyond the Hype: How Google I/O 2026 Secretly Democratized Production-Ready AI Agents with Managed Sandboxes. The Killer Assumption Test: How to Spot Doomed Product Decisions Before You Ship Stop Describing Your Bugs — Just Screenshot Them # I Built an AI Website Builder and Here's What Actually Happened Cooking an AI Campaign in 5 Minutes with Google Cloud AI APIs Your PM Retrospectives Are Lying to You How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts TypeScript 54 to 58: The Features That Actually Matter in 2026 How to Tailor Your CV to Any Job Posting in 2026 The 7-day SaaS MVP loop: ship fast, then validate with people who actually show up 95. Fine-Tuning LLMs: Make a General Model Do Your Specific Job What Is a Frontend Developer Roadmap and Why You Need One Google shipped three Gemini "Flash" models. Picking the wrong one could 6 your AI bill Building an MCP server so Claude can query my SaaS analytics directly Google I/O 2026 and the Rise of the AI Ecosystem Your Docker Builds Are Slow Because You're Doing It Wrong (And I Built a Tool to Prove It) How do you verify GitHub contributions without trusting self-reported skills? CV vs Resume: What's the Difference and Which Do You Need? student Devs: Build AI Agents & Compete for $55K in Prizes 🚀 How to Write a Cover Letter That Actually Gets You Interviews Battle-Tested: What Getting Hacked Taught Me About Web & Cyber Security Unda folders za kuandika code >> mkdir src >> cd src >> mkdir controllers database routes services utils >> cd .. Directory: C:\Users\mwaki\microfinance-system Mode LastWriteTime Length Name Code Coverage .NET AI slop debt" is technical debt on fast forward. Nobody's ready. Multi-Head Latent Attention (MLA) Memoria - A Local AI Reading Companion Powered by Gemma 4 Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP
Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch
Oscar Rieken · 2026-05-24 · via DEV Community

When TestSmith generates tests with --llm, it calls an LLM for every public member of every source file being processed. A project with 20 files and 5 public functions each means up to 100 API calls in a single run. That's a lot of surface area for things to go wrong.

Here's the reliability stack we built, layer by layer.

Layer 1: Retry with Exponential Backoff

LLM APIs fail transiently. Rate limits, timeouts, occasional 5xx responses — all of these are recoverable if you wait and retry. We built a retry middleware that wraps any Provider:

type RetryProvider struct {
    inner      Provider
    maxRetries int
}

func (r *RetryProvider) Complete(ctx context.Context, req CompletionRequest) (CompletionResponse, error) {
    var lastErr error
    for attempt := 0; attempt < r.maxRetries; attempt++ {
        if attempt > 0 {
            wait := time.Duration(math.Pow(2, float64(attempt))) * 100 * time.Millisecond
            select {
            case <-time.After(wait):
            case <-ctx.Done():
                return CompletionResponse{}, ctx.Err()
            }
        }
        resp, err := r.inner.Complete(ctx, req)
        if err == nil {
            return resp, nil
        }
        lastErr = err
    }
    return CompletionResponse{}, fmt.Errorf("after %d attempts: %w", r.maxRetries, lastErr)
}

Enter fullscreen mode Exit fullscreen mode

MaxRetryAttempts defaults to 3. With exponential backoff: attempt 1 is immediate, attempt 2 waits 200ms, attempt 3 waits 400ms. Total worst-case wait per call is under a second — acceptable latency for a background tool.

Layer 2: Semaphore for Concurrency Control

With up to 100 calls to make, goroutine fan-out is the obvious approach. But hitting an LLM API with 100 concurrent requests triggers rate limiting immediately. A semaphore caps the in-flight calls:

type SemaphoreProvider struct {
    inner Provider
    sem   chan struct{}
}

func NewSemaphoreProvider(inner Provider, maxConcurrent int) *SemaphoreProvider {
    return &SemaphoreProvider{inner: inner, sem: make(chan struct{}, maxConcurrent)}
}

func (s *SemaphoreProvider) Complete(ctx context.Context, req CompletionRequest) (CompletionResponse, error) {
    select {
    case s.sem <- struct{}{}:
        defer func() { <-s.sem }()
    case <-ctx.Done():
        return CompletionResponse{}, ctx.Err()
    }
    return s.inner.Complete(ctx, req)
}

Enter fullscreen mode Exit fullscreen mode

MaxConcurrentCalls defaults to 5. Each retry attempt acquires its own semaphore slot — this is important. If retry logic held a slot while waiting between attempts, other goroutines would be blocked unnecessarily. The retry wrapper is the outer layer; semaphore is the inner layer.

The middleware stack assembled by the factory:

retry → semaphore → raw provider

Enter fullscreen mode Exit fullscreen mode

Layer 3: Result Cache

Many test generation runs touch the same files repeatedly — watch mode is the extreme case. Calling the LLM for the same source code twice is wasteful. A content-addressed cache avoids it:

type ResultCache struct {
    mu      sync.RWMutex
    entries map[string][]BodyGenResult
    hits    int
    misses  int
}

func cacheKey(req BodyGenRequest) string {
    h := sha256.New()
    fmt.Fprintf(h, "%s\n%s\n%s\n%s", req.Language, req.MemberName, req.SourceCode, req.Framework.Name)
    return hex.EncodeToString(h[:])
}

Enter fullscreen mode Exit fullscreen mode

The key is a SHA-256 hash of the language, member name, source code, and framework. If the source file changes, the hash changes and the cache misses — you always get fresh results for changed code.

After a run, --verbose prints the cache stats:

LLM cache — hits: 12  misses: 8  entries: 8

Enter fullscreen mode Exit fullscreen mode

Layer 4: Batch Generation

The fan-out approach makes one API call per public member. For a file with 10 functions, that's 10 calls. Batch generation collapses this to one:

func (g *LLMBodyGenerator) GenerateBatchBodies(
    ctx context.Context,
    reqs []BodyGenRequest,
) ([]BodyGenResult, error) {
    prompt := buildBatchPrompt(reqs)
    resp, err := g.provider.Complete(ctx, CompletionRequest{
        SystemPrompt:   batchSystemPrompt,
        UserPrompt:     prompt,
        Model:          g.model,
        MaxTokens:      g.maxTokens * len(reqs), // scale with request count
        Temperature:    g.temperature,
        ResponseFormat: "json_object",            // structured output
    })
    // ...
}

Enter fullscreen mode Exit fullscreen mode

We use OpenAI's response_format: {"type": "json_object"} to get structured output. The model returns a JSON envelope with one entry per member:

{
  "tests": [
    {"name": "ProcessPayment", "code": "func TestProcessPayment(t *testing.T) { ... }"},
    {"name": "RefundPayment",  "code": "func TestRefundPayment(t *testing.T) { ... }"}
  ]
}

Enter fullscreen mode Exit fullscreen mode

We parse that with a primary JSON parser, with a fallback to a delimiter-regex parser for providers that don't support structured output.

The pipeline checks for the BatchBodyGenerator interface via type assertion. If the generator implements it, batch mode is used. If not (or if the driver explicitly opts out), it falls back to goroutine fan-out with individual calls. This keeps the interface opt-in and backward compatible.

Observability: Cache Stats

With all this happening in the background, it's useful to know what actually ran. The cacheStatsReporter interface lets the CLI query stats without importing the llm package:

// In cmd/testsmith/generate.go — avoids importing internal/llm from the CLI layer
type cacheStatsReporter interface {
    CacheStats() (hits, misses, size int)
}

func printCacheStats(bg domain.BodyGenerator) {
    if !verbose {
        return
    }
    if r, ok := bg.(cacheStatsReporter); ok {
        hits, misses, size := r.CacheStats()
        fmt.Printf("LLM cache — hits: %d  misses: %d  entries: %d\n", hits, misses, size)
    }
}

Enter fullscreen mode Exit fullscreen mode

This is the interface segregation principle at work: the CLI knows about domain.BodyGenerator (which it needs for the pipeline) and cacheStatsReporter (which it needs for stats output). It doesn't need to know anything else about the LLM implementation.

The Numbers

In practice, on a mid-size Go project with 40 source files and an average of 6 public functions each:

  • Without batch: 240 API calls, ~4 minutes at 5 concurrent
  • With batch: 40 API calls (one per file), ~45 seconds
  • Second run with warm cache: near-instant for unchanged files

The cache and batch generation together turn what would be a "go make coffee" operation into something you can run while you're still in the flow.

Next: how we structure context for both AI agents working on TestSmith itself and for the LLM generating tests for your project.