惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Jina AI
Jina AI
NISL@THU
NISL@THU
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
GbyAI
GbyAI
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
J
Java Code Geeks
B
Blog RSS Feed
Blog — PlanetScale
Blog — PlanetScale
Schneier on Security
Schneier on Security
V
Vulnerabilities – Threatpost
C
CXSECURITY Database RSS Feed - CXSecurity.com
V
Visual Studio Blog
宝玉的分享
宝玉的分享
Recent Announcements
Recent Announcements
T
True Tiger Recordings
F
Full Disclosure
Martin Fowler
Martin Fowler
D
Docker
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
A
About on SuperTechFans
雷峰网
雷峰网
Know Your Adversary
Know Your Adversary
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Hacker News: Ask HN
Hacker News: Ask HN
B
Blog
V
V2EX - 技术
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google DeepMind News
Google DeepMind News
S
Security Archives - TechRepublic
Google DeepMind News
Google DeepMind News
人人都是产品经理
人人都是产品经理
Malwarebytes
Malwarebytes
C
Check Point Blog
美团技术团队
P
Privacy International News Feed
Recorded Future
Recorded Future
博客园 - 司徒正美
T
The Blog of Author Tim Ferriss
L
LangChain Blog
Project Zero
Project Zero
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
P
Proofpoint News Feed
Scott Helme
Scott Helme
C
CERT Recently Published Vulnerability Notes
云风的 BLOG
云风的 BLOG
T
ThreatConnect
F
Fox-IT International blog

DEV Community

I Thought Coding Was The Job Beginning to market Why Your Treasure Hunt Engine Kept Crashing at 1.2M Concurrent Connections Kiln Crisis Management: Controlling Irregular Raw Meal in CCR Using Python The Grilling Optimizing a High-Throughput Browser-Based Box Shadow Generator: Debounced State Updates and Chunked File Readers I Was Spending $3,200/Month on GPT. Then I Tried Chinese Models. Why You Must Stop Pasting Production Payloads into Web Decoders: Building a Secure Base64 Decode Strategy Message Brokers Comparison 2026 — Kafka, RabbitMQ, NATS & Redis Streams: Which One Should You Choose? Your Git Tree Looks Like a Crime Scene: How to Write Commits That Don’t Suck I tried every popular library for programmatic PDF form filling. None of them survived production The const enum that took down our payments Architecture of Chaos Part 3 — Event Sourcing Saved Our Audit Trail, Then a Fiber Cable Broke Stop Paying Per Cert. It's Crazy. Building Embeddable Browser Games for Website Engagement Build a Privacy-First Tampermonkey Script for Long ChatGPT Conversations XSS Attacks Are Everywhere: Reflected, Stored, DOM-Based — How to Actually Fix Them (2026) Stop letting LLMs hallucinate dates — a tool for AI agents The Platform Team Became a Finance Team /align v0.8 — personal evals for Claude Code, maintained by an LLM agent Copilot helped me deploy my passion project to the App Store Software Engineering: The Art of Thinking Out Loud (with AI) Leaked Kubernetes Secrets: Impact Assessment and Mitigation Strategies First 90 days as a junior engineer on an AI-heavy team: what to learn first Something Honest About Being a Developer on This Kind of Team JSON Schema Validator Advanced Techniques for Power Users I Built Hermes Immune System — A Safety Lab for AI Agents Google I/O 2026: MCP Is Now Infrastructure (Spark, Managed Agents, WebMCP & More) Probabilistic Graph Neural Inference for deep-sea exploration habitat design for extreme data sparsity scenarios QuantConnect Review: Running 2,400 Backtests Without Installing a Single Python Library The Complete Guide to Video APIs in 2026 (And Why Your Choice of Tool Actually Matters) Alpha Vantage vs Yahoo Finance API: Free Market Data for Side Projects — An Honest Comparison Day 20 of 60: I Built a Production-Grade Authentication System with JWT Tokens and API Key Managemen Nobody on the internet knows if you are a human The fastest way to optimize images for your web projects (Zero Server Roundtrips) We Got Burned by Veltrix Configuration Layer and Lived to Tell the Story Why Block Handed Goose to the Linux Foundation: Agentic AI Goes Open The Delve Scandal Proved SOC 2 Is Broken — Here's What Micro-SaaS Founders Should Do Instead OpenTelemetry: The Foundation of Modern Cloud-Native Observability — Traces, Metrics, Logs, and the Future of Observability Arc Browser Review: 18 Months With a Browser That Thinks Differently [Boost] Docker healthchecks: what they actually measure and what you shouldn't promise Docker healthchecks: qué miden de verdad y qué no deberías prometer I Built an AI That Roasts Cold Emails — Here's What 18,000 Drafts Taught Me Are You My Parent?: Scaffolding in the architecture necessary for keyboard handling between components. The AI Labs Found Product-Market Fit in April How I Stopped Fighting AI Context: JetBrains AI vs. Copilot in Rider I Accidentally force-pushed to main at 11 PM — So I Built an Interactive Git Undo Tool Perplexity Spaces vs You.com vs Phind: which AI search fits your dev research workflow I'm 14, can't code, and built a cognitive state app in one day — here's what happened Three Cloudflare Patterns Earned the Hard Way Aider Review: The Open-Source AI Pair Programmer That Works With Any LLM How to Measure and Improve Core Web Vitals in Under 30 Minutes Standardizing Feature Flags Is Easy to Agree On. Migrating Safely Is the Hard Part. What if UI tests validated user experience instead of selectors? Why I Stopped Believing 'Best Practices' and Started Trusting 'Works For Us' PrestaShop Doctrine: Automatically Manage the DB Prefix PrestaShop Enterprise vs Shopify Plus A .NET Dinosaur in Web3 — Day 15: DAO Voting Halyra IDE Wearable App Development Cost: How to Build a Quality MVP Without Overspending New in Vue - May 2026 427 Remote Companies Using TypeScript in 2026 MCP CI gates need receipts: tools/list is not enough 📖 DICTIONARIES IN PYTHON: THE SMART DATA VAULT I Generated a Tableau Dashboard Using Gemma 4 — Locally, No API Key, No Cloud The Hidden Way Electronics Can Start a Fire — Even Without an Open Flame I Built a Beginner-Friendly NGINX Automation CLI for Linux Servers Vibe Thinking - The PM Who Writes Requirements That an AI Can Actually Use A Refreshing Perspective on AI and Truth Kubelet Metrics: How cAdvisor and CRI Collect Kubernetes Stats How to Optimize MongoDB on Bare Metal Servers: SRE Playbook Why I Built Bamise Instead of Using Laravel How to Build a Clean Academic Dataset Without Losing Your Mind (or Your Weekend) Kubernetes Is Eating Your Budget: How to Fix EKS Over-Provisioning What Awnings Taught Me About Developer Experience Tree Traversal: Why the Order You Pick Is a Data Flow Decision I built my own forum using PHP- it came out great Optimizing Chunking and Data Extraction for Zero-Hallucination RAG Controlling Blender with AI — Building an MCP Server for 3D Creation 5 Smart Contract Vulnerabilities Every Developer Should Know in 2026 Cursor users who write failing tests before prompting the AI complete features in 37% fewer iterations than those who pr When AI Becomes a Danger: 370,000 Grok Conversations Exposed I Refactored 100 Functions With Claude. CI Was Green. Production Got Slower in 7 Spots. I read my own commits like a stranger Child Safety vs. Data Center Dollars The Reason Your AI Chatbot Feels Fast Has Nothing to Do With a Better Model Beyond Vibe-Coding What I learned testing AI translation tools in 2026 (DeepL is still good, but LLMs caught up) AWS ECS Fargate Cost Allocation: Why Your Per-Cluster Spend Shows as One Line How to Surface License Violations in GitHub Advanced Security with feluda We Deleted 10 Real Users with a Test-Cleanup Script — RCA The Decision Subtraction Framework: How to Evaluate Any AI Tool How I Access My Home PC From Anywhere Without Spending a Penny # agents.md: Teaching AI Agents How to Scrape (The Future of Web Automation) KAI vs Global vs Tojiro vs Miyabi: How to Actually Tell Japanese Knife Brands Apart Why We Accidentally Blocked Our Users: A Deep Dive into Idempotency in Distributed Systems I Connected Hermes Agent to a Live MCP Server with 59 Tools and Here's What It Actually Built Our first app is finally live on the Play Store after 4 months of hard work 🚀 I Built UUIDs That Look Random But Sort Like Timestamps (50% Smaller Indexes!)
Introducing Batch Processing for ZeroGPU
Josh at Zero · 2026-05-28 · via DEV Community

Josh at ZeroGPU

Running AI inference one request at a time works well for real-time product experiences. But many workloads do not need an immediate response. Data enrichment, classification, extraction, content moderation, summarization, and offline analytics often involve hundreds or thousands of requests that can be processed asynchronously.

That is where the ZeroGPU Batch API comes in.

With Batch Processing, you can upload a JSONL file, submit it as a batch job, and retrieve the results when processing is complete. It is designed for large asynchronous workloads where throughput, reliability, and simplicity matter more than instant response time.
Why Batch Processing?

Many AI workflows are naturally asynchronous.
For example, you might want to:

  • Classify thousands of documents.
  • Extract structured data from customer records.
  • Run content moderation over historical user-generated content.
  • Summarize support tickets, reviews, or research notes.
  • Process backfills or recurring data pipelines.

Sending each request individually can add unnecessary orchestration complexity. You need retry logic, request tracking, output matching, rate management, and failure handling.

The Batch API gives you a cleaner workflow.

How It Works
Batch Processing in ZeroGPU follows a simple file-based flow:

  1. Create a JSONL input file.
  2. Upload it using the Files API.
  3. Create a batch using the returned file ID.
  4. Poll the batch until it completes.
  5. Download the output and error files.

Each line in the JSONL file represents one request. ZeroGPU processes those requests asynchronously and writes the results back to output files.

A minimal input line looks like this:

{“custom_id”:”request-1",”method”:”POST”,”url”:”/v1/chat/completions”,”body”:{“model”:”your-model-id”,”messages”:[{“role”:”user”,”content”:”Classify this text.”}]}}

Enter fullscreen mode Exit fullscreen mode

The custom_id is returned in the output, so you can match every result back to your original input.

Built For AI Workloads At Scale

The Batch API is especially useful when you need to process a large amount of data without holding open client connections or building your own job orchestration layer.

ZeroGPU currently supports batch jobs for /v1/chat/completions, with JSONL files uploaded through /v1/files.

The core endpoints are:

POST /v1/files to upload input JSONL.
POST /v1/batches to create a batch job.
GET /v1/batches/{batch_id} to check status.
GET /v1/files/{file_id}/content to download results.

Enter fullscreen mode Exit fullscreen mode

This makes batch processing easy to integrate into existing backend systems, cron jobs, data pipelines, and internal tools.

OpenAI-Compatible Shape
ZeroGPU’s Batch and Files APIs are wire-compatible with the OpenAI-style batch workflow, while using ZeroGPU authentication headers:

x-api-key: your-api-key
x-project-id: your-project-id

Enter fullscreen mode Exit fullscreen mode

That means developers familiar with OpenAI batch jobs should feel at home, while still getting ZeroGPU’s routing, project isolation, logging, and model infrastructure.

When Should You Use Batch?
Use the real-time API when your user is waiting for a response.
Use the Batch API when the work can happen in the background.
Good fits include:

  • Nightly data processing.
  • Bulk document classification.
  • Large-scale extraction jobs.
  • Offline analytics.
  • Backfills.
  • Evaluation datasets.
  • Reprocessing historical data.

Batch jobs are also easier to audit because each request has a stable custom_id, and outputs are written to downloadable files.

Get Started

The fastest way to try it:

  1. Prepare a JSONL file.
  2. Upload it with POST /v1/files.
  3. Create a batch with POST /v1/batches.
  4. Poll for completion.
  5. Download the output file.

You can try the new interactive playgrounds in the ZeroGPU docs:

Upload file: /api-reference/batch/upload-file
Create batch: /api-reference/batch/create-batch
Retrieve batch: /api-reference/batch/retrieve-batch
Download file: /api-reference/batch/download-file

Enter fullscreen mode Exit fullscreen mode

Batch Processing makes it easier to run AI workloads at scale without managing queues, workers, retries, or GPU infrastructure.

ZeroGPU handles the execution. You focus on the data.