惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

DEV Community

EClaw vs Slack and Mattermost for Multi-Agent Workflows 🐍 Custom Django middleware request response — what devs get wrong I Built a Free Interactive GitHub Learning Platform — Web Guide + Terminal Guide + Git Reference + CLI Sandbox 9 Dart Syntactic Sugar Features That Make My Codebase Happier The Day We Realized Events Were the Bottleneck (And Why We Moved to Rust) Stripe and Friendly Fraud: What the HN Crowd Got Right — and What Progenix Does About It BGP Knowledge for Indie Hackers: Is It Really Necessary? LangGraph vs CrewAI vs AutoGen in 2026: Pick the Right AI Agent Framework (Or Skip Frameworks Entirely) How to Brier-grade your own ML option-pricing forecasts in 40 lines of Python I built a Rust inference engine that streams MoE expert weights from NVMe SSDs, no GPU required Open vs Closed LLMs in 2026: The Game-Changing Convergence [03:32:15] AI Agents Are Quietly Taking Over Your Industry — Here's What's Happening [03:32:02] Understanding React Rendering Flow I shipped 29 browser-only image tools. These 5 boring patterns kept the codebase sane Your Treasure Hunt Engine Was Probably a Latency Minefield (And Heres the Postmortem) Before You Add More Agents, Design the Control Plane 𝗖𝗮𝗰𝗵𝗶𝗻𝗴 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 𝗘𝘅𝗽𝗹𝗮𝗶𝗻𝗲𝗱 (Backend & Frontend Developers) I Let AI Replace Me for a Week as a (Kinda Junior) AI Engineer 😅 The Day Our Configs Were Backwards (And How Rust Fixed It) Deploying NextDNS Router-Side to Strip Ads From Video Discovery Traffic I Migrated Redis to KeyDB — Same Protocol, 5x Throughput, $0 Rewrite Vibe Coding for Senior iOS Developers - 6 Takeaways after Shipping 10 Apps in 4 Months Revisiting Benchmarking- Building a Rust A2A Agent I Built a Daily News Newsletter Bot with Hermes Agent — Here's Everything That Went Wrong (and Right) The Django Singleton Model: How to Manage Page Headers Without a CMS I built 51 free browser-based developer tools — here's why and how How I Built a 28-Tool AI Video SaaS Solo with Python, Flask and OpenAI xAI Just Dropped 'Grok Build': The Terminal-Native Agentic AI Changing How We Code Solana's Account Model Explained By Someone Who Got Confused By It First That 0.8 second P99 Latency Cliff in Production Wasnt Supposed to Happen Chia sẽ câu hỏi pv backend dev REST API Design: Building APIs Developers Love (2026) Code Signing a Tauri App for macOS — The Complete Flow Adding Gemma 4 speech recognition to a .NET desktop app: the llama-server sidecar that survived The Moment We Realized Our Treasure Hunt Engine Was Lying to Us Is it a good practice to use a single Builder pattern for both Creating and Updating an entity? BMAD Method + Claude Code: How I Actually Ship Projects with Spec-Driven AI Development I Vibe-Coded a Stock Screener Into Production. Then My 2GB Server OOMed and Google De-Indexed Me. Developing WriterzRoom: Governed Multi-Agent AI for Regulated Content Workflows I Built a Profiler to Audit My Own AI Tool Calls. Here's What I Learned About Observability contributions. From Simple GitHub Contributions to a Production Wikimedia Merge — My Open Source Journey as Gautam Kumar Maurya (GKM) What Is Identity on Solana? (For Web2 Developers) RAG - Sparse Embedding On Age Verification Repo Drift Is the Hidden Cost of AI Coding Agents — and one Fix Is Simpler Than You Think Building an Image-to-3D Workflow with Pixal3D: From One Image to a GLB Asset Rust Was the Constraint: How We Discovered the Language Was Our Scaling Bottleneck Infinite Tool Call Loops in LangChain Agents: A Real Fix Estimating Distance to BLE Beacons Using RSSI and TxPower in HarmonyOS How I Used Kubernetes Documentation Effectively During the CKA Exam Agentic Transformation: From AI Assistance to Engineering Leverage When Your ChatLlamaCpp Stream Causes an Infinite Loop MartinLoop: a control plane for AI coding agents Stop Cloning Entire Repos for Your Doc Builds Rux: A Modern Systems Programming Language Worth Watching Building calculatefreelance: A lightweight Next.js utility for the 1099 economy MUDs — The Grandfather MMOs Chapter-marker survival across the EPUB to multi-voice audio pipeline Magnifica Humanitas: How the Pope walked into the room full of AI engineers and said what few else dared to say. Race-Condition: How a Single SQL Line Eliminated 100 Lines of Retry and Lock Code Multi-Line Formatting by Default AI Agents Also Need ID - When Your AI Assistant Starts Using Your Credit Card rdev-go-ddgen: Automating Domain Directory Boilerplate for Go Applications refactor: optimize core execution modules and integrate ContractGuard logic How does VuReact implement Vue v-on in React I Replaced My Entire Business Stack with 4 Notion Templates We Tried 6 Memory Providers for Hermes Agent — Here's What We Learned Can Google Antigravity 2.0 Pass the "Napkin Challenge"? 📝🚀 Multiplexing SSH Connections with Control Master: Speed Up Deployments and Automation I Built a Screenshot-to-React Generator in 3 Hours Why 'AI Without Hype' Stopped Differentiating in 2026 A SEC filing research prompt pack for source-aware stock research SchemaSpy vs SchemaCrawler - Which Database Documentation Tool is Right for You? One of the First Public HiDream-O1-Image LoRAs — and How to Train Your Own Human-in-the-Loop: The Most Important Concept in AI That Keeps You Employed TIL 5/22/2026 How We Shipped more than 60 Design System Components in 5 Weeks Using Figma as the Single Source of Truth Why HVAC Owners Lose More Money in the Office Than They Make in the Field What will you think of when you read about a neural network!!? Mathematics? 🤔 I Built a Free Finance Dashboard as a Solo Dev — Here's What I Learned Drive JHipster with your AI agent: introducing jhipster-mcp (v0.0.4) Pokemon Battle Simulator Napkin Challenge! Looking for a Founding Engineer Copy Job CDC with SQL estate is now GA in Microsoft Fabric what terminal for CLI in Windows 10 do users like most Is Claude API Worth $3/1M Tokens Over Self-Hosted Llama? Vibe Coding Meets Spec-Driven Development: The Best of Both Worlds We Asked 10 LLMs to Write Efficient Code. Only 4 Got Better. 10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble. Building a Browser-Based Free Isometric Illustration Maker for Modern UI Animation Workflows Use Blunt Prompts and Get Shit Done MCP servers are just REST APIs in a polite wrapper - here's 5 lines of Python I Got Tired of LLMs Hallucinating Compliance, So I Built an Open-Source Governance Layer Containers & Agents with Docker & OpenClaw All About AI & Using Claude On the Shoulders of Giants: Package Registries, Node & NPM Decoupling Webhook Verification and Automating Unstructured Data Ingestion Why flag_shih_tzu is changing its default SQL for bit flags Cómo construí una calculadora de interés compuesto con JavaScript vanilla y por qué todo el mundo debería usar una The Hard Part of Building a Realtime Binary Options Platform Was Not the Chart
Bulk Downloading 1688 Product Images: A Lesson in Maxing Out Bandwidth
yanmoheluo · 2026-05-27 · via DEV Community

ur purchasing system suddenly went down. Monitoring showed that outbound bandwidth was maxed out at 500Mbps, causing all external API requests to timeout. The culprit was a script for bulk downloading 1688 product images—it launched 200 concurrent download threads without any rate limiting, completely saturating our shared bandwidth.

Problem Scenario: A Brutal Approach to Image Downloading

We needed to sync approximately 3,000 1688 products daily, including main images and detail images, averaging 5 images per product. The initial implementation was straightforward but crude:

// Old brute-force download script
function downloadAllImages($productIds) {
    foreach ($productIds as $id) {
        $images = get1688ProductImages($id); // Call 1688 API to get image URL list
        foreach ($images as $url) {
            $content = file_get_contents($url); // Synchronous blocking download
            file_put_contents("/images/$id/".basename($url), $content);
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

This script had 3 critical issues:

  1. No concurrency control: While file_get_contents is synchronous, the outer loop had no limits, resulting in massive HTTP requests fired simultaneously
  2. No retry mechanism: If an image download failed (e.g., network jitter), the script simply skipped it, leaving product images missing
  3. No bandwidth limiting: 200 concurrent requests downloading simultaneously, each averaging 2MB, instantly consumed 400MB of bandwidth

The immediate consequence: all other business operations (including order processing and logistics queries) were interrupted for 18 minutes. We had to manually kill the process and spend 2 hours re-downloading the failed images.

Solution: A Downloader with Rate Limiting and Queue

We redesigned the downloader using Guzzle's async capabilities, adding bandwidth control and retry mechanisms.

Step one: Use Guzzle's concurrent request pool with a maximum concurrency limit.
Step two: Implement a simple token bucket algorithm for bandwidth control.

// New rate-limited downloader
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;

class ThrottledImageDownloader {
    private $client;
    private $concurrency = 10; // Maximum concurrency
    private $bandwidthLimit = 50 * 1024 * 1024; // 50MB/s bandwidth limit
    private $tokens;
    private $lastRefillTime;

    public function __construct() {
        $this->client = new Client(['timeout' => 30]);
        $this->tokens = $this->bandwidthLimit;
        $this->lastRefillTime = microtime(true);
    }

    // Token bucket algorithm for bandwidth control
    private function consumeBandwidth($bytes) {
        $now = microtime(true);
        $elapsed = $now - $this->lastRefillTime;
        $this->tokens = min($this->bandwidthLimit, $this->tokens + $elapsed * $this->bandwidthLimit);
        $this->lastRefillTime = $now;

        if ($this->tokens < $bytes) {
            $sleepTime = ($bytes - $this->tokens) / $this->bandwidthLimit;
            usleep($sleepTime * 1e6);
            $this->tokens = 0;
        } else {
            $this->tokens -= $bytes;
        }
    }

    public function downloadBatch(array $imageUrls) {
        $requests = function ($urls) {
            foreach ($urls as $url) {
                yield new Request('GET', $url);
            }
        };

        $pool = new Pool($this->client, $requests($imageUrls), [
            'concurrency' => $this->concurrency,
            'fulfilled' => function ($response, $index) use ($imageUrls) {
                $content = $response->getBody()->getContents();
                $this->consumeBandwidth(strlen($content));
                // Save image logic
                $filename = basename($imageUrls[$index]);
                file_put_contents("/images/$filename", $content);
            },
            'rejected' => function ($reason, $index) use ($imageUrls) {
                // Retry on failure, up to 3 times
                $this->retryDownload($imageUrls[$index], 3);
            },
        ]);

        $pool->promise()->wait();
    }

    private function retryDownload($url, $maxRetries) {
        for ($i = 0; $i < $maxRetries; $i++) {
            try {
                $response = $this->client->get($url);
                $content = $response->getBody()->getContents();
                $filename = basename($url);
                file_put_contents("/images/$filename", $content);
                return;
            } catch (\Exception $e) {
                if ($i === $maxRetries - 1) {
                    // Log failure
                    error_log("Failed to download $url after $maxRetries attempts");
                }
                sleep(pow(2, $i)); // Exponential backoff
            }
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

Key improvements:

  • Concurrency control: concurrency set to 10, preventing instant bandwidth saturation
  • Token bucket rate limiting: The consumeBandwidth method ensures downloads don't exceed 50MB per second
  • Exponential backoff retry: Wait 2^i seconds after failure, with a maximum of 3 attempts

Lessons Learned: From Bandwidth Disaster to Stable Sync

After deploying the new downloader, we ran A/B tests. The old script took 12 minutes to download images for 3,000 products (~15,000 images), but consumed 500Mbps of bandwidth. The new script took 18 minutes for the same task, but bandwidth remained stable at 45-50Mbps with zero impact on other services.

Further optimization: Incremental downloads and caching

We also added a simple file hash check to avoid re-downloading existing images:

// Incremental check - only download new images
function needsDownload($url, $localPath) {
    if (!file_exists($localPath)) {
        return true;
    }
    // Check if remote file has been updated via HEAD request
    $headers = get_headers($url, 1);
    $remoteSize = $headers['Content-Length'] ?? 0;
    $localSize = filesize($localPath);
    return $remoteSize != $localSize;
}

Enter fullscreen mode Exit fullscreen mode

This optimization reduced daily incremental sync time from 18 minutes to 3-5 minutes, since only about 10% of product images are updated daily.

Summary: When bulk downloading third-party resources, never assume that "faster is better." Brute-force concurrent downloads may seem efficient, but they often sacrifice system stability. Rate limiting, retry mechanisms, and incremental checks are the three core elements of a reliable download system. If your image sync script is still running file_get_contents without protection, it's time for an upgrade.

Has your system encountered similar issues when handling large volumes of external resource downloads? Feel free to share your solutions.

About the Author: Building cross-border purchasing solutions with taocarts — a daigou system for 1688/Taobao purchasing, order management, and international shipping.