惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
Lohrmann on Cybersecurity
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Recorded Future
Recorded Future
S
Schneier on Security
I
Intezer
Latest news
Latest news
N
News and Events Feed by Topic
Scott Helme
Scott Helme
T
Threat Research - Cisco Blogs
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
U
Unit 42
量子位
博客园 - 【当耐特】
S
Security @ Cisco Blogs
Google Online Security Blog
Google Online Security Blog
博客园 - 叶小钗
酷 壳 – CoolShell
酷 壳 – CoolShell
NISL@THU
NISL@THU
The Cloudflare Blog
李成银的技术随笔
T
ThreatConnect
L
LINUX DO - 最新话题
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
有赞技术团队
有赞技术团队
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Jina AI
Jina AI
T
Tor Project blog
The Hacker News
The Hacker News
人人都是产品经理
人人都是产品经理
小众软件
小众软件
S
Security Archives - TechRepublic
美团技术团队
博客园 - Franky
Security Latest
Security Latest
J
Java Code Geeks
P
Proofpoint News Feed
V
V2EX
The GitHub Blog
The GitHub Blog
WordPress大学
WordPress大学
Application and Cybersecurity Blog
Application and Cybersecurity Blog
H
Help Net Security
PCI Perspectives
PCI Perspectives
Cyberwarzone
Cyberwarzone
Hugging Face - Blog
Hugging Face - Blog
N
Netflix TechBlog - Medium
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
SecWiki News
SecWiki News
腾讯CDC
爱范儿
爱范儿
D
Docker

DEV Community

I wrapped Claude Code in a zsh function. Here's every decision I almost got wrong. Mobile Game Optimization: A Unity Developer's Checklist Git: Best Practices for Beginners Three days I lost chasing a ghost that was already dead on disk Guardrails for Agent Output: Pluggable Validation Before and After LLM Calls Gemma Forge: Local AI Without the Setup Wall From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot Runninig a forkbomb in Jenkins What’s Actually Happening When You Use Git Preventing Recursive Tool Loops in LangChain Agents Building a Rock-Paper-Scissors CLI with TypeScript — Union Types, Conditionals, and Jest Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory. Why Flutter Has Become the Go-To Framework for Fintech App Development We built a scripting language just for AI agents. Here's why. Stop building AI inboxes. Build decision layers instead. Meme Monday Why I Built @editora/ui-react? Are AI tools the next level of abstraction in software development? Identity on Solana: Your Wallet Is Your Account One API Call Changed Everything The Internet Career Nobody Talks About Enough: What Is DevRel? Solar Panel Wiring Diagram: Series vs Parallel Hello everyone! Glad to join the dev.to community I Built an AI Agent That Tailors My Resume - Here's How Agents Actually Work I Built a WhatsApp OTP + AI Chatbot Platform for African Businesses MTP Explained — And Why It Matters for Android on Mac Most Beginners Learn Full-Stack Development Backwards GitHub Glow-Up: Open Source, READMEs, Badges, Streaks, Git and gh CLI System Design Cheat Sheet: Concepts Every Developer Should Know Are Junior Developer Roles Actually Dying? A Fresher's Honest Take Using DigitalOcean Droplets as Ephemeral Sandboxes for AI Agents I built a VSCode extension that visualises your code navigation as a call tree — made for legacy codebase pain Vite predev/prebuild: chaining scripts without losing your mind A website to save you from messy browser tabs Dear Web2 Developer... Solana is here calling Postgres JSONB indexes: GIN vs BTREE on the same column The $5 AI That Remembers Everything What are your goals for the week? #180 Zettelkasten for Developers: A Practical Method That Works OpenClaw vs Hermes Agent: Stars, Downloads & Usage 2026 `act` vs. `waitFor` Global Teams Don’t Struggle With Time Zones. They Struggle With Context Python as a JavaScript Dev $5.4 Billion in Damage. 8.5 Million Machines Down. Three YAML Controls Would Have Prevented It. Here's the Structural Analysis. 🚫 Stop Using PN532 V1 for Your NFC Projects (Real Debugging Experience) Probabilistic Graph Neural Inference for smart agriculture microgrid orchestration for extreme data sparsity scenarios Inference Is Becoming the New Steady-State Cost Center Why AI-Generated Code Is Always Good Enough — And Never Great I built a dark admin dashboard template in HTML — no React, no npm, just pure HTML What is the Difference Between Lattice-Based and Hash-Based Signatures? Next.js App Router caching: revalidate, dynamic, and no-store without the folklore Next.js App Router caching: revalidate, dynamic y no-store sin folklore I built Stashly — a full-stack content manager with a rich text editor published: false tags: react, node, mongodb, typescript Why I Started Building React Projects Instead of Just Watching Tutorials ? Every Tool Eventually Becomes Tuesday Nobody Warns You That Real Software Engineering Feels Chaotic Tích hợp VNPay, Stripe trong Odoo 19 BeautifulSoup and Requests for Web Scraping With Python: When Simple Still Works I Was Stuck Debugging React — Then Developer Tools Changed It Buck Converter Ripple: Sizing the Inductor and Capacitor With Confidence AWS Just Made Its MCP Server Generally Available. Here's What It Actually Gives AI Agents. RAMPART Tests Your AI Agents in Dev. What Catches Malicious Tool Calls in Production? Vibe Team Software Engineering: What a Real AI Human Dev Team Workflow Actually Looks Like An npm Package for AI Agent Orchestration Just Shipped With Its Front Door Unlocked. Here's What the CVE Actually Reveals. Microsoft Foundry Just Added CI/CD for AI Agents. Here's What That Actually Changes. The Best Career Insurance Is a Tech Event You Don't Want to Attend Your GitHub Profile Already Tells Recruiters More Than Your Resume. Most Devs Just Don't Surface It. How to Add Execution Budgets to OpenAI Agents SDK Binary Tree Interview Problems: 6 Traversal Patterns, 15 Problems We trained a personal voice DoRA on Qwen3-8B for $1.50 — beat stock model 100% in blind A/B Stop Leaking API Keys: Why I Built a Local-First Vault for Developers 🔐 RAG Explained: How Retrieval-Augmented Generation Actually Works I Built a Fast Async JioSaavn API Wrapper in Python 🎧 chown & chgrp Deploying Your First App on Kubernetes: A Beginner's Guide (Minikube & Kind) Logs in code It's called a PR "review" for a reason DePIN GPU Market: The Failed Job Receipt Developers Should Demand Why Your AI Agent Monitoring is Wrong (And How to Fix It) Lock Down Your Cloud Shares: A Beginner’s Guide to Azure Files Security. Building a Multi-Channel Content Syndication Pipeline with EmDash Plugins Turn Your Phone Into Voice Input for Any React Text Field Which package is bloating your Docker image? Putting Claude Code Under Version Control: Configs Since July, Memory Since April What I Thought DevRel Was vs. What It Actually Is (A Mentee's Honest Take) What I Thought DevRel Was vs. What It Actually Is (A Mentee's Honest Take) 400 Million Tokens Burned Overnight Reviving My Linux Mastery Game from a Merge Conflict — A Finish-Up-A-Thon Comeback Don’t let AI break your collective thinking: a practical guide for engineering teams First Gemma 4 ExecuTorch Deployment on Raspberry Pi 5 — and Why It's 7.7 Slower Than llama.cpp Per-Turn Evaluation: Dynamic Governance for AI Agents The AI Triforce of seed4j: Power, Wisdom, and Courage for Your Dev Agent Your AI agent reports 80% task completion. It fabricated it. Pourquoi les overlays d'accessibilité ne tiennent pas leurs promesses (et ce que la FTC vient d'acter) AI May Break Product-Market Fit in Enterprise Software I’m Building Around the Gap Between AI Output and Repo Truth How to Build a Stripe Customer Portal in Next.js SaaS On-Demand Pricing Feels Safe - Until You See the Bill Building an Internal Developer Portal with Backstage A Production Deployment Guide After the Last Song
Why Too Many Parts Hurt ClickHouse Performance
Mohamed Huss · 2026-05-25 · via DEV Community

A lot of people initially think ClickHouse performance problems come from:

  • large queries
  • bad joins
  • massive datasets
  • missing indexes

And honestly, those things can matter.

But one of the most common operational problems in ClickHouse often starts much earlier:

too many tiny parts.

This is one of those issues that usually stays invisible at first.

Then suddenly:

  • merges fall behind
  • queries slow down
  • memory usage increases
  • inserts become unstable

And the cluster starts behaving strangely.


Every Insert Creates Parts

This is the first thing that’s important to understand.

In MergeTree-based engines, ClickHouse stores data as immutable parts.

Something as simple as:

INSERT INTO events VALUES (...);

Enter fullscreen mode Exit fullscreen mode

creates new parts on disk.

And this is completely normal.

ClickHouse is designed around this storage model.

So:

parts themselves are not the problem.

The real issue starts when parts begin accumulating faster than merges can stabilize them.


Why Tiny Inserts Become Dangerous

At smaller scale, tiny inserts may seem harmless.

For example:

  • inserting row-by-row
  • extremely frequent micro-batches
  • tiny streaming flush intervals

Initially:

everything still works.

But over time, the number of parts starts growing aggressively.

Now ClickHouse has to manage:

  • more metadata
  • more merges
  • more scheduling
  • more file operations

This creates operational overhead.

Meaning:

the system starts spending increasing resources managing fragmentation itself.


Why Merges Matter So Much

ClickHouse relies heavily on background merges.

These merges:

  • combine smaller parts
  • reduce fragmentation
  • improve compression
  • optimize query performance

Under healthy ingestion patterns, merges naturally keep the system stable over time.

That is the ideal state.

But problems start when:

parts created per second
        >
parts merged per second

Enter fullscreen mode Exit fullscreen mode

Now fragmented parts begin accumulating faster than ClickHouse can compact them.

And this is usually where instability slowly starts building.


The Dangerous Part Is That It Builds Slowly

This is what makes the issue tricky operationally.

You usually do not notice the problem immediately.

The cluster may look perfectly healthy initially.

Then gradually:

  • insert latency increases
  • merges lag behind
  • CPU usage becomes unstable
  • queries become heavier
  • replication slows down

And eventually ClickHouse may start throwing errors like:

Too many parts

Enter fullscreen mode Exit fullscreen mode

At that point, the merge system is already under serious pressure.


Queries Also Become More Expensive

A lot of people think parts only affect inserts.

But queries suffer too.

Because queries now need to:

  • open more parts
  • scan more metadata
  • coordinate more files

Even when the actual dataset itself is not massive.

So sometimes:

performance degradation comes more from fragmentation than raw data volume.

That is a very important operational insight.


FINAL Does Not Really Solve This

One thing that’s important to understand:

FINAL is not really a solution for too many parts.

For example:

SELECT *
FROM events FINAL;

Enter fullscreen mode Exit fullscreen mode

FINAL applies merge logic during query execution.

But the fragmented parts still physically exist underneath.

So if the system already has excessive fragmentation:

  • queries still scan many parts
  • merge pressure still exists
  • query execution can become heavier

Which means:

FINAL can actually become more expensive when fragmentation becomes unhealthy.

The real fix is usually improving ingestion and merge behavior itself.


Over-Partitioning Can Quietly Make This Worse

Another thing that often accelerates part explosion is overly granular partitioning.

For example:

PARTITION BY toYYYYMMDDhh(timestamp)

Enter fullscreen mode Exit fullscreen mode

instead of something broader like:

PARTITION BY toYYYYMM(timestamp)

Enter fullscreen mode Exit fullscreen mode

Now even small inserts may create parts across many partitions simultaneously.

Which means:

a single insert can end up creating multiple fragmented parts underneath.

And over time, merge pressure increases much faster than expected.


ClickHouse Also Has Ways to Help

Modern ClickHouse versions also support features like async inserts to help reduce excessive tiny-part creation.

Instead of immediately flushing every small insert into separate parts, ClickHouse can buffer inserts internally before writing larger parts to disk.

This helps reduce fragmentation and merge pressure in workloads that naturally produce smaller inserts.

But async inserts are not a replacement for healthy ingestion patterns themselves.

Stable batching still matters a lot.


Why Batch Size Matters So Much

ClickHouse generally performs much better with:

  • larger batches
  • fewer inserts
  • healthier merge behavior

Because fewer parts means:

  • fewer merges
  • lower metadata overhead
  • better compression
  • more efficient scans

This is one of the reasons ClickHouse ingestion patterns often look very different from traditional OLTP systems.


Too Many Parts Also Affects Startup and Recovery

Another thing people often discover late:

Large numbers of parts also affect:

  • startup time
  • replication recovery
  • metadata loading
  • server restarts

Because ClickHouse now has to:

  • scan part metadata
  • validate parts
  • rebuild internal state

before the server becomes fully operational again.

So the issue is not just query performance.

It becomes an overall operational stability problem.


The Important Lesson

One thing I’ve noticed with ClickHouse is that many performance problems are actually merge-management problems underneath.

And too many parts is one of the clearest examples of that.

Because the issue usually is not:

“ClickHouse cannot handle large data.”

The issue is more often:

fragmentation and merge pressure slowly became unhealthy.

That is a very different operational problem.


Final Thought

ClickHouse is extremely good at handling massive analytical workloads.

But it performs best when the storage engine is allowed to merge parts efficiently.

And sometimes the biggest performance problem is not the query itself.

It is the thousands of tiny fragmented parts quietly building underneath the system over time.