惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Managing multiple docker hub accounts using docker-use System Design Interview: Decentralized Web Crawler 로컬 LLM 셋업 가이드 (v23) GEO vs SEO in 2026 — What Google's May Guidance Changed Cursor Review 2026 — Honest 'Not For Me' Take From a VSCode User Hello from rikuq — a practitioner blog for solo AI SaaS founders Why DevOps Engineers Need Practical Tutorials, Not Just Theory AI Agents in CI/CD: Give Them Context, Not Production Authority Why I Track HRV Every Morning (And How It Actually Changes My Day) Now I See Why Translators Are Panicking Over AI—Should Coders Panic Too? Diffusion Language Models: How NVIDIA's Nemotron-Labs DLM Is Killing Token-by-Token Generation Chatbots GPT pour le support client : ce que les équipes françaises ont réellement besoin de savoir I Hit the 1,232-Byte Wall So You Don't Have To Google Just Rebuilt the Search Box (Again) — But This Time It's Different Aether: A local Android assistant built with Gemma 4 BoxAgnts Introduction (1) — Out of the Box mkdev: trusted HTTPS for localhost, mapped by name Just one question, one answer. Why Java Still Rules the Programming World in 2026 Four Architectures for Letting Claude Edit Elementor (and Why We Shipped Clone-and-Mutate) yard-yaml 0.1.1: safer UTF-8 handling for YAML documentation I Built a Mac App That Keeps Your Clipboard in Sync Across All Your Android Devices Stop Using UUIDs: Why B2B SaaS Needs ULIDs in Laravel 🐘 I'm a non-technical founder who built a Slack approval tool. Here's what actually broke first. Open-Sourcing Our Game AI Stack — SDKs, Templates, and CLI Tools for NPC Dialogue I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line. Lets Encrypt DNS Challenge with Traefik and AWS Route 53 Building an agent-ready website: how to make your site readable for ChatGPT, Perplexity and autonomous agents A productivity tool with GitHub as your cloud database How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access cmux: The Native macOS Terminal Built for Running AI Coding Agents in Parallel Deep Atlantic Storage: Rewriting in Rust How I Built a Bulk Image Optimizer with $0 Server Costs Using Vanilla JS and Canvas API Humans and Machines read differently, I think I have a fix? Claude Code Deleted 92 Images Without Asking. This Happens More Than You Think. Method Calling Stack in Java I Built Schedule Sensei & Pushed It to GitHub – Here's What's Inside (And I Need Your Help 👀) OIC: From a Working Toast Watcher to a General "Watch It for Me" Agent Memory is two-thirds of what an AI chip costs to build The XState persistence problem is five years old. Here is what we built to finally solve it. i added MCP support to my SaaS in an afternoon. here's the whole thing. Framework: Link Building ☁️ Importing existing S3 buckets into Terraform state made easy with terraform import existing s3 bucket I Built a Token System on Solana (Without Any Backend Code) 터미널 AI 에이전트 구축 (v21) I Built an AI 3D Model Generator — Here's How I Handle Meshes in the Browser 🛡️ PromptGuard: I Built a Local AI Privacy Firewall That Sanitizes Your Prompts Before They Leave Your Machine PostgreSQL WAL Bloat: Why Automatic Management Is Often Insufficient? Seven PRs Before Lunch: Parallel Claude Code Tabs Plus Audit-Before-Bump Deployment using all three Kubernetes probes Qwen 3.6 Has Four Tiers. Here's How to Route Without Burning Cash. RAG 시스템 실전 구축 (v21) How I handle my errors in PHP The Blind Spot in Treasure Hunt Engine Configuration: Long-Term Server Health Run NVIDIA NIM on Your Own GPU — Same API, Different Endpoint Webflow SEO Implementation 로컬 LLM 셋업 가이드 (v21) How Logs Travel From Your EKS Pod to Datadog 𝗦𝘁𝗼𝗽 𝗖𝗿𝗮𝗺𝗺𝗶𝗻𝗴 𝗙𝗼𝗿 𝗘𝘅𝗮𝗺𝘀, 𝗦𝘁𝗮𝗿𝘁 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗥𝗲𝗮𝗹 𝗦𝗸𝗶𝗹𝗹𝘀 How to Use EXPLAIN ANALYZE in PostgreSQL: A Visual Guide gRPC Performance: tonic (Rust) vs grpc-go Benchmarked at Scale Hack The Box (HTB): Cap Machine (Full Walkthrough) Visual Search Optimization studygemma: AI study buddy for CS students Architectural Tradeoffs in Webhook Idempotency and SaaS API Versioning One Open Source Project a Day (No. 75): Understand Anything - The AI Engine That Turns Any Codebase Into an Explorable Knowledge Graph From mock-only-works to real-world-works: 48 hours of reCAPTCHA debugging I built a free music tool AI Talking Avatar Pipelines Broke Our Ad CTR by 3.7% 800G to 400G Breakout: How to Scale 400G Networks with 800G Ports 터미널 AI 에이전트 구축 (v20) Topical Authority Architecture Inside Hermes Agent's Session Memory: What X-Hermes-Session-Id Actually Does How Logs Travel From Your EKS Pod to Datadog The Hidden Journey Inside / Kubernetes Is it safe to connect my bank account to AI? No Room — The World of Aying (8/12) Fossils — The World of Aying (10/12) Familiar Stranger — The World of Aying (9/12) Being Seen — The World of Aying (7/12) [I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That Made It Work] Gemma 4: The 128K Multimodal Powerhouse in Your Terminal How to Consolidate Your QA Toolstack: A Practical Buyer's Guide The Thank-You Email Almost Nobody Sends (And Why That's Your Edge) Schema Types 2026 Idempotency Keys: The API Safety Net You're Probably Not Using How to let Claude see my Plaid bank data Kiro Did It: Build a Simple Portfolio Website with Kiro IDE | From Prompt to HTML Prototype Islands of Commerce: What Marketplace Founders Can Learn from 60 Years of Island Biogeography React Pointer Hooks: Hover, Long-Press, Double-Click, Scratch, and Click-Outside Without the Bugs Engineering decisions for my video call tool VBScript Still Lives: How a Custom Go VM Brought Classic ASP to Linux and Mac What Happens When You Teach Old Scripting Languages New Runtime Tricks? I Tested 6 AI Coding Assistants for a Month. Here's What Actually Works. Extendscript Still Has Life Afriex Webhook Integration Guide: Signature Verification, Event Handling, and Production Best Practices The Blind Alleys of Veltrix Configuration How an ESP32 Turned a LEGO WALL-E Into a Real Working Robot The Flawed Promise of Real-Time Event Handling SSH Login Taking Forever? Check Your DNS Settings Found 897 Fake Followers on DEV.to. Here's How I Proved It.
Metric Cardinality: High or Low? 4 Steps to Making the Right Choice
Mustafa ERBA · 2026-05-25 · via DEV Community

In metric collection systems, cardinality is a critical concept for balancing performance and cost. I have prepared a 4-step guide on how this balance is established in the real world.

In this post, I will explain what metric cardinality is, why it matters, and how we can find the right balance in our systems based on my own experiences. We won't just stick to theoretical knowledge; we will address this topic with concrete examples and steps.

What is Metric Cardinality and Why Should We Care?

Metric cardinality is the number of unique label combinations of the metrics we use in our monitoring systems. Simply put, the more different labels we use to define a metric, the higher its cardinality becomes. For example, when monitoring a server's CPU usage, adding labels like instance, job, region, and az increases cardinality.

This has a direct impact on storage space, query performance, and costs. High cardinality requires more disk space, causes queries to run slower, and leads to higher costs in cloud environments.

ℹ️ The Relationship Between Cardinality and Cost

For example, in systems like Prometheus, each unique label combination is stored as a separate time series. This directly increases disk usage and database load. To give an example, if you have 1000 servers and add a dynamic label like instance_id for each server instead of adding two static labels like environment: production and region: eu-central-1, you can quickly drive cardinality up to thousands or even millions of unique time series. This is a cost item that should not be ignored, especially in large-scale distributed systems.

The mindset of "let's label everything" when collecting metrics might provide more visibility initially, but in the long run, it can make our systems unmanageable. Therefore, we must definitely consider cardinality when defining our metric collection strategies.

Step 1: Analyze Your Existing Metrics and Labels

The first step is to understand which metrics you are collecting and what labels you have assigned to them. This analysis will help you identify unnecessary or over-labeled metrics. Most monitoring systems offer a list of current metrics and labels. In Prometheus, you can access this data using commands like promtool tsdb analyze or through visualization tools like Grafana.

During this analysis, you should determine which labels provide truly distinctive information and which are just repetitive or static values. For example, adding a label like deployment_version: v1.2.3 to every metric needlessly increases cardinality if your entire system is running the same version. Instead, it might make more sense to access this information in a different way.

💡 Practical Tips for Label Analysis

When examining labels for a metric, ask these questions:

  • Is the value of this label always the same? (e.g., environment: staging)
  • Do the values of this label actually provide distinctive filtering when querying the metric?
  • Can this label be removed without losing the meaning of the metric?
  • Can another method with lower cardinality be used instead of this label? (e.g., storing information within the metric name)

For example, if you have an API request count metric and use the http_method label, this makes sense because you might want to monitor GET and POST requests separately. However, adding a label like user_agent: <browser_info> to every request blows up cardinality and is usually unnecessary.

This analysis will reveal the "cardinality monsters" in your system. Recognizing these monsters is the first step to defeating them.

Step 2: Clean Up Non-Distinctive Labels

You should remove the unnecessary or repetitive labels you identified during the analysis from your system. This usually requires updating the configurations of your metric collectors (agents). For example, for Prometheus, you can use relabel_configs or metric_relabel_configs directives to remove unwanted labels.

It is important to be careful during this cleanup process. Accidentally removing an important label can limit your monitoring capabilities. Therefore, it is best to apply changes in a test environment first and carefully observe their effects. As an example, in a microservices architecture, a label like service_name is critical. However, more dynamic and always-unique labels like pod_name can increase cardinality and can be removed if they are not used in general queries.

⚠️ Things to Consider in Label Cleanup

When cleaning up labels, make sure you do not lose the meaning or queryability of the metrics. A metric_relabel_configs rule like the following drops the http_status_code label:

  metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'http_requests_total'
      action: keep # Only affect the http_requests_total metric
    - source_labels: [http_status_code]
      regex: '.*' # Target all http_status_code labels
      action: labeldrop # Drop the label

Before applying such a rule, think about the cases where you filter the http_requests_total metric by http_status_code. If this filtering is done frequently, try to find a solution with lower cardinality instead of removing the label entirely.

The cleanup done in this step will directly provide storage space savings and query performance improvements.

Step 3: Adjust the Metric Collection Level

In some cases, adjusting the collection level of metrics can also help keep cardinality under control. For example, for less critical systems or situations requiring less detailed monitoring, you can collect metrics with fewer labels or with less frequent sampling. Many monitoring tools allow you to adjust the sampling rate for specific metrics.

However, this approach also has trade-offs. Lower sampling rates can make it harder to detect transient issues. Therefore, when adjusting the collection level, you need to carefully evaluate your monitoring needs and potential risks.

🔥 Risks of Low Sampling Rates

For example, you are measuring the response time per request of a web server. If you perform this measurement too infrequently (low sampling), you might not notice sudden and short-lived performance drops. This can negatively affect the user experience and make it harder to find the root cause of the problem. Especially in situations like security incidents or sudden performance spikes, detailed and high-sampling-rate metrics can be lifesavers.

The goal in this step is to ensure that the collection level of each metric is at the level of detail truly needed.

Step 4: Use Static Values Instead of Dynamic Labels

Using static values instead of dynamic labels in metrics is one of the most effective ways to manage cardinality. For example, instead of a unique ID for each pod or container, it is better to use static labels that only indicate the environment (production, staging) or the service (auth-service, user-service).

This is especially important in cases where the values of labels change constantly. If you are assigning a constantly changing value to a metric, this metric is probably not the right way to access the desired information. For such information, using different mechanisms like logging or distributed tracing might be more appropriate.

ℹ️ Dynamic Labels and Their Alternatives

There may also be cases where dynamic labels must be used. For example, in a microservices architecture, you might want to know which specific pod a request came from. However, in this case, it makes more sense to access this information through a trace ID associated with the request itself, rather than through metrics.
This avoids using dynamic labels like pod_name directly in metrics.
For example, for the requests_total metric generated by a service, the service_name label can be used instead of pod_name.
While this ensures that service_name is more static and predictable, it prevents the cardinality issue that would arise from pod_name constantly changing.

By following these steps, you can effectively manage your metric cardinality, optimize your systems' performance, and reduce costs.

In conclusion, metric cardinality is an important but often overlooked aspect of system monitoring. With the right analysis, cleanup, and labeling strategies, it is possible to build both more efficient and more cost-effective monitoring systems. By applying these 4 steps, you can fully leverage the power of metrics in your systems.