惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Datadog | The Monitor blog

Reduce CVE noise with OpenVEX assessments in Datadog How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability How to audit and clean up monitors effectively Diagnose slow PostgreSQL queries faster with explain plan correlation Explore Datadog metrics with Natural Language Queries Toto 2.0: Time series forecasting enters the scaling era Simplify micro-frontend observability with Datadog RUM Attribute AI costs across providers with Datadog Cloud Cost Management Diagnose and resolve database performance issues faster with Database Investigator Datadog for Government achieves FedRAMP® High certification Analyze cloud costs with flexible spreadsheets in Datadog Sheets Inside Datadog’s AI Research Lab: Meet two PhD candidates behind Toto Connect triage and investigation in a single workflow with Datadog Cloud SIEM This Month in Datadog - April 2026 Monitor and optimize Supabase query performance with Datadog Database Monitoring Add dynamically updating context to logs with Reference Tables and Observability Pipelines Introducing ARFBench: A time series question-answering benchmark based on real incidents The product signal latency gap slowing your growth Test network paths with TCP, UDP, and ICMP in Datadog Turn developer feedback into operational insight with Datadog Forms and Sheets How to investigate cloud credential compromise with Bits AI Security Analyst Evaluate, optimize, and secure your Google Cloud AI stack with Datadog Bringing observability data hosting to the UK on AWS Identify and fix code issues faster with Datadog’s Azure DevOps Source Code integration Steganography at scale: Embedding share URLs in Datadog widget screenshots Every team should be A/B testing Centralize observability management with Datadog Governance Console Spotting CI/CD misconfigurations before the bots do: Securing GitHub Actions with Datadog IaC Security Route OTel data from AI apps to ClickHouse and Datadog using Observability Pipelines Manage service tracing across hosts with Single Step Instrumentation rules Offline evaluation for AI agents: Best practices Detect runtime threats in Python Lambda functions with Datadog AAP Introducing our open source AI-native SAST Instrument and monitor Boomi integration flows with OpenTelemetry and Datadog Not all index scans are equal: How we cut query latency by over 99% Platform engineering metrics: What to measure and what to ignore Integrate Recorded Future threat intelligence with Datadog Cloud SIEM CI/CD security: threat modeling using a MITRE-style threat matrix CI/CD security: How to secure your GitHub ecosystem Ingress NGINX is EOL: A practical guide for migrating to Kubernetes Gateway API How we built a real-world evaluation platform for autonomous SRE agents at scale Operating agentic AI with Amazon Bedrock AgentCore and Datadog LLM Observability: Lessons from NTT DATA Introducing the Datadog Code Security MCP Capture and analyze custom heatmaps in Session Replay Understand session replays faster with AI summaries and smart chapters Monitor ClickHouse query performance with Datadog Database Monitoring How we designed empathetic alert sounds for on-call engineers Search and act across Datadog to resolve issues faster with Bits Assistant Measure the business impact of every product change with Datadog Experiments Analyzing round trip query latency Configuring JavaScript caches for better performance Introducing Bits AI Dev Agent for Code Security Datadog achieves ISO 42001 certification for responsible AI Monitor Nutanix clusters, hosts, and VMs with Datadog Monitor Juniper Mist in Datadog A new Host Map for modern infrastructure When upserts don't update but still write: Debugging Postgres performance at scale Annotate traces to improve LLM quality with Datadog LLM Observability What's new in Cloud SIEM: AI-powered investigations, enhanced threat intelligence, and scalable security operations Explore Kubernetes with native OpenTelemetry data Monitor Oracle Fusion Cloud Applications with Datadog Announcing the Datadog Terraform provider v4.0.0 Scaling Kubernetes workloads on custom metrics How to design cloud environments for AI-powered threat analysis Monitor Aruba Central in Datadog How we centralize and remediate risks with Datadog Case Management Accelerate incident response with Datadog and ServiceNow Monitor your application and network load balancer logs Understanding Karpenter architecture for Kubernetes autoscaling Tools for collecting metrics and logs from Karpenter Monitor Karpenter with Datadog What your product data is actually saying Key metrics for monitoring Karpenter Securing Datadog's platform in the AI age: The role of observability data Closing the verification loop: Observability-driven harnesses for building with agents When an AI agent came knocking: Catching malicious contributions in Datadog’s open source repos Closing the verification loop, Part 2: Fully autonomous optimization Four ways engineering teams use the Datadog MCP Server to power AI agents Approaching your observability migration with the right mindset Meet the new Bits AI SRE: Deeper reasoning, twice as fast Designing MCP tools for agents: Lessons from building Datadog's MCP server Key learnings from the 2026 State of DevSecOps study Use plain English to query your multi-cloud infrastructure in Resource Catalog Simplifying troubleshooting across the user journey with Datadog Synthetic Monitoring Protect your OCI resources with Datadog Cloud Security This Month in Datadog - February 2026 Fine-tune Toto for turbocharged forecasts Amazon EC2 security: How misconfigured and public AMIs expand your cloud attack surface Enable end-to-end visibility into your Java apps with a single command Measure and improve mobile app startup performance with Datadog RUM Evaluating our AI Guard application to improve quality and control cost Identify untested code across every level of your codebase Make use of guardrail metrics and stop babysitting your releases Monitor Versa Networks SD-WAN performance in Datadog How we reduced the size of our Agent Go binaries by up to 77% Improve performance and reliability with APM Recommendations Remediate transitive vulnerabilities faster with Datadog Software Composition Analysis Generate audit-ready vulnerability and compliance reports with Datadog Sheets Monitor Fortinet FortiManager performance in Datadog Improve test coverage across codebases with Datadog Code Coverage
How to optimize high-volume log data without compromising visibility
2025-04-17 · via Datadog | The Monitor blog
Edith Méndez

Edith Méndez

Melanie Yu

Melanie Yu

Aaron Kaplan

Aaron Kaplan

As distributed systems grow in complexity and the threat landscape evolves, Security, DevOps, and other teams are faced with an explosion of log data—often hundreds of terabytes per day—from a growing number of on-prem and multi-cloud sources. As a result, managing log data efficiently has become more complex, more costly, and more challenging than ever before.

Meanwhile, organizations are grappling with the rigid pricing models and rising, frequently unpredictable costs of many logging platforms and SIEM tools. Cold-storage solutions, truncated retention periods, and data filtering and sampling can help rein in the growing costs associated with log management, but not without trade-offs: Overreliance on these methods can put critical data out of reach in moments of urgency, weakening security, prolonging incidents, and damaging trust. As a result, organizations are often forced into difficult compromises between visibility, security, and cost efficiency.

This post will explore strategies for optimizing how you manage high-volume logs in order to maintain critical visibility while controlling costs. In it, we’ll discuss the importance of knowing how your logs align with your business priorities as well as three best practices for cost-effective log management at scale:

Know how your logs align with your business priorities

Different teams have different priorities when it comes to logs, and every team tends to believe their logging needs are paramount. Governance engineers often want unrestricted access to every possible type of log for compliance and auditing purposes. SREs and DevOps teams collect a diverse range of logs for troubleshooting. Security teams prioritize log retention for threat detection and forensic investigations. And CTOs and business leaders are left to balance all of these needs with budget constraints.

As a first step, it’s essential to understand where your log spend is going and consider exactly how each of the many types of logs you collect fits into your business priorities. This means answering a few key questions:

  • Which of your services are the most critical sources of logs?
  • How are logs currently accessed and used across teams?
  • What types of logs are you currently collecting, and what insights do they provide?

When it comes to understanding your logging spend, tagging is key: Tag logs with their sources, the teams associated with them, and level or tiering info (e.g., hot, warm, cold, debugging, compliance) to facilitate cost analysis. For logs sent to Datadog, organizations can rely on features such as Usage Attribution and the out-of-the-box estimated usage dashboard for Log Management for granular analysis of their logging costs.

The out-of-the-box estimated usage dashboard for Datadog Log Management.

Understanding your logs in the context of your business priorities—and cultivating this understanding across teams—is essential to making informed decisions on which logs to collect, how they should be handled, and how to manage logging costs effectively. Otherwise, logging costs can easily spiral out of control.

Reduce noisy log data at the edge

Once you’ve determined which logs you need to collect, you can zero in on the precise data you need from them. Noisy, context-heavy logs can drive up storage costs and slow down investigations. For example, CDN and firewall logs, which provide indispensable visibility to security and operations teams, often contain extraneous data.

Before your logs leave your environment, ensure that you’ve filtered out any redundant data, and stay ahead of potentially costly log surges by sampling and imposing quotas where appropriate. Dropping redundant metadata, stripping null fields, and normalizing data (such as dates, times, IP addresses, and location information) for consistency prior to routing can have a significant impact on your log management overhead. Generating metrics from logs can also help you control log volumes while effectively tracking KPIs. Instead of storing every CDN or WAF log, for example, you may want to simply generate metrics from them for alerts and general performance monitoring, so you can still extract meaningful insights without incurring unnecessary costs.

Generating metrics from logs with Datadog Observability Pipelines.

Meanwhile, bugs, errors, and various unpredictable events triggered in the course of CI/CD can lead to unexpected surges in log volumes. By configuring rule-based quotas for your log sources, you can prevent surges from inundating your storage and causing cost overruns.

For example, say a DevOps team managing a payment service notices an uptick in log volume after rolling out a new feature to improve transaction validation. The surge includes redundant error messages, verbose debugging logs mistakenly enabled in production, and a surplus of user interaction logs offering little to no aid in troubleshooting. While some of these logs provide valuable insights, their sheer volume makes it difficult to pinpoint real issues and unnecessarily drives up storage and observability costs. To address this, the team takes several steps to reduce noise while preserving meaningful data:

  • First, they group identical errors instead of logging the same issues repeatedly.
  • Next, they adjust log levels, ensuring that debug-level logs are disabled in production so that only warnings, errors, and critical alerts are captured.
  • Meanwhile, they introduce log sampling, retaining detailed logs for failed transactions while sampling one percent of the logs for successful transactions.
  • Finally, they filter out nonessential data that doesn’t contribute to their debugging or performance monitoring, such as the user interaction logs mentioned earlier. (They already have filtering in place in order to ensure that sensitive data, such as credit card numbers, IP addresses, and tokens, is properly redacted before their logs are shipped to destinations outside of their infrastructure, in compliance with regulations.)

With these adjustments, their log volume drops by 50 percent, allowing them to quickly pinpoint issues while significantly reducing their log storage and observability costs.

Setting up a rule-based quota with Datadog Observability Pipelines.

To help teams manage their log volumes, Datadog Observability Pipelines provides a range of out-of-the-box processing capabilities for filtering and sampling logs before they leave your environment, generating metrics from logs, and imposing rule-based quotas in order to control log volumes. This type of control over your log data can help you attain essential visibility while managing costs.

Route logs proactively and selectively

Once you’ve homed in on how you are collecting and processing log data, it’s crucial to ensure that you’re sending that data exactly (and only) where you need it. Organizations are collecting log data from more and more parts of their distributed systems and sending that data to a wide variety of endpoints, such as cloud storage, log management systems, and SIEM providers. In the midst of this complexity, following a tiered logging strategy and proactively routing your logs is essential to cost efficiency.

As a general rule, selectively route your log data at the earliest possible points in your data pipelines (ideally at the edge) in order to avoid unnecessary costs. Not all data demands storage in a premium log management platform. For example, CDN, load balancer, and VPC flow logs are typically high-volume and often essential to collect, but they are queried relatively infrequently. Other logs may only be used to support keyword searches or other basic aggregation queries, or stored strictly for compliance reasons.

To optimize costs, there are a few general guidelines to follow when it comes to where to send which of your logs:

  • Send most low-priority and noisy logs directly to an archive, such as Amazon S3, Google Cloud Storage (GCS), or Azure blob storage. From there, these logs can be rehydrated in Datadog or queried using other tools on an ad hoc basis. Generally speaking, this covers Info and Debug logs or any others that you rarely or never need to query with any urgency, such as those indicating successful HTTP requests, read-only access, health checks, and standard operations.
  • Send Error, Warning, and Critical-status logs—such as those recording failed authentication attempts, admin activities, security tool alerts, configuration changes, and data modification events—to a hot or warm storage provider such as Datadog. Generally speaking, this tends to account for about 10 to 30 percent of log data.

Proactively routing logs can be a challenge when you’re managing multiple agents, collectors, or log forwarders. In addition to the processing capabilities covered in the previous section of this post, Observability Pipelines can help you orchestrate routing before your logs leave your on-prem or cloud environments, easily integrating with many popular downstream logging applications and storage environments. With Observability Pipelines, you can build, manage, and deploy pipelines in your own environment from Datadog’s SaaS control plane.

Routing logs with Datadog Observability Pipelines.

Fine-tune your log storage on a per-use-case basis

The configurability of different storage solutions is an important consideration when it comes to making routing decisions. Controlling how your logs are stored is not just a matter of pointing them to the right endpoints: When it comes to balancing costs with visibility, it’s important to fine-tune your storage to your logging use cases as much as possible. This means tailoring indexing and retention for each type of log you collect, on a case-by-case basis, in each of your log storage solutions.

For example, organizations in highly regulated industries like banking, healthcare, and insurance are often required to store certain types of high-volume logs long-term for auditing and security purposes. While simply sending these logs to cold storage may be the most economical option, rehydrating security and transaction logs every time you need to query them can be burdensome and cost precious time during incidents.

Datadog provides various solutions for reining in your log storage costs without the sacrifices to rapid queryability imposed by cold storage solutions: Logging Without Limits™ allows you to enrich, parse, and archive 100 percent of your logs while storing only what you need. And Flex Logs decouples log storage and querying costs, enabling teams to take more granular control over their log storage by maintaining logs in a rapidly queryable state for between 30 and 450 days. Datadog Log Management allows you to choose between Standard Indexing, archiving, and Flex Indexing—or dynamically combine these solutions—on a per-use-case basis.

Configuring log storage with Datadog Flex Logs.
Configuring log storage with Datadog Flex Logs.

Learn more about controlling logging costs while maintaining critical observability

In this post, we’ve outlined some key strategies for cost-effectively managing high-volume logs without compromising critical visibility. We’ve also shown how you can implement these recommendations using Datadog Observability Pipelines, Logging Without Limits, and Flex Logs.

If you’re looking to learn more about controlling log costs, you may want to check out our guides to reducing log volumes, strategically indexing your logs, and getting started with Logging Without Limits™. You can also sign up to receive our solution brief on modern log management or our webinar on controlling log volumes and costs while boosting visibility. And if you’re new to Datadog, you can sign up for a 14-day free trial.