惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Visual Studio Blog
MongoDB | Blog
MongoDB | Blog
Engineering at Meta
Engineering at Meta
云风的 BLOG
云风的 BLOG
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
T
The Exploit Database - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
Know Your Adversary
Know Your Adversary
月光博客
月光博客
I
InfoQ
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
爱范儿
爱范儿
S
Securelist
博客园 - 叶小钗
C
CERT Recently Published Vulnerability Notes
Recorded Future
Recorded Future
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
D
DataBreaches.Net
G
GRAHAM CLULEY
P
Proofpoint News Feed
A
About on SuperTechFans
Google DeepMind News
Google DeepMind News
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tor Project blog
Stack Overflow Blog
Stack Overflow Blog
T
Threat Research - Cisco Blogs
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Recent Announcements
Recent Announcements
P
Proofpoint News Feed
The GitHub Blog
The GitHub Blog
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
Jina AI
Jina AI
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
博客园 - 【当耐特】
H
Help Net Security
F
Fortinet All Blogs
T
The Blog of Author Tim Ferriss

Datadog | The Monitor blog

Introducing our open source AI-native SAST Instrument and monitor Boomi integration flows with OpenTelemetry and Datadog Not all index scans are equal: How we cut query latency by over 99% Platform engineering metrics: What to measure and what to ignore Integrate Recorded Future threat intelligence with Datadog Cloud SIEM CI/CD security: threat modeling using a MITRE-style threat matrix CI/CD security: How to secure your GitHub ecosystem Ingress NGINX is EOL: A practical guide for migrating to Kubernetes Gateway API Operating agentic AI with Amazon Bedrock AgentCore and Datadog LLM Observability: Lessons from NTT DATA Introducing the Datadog Code Security MCP Capture and analyze custom heatmaps in Session Replay Understand session replays faster with AI summaries and smart chapters Monitor ClickHouse query performance with Datadog Database Monitoring How we designed empathetic alert sounds for on-call engineers Search and act across Datadog to resolve issues faster with Bits Assistant Measure the business impact of every product change with Datadog Experiments Analyzing round trip query latency Configuring JavaScript caches for better performance Introducing Bits AI Dev Agent for Code Security Datadog achieves ISO 42001 certification for responsible AI Monitor Nutanix clusters, hosts, and VMs with Datadog Monitor Juniper Mist in Datadog A new Host Map for modern infrastructure Annotate traces to improve LLM quality with Datadog LLM Observability What’s new in Cloud SIEM: AI-powered investigations, enhanced threat intelligence, and scalable security operations Explore Kubernetes with native OpenTelemetry data Monitor Oracle Fusion Cloud Applications with Datadog Announcing the Datadog Terraform provider v4.0.0 Scaling Kubernetes workloads on custom metrics How to design cloud environments for AI-powered threat analysis Monitor Aruba Central in Datadog How we centralize and remediate risks with Datadog Case Management Accelerate incident response with Datadog and ServiceNow Monitor your application and network load balancer logs Understanding Karpenter architecture for Kubernetes autoscaling Tools for collecting metrics and logs from Karpenter Monitor Karpenter with Datadog What your product data is actually saying Key metrics for monitoring Karpenter Securing Datadog’s platform in the AI age: The role of observability data Four ways engineering teams use the Datadog MCP Server to power AI agents Approaching your observability migration with the right mindset Meet the new Bits AI SRE: Deeper reasoning, twice as fast Key learnings from the 2026 State of DevSecOps study Use plain English to query your multi-cloud infrastructure in Resource Catalog Simplifying troubleshooting across the user journey with Datadog Synthetic Monitoring Protect your OCI resources with Datadog Cloud Security This Month in Datadog - February 2026 Amazon EC2 security: How misconfigured and public AMIs expand your cloud attack surface Enable end-to-end visibility into your Java apps with a single command Measure and improve mobile app startup performance with Datadog RUM Evaluating our AI Guard application to improve quality and control cost Identify untested code across every level of your codebase Make use of guardrail metrics and stop babysitting your releases Monitor Versa Networks SD-WAN performance in Datadog Improve performance and reliability with APM Recommendations Remediate transitive vulnerabilities faster with Datadog Software Composition Analysis Generate audit-ready vulnerability and compliance reports with Datadog Sheets Monitor Fortinet FortiManager performance in Datadog Improve test coverage across codebases with Datadog Code Coverage Move fast, don’t break things: Consistent testing standards at scale Enrich logs with ServiceNow CMDB context before routing to any SIEM or logging tool Monitor Lustre with Datadog Make faster, better product decisions with Datadog Product Analytics Surface and remediate runtime posture issues with Workload Protection Findings Protect agentic AI applications with Datadog AI Guard How to optimize JavaScript code with CSS Trace Google Pub/Sub workloads in Cloud Run with Datadog Detect human names in logs with ML in Sensitive Data Scanner How we cut our NLQ agent debugging time from hours to minutes with LLM Observability Debug PostgreSQL query latency faster with EXPLAIN ANALYZE in Datadog Database Monitoring Datadog acquires Propolis Unify and correlate frontend and backend data with retention filters Scale compliance across global frameworks with Datadog Cloud Security Monitor Arista VeloCloud SD-WAN performance with Datadog Building reliable dashboard agents with Datadog LLM Observability Simplify log collection and aggregation for MSSPs with Datadog Observability Pipelines Mitigation for Node.js denial-of-service vulnerability affecting Datadog APM Automate flaky test fixes with the Bits AI Dev Agent and Test Optimization How we built an AI SRE agent that investigates like a team of engineers Datadog integrations 2025 recap: Observability for AI, security, and hybrid cloud Design effective executive dashboards with Datadog Implement dbt data quality checks with dbt-expectations Bring faster visibility into AWS Lambda functions with remote instrumentation Troubleshoot faster with the GitLab Source Code integration in Datadog How Cambia Health Solutions saved $30,000 monthly with Cloud Cost Management and the Datadog Resource Catalog Normalize any logs for Cloud SIEM with Datadog's OCSF processor Optimizing Datadog at scale: Cost-efficient observability at Zendesk Detect, diagnose, and resolve network issues easily with CNM Network Health Connect engineering errors to user impact in early-stage products Cilium configuration for Kubernetes operations at scale Designing feedback loops for progressive delivery Ship features faster and safer with Datadog Feature Flags Choosing the right OpenTelemetry Collector distribution Route your monitor alerts with Datadog monitor notification rules Automate Cloud SIEM investigations with Bits AI Security Analyst Cloud threat detection: How to identify risky activity across control and data planes Collecting Kafka performance metrics Monitoring Kafka with Datadog Monitoring Kafka performance metrics
Key metrics for measuring your organization's security posture
2025-03-18 · via Datadog | The Monitor blog
Mallory Mooney

Mallory Mooney

In today’s evolving cloud landscape, balancing security and compliance is becoming increasingly more challenging. Security is essential for protecting an organization’s applications, resources, and data from threats, while compliance ensures a commitment to building services that align with industry standards. Although these goals overlap as key components of a strong security posture, they require distinct approaches that can be challenging to integrate. The difficulty lies in detecting and responding to threats efficiently while also tracking that work for reporting and auditing purposes.

In Part 1 of this series, we’ll look at some of the data that is helpful for tracking your organization’s security posture in the following categories:

In Part 2, we’ll show you how Datadog bridges the gap between security and compliance by enabling you to track each of the metrics we discuss in this post.

A note on the metrics discussed in this post

Some of the metrics we’ll discuss are widely used as indicators for measuring an organization’s effectiveness in responding to security incidents. However, there is an ongoing conversation within the industry about whether approaches like Service Level Objectives (SLOs) offer a more actionable framework for gauging success and connecting security to broader operational goals. In this post, we’ll reference both traditional metrics and approaches like SLOs, but it’s important to note that there’s no one-size-fits-all solution. Organizations should tailor approaches to meet their unique goals.

The first category of metrics we’ll look at focuses on how well a team responds to and remediates security incidents, such as those that are the result of denial-of-service (DoS) attacks, data breaches, unauthorized access, and vulnerability exploits. Traditional, time-based operational metrics, like the ones described in this section, provide a starting point for measuring a team’s efficiency during these events. They also help them adjust their detection systems to better differentiate anomalous from typical activity in the future.

Mean Time to Detect (MTTD)

The mean time to detect (MTTD) metric measures the average time it takes your threat detection systems—we’ll focus primarily on cloud SIEMs in this post—to identify an issue within your environment. This metric can help establish a baseline for how well your threat detection systems respond to threats. The end goal for a threat detection system is having a consistently low false positive rate, which indicates that they are configured to accurately identify both legitimately malicious and benign activity.

A consistent increase in MTTD could indicate a few issues. Misconfigured detection signals, for example, can lead to a high false positive rate and increase the time it takes to detect legitimate threats. Out-of-date (or a lack of) threat intelligence lists can affect how well your systems detect emerging or sophisticated threats. Additionally, systems that are not fully integrated with your environment can overlook critical activity captured in logs.

It can also be helpful to monitor anomalies in MTTD to get a complete picture of what affects your systems’ abilities to detect threats. A single spike, as seen in the following screenshot, can be the result of a particularly challenging or uncommon incident. For example, a threat actor may be able to successfully evade initial detection via covert methods that your threat detection systems doesn’t account for, such as password spraying.

Monitor the mean time to detect security metrics

Mean Time to Acknowledge (MTTA)

The mean time to acknowledge (MTTA) metric focuses on the time between a threat detection system’s initial detection of an issue and when it is reviewed by your security team. Similar to MTTD, MTTA can also show a need to fine-tune your threat detection systems. For example, a consistently high MTTA could indicate an issue in one of the following areas:

  • Difficulty with prioritizing high-risk signals
  • Understaffed security teams, which leads to delays in analyzing signals
  • Threat detection systems that are generating a high number of false positives

Mean Time to Resolve (MTTR)

The mean time to resolve (MTTR) metric tracks the average amount of time it takes for a team to fully resolve a security incident after it was detected by their threat detection system. A consistently higher time to resolve could indicate a need to reassess your team’s incident response plan. For example, issues like unclear roles or a lack of training can cause confusion during an incident and extend MTTR.

A sudden but temporary increase in MTTR, as seen in the following screenshot, can be the result of a particularly complex incident that required more time to investigate. It can also shed more light on weaknesses in your incident management process that are worth looking into. For example, factors like poor coordination or a lack of training can increase the amount of time it takes to fully resolve a security incident.

Monitor the mean time to resolve security metrics

SLO considerations

While these time-based metrics offer baselines for understanding how long it takes for your team to detect, acknowledge, and resolve an incident, they do not always offer insight into why it took that amount of time, how long it should take to resolve, or how to improve resolution times. For example, these metrics alone often do not account for circumstances like having only one security engineer on call, or your team becoming proficient at resolving the same recurring incident without looking into why it keeps happening.

To address these gaps, setting SLOs can help you work with proactive, measurable targets for handling incidents efficiently. Traditional, time-based metrics are still valuable for reporting purposes after a security incident, but SLOs can complement that data by giving your teams a more complete picture of their efficiency and goals. Asking certain questions about security-related SLO expectations and users, such as the following examples, can serve as a good starting point:

  • Is there an expectation to resolve a certain percentage of critical vulnerabilities within a specific time?
  • Do we identify our users as internal IT teams, company leadership, employees, or customers (or all of the above)?

In this section, we looked at important time-based metrics for assessing the effectiveness of your organization’s security incident response and threat detection systems. We also briefly looked at how SLOs can take this data a step further by enabling your organization to set realistic goals for incident response. Next, we’ll look at metrics that provide a high-level overview of your security posture and how it affects your organization.

Incidents and threats

The second category of metrics applies to the overall state of your environment and threat detection systems. This information is especially helpful for security audits, which rely on quantifiable data to evaluate performance and make informed decisions about system and process improvements. In addition to audits, these metrics can help you identify gaps in your security posture and determine their cause.

Intrusion attempts

Intrusion attempts reflect the number and frequency of attempts to compromise your systems. These scenarios typically include the various tactics and techniques that attackers use to gain or attempt to gain access to a system, and include events like unusual login attempts, unauthorized changes to valid accounts, resource modifications, and data exfiltration.

Monitoring the total number of intrusion attempts over a period of time can help you compare trends with changes to your systems, while tracking the frequency of attempts could highlight specific misconfigurations or vulnerabilities. For example, a sudden spike in intrusion attempts could be the result of an attacker taking advantage of a new vulnerability, while a steady increase could simply reflect your organization’s heightened profile.

In either case, monitoring intrusion attempts can provide valuable insights into which vulnerabilities to prioritize and where to implement better safeguards. In Part 2 of this series, we’ll look at how Datadog can help you monitor intrusion attempts and explore their causes.

False positive rates (FPR)

False positive rate (FPR) measures the percentage of benign signals that were incorrectly classified as malicious by a threat detection system. Monitoring FPR enables you to assess the effectiveness of your threat detection systems, with the goal being a low percentage of false positives. For example, a consistently high FPR for a cloud SIEM indicates poorly tuned signals, which can create alert fatigue and lead to overlooking serious threats.

A high FPR gives you insights into how to create high-fidelity signals for your cloud SIEMs. A signal that detects a single event, such as a failed login attempt, may not indicate anything malicious. But combining that single attempt with other behavior, such as multiple failed attempts followed by a successful login and lateral movement, can surface a threat. Building sufficient coverage for cloud SIEMs through these assessments can ensure that you are building the right detection signals for your environment.

You can also compare FPR with other metrics, like the number of security incidents and MTTA, to determine if your signals and other preventative measures are improving your organization’s security posture. For example, if the number of severe incidents, MTTA, and FPR are consistently low, that can indicate an effective security incident response.

Security incidents

Security incidents are events that will (or can) endanger the availability, integrity, and confidentiality of your systems and their data or violate security policies. It’s important to note the differences between a security incident and a security event because one requires more attention than the other. Events are occurrences that indicate a threat, risk, or vulnerability to a system, while incidents are the result of an event that will compromise a system or violate a policy. Many security events are negligible, so acknowledging the differences can help you refine the data you track.

Monitoring the number, frequency, and severity of security incidents (not security events) provides a better understanding of gaps in your organization’s security posture. For example, a steady increase in the frequency of critical security incidents could indicate a flaw in your organization’s security processes, such as not being aware of recommended security best practices like properly configuring metadata services.

On the other hand, a noticeable decline in the frequency of significant security incidents could be the result of a well-configured environment and efficient threat detection systems. You can compare this trend with those of other metrics, like response times, and your SLOs to confirm that it is a result of improved incident management. For example, satisfactory response times and threat detection systems’ false positive rates can account for a decline in significant security incidents. However, it’s important to note other factors that could contribute to a decline, such as emerging threats that create blind spots in your threat detection systems. In Part 2, we’ll look at how Datadog can help you track your security incidents and determine their root cause.

Governance, compliance, and preparedness

The final category measures your organization’s governance effectiveness, compliance, and level of preparedness. Overlooking a governance policy or compliance requirement can be costly, so having this information for routine audits and reporting purposes is essential. Tracking this data can also help you determine which improvements need to be made to your services and resources, as well as your threat detection systems.

Level of preparedness

This assessment looks at an organization’s ability to detect, contain, and recover from security incidents. While not a standard metric like the response and remediation metrics we talked about earlier, the level of preparedness can be measured by looking at a few factors, including:

  • The frequency of incident management trainings that highlight security response
  • The frequency of security-focused exercises, such as red team and purple team exercises
  • The quality of incident response plans, such as well-defined roles and steps for communicating with customers
  • The ability to maintain a history of events, such as those captured in audit logs

Comparing this information with other metrics, such as trends in the number of incidents, intrusion attempts, and FPR can give you a better understanding of your organization’s overall level of preparedness. It can also help you develop a grading system that supports your goals. For example, a moderate-high level of preparedness could mean that all responders have completed incident management training, your organization conducts 1–2 regular audits a year, and your cloud SIEM has a consistently low FPR.

Policy for security and compliance

In addition to evaluating your organization’s level of preparedness, it’s important to track your adherence to established security policies and industry standards, such as CIS, GDPR, HIPAA, and PCI-DSS. Tracking this information not only gives you a baseline for reducing security risks in your environment but also protects your organizations from costly lawsuits and compliance violations. Compliance metrics can be customized to fit your organization’s goals, but we’ll look at a couple of examples to consider.

First, you can look at the severity levels of flagged misconfigurations. Not all misconfigurations need to be addressed immediately, so knowing their status, as seen in the following screenshot, will help you prioritize the most pressing violations. Critical misconfigurations would require an immediate, concentrated effort to resolve before the high priority ones.

Monitor misconfigurations in your environment

Second, tracking which resources either pass or fail recommended configurations gives you an overview of how well your environment complies with a particular framework or benchmark. This can serve as a good starting point for keeping your resources secure. A high number of resources that pass benchmarks indicate efficient deployment, change management, and configuration processes.

A higher percentage of failed misconfigurations, on the other hand, could indicate one of the following issues:

  • An overreliance on default settings when deploying new resources
  • Changes made during incidents that were not documented
  • Poor visibility into resource changes
  • A lack of training on compliance policies

You can take these elements of monitoring policy compliance—severity levels and pass/fail status—a step further by establishing security baselines, which are the group of configuration settings that are required for your environment. These baselines can either be customized for your resources or based on industry-standard recommendations.

For example, you can create a security baseline that’s designed to apply the minimum compliance requirements across all of your Kubernetes environments. With it, you can assign severity levels to certain configurations that you want to always enforce, such as requiring HTTPS connections on the API server, in addition to Service Level Agreements (SLAs) for remediation.

Understand your security posture with these key metrics

In this post, we looked at the key metrics for monitoring your organization’s security posture. We also looked at how SLOs can supplement these traditional security metrics to create a comprehensive strategy for handling incidents efficiently. In Part 2, we’ll discuss how you can use Datadog to monitor these data points and strengthen your organization’s security posture.

Acknowledgments

We’d like to thank Kendra Ash and Elise Burke for their invaluable feedback on this article.