Datadog for Serverless: End-to-end visibility for modern applications

Datadog | The Monitor blog

Introducing our open source AI-native SAST Instrument and monitor Boomi integration flows with OpenTelemetry and Datadog Not all index scans are equal: How we cut query latency by over 99% Platform engineering metrics: What to measure and what to ignore Integrate Recorded Future threat intelligence with Datadog Cloud SIEM CI/CD security: threat modeling using a MITRE-style threat matrix CI/CD security: How to secure your GitHub ecosystem Ingress NGINX is EOL: A practical guide for migrating to Kubernetes Gateway API Operating agentic AI with Amazon Bedrock AgentCore and Datadog LLM Observability: Lessons from NTT DATA Introducing the Datadog Code Security MCP Capture and analyze custom heatmaps in Session Replay Understand session replays faster with AI summaries and smart chapters Monitor ClickHouse query performance with Datadog Database Monitoring How we designed empathetic alert sounds for on-call engineers Search and act across Datadog to resolve issues faster with Bits Assistant Measure the business impact of every product change with Datadog Experiments Analyzing round trip query latency Configuring JavaScript caches for better performance Introducing Bits AI Dev Agent for Code Security Datadog achieves ISO 42001 certification for responsible AI Monitor Nutanix clusters, hosts, and VMs with Datadog Monitor Juniper Mist in Datadog A new Host Map for modern infrastructure Annotate traces to improve LLM quality with Datadog LLM Observability What’s new in Cloud SIEM: AI-powered investigations, enhanced threat intelligence, and scalable security operations Explore Kubernetes with native OpenTelemetry data Monitor Oracle Fusion Cloud Applications with Datadog Announcing the Datadog Terraform provider v4.0.0 Scaling Kubernetes workloads on custom metrics How to design cloud environments for AI-powered threat analysis Monitor Aruba Central in Datadog How we centralize and remediate risks with Datadog Case Management Accelerate incident response with Datadog and ServiceNow Monitor your application and network load balancer logs Understanding Karpenter architecture for Kubernetes autoscaling Tools for collecting metrics and logs from Karpenter Monitor Karpenter with Datadog What your product data is actually saying Key metrics for monitoring Karpenter Securing Datadog’s platform in the AI age: The role of observability data Four ways engineering teams use the Datadog MCP Server to power AI agents Approaching your observability migration with the right mindset Meet the new Bits AI SRE: Deeper reasoning, twice as fast Key learnings from the 2026 State of DevSecOps study Use plain English to query your multi-cloud infrastructure in Resource Catalog Simplifying troubleshooting across the user journey with Datadog Synthetic Monitoring Protect your OCI resources with Datadog Cloud Security This Month in Datadog - February 2026 Amazon EC2 security: How misconfigured and public AMIs expand your cloud attack surface Enable end-to-end visibility into your Java apps with a single command Measure and improve mobile app startup performance with Datadog RUM Evaluating our AI Guard application to improve quality and control cost Identify untested code across every level of your codebase Make use of guardrail metrics and stop babysitting your releases Monitor Versa Networks SD-WAN performance in Datadog Improve performance and reliability with APM Recommendations Remediate transitive vulnerabilities faster with Datadog Software Composition Analysis Generate audit-ready vulnerability and compliance reports with Datadog Sheets Monitor Fortinet FortiManager performance in Datadog Improve test coverage across codebases with Datadog Code Coverage Move fast, don’t break things: Consistent testing standards at scale Enrich logs with ServiceNow CMDB context before routing to any SIEM or logging tool Monitor Lustre with Datadog Make faster, better product decisions with Datadog Product Analytics Surface and remediate runtime posture issues with Workload Protection Findings Protect agentic AI applications with Datadog AI Guard How to optimize JavaScript code with CSS Trace Google Pub/Sub workloads in Cloud Run with Datadog Detect human names in logs with ML in Sensitive Data Scanner How we cut our NLQ agent debugging time from hours to minutes with LLM Observability Debug PostgreSQL query latency faster with EXPLAIN ANALYZE in Datadog Database Monitoring Datadog acquires Propolis Unify and correlate frontend and backend data with retention filters Scale compliance across global frameworks with Datadog Cloud Security Monitor Arista VeloCloud SD-WAN performance with Datadog Building reliable dashboard agents with Datadog LLM Observability Simplify log collection and aggregation for MSSPs with Datadog Observability Pipelines Mitigation for Node.js denial-of-service vulnerability affecting Datadog APM Automate flaky test fixes with the Bits AI Dev Agent and Test Optimization How we built an AI SRE agent that investigates like a team of engineers Datadog integrations 2025 recap: Observability for AI, security, and hybrid cloud Design effective executive dashboards with Datadog Implement dbt data quality checks with dbt-expectations Bring faster visibility into AWS Lambda functions with remote instrumentation Troubleshoot faster with the GitLab Source Code integration in Datadog How Cambia Health Solutions saved $30,000 monthly with Cloud Cost Management and the Datadog Resource Catalog Normalize any logs for Cloud SIEM with Datadog's OCSF processor Optimizing Datadog at scale: Cost-efficient observability at Zendesk Detect, diagnose, and resolve network issues easily with CNM Network Health Connect engineering errors to user impact in early-stage products Cilium configuration for Kubernetes operations at scale Designing feedback loops for progressive delivery Ship features faster and safer with Datadog Feature Flags Choosing the right OpenTelemetry Collector distribution Route your monitor alerts with Datadog monitor notification rules Automate Cloud SIEM investigations with Bits AI Security Analyst Cloud threat detection: How to identify risky activity across control and data planes Collecting Kafka performance metrics Monitoring Kafka with Datadog Monitoring Kafka performance metrics

Daniel Langer · 2018-11-19 · via Datadog | The Monitor blog

To make modern application architectures more observable, we’re excited to announce general availability of serverless monitoring in Datadog. In the Serverless view, you can search, filter, and explore all your AWS Lambda functions in one place, and jump straight into detailed performance data for each of them:

Troubleshoot performance issues with distributed traces, and filter your serverless traces by cold starts, timeouts, duration, or errors
Collect custom metrics in real time from serverless functions to monitor key business metrics without adding latency to your applications
See the bigger picture in the Service Map, which visualizes Lambda functions along with their dependencies (e.g., API Gateway, DynamoDB, S3)

In our conversations with customers since announcing the Serverless beta in late 2018, we’ve often heard how serverless architectures augment existing servers and infrastructure. It’s not easy to go all-in on serverless right from the start, which is why we’ve ensured that Serverless works seamlessly with the rest of Datadog. For example, we’ve made it easy to trace requests across any kind of infrastructure, whether in Lambda, EC2, or even running on-prem. And you enjoy the same power to drill down into a specific metric, log line, or trace from anywhere in Datadog.

With these additions to the Datadog platform, all the data you need for troubleshooting serverless functions is now available in one place.

Serverless meets the 3 pillars of observability

From an operations standpoint, serverless functions are fundamentally different from applications running on servers, VMs, or containers. From a monitoring perspective, however, you still need observability data like work metrics (requests, errors, latency), request traces, and logs to ensure that your serverless functions are performing properly, and to troubleshoot any issues that arise.

The Serverless view in Datadog provides a searchable, sortable view of all your functions. Faceted search allows you to filter your functions using metadata such as function name, AWS account, region, runtime, and team name, whether that metadata is collected from your cloud provider or applied as a custom tag. In the Serverless view, you can also sort all your functions using high-level statistics such as number of invocations, average duration, error count, and memory usage.

Clicking on any function from the Serverless view takes you straight to detailed, function-specific data from all three pillars of observability: metrics, logs, and distributed traces.

Metrics, request traces, and logs for a Lambda function in Datadog.

Monitoring cold starts, timeouts, and more

Serverless monitoring does have some fundamental differences from traditional infrastructure monitoring. The runtime environment is minimal—and beyond the developer’s control. These differences mean that there is no place to install a monitoring agent, and some of the metrics you care about are unique to serverless environments.

There are also strict limits that control how your code runs: allocated memory, maximum timeout, and concurrency limit. When you exceed these limits, your Lambda functions will timeout or be killed by the runtime. Finding out about looming limits can therefore help you prevent user-facing availability or performance issues.

Cold starts are less predictable—they occur when the cloud provider scales up your functions behind the scenes to handle more requests. Your customers could see requests taking longer to complete when your functions need to cold start. Datadog detects when cold starts occur, and automatically applies a cold_start attribute to your request traces. So you can search all your traces to find cold starts for any service or function and determine their impact on overall application performance.

Datadog APM for serverless

Regardless of where your code runs, Datadog traces it. Whether in a Lambda function, on a host, or on a combination of both, you can see exactly what happens in the full lifespan of a request.

Tracing Lambda functions with X-Ray

Datadog’s integration with AWS X-Ray enables you to visualize Lambda trace data, so you can zero in on the source of any errors or slowdowns, and see how the performance of your functions impacts your users’ experience.

Tracing is especially valuable for Lambda and other serverless platforms, because it allows you to visualize how requests travel between the numerous components of a serverless architecture. If your application’s end-to-end latency starts increasing, you can see in an instant whether the bottleneck is due to code-level issues in one of your Lambda functions, hitting your limits for Lambda concurrency, or issues in a service dependency like DynamoDB.

If your request invokes multiple Lambda functions connected by other AWS components such as SNS or API Gateway, X-Ray will automatically instrument each of these functions and tie them together in a single trace. Within the trace detail view in Datadog, you can easily jump from one function to another via auto-generated links.

Distributed end-to-end tracing, anywhere

Your requests may originate from on-prem hosts and hit Lambda functions, or may skip between Docker containers and Lambda functions. Datadog intelligently ties together traces from VMs and X-Ray traces from Lambda. To automatically create unified traces in Datadog for end-to-end analysis, customers only need to add a few lines of code to their Lambda functions. (No code changes are required to your code running on VMs.)

Not only does Datadog track request traffic flowing through Lambda functions and other application code, but it follows requests across your microservices, through message queues, and deep into database tables. Seeing these distributed traces and logs in the same view, along with your infrastructure metrics, makes it easier to get to the root cause quickly.

Instrument Lambda functions for tracing

To instrument a Lambda function for tracing with X-Ray, navigate to the Lambda function in the AWS console, scroll to the “Debugging and error handling” section, and check the box to “Enable active tracing.”

To do this with the Serverless Framework, add the tracing section to your provider configuration:

provider:
  name: aws
  runtime: python3.7
  tracing:
    lambda: true
    apiGateway: true

Then, in your function code, import the X-Ray SDK and patch all supported libraries. For Python applications, instrumentation is as simple as importing the SDK and adding a one-liner to your function to start automatically tracing all calls to AWS services and other X-Ray-supported integrations:

patch_all()

Instrumentation is available for Python, Node, Go, Java, Ruby, and .NET.

Service Map for serverless

On the Service Map, you can now visualize how your Lambda functions fit together into services, and how requests flow across your infrastructure.

Compose your serverless functions into services, and create alerts to see the health of your infrastructure all in one place. The Service Map automatically reflects the status of your alerts, giving you a color-coded view into the health of your infrastructure and application. When alerts are firing, you’ll see right away which services were affected, regardless of whether they are API Gateways, Lambda functions, or web applications running on EC2.

Serverless metrics and logs

The addition of Lambda tracing to the Datadog platform complements the insights that we’ve long provided for your functions via metrics and logs. Each of these data types provides valuable insights into the usage and performance of your serverless functions, from overall performance summaries to low-level error reporting.

Track business goals with real-time custom metrics

With Datadog custom metrics for Lambda, you can track the number of signups, conversions, or orders being placed, all in real time. Our forwarder function sends data to Datadog asynchronously, without relying on your Lambda functions to make API calls, so you can monitor key business metrics from your serverless functions without adding unnecessary latency to your applications. Enable custom metrics in your serverless functions with the Datadog Lambda Layer. Asynchronous reporting of custom metrics is now available for Python, Node, Ruby, Java, and Golang functions.

Logs: Errors, duration, memory usage

Lambda logs are extremely valuable for debugging, as they provide detailed error reporting and low-level details for each invocation. Logs capture data such as the execution time for a particular invocation, the billed duration, and the actual memory usage as compared to the memory allocated to the Lambda function. Because you pay for allocated resources rather than actual usage, these memory statistics can help you identify overprovisioned functions so you can balance Lambda performance with costs. Similarly, you can identify Lambda functions that are underprovisioned, and assign more memory to those functions to improve their performance. These memory statistics are available at full granularity in the Lambda logs, and are also aggregated automatically for each Lambda function in the Serverless view.

Get full visibility into your serverless functions

If you’re already using Datadog to monitor your applications, cloud infrastructure, and serverless functions, visit the Serverless view today to start monitoring and exploring all the performance data from your Lambda functions in one place. Then enable the X-Ray integration to start tracing requests as they travel through your architecture for even greater visibility.

If you don’t yet have a Datadog account, you can start a free, full-featured trial today to get deep visibility into your applications, infrastructure, and serverless functions in one platform.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Datadog | The Monitor blog