BrontoScope: AI-Powered Error Investigations

Authored by Marco Aquilanti

Today we're introducing BrontoScope, one of the Bronto AI Labs initiatives aimed at reducing user toil, increasing team efficiency, and reducing MTTR.

The Problem with AI in Observability

Almost every software company is adding AI features to their products — often with mixed results. As a user, I'm frequently annoyed by the continuous stream of AI features popping up everywhere: messaging apps that want you to chat with an LLM while you're looking for your friends, search engines surfacing LLM answers first and leaving you wondering whether what you're reading is true or a hallucination.

The observability space is no exception. Many products are being "enriched" with AI features, but most are missing the point. Here's why.

Observability has always been hard. A production system can easily produce terabytes of logs, millions of traces, and millions of metrics every hour — too much for any human to easily inspect. LLMs should be the next pillar in observability, reducing burden and improving reliability. But only if focused on making the user's life simpler.

Most current AI features in observability actually make the user's life harder by:

Requiring a detailed prompt as input — users must invest significant time crafting prompts to get well-structured responses
Producing long, verbose text responses — even when the AI has nailed the request, the answer is often diluted across lines and lines of text
Taking too long — complex multi-step LLM workflows leave users waiting far too long for answers during an incident

The Bronto Approach

At Bronto, we're extending the logging platform with LLM capabilities focused on one goal: automating recurring work patterns to make the user's life simpler, not harder.

Our Bronto Labs initiative is built around three tools:

Auto-Parsing — using AI to automatically structure logs
AI Dashboard Creation — generating dashboards from natural language
BrontoScope — AI-powered incident investigation

The philosophy behind all of these: before adding any new feature, we make sure it will be genuinely useful to most users and won't slow down or hinder any of their existing tasks.

BrontoScope

Incidents don't wait for business hours. When an alert fires at 3am, one or a few on-call engineers need to move fast — often without access to the domain experts who know the affected system best.

The first steps of any incident are always the same:

Understand the scope of the incident
Estimate the impact on customers and the broader system
Assign a priority and decide how to tackle it

Staying calm, thinking clearly, and acting quickly are all required — even when you've just been woken up. But too much haste leads to incorrect diagnosis.

LLMs can help enormously in these scenarios — they can summarize large amounts of data in seconds and are not affected by panic, confusion, or a 3am wake-up call.

BrontoScope automates the incident investigation process with a single click on any error event in your logs. The LLM writes and runs tens of queries against your data, analyzes the results, generates a summary report, and delivers it to you in just a few seconds.

What the Report Includes

Scope — when the errors started appearing, and which users, customers, services, regions, or hosts are affected
Probable causes — resource exhaustion, network issues, software bugs, traffic spikes, etc.
Suggestions — how to stop the error occurring or how to continue the investigation
Supporting data — the query results and charts that led the LLM to its conclusions, so you can validate that the model isn't hallucinating

How It Works

The process works in stages: first, the LLM analyzes the error and its surrounding context to guide subsequent data retrieval. The search engine then queries the relevant data and presents all findings to the LLM in a single comprehensive prompt — essentially, an ad-hoc dashboard built around the error and composed of many charts. The final response is streamed to the user via Server-Sent Events, allowing them to read the output as it's generated in real time.

BrontoScope is powered by AWS Bedrock's most advanced AI models, ensuring all data is processed within the AWS ecosystem — prompts and responses are never stored or shared with model providers or third parties.

Why It Actually Makes Life Easier

No prompt required — just click on a log event. The LLM analyzes and understands the error, writes its own filter to find similar occurrences, and scans the data autonomously
Concise reports — goes straight to the point, with charts included to maximize the information density
Fast — in most cases the report is streamed to the user in under 10 seconds, even though tens of queries are run per investigation, thanks to the speed of Bronto's search engine

Availability

BrontoScope is currently available on request and is being used internally by the Bronto team as well as by a number of design partner customers in real-world situations. Improvements will be made in the coming months.

This is just one of the AI features being developed at Bronto — stay tuned for future posts, or join our AI initiative and help shape what we build next.

Join Bronto Labs

推荐订阅源

DEV Community