惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Attack and Defense Labs
Attack and Defense Labs
人人都是产品经理
人人都是产品经理
Stack Overflow Blog
Stack Overflow Blog
W
WeLiveSecurity
O
OpenAI News
SecWiki News
SecWiki News
博客园 - Franky
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
Microsoft Security Blog
Microsoft Security Blog
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
H
Hacker News: Front Page
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
月光博客
月光博客
李成银的技术随笔
Spread Privacy
Spread Privacy
F
Full Disclosure
F
Fortinet All Blogs
T
The Exploit Database - CXSecurity.com
Vercel News
Vercel News
AWS News Blog
AWS News Blog
WordPress大学
WordPress大学
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
V
Visual Studio Blog
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Last Week in AI
Last Week in AI
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
T
True Tiger Recordings
N
News and Events Feed by Topic
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
N
News | PayPal Newsroom
S
SegmentFault 最新的问题
Jina AI
Jina AI

Security @ Cisco Blogs

Inside the SOC: AI-powered DNS defense against ransomware State-sponsored actors, better known as the friends you don’t want Security Insights: A Threat-First View for the Platform That Enforces Access From Strategy to Architecture: How Cisco is Building a Quantum-Safe Future AI-Ready, Simpler, and More Secure WAN: Cisco SD-WAN Innovations Designing for What’s Next: Securing AI-Scale Infrastructure Without Compromise Preparing for Post-Quantum Cryptography: The Secure Firewall Roadmap Mobile World Congress 2026: AI-powered Network Security Powering MWC Barcelona – Building a Unified SOC and NOC with Splunk in Record Time AI-powered Network Security at the Mobile World Congress 2026 SNOC Inside the Mobile World Congress 2026 SOC: Detecting Shadow Traffic with Firepower 6100 Data Optimization in Security: A Splunk Architect’s Perspective Inside the Talos 2025 Year in Review: A discussion on what the data means for defenders Zero Trust for Agentic AI: Safeguarding your Digital Workforce The Agent Trust gap: What Our Research Reveals About Agentic AI Security Meet Your Incident Responders
AI-generated reporting: Lessons learned from Cisco Talos Incident Response
Nate Pors · 2026-05-21 · via Security @ Cisco Blogs

The Cisco AI Readiness Index shows that most organizations are already seeing tangible value from investment in artificial intelligence (AI). However, early adopters quickly encountered limitations when attempting to generate long-form, technical content. For instance, when given raw notes and asked to create technical reports, large language models (LLMs) such as ChatGPT, Claude, and Gemini generated polished-looking results that often contained significant inaccuracies, unusual conclusions, and inconsistent writing styles.

The Cisco Talos Incident Response (Talos IR) AI Tiger Team set out to identify the root causes of these output problems, which we collectively refer to as “inconsistencies.” After defining these issues, we experimented with various solutions through prompt engineering. In the following sections, we share our findings on the consistency problem and our control methods based on a specific case study, drafting an experimental AI-assisted Tabletop Exercise (TTX) report.

In a nutshell, a TTX involves cybersecurity stakeholders gathering in a virtual or physical conference room and talking through a fictitious, tailored scenario involving a cybersecurity incident. Facilitators guide them through a discussion of incident resolution, asking probing questions to highlight areas of strength and potential gaps in the organization’s incident response processes. While this case study focuses on a TTX report, the methodology could be adapted to any cybersecurity reporting use case with standardized inputs and predictable outputs.

As an important note, the Talos IR AI Tiger Team experiments with and publishes these findings from a strictly research-oriented perspective.

Defining the inconsistency problem in AI reporting

Various types of inconsistencies in AI output frequently diminish the efficiency gains that AI reporting processes promise to deliver. At their core, most inconsistencies stem from the probability-driven nature of LLMs. These models generate output by predicting the next token, typically a word or sub-word, in a sequence, based on model weights and training data. In essence, this means that no two LLM outputs will be identical, even when provided with the exact same prompt multiple times.

Talos IR identified four ways this probabilistic nature manifests itself during report content generation, detailed in the following list:

  • Inconsistency in research and sourcing: LLMs utilize various data sources, ranging from static training sets to real-time internet access. Because a model may pull from different websites during separate runs, the underlying data often shifts. This variability in source material directly leads to inconsistent results, making it difficult to rely on an LLM for repeatable, standardized research outcomes.
  • Inconsistency in conclusions: Even with identical data, LLMs may produce different conclusions. For example, in a data breach scenario, a model might suggest a full organization-wide password reset in one instance and a targeted reset in another. Without the nuance to evaluate specific context, the model often defaults to whichever recommendation it generates first. This lack of consistency complicates decision-making, as the model may fail to provide the most appropriate solution for the specific circumstances at hand.
  • Inconsistency in output format: Because LLMs generate content token-by-token, document structure and formatting can fluctuate between runs. This unpredictability is problematic for professional environments where standardized layouts, such as consistent executive summaries or recommendation sections, are essential for quality control. Achieving a predictable, uniform output remains a significant challenge when using LLMs for formal report generation.
  • Inconsistency due to context drift and pollution: LLMs use a “context window” to track conversation history, but this creates two primary issues. First, when the window hits its limit, the model discards older information, potentially losing critical initial instructions. Second, performing multiple unrelated tasks in one session leads to “context pollution,” where conflicting data causes the model to produce unpredictable or blended results. As a session grows, these factors degrade performance, as the model struggles to maintain focus on the original task requirements.

Methods to control inconsistencies

The Talos IR AI Tiger Team developed and tested various prompt engineering methods to control each type of inconsistency. While none of these methods are particularly groundbreaking individually, they collectively produced the highly accurate report described in the “Case Study” section. The four following inconsistency control methods are described and discussed to help others on their own prompt-writing journey.

  • Prompt specialization: Prompt specialization mitigates context drift and pollution by replacing large, unified prompts with granular, single-task instructions. By focusing each prompt on a specific, small portion of the report, the risk of hallucination or cross-contamination between sections is significantly reduced. This modular approach allows for greater transparency and easier optimization of individual components.
  • Specified source constraints: Specified source constraints address inconsistencies in research and conclusions by mandating exactly where the LLM should retrieve information. By providing explicit instructions on data provenance, users limit the model’s ability to pull from unreliable or conflicting sources. This control ensures that the final output remains grounded in authoritative data, preventing the generation of inaccurate or speculative content. Defining these boundaries within the prompt is essential for maintaining integrity and ensuring that the model’s conclusions align strictly with the provided source material.
  • Output format specification: Output format specification ensures consistency by providing the LLM with rigid parameters regarding length, tone, content, and structure. Without these instructions, models often produce excessive or overly creative content that deviates from professional standards. By explicitly defining the target audience, preferred writing style, and necessary content elements, users can force the model to adhere to a predictable structure. This level of guidance is critical for quality control, ensuring that the generated report meets professional requirements and remains free of unnecessary or redundant information.
  • Template-guided prompting: Template-guided prompting is a method for strictly enforcing structural consistency. By embedding a rigid template directly into the prompt, users can control exactly how the final output is laid out. Clear instructions are provided to the model to distinguish between static text that must remain unchanged and dynamic placeholders that require replacement. This approach eliminates formatting variability, ensuring that every document follows a uniform, professional structure. By combining these templates with clear delimiter instructions, users achieve highly predictable, repeatable output that requires minimal post-processing or manual formatting.

Case study: TTX report

We selected the TTX report as an ideal case study candidate for two key reasons. First, its content is largely a reorganization of notes captured during a TTX event, meaning the LLM’s role is focused on restructuring existing data rather than generating new content creatively. Second, unlike a forensics report, which contains timestamps, file paths, and other technical elements that are difficult to manually verify, a TTX report is straightforward enough for the human author to review at a glance. This makes it significantly less likely that a hallucination would go undetected during research and testing.

As mentioned earlier, during our research the team created three TTX reporting prompts named the “Discussion Organizer,” the “Recommendation Polisher,” and the “Executive Summarizer.” One of these, the “Executive Summarizer,” is shown in full below to assist other researchers in their work. It is designed to write an accurate, concise executive summary given the rest of the report as input.

Prompt Graphic

The benefits

There were many clear benefits to AI-generated reporting during our testing:

  • Efficiency: As noted at the start of this post, case study test results predicted a 50% reduction in total report drafting time. This included the time spent manually writing the 10% of content that could not be efficiently AI-generated and manually editing the AI-generated content.
  • Better content: The “Recommendation Polisher” prompt was effective in suggesting corollaries of recommendations that the TTX participants and facilitators may not have explicitly identified during the discussion. Our testing resulted in more robust lists of recommendations.
  • Consistent quality: A blind test of the sample report in our quality assurance process showed no noticeable drop in overall writing quality. The peer reviewer, professional editor, and management reviewer all made complimentary comments about the report while unaware that it was AI-generated. The peer reviewer commented that the incidence of typos and grammatical errors was far lower than in the average report.

Cautions

There were also some drawbacks and considerations that would need to be closely managed in a production environment:

  • Data management: First, proper AI tool selection is critical to protect sensitive data. Uploading organizational data into a publicly hosted AI tool would often constitute a policy violation and significant data privacy incident. Talos IR carefully adheres to Cisco’s Responsible AI principles and urges other organizations and individuals to exercise extreme caution in data handling.
  • Model selection: Testing confirmed that model selection is critical for output quality. As of late 2025, Claude Sonnet 4.5 emerged as the most effective model, delivering high-quality, consistent prose. Its ability to proactively identify and flag internal conflicts in source notes significantly reduced the need for manual corrections.
  • Input quality control: Unsurprisingly, we found that input quality determines output quality. To quote a coding aphorism, “Garbage in, garbage out.” The primary area where this can be problematic is the recommendations. While the model can and does identify missed recommendations, it cannot be relied upon to do so.
  • LLM over-reliance: Perhaps the most obvious consideration is that report authors retain accountability for the quality of the final product. That being the case, they must edit, understand, and take ownership of every word of the final report. While testing, we found that the LLMs generated recommendations that were duplicative, irrelevant, or not actionable. If this were used in a production environment without manual checks, it could result in poor-quality recommendations in a final report.

Technology limitations

The Talos IR AI Tiger Team found during testing that editing multiple sample reports within a single session resulted in cross-contamination of content from one report’s source material to another, even if the notes used to generate the first report were deleted from the project’s reference documents. We determined that it was critical to run each prompt in a new session or project to ensure the integrity of the output. 

Separately, we developed and tested a fourth prompt intended to edit a full report for errors in grammar, spelling, etc. While the process was highly effective in identifying misspellings, multiple iterations hallucinated numerous grammar issues (false positives) and failed to identify actual issues (false negatives), with a success rate below 50%. The most concerning aspect was that multiple runs with the same model, prompt, and draft report input would behave inconsistently, sometimes catching issues and sometimes overlooking them. While our team will continue to test this use case as models improve, it is currently unsuitable for production use. 

What’s next

Cisco has invested considerable resources in the responsible adoption and development of AI. The primary goal of the Talos IR AI Tiger Team is to take that broad mandate and convert it into actionable applications within the fields of incident response and forensics. With that in mind, we continuously test, develop, and publish new capabilities in accordance with Cisco’s Responsible AI principles. Again, the Talos IR AI Tiger Team experiments with and publishes these findings from a strictly research-oriented perspective.

If you’re interested in learning more about Cisco Talos Incident Response and how our services could benefit your organization, we’d love to talk with you further. You can read more about us and contact us via our website.

Disclaimer: Some of the individuals posting to this site, including the moderators, work for Cisco. Opinions expressed here and in any corresponding comments are the personal opinions of the original authors, not those of Cisco.


We’d love to hear what you think! Ask a question and stay connected with Cisco Security on social media.

Cisco Security Social Media

LinkedIn
Facebook
Instagram