惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

Vectra AI Blog

AI-Driven Network Detection and Response: Insights from a 2026 Gartner® Magic Quadrant™ Leader Securing AI Adoption Starts with Visibility by Aakash Gupta The Missing Data Layer Behind SIEM and SOAR Why Most SIEM/SOAR Integrations Break — and How to Fix Them Shai-Hulud Part 2: When the Worm Forged Its Own Security Certificate Improve SIEM and SOAR Workflows with Better Security Signal by Gearóid Ó Fearghaíl ShinyHunters isn’t a group. It’s a pattern. How Vectra AI Secures the AI Enterprise AI agents: the new workforce — and attack surface. by Tiffany Nip How Vectra AI Scoring Helps Security Teams Focus on What Matters First What’s Next for the Enterprise After Two GenAI Tidal Waves? If An Identity was Compromised, Would We Know? Help Over Hype: Claude Mythos, Project Glasswing and the Real Questions CISOs Want Answered Azure Logging just Changed - Your Detections May be Missing it by Alex Groyz When the Defender Becomes the Door: BlueHammer, RedSun, and UnDefend in the Wild by Justin Howe 4 Ways to Improve SOC Efficiency with AI by Jesse Kimbrel Why triage alerts - when AI can do it for you? Attackers Don’t Hack In — They Log In: The MFA Blind Spot The rise of supply chain-driven data theft in SaaS environments by Lucie Cardiet AI-Assisted Search: Clarity at the Speed of a Question What We Learned from Analyzing Millions of Alerts FortiClient EMS Zero-Day: When the Control Plane Becomes Initial Access by Lucie Cardiet Detecting Compromise After the Axios Supply Chain Attack. by Yusri Mohd Yusop Who’s Doing What on Your Network? by Mark Wojtasiak Breaking down the axios supply chain incident by Lucie Cardiet Detecting Sliver C2: When Advanced Beaconing Tries to Hide in Plain Sight Prompt Control: How Context Becomes the Command-and-Control Layer for AI Agents How Attackers Move Through Hybrid Networks After the Initial Breach How Attackers Establish Persistence in Hybrid Environments What the Stryker Incident Reveals About Handala’s Attack Playbook Why Cyber Resilience is Lagging in the AI Era 5-Minute Hunt: Six Queries to Detect Iranian APT Activity AI-Powered Attacks Are Here, But So Is AI-Powered NDR to Stop Them What is hiding in AI traffic AWS Compromised by AI Agents in Minutes The UX of Cybersecurity AI: Designing for Behavior at Machine Speed Molt Road and the Automation of Underground Marketplaces Moltbook and the Illusion of “Harmless” AI-Agent Communities From Network Detections to Understanding Risk: The Vectra AI Take on Gartner’s Redefinition of NDR From Clawdbot to OpenClaw: When Automation Becomes a Digital Backdoor Securing the AI Enterprise: How I’m Thinking About It as a CEO Cybersecurity Predictions 2026: AI, Agents, and SOC Defense OPSEC Failures: How Threat Actor Mistakes Help Defenders How Threat Actors Turned AI Into a Weapon CVE-2025-14847 MongoBleed in the Wild: Identifying MongoDB Exposure and Exploitation with Network Metadata Pro-Russia Hacktivists Are Targeting Critical Infrastructure How Vectra AI Connects Network Detections to Endpoint Processes Automatically by Dale O’Grady How Vectra AI and CrowdStrike Deliver Complete Context Across Endpoint and Network by Tiffany Nip You are the Blackboard - AI Agent Assisted Bug Hunting TCP Reset Does Not Stop Modern Attacks – Here's Why Shai-Hulud: When a Supply-Chain Incident Turns Into a Worm How Typhoon APTs Infiltrate Infrastructure Without Leaving a Trace Think Your Microsoft Environment Is Resilient to Attacks? Think Again by Tiffany Nip Operation ENDGAME and the Battle for Initial Access by Lucie Cardiet What 400+ NDR Power Users Taught Us About Network Visibility How Attackers Gain Initial Access in Hybrid Environments How Vectra AI Hybrid NDR Enables Proactive Threat Hunting and Outcome-Driven Defense by Tiffany Nip Introducing the Vectra AI MCP Server for On-Premises (QUX) by Fabien Guillot From Conti to Black Basta to DevMan: The Endless Ransomware Rebrand by Lucie Cardiet How the F5 Breach Exposed Critical Edge Security Gaps Qilin’s 2025 Playbook, and the Security Gap it Exposes by Lucie Cardiet Vectra Fusion: Extending the Vectra AI Platform to Build Resilience Both Pre and Post Compromise Seeing Beneath the Surface: What Crimson Collective Reveals About Cloud Detection Depth Cl0p Is Back, Exploiting Supply Chains Again. How to Choose the Best NDR for Hybrid Environments Red Hat GitLab Breach Shows Why Consulting Data is a Goldmine for Attackers When GoAnywhere Lets Attackers Go Everywhere by Lucie Cardiet Vectra AI with Netography Redefining the SOC Platform around Modern Attack Resilience Beyond Endpoints: How BRICKSTORM Exposed Security Blind Spots by Lucie Cardiet EDR Isn’t Enough: Why Forward-Thinking CISOs Are Turning to Network + Identity by Mark Wojtasiak What Modern SOCs Should Know About NDR Alternatives Scattered Lapsus$ Hunters Announce They Are Going Dark but the Threat Remains LockBit is Back: What’s New in Version 5.0 The Npm Exploit Is The Entry Point, What Follows Is Just As Critical. How AI is Fueling Cybercrime and Why Security Gaps Are Growing by Lucie Cardiet 5-Minute Hunt: Detecting Risky Multi-Tenant Apps in Microsoft 365 GLOBAL RaaS: Dissecting a Modern Ransomware Franchise What the CISA Advisory Reveals About Nation-State Attacks New Technologies bring new risks: MCP-Powered Swarm C2 4 Real-World Attacks That Show Why SOCs Need NDR Why insider threats go undetected by security tools Black Hat USA 2025: What Security Teams Asked Us in Las Vegas Vectra AI and Google Security Operations: Breaking Down Security Silos by Zoey Chu Black Hat Takeaway: Everyone Talks Prevention, But Who Detects Compromise? Black Hat USA 2025: What It Told Me About Protecting the Modern Network from Modern Attacks Introducing the Vectra AI MCP Server Cloud Security Grey Zone: Who Owns the Risk of Managed Identities? CVE-2025-53770: A 9.8/10 Critical Exploit Targeting SharePoint 5 Ways Security Teams Can Start Driving Outcomes with Agentic AI Behind the Hunt: Real-World Threat Hunting Practices and How Vectra AI Makes the Difference Vectra AI named in Gartner hype cycle for security operations 2025 Choosing the Right NDR: Gartner’s 5 Questions Every Security Buyer Should Be Asking Gartner Security and Risk Conference – Chaos meets Opportunity Are Iranian APTs Already inside Your Hybrid Network? You Have the Right Tools. So Why Are Attackers Still Getting In? Vectra AI Named a Leader and Outperformer in the 2025 GigaOm Radar Report for Network Detection and Response (NDR) The Two Control Points That Will Define the Future of Cybersecurity – Network and Identity Challenges in Microsoft Log Monitoring: Insights for Your SOC How Sanofi Detected and Stopped a Cyberattack The Cutting Edge: AI’s Inevitable Rise in Offensive Security
Can Your SOC's AI Actually Think? Evaluating LLMs with the Vectra AI MCP Server
2025-11-04 · via Vectra AI Blog

You know that moment when someone says, “Let’s just plug ChatGPT into the SOC” — and everyone nods like it’s totally fine? Yeah, this post is about what happens after that moment.

Because as cool as it sounds, adding GenAI to a SOC isn’t magic. It’s messy. It’s data-hungry. And if you don’t measure what’s really happening under the hood, you might just end up automating the confusion.

So… we decided to measure it.

GenAI in the SOC: cool idea, hard reality

Let’s start with the obvious: AI is everywhere in security right now.

Every SOC slide deck has a big “GenAI Assistant” bubble somewhere in it. But how those assistants actually perform when faced with real SOC workflows — that’s the real test.

Enter the Vectra MCP Server — the air traffic controller for all your AI agents.

It connects your LLM (say ChatGPT or Claude) to your security tools (and their data!) — in this case, Vectra AI.

The MCP orchestrates enrichment, correlation, containment, and context, letting your AI agent interact directly with the signals that matter instead of getting lost in dashboards.

And because we want everyone to leverage and experience these capabilities, we have released 2 MCP servers allowing you to connect any Vectra platform to your AI workflows.

So, if you’ve been thinking, “I wish I could just connect my LLM to my security stack and see what happens,” — now you can. No license hoops, no NDAs, just plug it in and play.

At Vectra AI, we genuinely believe that GenAI + MCP will fundamentally change how SOCs operate.

This isn’t a “someday” idea — it’s already happening, and we are making sure that Vectra AI users are fully equipped to leverage this change.

That’s also why we spend a lot of time talking with customers, prospects, and partners — to understand how fast these technologies are moving, and what “LLM-ready” really means in a live SOC.

So… we decided to measure it.

Because if GenAI is going to reshape security operations, then we need to be absolutely sure our platform, our data, and our MCP integrations can plug into that new world seamlessly. Measuring efficacy isn’t a side project — it’s how we future-proof the SOC.

It’s not about more data — it’s about better data

We’ll be blunt: GenAI without good data is like hiring Sherlock Holmes and giving him a blindfold.

At Vectra AI, data is the differentiator. Two things make it special:

  1. AI-based detections: built on years of research into attacker behaviors, not anomalies. They're designed to be robust, meaning they stay effective even as attackers change tools. Each detection focuses on intent and behavior rather than static indicators, giving SOC teams confidence that what they're seeing is real and relevant.
  1. Enriched network metadata: high-context telemetry that spans hybrid environments, structured and correlated so it's machine-readable and immediately actionable.

That's the kind of data GenAI can actually use. Feed that into an LLM, and it starts reasoning like a seasoned analyst. Feed it raw logs, and you'll get a very confident hallucination about DNS.

So, how do you evaluate an AI analyst anyway?

Turns out, you can’t just ask it to “find bad guys faster.”

You need to measure how it reasons. And when you deal with an AI agent with MCP, there are primarily 3 things that you can influence:

  1. The model (GPT-5, Claude, Deepseek, etc.)
  1. The prompt (how you tell it to act — tone, structure, goals)
  1. The MCP itself (how it plugs into your detection stack)

Each of those can move the performance needle.

Change the prompt slightly, and suddenly your “confident” AI analyst forgets how to spell “PowerShell.”

Change the model, and latency doubles.

Change the MCP integration, and half your context disappears.

That’s why we built a repeatable testbed — automated evaluation, real SOC scenarios, and a dash of brutal honesty.

The testbed (a.k.a. “we actually tried it”)

For the first run, we kept things intentionally simple: tier-1 tasks, light reasoning (two hops max), no fancy multi-agent choreography.

The stack looked like this:

  • n8n for quick prototyping and automation
  • A minimal SOC prompt (basically: “You’re an AI analyst. Help out. If you don’t know, say so.”)

But this wasn’t a toy experiment. We tested 28 real SOC tasks — the kind analysts actually face every single day. Things like:

  • Listing hosts in high or critical status
  • Pulling detections for specific endpoints (piper-desktop, deacon-desktop, etc.)
  • Checking for command-and-control detections tied to IPs or domains
  • Finding exfiltration over 1GB
  • Tagging and deleting host artifacts
  • Looking up accounts in “high” or “critical” risk quadrants
  • Hunting for “Admin” accounts involved in EntraID operations
  • Querying detections with specific JA3 fingerprints
  • Assigning analysts to hosts or detections

Basically, everything a Tier-1 or Tier-2 SOC analyst would touch on a busy Tuesday morning.

Each run was scored for correctness, speed, token use, and tool activity — all measured on a scale of 1-5.

What makes a good GenAI agent?

Evaluating GenAI inside a SOC isn’t about which model sounds smarter. It’s about how efficiently it thinks, acts, and learns. A good AI agent behaves like a sharp analyst — it doesn’t just get the right answer, it gets there efficiently. Here’s what to look for:

  1. Efficient token usage. The fewer words it needs to reason, the better. Long-winded models waste compute and context space.
  1. Smart tool calls. When a model keeps calling the same tool over and over, it’s basically saying “let me try again.” The best ones understand when and how to use a tool — minimal trial and error, maximum precision.
  1. Speed without sloppiness. Fast is good, but only if accuracy holds. The ideal model balances responsiveness with reasoning depth.

In short: your best AI analyst doesn’t just talk — it thinks efficiently.

Here’s what we found:

Highlights and practical takeaways

  • GPT-5 wins on accuracy and reasoning depth, but it's slow and pricey. Use when precision matters more than speed.
  • Claude Sonnet 4.5 delivers the best overall balance: accuracy, speed, and efficiency. Great for production SOCs.
  • Claude Haiku 4.5 is perfect for fast triage: quick, cheap, and "good enough" for first-line decisions.
  • Deepseek 3.1 is the value champion: impressive performance at a fraction of the cost.
  • Grok Code Fast 1 is for tool-heavy workflows (automation, enrichment, etc.), but watch your token bill.
  • GPT-4.1... let's just say it's not invited back for another shift.

And because every good article needs graphs — here’s some:

Correctness score comparison

GPT-5 is technically the winner at 4.32/5, but honestly? Claude Sonnet 4.5 and Deepseek 3.1 are basically tied at 4.11 and you probably won't notice the difference. The real plot twist? GPT 4.1 absolutely faceplants with 2.61/5. Like, yikes. Don't use that one for security stuff.

Execution time

Claude Haiku 4.5 is flying through these queries at 38 seconds. Meanwhile GPT-5 is taking a leisurely 93-second stroll — literally 2.5x slower. When there's a potential security incident, those extra seconds feel like forever. Haiku gets it done.

Value proposition matrix

Bigger bubble = fewer tokens used. GPT 4.1's bubble is huge, but that's not a flex — it's like saying "I finished the test super fast" when you failed it. Cheap and wrong isn't a value proposition, it's just... wrong. The models you actually want are in the upper-right corner: Deepseek 3.1 (efficient AND accurate), Claude Sonnet 4.5 (balanced beast), and Grok Code Fast (solid all-around). GPT-5's micro-bubble confirms it's the expensive option.

So, what did we learn?

  1. Accuracy isn’t everything. A model that’s slightly more accurate but takes twice as long — and burns five times the tokens — might not be your best option. In a SOC, efficiency and scale is part of accuracy.
  1. Tool use is a window into reasoning. “If an LLM needs ten tool calls to answer a simple question, it’s not being thorough — it’s lost. The best-performing models didn’t just get the answer right; they got there efficiently, using one or two smart queries through the MCP. Tool use isn’t about quantity — it’s about how quickly the model figures out the right path. It's not always the LLM to be blamed. A good MCP server is essential for optimal tool calling. But lets keep MCP evaluation for a later time.
  1. Prompt design is underrated. The tiniest tweak in wording can swing accuracy or hallucination rates wildly. We kept the prompt minimal on purpose — a baseline for future tuning — but it’s clear that small design choices have big effects.

Wrapping up (and a little reality check)

So, here’s the thing — it’s not really about which model wins a beauty contest. Sure, GPT-5 might edge out Claude on one metric or another, but that’s missing the point.

The real lesson is that evaluating your AI agent is not optional.
If you’re going to rely on GenAI inside your SOC — to triage alerts, summarize incidents, or even call containment actions — then you need to know how it behaves, where it fails, and how it evolves over time.

AI without evaluation is just automation without accountability.

And equally important: your security tools need to speak LLM.

That means structured data, clean APIs, and context that’s machine-readable — not locked in dashboards or vendor silos. The most advanced model in the world can’t reason if it’s fed half-broken telemetry.

That’s why at Vectra AI, we’re obsessed with making sure our platform — and our MCP server — are LLM-ready by design. The signals we produce aren’t just meant for humans; they’re built to be consumed by machines, by AI agents that can reason, enrich, and act.

Because in the next wave of security operations, it’s not enough to use AI — your entire ecosystem has to be AI-compatible.

The SOC of the future isn’t just AI-powered. It’s AI-measured, AI-connected, and AI-ready.