惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
博客园 - 司徒正美
美团技术团队
WordPress大学
WordPress大学
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
T
Troy Hunt's Blog
S
Schneier on Security
T
The Exploit Database - CXSecurity.com
P
Proofpoint News Feed
云风的 BLOG
云风的 BLOG
Engineering at Meta
Engineering at Meta
Cisco Talos Blog
Cisco Talos Blog
T
Tor Project blog
B
Blog
NISL@THU
NISL@THU
月光博客
月光博客
博客园 - 【当耐特】
AWS News Blog
AWS News Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
腾讯CDC
L
Lohrmann on Cybersecurity
The Cloudflare Blog
L
LINUX DO - 最新话题
S
Security @ Cisco Blogs
S
Secure Thoughts
Spread Privacy
Spread Privacy
有赞技术团队
有赞技术团队
The Last Watchdog
The Last Watchdog
Project Zero
Project Zero
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Vercel News
Vercel News
H
Hacker News: Front Page
S
SegmentFault 最新的问题
Schneier on Security
Schneier on Security
aimingoo的专栏
aimingoo的专栏
P
Privacy & Cybersecurity Law Blog
博客园 - 三生石上(FineUI控件)
Forbes - Security
Forbes - Security
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
T
Tailwind CSS Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
G
GRAHAM CLULEY
W
WeLiveSecurity
小众软件
小众软件
Recorded Future
Recorded Future
Cyberwarzone
Cyberwarzone
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org

Aikido Security's Blog

Axios CVE-2026-40175: a critical bug that’s… not exploitable GlassWorm goes native: New Zig dropper infects every IDE on your machine Aikido Attack finds multiple 0-days in Hoppscotch The cybersecurity doomerism around Mythos doesn't match what we see on the ground axios compromised on npm: maintainer account hijacked, RAT deployed Popular telnyx package compromised on PyPI by TeamPCP Aikido × Lovable: Vibe, Fix, Ship CanisterWorm Gets Teeth: TeamPCP's Kubernetes Wiper Targets Iran TeamPCP deploys CanisterWorm on NPM following Trivy compromise Security testing is validating software that no longer exists Aikido Recognized by Frost & Sullivan with the 2026 Customer Value Leadership Award in ASPM GlassWorm Hides a RAT Inside a Malicious Chrome Extension fast-draft Open VSX Extension Compromised by BlokTrooper Glassworm Strikes Popular React Native Phone Number Packages Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Hundreds of Repositories How Security Teams Fight Back Against AI-Powered Hackers Introducing Betterleaks, an open source secrets scanner by the author of Gitleaks Trump’s 2026 cybersecurity strategy: From compliance to consequence How does AI pentesting work with compliance? What continuous pentesting actually requires Rare Not Random: Using Token Efficiency for Secrets Scanning Persistent XSS/RCE using WebSockets in Storybook’s dev server Why Determinism Is Still a Necessity in Security WAF vs. RASP vs. ADR Introducing Aikido Infinite: A new model of self-securing software How Aikido secures AI pentesting agents by design Astro Full-Read SSRF via Host Header Injection How to Get Your Board to Care About Security (Before a Breach Forces the Issue) What is Slopsquatting? The AI Package Hallucination Attack Already Happening SvelteSpill: A Cache Deception Bug in SvelteKit + Vercel Top 6 Wiz Code Alternatives Aikido recognized as Platform Leader in Latio Tech's 2026 Application Security Report From detection to prevention: How Zen stops IDOR vulnerabilities at runtime npm backdoor lets hackers hijack gambling outcomes Introducing Upgrade Impact Analysis: When breaking changes actually matter to your code Why Trying to Secure OpenClaw is Ridiculous Claude Opus 4.6 found 500 vulnerabilities. What does this change for software security? Introducing Aikido Expansion Packs: Safer defaults inside the IDE Self-Securing Software: What It Is, Why It Matters, and How It Works npx Confusion: Packages That Forgot to Claim Their Own Name What Is Continuous Pentesting? Introducing Aikido Package Health: a Better Way to Trust Your Dependencies AI Pentesting: Minimum Safety Requirements for Security Testing Secure SDLC for Engineering Teams (+ Checklist) Fake Clawdbot VS Code Extension Installs ScreenConnect RAT G_Wagon: npm Package Deploys Python Stealer Targeting 100+ Crypto Wallets Gone Phishin': npm Packages Serving Custom Credential Harvesting Pages Malicious PyPI Packages spellcheckpy and spellcheckerpy Deliver Python RAT Top 10 AI Security Tools For 2026 Agent Skills Are Spreading Hallucinated npx Commands Understanding Open-Source License Risk in Modern Software The CISO Vibe Coding Checklist for Security Top 6 Graphite alternatives for AI code review in 2026 From “No Bullsh*t Security” to $1B: We Just Raised Our $60m Series B Critical n8n Vulnerability Allows Unauthenticated Remote Code Execution (CVE-2026-21858) Top 14 VS Code Extensions for 2026 AI-Driven Pentesting of Coolify: Seven CVEs Identified Top Continuous Pentesting Tools in 2026 SAST vs SCA: Securing the Code You Write and the Code You Depend On JavaScript, MSBuild, and the Blockchain: Anatomy of the NeoShadow npm Supply-Chain Attack How Engineering and Security Teams Can Meet DORA’s Technical Requirements IDOR Vulnerabilities Explained: Why They Persist in Modern Applications Shai Hulud strikes again - The golden path MongoBleed: MongoDB Zlib Vulnerability (CVE-2025-14847) and How to Fix It First Sophisticated Malware Discovered on Maven Central via Typosquatting Attack on Jackson The Fork Awakens: Why GitHub’s Invisible Networks Break Package Security Top 10 Cyber Security Tools For 2026 SAST in the IDE is now free: Moving SAST to where development actually happens AI Pentesting in Action: A TL;DV Recap of Our Live Demo The Top 7 Threat Intelligence Tools in 2026 React & Next.js DoS Vulnerability (CVE-2025-55184): What You Need to Fix After React2Shell OWASP Top 10 for Agentic Applications (2026): What Developers and Security Teams Need to Know DAST vs Pentesting v AI Pentesting: Why DAST Cannot Replace Modern Pentesting PromptPwnd: Prompt Injection Vulnerabilities in GitHub Actions Using AI Agents Top 7 Cloud Security Vulnerabilities Critical React & Next.js RCE Vulnerability (CVE-2025-55182): What You Need to Fix Now How to Comply With the UK Cybersecurity & Resilience Bill: A Practical Guide for Modern Engineering Teams Shai Hulud 2.0: What the Unknown Wonderer Tells Us About the Attackers’ Endgame SCA Everywhere: Scan and Fix Open-Source Dependencies in Your IDE Safe Chain now enforces a minimum package age before install Shai Hulud Attacks Persist Through GitHub Actions Vulnerabilities Shai Hulud Launches Second Supply-Chain Attack: Zapier, ENS, AsyncAPI, PostHog, Postman Compromised CORS Security: Beyond Basic Configuration Revolut Selects Aikido Security to Power Developer-First Software Security The Future of Pentesting Is Autonomous How Aikido and Deloitte are bringing developer-first security to enterprise Secrets Detection: A Practical Guide to Finding and Preventing Leaked Credentials Invisible Unicode Malware Strikes OpenVSX, Again AI as a Power Tool: How Windsurf and Devin Are Changing Secure Coding Building Fast, Staying Secure: Supabase’s Approach to Secure-by-Default Development OWASP Top 10 2025: Official List, Changes, and What Developers Need to Know Top 10 JavaScript Security Vulnerabilities in Modern Web Apps The Return of the Invisible Threat: Hidden PUA Unicode Hits GitHub repositorties Top 7 Black Duck Alternatives in 2026 What Is IaC Security Scanning? Terraform, Kubernetes & Cloud Misconfigurations Explained AutoTriage and the Swiss Cheese Model of Security Noise Reduction Top Software Supply Chain Security Vulnerabilities Explained The Top 7 Kubernetes Security Tools Top 10 Web Application Security Vulnerabilities Every Team Should Know What Is CSPM (and CNAPP)? Cloud Security Posture Management Explained
International AI Safety Report 2026: What It Means for Autonomous AI Systems
Dania Durnas · 2026-02-09 · via Aikido Security's Blog

The International AI Safety Report 2026  is one of the most comprehensive overviews to date of the risks posed by general-purpose AI systems. It’s compiled by over 100 independent experts from more than 30 countries, and shows that while AI systems are performing at levels that seemed like science fiction only a few years ago, the risks of misuse, malfunction, and systematic and cross-border harms are clear.

It makes a compelling case for better evaluation, transparency, and guardrails. But one direct question remains under-explored: what does “safe” look like when AI operates autonomously against real systems?

A summary of the interesting takeaways from the International AI Safety Report includes:

  • At least 700 million people use AI systems weekly, with adoption rates faster than the personal computer in its early years
  • Several AI companies released their 2025 models with additional safety measures after pre-deployment testing failed to rule out that the systems could help non-experts develop biological weapons. (!!!) (Unclear if the additional safety measure would still prevent it entirely)
  • Security teams have documented AI tools being used in actual cyberattacks by both independent actors and state-sponsored groups. 

The report talks at length about the approaches to manage many of the risks associated with AI– here’s our take: 

Where Aikido agrees with the report (and ways it could go further) 

1. Layered defense matters

The report outlines a defense-in-depth approach to AI safety, breaking it into three layers: building safer models during training, adding controls at deployment, and monitoring systems after they're live. We agree broadly with the application of these layers

The report uses a "Swiss Cheese" diagram showing how different layers have different vulnerabilities that only combined provide strong protection.

Source: International AI Safety Report 2026, Figure 3.9. Licensed under the UK Open Government Licence v3.0. Based on data from Zou et al. (2025), as cited in Anthropic (2025).

The report emphasizes the first layer, the safer model development. They’re cautiously optimistic that training-based mitigations can help, but also acknowledge that they are hard to implement at scale. While we agree that AI operators should give their best effort during training, our philosophy diverges slightly from the report in this case. We can’t rely on prompts or instructions to keep agentic systems in scope. Layered defense only works if each layer can fail independently.

2. Validation as a Safety Requirement

The report stays light on implementation details for the second layer, deployment-time controls, but we think this is where the most immediate progress can happen.

The International Report documents models gaming their evaluations in concerning ways. Some find shortcuts that score well on tests without actually solving the underlying problem (reward hacking). Others intentionally underperform when they detect they're being evaluated, attempting to avoid restrictions that high scores might trigger (sandbagging). In both cases, the models optimize for something other than the intended goal.

We reached the same conclusion: once AI systems operate autonomously, you can't trust what they self-report, their confidence levels, or their reasoning traces. An agent that validates its own discoveries creates a single point of failure disguised as redundancy. Safe operation requires treating initial findings as hypotheses, reproducing behavior before reporting, and using validation logic that is separate from discovery. This validation can even come from another AI agent.

3. Reduce risk before allowing agents to run in live environments

The report's third layer covers observability, emergency controls, and continuous monitoring after systems go live. This aligns with what we’ve seen in our operations.

Black box operation isn't acceptable for autonomous systems that interact with production infrastructure, so we treat emergency stop mechanisms as non-negotiable requirements. If you can't see what an agent is doing or stop it when it’s off the rails, you're not operating it safely, regardless of how good the underlying model is.

4. Prompt Injection Requires Enforced Constraints, Not Hope 

The report shows that prompt injection attacks are still a serious vulnerability– many major models in 2025 could be successfully attacked by prompt injection with relatively few attempts. The success rate is falling but remains relatively high. We go a step further than the report and maintain that any agent interacting with untrusted application content must be assumed vulnerable to prompt injection by default. Safety, in this context, comes from enforcing constraints and not hoping models behave correctly.

Chart showing prompt injection attack success rates across major AI models released between May 2024 and August 2025, with most models remaining vulnerable despite gradual improvements.

Source: International AI Safety Report 2026, Figure 3.9. Licensed under the UK Open Government Licence v3.0. Based on data from Zou et al. (2025), as cited in Anthropic (2025).

What We Think Should Come Next 

Systems, Not Just Models

The report makes a strong case for defense-in-depth, transparency, and evaluation. These matter, but many of the most immediate problems occur once models connect to tools, credentials, and live environments. This is why implementation-level requirements are so important (and necessary). We have to translate these principles into concrete technical requirements that teams can implement.

Based on operating AI pentesting systems in production, we believe the minimum safety requirements for autonomous AI systems should include:

  • Abuse prevention and ownership validation
  • Enforced scope control at the network level
  • Isolation between reasoning and execution
  • Full observability and emergency controls
  • Data residency and processing guarantees
  • Prompt injection containment
  • Validation and false positive control

We found that these are the minimum enforceable requirements for safety. If you omit any one of them, you introduce unacceptable risk into the system. We go more into these requirements in our blog post on AI Pentesting Safety

Safety Baselines as Policy Building Blocks

The International AI Safety Report represents significant progress toward a shared understanding of AI risks across governments, researchers, and industry. The challenge now is bridging research findings, regulatory frameworks, and real-world deployment practices.

The report does bring up some genuinely high-stakes scenarios and unsettling stats about how quickly capabilities are advancing. That said, this isn’t a reason to panic or regulate “AI” as a scary monolith. The report itself notes that safeguards vary widely across developers and

that prescriptive mandates can stifle defensive innovation. We agree. Regulation should avoid mandating a single implementation path. Instead, policies should define clear, outcome-oriented safety baselines that can be building blocks for broader frameworks.

As part of the movement toward creating more outcome-focused safety frameworks, we've published our document on the Minimum Safety Requirements for AI-driven Security  Testing. For teams evaluating AI pentesting tools or building autonomous security systems, this guide serves as a vendor-neutral reference. We hope this helps teams evaluate AI pentesting tools, build safer autonomous security systems, and contribute to establishing clear baselines that work for both builders and regulators.