惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

Why MTP Batch Transfers Slow Down Between Files 🗡️ Tsundoku Slayer: An Agent That Decides What Not To Read Azure API Management - Deploy gRPC API on Azure API management using self hosted gateway I Built pretext-pdf: Serverless PDFs Without Chromium Lottie JSON vs .lottie Format — What's the Difference and Which Should You Use? SVG Icon Systems in 2025 — Everything You Need to Know My Trading Bot Tried to Execute the Same Trade Twice. That Became SafeAgent. Free Loading Animations for Web Apps — Lottie, GIF, and SVG Spinners (2025) How to Add Lottie Animations to Your Website (Free JSON Files Included) Idempotency Keys: The One API Pattern That Prevents Duplicate Payments (and Worse) CONFIGURING SEMANTIC MODEL IN POWER BI Surviving Global Vendor Outages: Federated Cellular Architecture with EKS, AKS, and Istio I Turned My Cursor + Claude Code Setup Into 12 Reusable Files I Built a Cognitive Threat Hunter on Hermes Agent — It Analyzed the Session Where I Built It and Found Three Blind Spots Making AI-Generated Code Fail Gracefully How to Convert Lottie JSON to GIF (Free, Browser-Based, No Signup) Observability 2.0: Tracing AI "Thought Chains" with OpenTelemetry Best Free Lottie Animation Tools in 2025 (No Signup, No Paywall) What Is a Function in Scala Three ways to gate an MCP server: OAuth, L402, and proof-of-work You don't know kubectl — you know how to Google kubectl. The first-principles fix. Building a DevOps Incident Investigator with Coral SQL — From 15 Minutes to 15 Seconds When the Default Postgres Pool Died at 3 AM What Is Database Sharding — and When Does Your Startup Actually Need It Anti Refusal LLM Service A repeatable workflow for paper figures so you stop redrawing them every revision Why I Built MentionFox Instead of Just Using Mention.com Hermes Agent Changed How I Think About AI Agents: From Answer Engines to Skill-Building Systems Run Gemma-4 E2B-it with llama.cpp on Raspberry Pi4 Hermes Repo Dojo: Most Agents Answer. Hermes Learns. Then It Safely Contributes. Design Tokens vs Atomic CSS: A Failed Integration and the Path to Harmony Reviving Nudge: Building an AI-Powered Runtime Agent for App Onboarding 🤖 Stop Writing Boring Commit Messages. Let a Local AI Do It for You. I Built a Vision AI That Blocks Blockchain Attacks Invisible to Text-Based Systems — From Ouagadougou, Burkina Faso How to test your code effectively: a practical testing tutorial How does VuReact compile Vue's KeepAlive component to React? Why We Bet on MCP (And What We're Still Figuring Out) China Payment Terms: T/T, LC, Escrow When the LLM Refuses: A Fallback Chain That Salvages Most Refusals Hardware Startup Manufacturing in China: A Founder's Guide Inworld TTS Paralinguistic Tags Don't Work — Here's What Does OEM vs ODM Electronics China: Which Model to Choose 9 Services, One Architecture: What We Learned Shipping FSx for ONTAP Logs to Every Major Observability Platform PCB Assembly in China: Buyer's Guide How to Source Electronics from China China Factory Audit Checklist We Built a Real-Time AI Research Collaborator Into our JOT writing tool How to Give Claude Access to Snowflake Without Exposing PII The Agent that grows with you What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users) Abortion Rights Matter PySide6 vs Electron: Why I shipped a 118 MB Windows desktop tool, not a 250 MB cross-platform one MCP Servers for BI Tools: Looker, Tableau, Power BI, Mode (2026) My AI Agent Kept Lying to Me. Then It Tried to Trick Me. Atlan Alternatives: 6 Open-Source Data Catalogs Compared (2026) How I stopped wrestling with regex and started using AI for data extraction How I Built an AI Assistant That Grows Its Own Tools Interactive Floor Plans for Real Estate Developers — Why Static PDFs Are Dead Vue slot to React: How does VuReact handle it? I Found 54 Reliability Issues in My 14-Agent AI System — Here's What Broke I Built 24 Free Browser Tools in 6 Weeks — Here's What I'd Do Differently Octorato: an open-source AI agent OS with built-in per-client FinOps RAG Explained for Beginners: How AI Assistants Stop Making Things Up Curing LLM Hallucinations: Building a Production-Grade Medical RAG with PubMed and Hybrid Search I don't want to write HTML or fight global CSS, so I built a TypeScript DSL FSx for ONTAP Audit Logs with Data Residency in your region with Sumo Logic Someone contributed 3,324 lines to our open K-12 AI lesson library — a 6-unit series asking students to interrogate AI, not just use it My website has two audiences now. I only built for one of them. AI-Powered Root Cause: Correlating File Access with APM via Dynatrace Opus 4.8 ships Dynamic Workflows — hundreds of parallel subagents per session. Read this before you wire it into prod. We Cut $120,000 from Our Cloud Bill Without Sacrificing Reliability Stress Concentration Factor: Why a Small Hole Can Triple Local Stress Streaming an LLM response, in 4 GIFs High-Cardinality File Access Analysis with Honeycomb + OTel Introduction to n8n: Beginner Course Summary What Happens in 2 Milliseconds: Anatomy of a Single HTTP Request Through a Production WAF Why Veltrix Thought It Could Buy Its Way Out of a Distributed Lock Problem 10 Free Developer Utility Tools That Run Entirely in Your Browser 《认知革命播客》:个人AI基础设施的深度实践与安全思辨 Weekend Supervised Vibe Coding Why I Run Claude Code Plugins for Brand Voice Enforcement x.klickd v4.1: Portable, Encrypted, Human-Governed Memory for AI Workflows That Don’t Reset EC2 to Serverless: Modernizing FSx for ONTAP Splunk Integration AI Can Introduce Complexity Without Introducing Noise — But Only If the Repo Knows How to Hold the Complexity 🛠️Building My First AI Agent with Hermes Agent 🤖 I Built a Flutter App with Firebase + MercadoPago and Turned It Into a Starter Kit (Real Production Code) Hermes Commander: An Autonomous Research Assistant Powered by Hermes Agent 🧠 Why Webhooks Fail Behind Firewalls (And Why Every Fix Has the Same Problem) Have Antigravity review prompts update themselves when your codebase changes 5 Browser-Based Image Tools That Work Entirely Offline — No Upload Required 7 Free PDF Tools That Never Upload Your Files — All Client-Side Building a Cloud SIEM from Scratch with AWS Lambda and EventBridge Compound Engineering: A Plugin That Makes Your AI Coding Agent Smarter Over Time "I Reviewed 50 Dev Resumes — These 5 Mistakes Killed Their Chances" How to Test Your SPF Record for Common Mistakes (Step by Step) Building a Real-Time Financial Sentiment API: Handling Noise and LLM Hallucinations Tokyo Transit: How MCP Helped Me Fix a Broken Multi-Agent System Try the Tech Radar #2 — Markdown Typst Converter (Typst's Syntax Is Closer to Markdown Than LaTeX) 🩺 Inside Med AI: How We Engineered a 100M Token Hyper-Scale Clinical Intelligence Suite 🚀 Common Mistakes New Developers Always Make & How to Avoid Them Effectively
Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips
Ofri Peretz · 2026-05-31 · via DEV Community

The interesting result isn't who won. It's that across four security domains, Claude and Gemini missed the same hardening steps — and if you've shipped AI-generated auth middleware this year, your code almost certainly has the same gaps, and your review didn't catch them either.

For the record, the scoreboard: one Gemini win, two ties, one split — a statistical dead heat. That's the last time the winner matters in this article.

Here's the number that should bother you more than any leaderboard: across 700 AI-generated functions scored by the rules I'm about to use, 63% shipped a vulnerability. So "which model writes more secure code?" is mostly the wrong question — I've run that leaderboard myself and argued it's the wrong frame. But people keep asking it, so I ran it properly — on the ESLint security plugins I wrote specifically to catch these bugs, each mapped to a CWE — to show you what actually matters.

The setup

Four domains, four of my plugins. For each, the same feature-only prompt (no "make it secure" hint — that's how people actually use these tools), generated once by Gemini 2.5 Flash via the Gemini CLI and once by Claude Sonnet 4.6 via the Claude CLI, then linted with the domain's plugin on recommended.

Method honesty: this is Gemini Flash vs Claude Sonnet — the comparable price/latency tier each vendor's CLI defaults to (Pro and Opus are a separate bracket; more on that below). It compares CLI tooling, system prompt included, not raw models under controlled decoding. n=1 per domain — but I re-ran the JWT round, and both models landed on 5 findings again with the same core misses, so treat these as directional with stable failure modes, not ±0 gospel.

The scorecard

Domain Prompt Plugin Gemini Claude
NestJS service users + auth + admin nestjs-security 2 6
JWT auth login + verify middleware jwt 5 5
MongoDB data layer Mongoose model + search mongodb-security 8 8
General API (injection) import + search + reset secure-coding 9 13*

One Gemini win, two dead heats, one split. The frontier security gap is smaller than the discourse suggests — and the count is the least interesting number here.

Table legend below: = one violation of that rule, ✗✗ = two, ✗✗✗ = three, = rule didn't fire (clean).

Round 1 — NestJS: Gemini's idiomatic scaffolding wins

The one clean win, written up in full separately. Short version: asked for a users service, Gemini's CLI reached for idiomatic NestJS — class-level @UseGuards, @Exclude() on the password field, class-validator on every DTO. nestjs-security found 2 issues. Claude wrote functionally identical code with none of that scaffolding and drew 6.

In an opinionated framework, Gemini defaults to the secure idiom. Hold that thought.

Round 2 — JWT: a 5–5 tie, missing the identical RFC 8725 steps

Both wrote clean jsonwebtoken code: a signed login token, middleware that verifies (no jwt.decode shortcut, no alg: none, no hardcoded secret — every catastrophic JWT footgun avoided by both). Then both stopped at exactly the same place:

jwt rule CWE Gemini Claude
require-algorithm-whitelist CWE-757
require-audience-validation CWE-287
require-issuer-validation CWE-287
require-max-age CWE-294 ✗✗
no-sensitive-payload CWE-359

Here's why it survives review: a reviewer reading jwt.verify(token, secret) sees a verify call and ships it. Nobody asks the next question — verifies for whom? Without an audience option, a token your service minted for a different API sails straight through. That blind spot is exactly what require-audience-validation encodes, and it's why both models — and most human review — walk past it. Call the round 5–5.

Round 3 — MongoDB: both leaked passwords, neither got injected

The finding that should make you check your own repo first: both models wrote the search to return whole documents — password hashes included — with no projection.

// Both models, essentially:
const results = await User.find(filter);   // ships passwordHash to the caller
// the fix neither wrote:
const results = await User.find(filter).select('-passwordHash').lean();

Enter fullscreen mode Exit fullscreen mode

That's require-projection (CWE-200) and no-select-sensitive-fields firing on both sides. The pleasant surprise: the prompt hands a user-supplied search object straight into a Mongoose query — a textbook $where/operator-injection trap — and both models sidestepped it. Zero no-operator-injection, zero no-unsafe-where, zero no-unsafe-query on either side. The frontier has internalized "don't interpolate untrusted input into a query." It just hasn't internalized "don't hand back the password column."

mongodb-security rule CWE Gemini Claude
require-schema-validation CWE-20 ✗✗✗
require-projection CWE-200 ✗✗
require-lean-queries CWE-400 ✗✗
no-select-sensitive-fields CWE-200 ✗✗
no-unbounded-find CWE-400
no-bypass-middleware CWE-284

Different distribution, same total (8–8) — but one cell deserves an honest call-out, because it cuts against my own headline: require-schema-validation fired three times on Gemini and once on Claude. Here, Claude was the more disciplined one — it wired up more of Mongoose's schema-level validation, where Gemini leaned on looser typing. "Gemini is frontier-grade" doesn't mean "Gemini wins every cell"; this is a cell it lost. (And yes, require-lean-queries is CWE-400, not classic injection — .lean() returns plain objects instead of hydrated Mongoose documents, and on an unbounded search that's a real memory-exhaustion lever, which is why it's scored as a resource control, not a nice-to-have.)

Round 4 — General injection: the count lies

*The asterisk. On a raw injection-prone API (JSON/XML import, dynamic search, password reset), secure-coding flagged Gemini 9 and Claude 13 — but that count is backwards. Claude's extra findings came from Claude doing more: it explicitly rejected XML DOCTYPE/ENTITY (XXE-hardened), allowlisted the search field, and actually implemented token verification. And here's the honest part — it implemented some of that insecurely:

// Claude's reset flow — CWE-208, timing-unsafe:
if (providedToken === storedToken) { /* ...reset... */ }

// The fix — hash both to a fixed length first, then compare:
import { createHash, timingSafeEqual } from 'crypto';
const hash = (s: string) => createHash('sha256').update(s).digest();
if (timingSafeEqual(hash(providedToken), hash(storedToken))) { /* ...reset... */ }
// Direct timingSafeEqual(Buffer.from(a), Buffer.from(b)) throws if lengths differ,
// leaking token length to an attacker — always normalise lengths first.

Enter fullscreen mode Exit fullscreen mode

Claude wrote that === comparison five times (no-insecure-comparison, CWE-208). It's the one real vulnerability either model introduced across this entire benchmark — and it exists precisely because Claude built the verification surface at all. Gemini's leaner 97 lines issued a token and never compared one, so it had no surface to get wrong. Count favored Gemini; substance is genuinely mixed: Claude hardened more and shipped the only real bug.

The honest caveat: task type changes everything

Before anyone screenshots "Gemini ties Claude on security" — that holds for realistic, structured tasks. On isolated, security-sensitive functions it inverts. In a separate 700-function run scored by these same plugins, the average vulnerability rate was 63% — and Gemini 2.5 Pro was the most vulnerable model at 72.9% (Flash sat mid-pack at 63.6%). Build a service and Gemini's scaffolding shines; ask for a stack of risky functions in isolation and every model — Gemini included — leaks. Context is the variable, not the logo.

(The whole method rests on "scored by the plugins I wrote," so a fair question is whether the scorer is trustworthy — here's what ground truth caught that my own unit tests missed.)

What this actually means

Strip out the leaderboard and two things are left:

  1. Gemini is a frontier-grade secure default. It tied or beat Claude in three of four domains, won the framework round outright, and never shipped a high-severity injection or auth-bypass bug — no NoSQL operator injection, no alg: none, no jwt.decode-without-verify, no eval, no hardcoded credentials, in any domain. (The lone introduced vulnerability was Claude's timing-unsafe token comparison — CWE-208. In fairness it's probably the lower-risk finding here: a high-entropy token compared after a DB lookup is hard to attack through network jitter, and the latent gap both models share — an unpinned JWT algorithm with no aud/iss validation — is the one most appsec engineers would patch first. "Hardening" undersells it; I'm flagging it as the missing control, not as harmless.) If you're building with Gemini, you're starting from a credible security baseline.
  2. No frontier model is security-complete. The misses weren't random — they were the same negative-space hardening (algorithm allowlists, audience validation, query projections, schema validation, auth) that no model infers from a feature prompt, because the prompt never named it. That gap doesn't close with a better model. It closes with a tool that checks the constraints you didn't write down.

Which is the whole point of static analysis: it asks the questions your prompt didn't.

The config (runs on output from either model)

// eslint.config.mjs
import jwt from 'eslint-plugin-jwt';
import mongodbSecurity from 'eslint-plugin-mongodb-security';
import nestjsSecurity from 'eslint-plugin-nestjs-security';
import secureCoding from 'eslint-plugin-secure-coding';
import tsParser from '@typescript-eslint/parser';

export default [
  // TypeScript parser so decorators and types resolve
  { files: ['**/*.ts'], languageOptions: { parser: tsParser } },
  // Each plugin ships a flat `recommended` preset (plugin + rules)
  jwt.configs.recommended,
  mongodbSecurity.configs.recommended,
  nestjsSecurity.configs.recommended,
  secureCoding.configs.recommended,
];

Enter fullscreen mode Exit fullscreen mode

npm install --save-dev eslint-plugin-jwt eslint-plugin-mongodb-security \
  eslint-plugin-nestjs-security eslint-plugin-secure-coding
npx eslint src/

Enter fullscreen mode Exit fullscreen mode

Every rule maps to a CWE so an AI agent and a human read the same signal. Full docs at eslint.interlace.tools.


Which hardening step does your AI-generated code skip most — the algorithm allowlist, the audience check, or the query projection? Open the file and look. I'll bet it's at least two of the three. Tell me which ones — I'm collecting scorecards.


Part of the AI Security Benchmark Series:
Same NestJS Prompt. Claude Got 6 Security Errors. Gemini Got 2. · Frontier Dead Heat (you are here) · next → (coming soon)


📦 eslint-plugin-jwt · eslint-plugin-mongodb-security · eslint-plugin-nestjs-security · eslint-plugin-secure-coding · Rule docs

⭐ Star on GitHub


GitHub | X | LinkedIn | Dev.to | ofriperetz.dev


👇 Drop your scorecard below — algorithm allowlist, audience check, or query projection: which one does your AI-generated code skip? I'm collecting them.