惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

Stop Being Nice, Start Being Right": The Day My User Reconfigured My Reward Function Hot To Run LLMs Locally Research blockchain with post-quantum Dilithium and custom zk-STARKs from scratch AI agents do not just need tool access. They need execution control. The CTO’s Blueprint for Governing Multi-Agent AI Systems in the Enterprise I audited our CMS and 86% of our articles were invisible. A Sanity gotcha. Upselling Explained Industry-Specific Tactics for EC Owners 2026 I Keep Hermes Agent's Self-Improvement OFF For the First 14 Days — Here's What Happens When I Don't I Built the Hermes + Claude Code Dual-Stack: Orchestrator Meets Coder — Here's the Full Architecture Stop Using .iterrows(). Here's What Actually Fast Looks Like I Built a SaaS to Stop the Awkward "Hey, Did You Get My Invoice?" Conversation I Renamed a Hot Postgres Table Without Dropping a Request How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI What is a Webhook? A Complete Guide for Beginners Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models Beyond Translation: A Developer's Guide to App Localization (i18n & l10n) Aegis: Designing an Offline Ambient Co-Working Companion for High-Burnout Medical and STEM Grinds Local LLM Code Completion Showdown: Zed AI vs Continue vs Cursor (Honest 2026 Review) The Agentic Payment Protocol Wars Your No-Code AI Agent Has a Memory Problem The Agentic Payment Protocol Wars How to Bypass LinkedIn Commercial Use Limit in 2026 (Without Paying $150/mo) We built a statechart hosting platform where two actors in the same state can migrate to different versions — here's why that matters Playwright vs TWD: A Frontend Developer's Honest Comparison Claude Code's skillListingBudgetFraction: The Undocumented Setting Silently Killing Half Your Skills O GitHub pode mudar sua carreira mais do que você imagina Just redesigned and launched my developer portfolio 🚀 Would genuinely love some honest feedback from the dev community 👨‍💻 Data Virtualization and the Semantic Layer: Query Without Copying Launching opub: donated compute for open-source maintainers Four iteration rounds on a security scanner I run, all of them visible. Here is what the loop actually looks like. Why Good Abstractions Make Debugging Harder Found a Coordinated Inauthentic Network on GitHub: 24 Accounts, Fabricated History, and a Generator That Left Its PID in Three READMEs Cursor Just Released Composer 2.5. Here's What Actually Changed for AI Coding Agents. What Wrong Docs Cost Test Automation Teams Export Your DeepSeek Chats to Word, PDF, Google Docs, Markdown & Notion in One Click When the Docs Lie OpenShift Observability: Built-in vs. Bring-Your-Own If your AI initiative is pending for 6 months, the bottleneck is probably not technology Hermes Agent Under the Hood: The Open-Source Runtime for Autonomous AI Systems Expert Systems -The AI That Existed Before AI Was Cool AI-generated accessibility, an update — frontier models still fail, but skills change the game My HTML Learning Journey 🚀 The Day PayPal Failed and the Rust Rewrite Saved the Product Launch Google Sheets CRM: 4 Ways I've Actually Done It (with Apps Script Code) BrontoScope: AI-Powered Error Investigations The job of an AI engineer inside a 40-person company is not what most CEOs think it is Building a Clinical Speech-Therapy App With a Real SLP: 4 Lessons From PhoenixSteps 7 overlooked .Net features How Stripe Took 48 Hours and 3 API Calls to Break My Freelance Income Stream in Lagos Pretty normal Both Camps in the 'Left Behind' Argument Are Right About Each Other Flutter MCP Toolkit v3 Google Just Shipped Gemini 3.5 Flash. Here's What Developers Actually Need to Know. 🔐 Working with Private Symfony Recipes Rate limiting in web apps: what to protect before picking a library Rate limiting en aplicaciones web: qué proteger antes de elegir una librería What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg What It Really Takes to Become a Senior Software Engineer Microservices Were Never About Technology JS Crime Scene: The Misleading Array Project-as-code for a Directus v9 backend When the API literally burned your database after a typo COOKIES DPRK Hacking Trends 2026: AI‑Powered Supply Chain and Developer Environment Attacks Phone control for AI coding sessions is not a tiny terminal PayPal and Crypto Are Not Equals: How I Built a Gumroad Alternative for Restricted Countries Exploring Tech as a Content Writer I Raised Gemma 4's Token Cap. The Dense Model Stopped Refusing. React Server Components Don't Make Your App Fast by Default Multi-Stage Builds for a Next.js App — Reduce Image Size by 70% I Built a Chrome Extension That Teaches Vocabulary While You Browse Why I Walked Back from Next.js and RSC to a Plain SPA and a Separate Backend NeuralPocket: Private On-Device AI with Gemma 4 — Android & Web Github Speckit: Revolucionando o Desenvolvimento com SDD Cloud Cost Elasticity I Built a Payment System for Bangladesh—Heres Why Stripe Failed Us Polyglot Persistence in Microservices: Choosing the Right Database for Each Service Centralized Authentication for a Multi-Brand Laravel Ecosystem How I made a perfect recording button. Simple yet complex thing. Mumbli – my personal Wispr Flow Getting Paid Should Not Be a Geopolitical Nightmare: My NOWPayments Integration Story Four Layers of Validation in Kubernetes with Claude Code Prompt Flow — a visual side project for flow design, trace, and integration steps (looking for feedback) AI Citation Registry: Temporal Gaps in Government Publishing Cycles ShowDev: I built a 100% local, zero-upload PDF editor using WebAssembly JavaC Written by an AI Pipeline, Verified by Three Models. Is It Slop? Part1 Vulkan: Drawing Triangle 1 Why I Stopped Using useEffect to Sync State — and What I Use Instead Por qué dejé de usar useEffect para sincronizar estado y qué uso ahora Migrating a Long-Running WordPress Site to Payload CMS (And All The Chaos That Came With It) Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans Azure DevOps Structure Explained: Organizations, Projects, and Repos Without the Mess A Simple React Hook for localStorage State, Expiry, and Sync I sold you on /scratchpad. Then I migrated to /note. Fixing WSL Errors on Windows 11 Your app is not Netflix. Stop building like it is. Resolving inter-service communication issue I built an email cleaner. CSV parsing took longer than the actual validators. How I Would Learn Full-Stack Development in 2026 If I Started From Zero
Building a Database Performance Testing Tool With AI: The Honest Breakdown
Alicia Maria · 2026-05-21 · via DEV Community

It still feels a little strange to have AI writing practically all the code — but I decided to give it a real shot on this new project. A bit of context: I was running low on project ideas to share here, so I asked the AI for a list and picked one called Database Performance Testing. The goal was to run performance tests against a relational database. I built it in roughly two days, and in this article I want to share my honest impressions — both technical and about the AI-assisted workflow itself.


Why run performance tests on a database?

I think this is the first question we need to ask before starting any project. As it happens, I'm currently on a project where data performance is a critical system concern — which got me thinking: what would it look like to run performance tests directly against a relational database?

From a QA perspective, performance isn't just about how the API connects to the database. It's also about how queries are written and how the chosen database handles query concurrency, especially in synchronous systems. A slow endpoint isn't always a slow endpoint — sometimes it's a slow query hiding behind it. That's what drove me to start here.


What are we testing?

For this project, we have four distinct test scenarios:

  • N+1 Query Detection
  • Deadlock Simulation
  • Query Regression Tracking Across Schema Changes
  • Slow Queries

N+1 Query Detection

N+1 is a classic performance problem in applications that access databases — especially systems using ORMs like Entity Framework, SQLAlchemy, or Sequelize. The name describes exactly what happens: instead of one optimized query, the application ends up running 1 initial query + N additional queries.

Consider an e-commerce system that lists 20 orders and shows the email of each order's user. A naive approach would be:

SELECT id, user_id FROM orders LIMIT :n;
SELECT email FROM users WHERE id = :uid;

Enter fullscreen mode Exit fullscreen mode

The first query fetches the orders. The second query then runs 20 times — once per order. That's your N+1. The consequences at scale include multiplied latency, cache pollution, and unpredictable response times for users.

This can also appear in concurrent scenarios. Two threads both fetching user data simultaneously, each triggering their own cascade of lookups, can quickly multiply database load in ways that are hard to trace without instrumentation.


Deadlock Simulation

Imagine two transactions happening concurrently:

Transaction A:

UPDATE orders SET status = 'pending' WHERE id = 1;
UPDATE orders SET status = 'paid'    WHERE id = 2;

Enter fullscreen mode Exit fullscreen mode

Transaction B (running simultaneously, in reverse order):

UPDATE orders SET status = 'paid'    WHERE id = 2;
UPDATE orders SET status = 'pending' WHERE id = 1;

Enter fullscreen mode Exit fullscreen mode

Each transaction holds a lock the other needs. The database detects the cycle and resolves it by rolling back one of the transactions — but that rollback has a cost. At scale, this kind of contention can cause noticeable latency spikes and, in the worst case, lock up parts of your application. The goal of the deadlock test isn't just to confirm that a deadlock can*happen, but to measure *how the database recovers and what the timing impact looks like.


Query Regression Tracking Across Schema Changes

This scenario focuses on measuring the impact of schema changes — index creation, migrations, table alterations — on query execution plans and timings. It captures EXPLAIN ANALYZE output before and after a migration, then diffs the results.

This is especially useful for:

  • Validating that an optimization actually improved things before merging
  • Documenting schema versions and their performance characteristics over time
  • Supporting production investigations when run periodically as a baseline

Slow Queries

The slow query tests act as a latency gate — a set of critical queries that must stay under a defined threshold (configured in config.py as SLOW_QUERY_THRESHOLD_MS). If any of these queries breach the threshold in CI, the pipeline fails.

Think of it as a performance budget for your most important database operations. In this project, five critical queries were monitored. Here are the results from a low-volume benchmark run:

Query Avg (ms) p95 (ms) Max (ms)
user_lookup 0.20 0.29 0.35
order_history 0.31 0.43 0.75
inventory_search 0.24 0.32 0.62

All three passed the threshold comfortably. But the value here isn't just the green light — it's having a historical baseline. The next time someone adds a filter, changes a join, or drops an index, you'll know immediately if these numbers move.


The solution structure

Here's how the project is organized:

.
├── config.py                          # DB_URL and SLOW_QUERY_THRESHOLD_MS
├── conftest.py                        # pytest fixtures: engine, instrumented_engine
├── pyproject.toml                     # pytest config and markers
├── requirements.txt
├── analysis/
│   ├── n_plus_one_detector.py         # N+1 detection and simulation
│   ├── deadlock_simulator.py          # Concurrent deadlock demo
│   └── explain_analyzer.py            # EXPLAIN plan capture and diff
├── benchmarks/
│   ├── queries/                       # Raw .sql files (12 queries)
│   ├── scenarios/
│   │   └── run_benchmark.py           # Volume benchmark runner
│   ├── test_n_plus_one.py             # N+1 detection tests
│   ├── test_deadlock.py               # Deadlock tests
│   ├── test_explain.py                # EXPLAIN ANALYZE plan tests
│   └── test_slow_queries.py           # Latency threshold gate (5 critical queries)
├── data/
│   ├── seed.py
│   └── distributions.json
├── migrations/
│   ├── baseline/
│   │   └── 001_initial_schema.sql
│   └── v2_add_indexes/
│       └── 002_add_indexes.sql        # Sample migration for regression demo
├── reports/
│   ├── query_regression_report.py     # Delta reporter (also exports to Grafana)
│   ├── export_metrics.py              # Writes results to benchmark_results table
│   ├── output/                        # Timestamped benchmark JSON results
│   └── plans/                         # Saved EXPLAIN plans
├── scripts/
│   └── setup_schema.py                # Apply migration + snapshot schema
├── .github/
│   └── workflows/
│       └── performance-tests.yml      # CI: schema → seed → pytest
└── docker/
    ├── docker-compose.yml             # PostgreSQL + Grafana
    ├── init.sql
    └── grafana/
        ├── provisioning/
        └── dashboards/benchmark.json  # Auto-provisioned dashboard

Enter fullscreen mode Exit fullscreen mode

A few things worth highlighting in this structure:

analysis/ is designed to be run independently. Each module — N+1 detection, deadlock simulation, EXPLAIN analysis — can be executed on its own. You don't have to run the full suite every time. This matters in practice: there are moments where you only care about regression tracking after a migration, and running deadlock simulation alongside it is just noise.

explain_analyzer.py captures and diffs EXPLAIN plans. This is what powers the query regression scenario — it saves plan snapshots before and after a schema change and computes the delta. The results are stored under reports/plans/ as timestamped JSON files.

Grafana is provisioned automatically. The benchmark_results table gets populated by export_metrics.py after each run, and the dashboard in grafana/dashboards/benchmark.json is auto-loaded. No manual setup needed.


What the data actually showed

The most interesting results came from the query regression scenario. After adding a composite index (idx_orders_user_created) via migration 002_add_indexes.sql, the EXPLAIN plans for order_history changed dramatically:

Before the index — Sequential Scan:

  • Execution time: 0.532ms
  • The planner scanned the entire orders table (touching 39 blocks) and filtered out 2,497 rows to find 3 matching records
  • Total cost estimate: 101.27

After the index — Bitmap Index Scan:

  • Execution time: 0.090ms
  • The planner used idx_orders_user_created directly, reading only 5 blocks
  • Total cost estimate: 10.92

That's an 83% reduction in execution time and a ~10x drop in planner cost — from a full table scan touching 2,500 rows down to a targeted index lookup. Without the regression tracking in place, this kind of change would have been invisible unless someone happened to run EXPLAIN manually.

This is exactly the kind of validation that's worth having in CI: you add an index, run the suite, and get a clear before/after diff that confirms the optimization actually worked.


On building with AI: the honest take

This is where I want to spend some real time, because the AI-assisted workflow was just as interesting to reflect on as the technical output itself.

What I actually did was prompt the AI for a project idea, refine the scope, and then iteratively ask it to generate the solution — structure first, then individual modules, then the CI pipeline. I didn't write much code from scratch. What I did was spend a significant amount of time reading what it produced, questioning whether the test logic made sense, and validating that the scenarios matched the real-world problems they were meant to simulate.

What felt different about this workflow compared to writing everything myself:

  • I spent more time thinking about whether the tests were correct than about how to implement them. That's actually a good shift for a QA-focused project — but it felt unfamiliar.
  • The AI was excellent at boilerplate and structure (docker-compose setup, pytest fixtures, file organization) and decent at logic, but it needed guidance on edge cases and test validity.
  • I felt genuinely uncertain about "ownership" of the result. The code works, I understand it, and I made real decisions throughout — but I also couldn't have shipped it in two days without the AI. That's a strange feeling to sit with.

Where it genuinely helped: spinning up the Docker + Grafana integration was something I would have spent half a day on manually. The AI had a working docker-compose.yml with auto-provisioned dashboards in minutes. That alone was worth it for a POC.

Where it fell short: the AI tends to produce confident-looking code that needs careful review. A few of the generated SQL files had column names that didn't match the schema. The N+1 detection logic needed an extra pass to handle the concurrent case correctly. None of these were blockers, but they reinforced that the human-in-the-loop isn't optional — it's the job.

I'll keep experimenting with AI-assisted workflows, particularly for testing tooling and POCs. If you've built something similar or have thoughts on the approach, I'd love to hear it in the comments. You can see the full solution here.


Key takeaways

  • Database performance testing is a QA concern, not just a backend one. Slow queries, N+1 patterns, and regression after migrations are all things a QA engineer should be equipped to catch.
  • EXPLAIN plans are underused. Having automated before/after diffs on schema changes is low-effort and high-value. The index migration result above would have been invisible otherwise.
  • Modular test design matters. Being able to run N+1 detection independently from deadlock simulation means you can integrate specific checks into specific workflows without overhead.
  • AI accelerates, but doesn't replace judgment. The two days I spent on this were productive — but the value came from the decisions I made, not from the code the AI generated.

I'll keep experimenting with AI-assisted workflows, particularly for testing tooling and POCs. If you've built something similar or have thoughts on the approach, I'd love to hear it in the comments.You can check the full solution here.