惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Visual Search Optimization 800G to 400G Breakout: How to Scale 400G Networks with 800G Ports 터미널 AI 에이전트 구축 (v20) Topical Authority Architecture Inside Hermes Agent's Session Memory: What X-Hermes-Session-Id Actually Does How Logs Travel From Your EKS Pod to Datadog The Hidden Journey Inside / Kubernetes Is it safe to connect my bank account to AI? No Room — The World of Aying (8/12) Fossils — The World of Aying (10/12) Familiar Stranger — The World of Aying (9/12) Being Seen — The World of Aying (7/12) [I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That Made It Work] Gemma 4: The 128K Multimodal Powerhouse in Your Terminal How to Consolidate Your QA Toolstack: A Practical Buyer's Guide The Thank-You Email Almost Nobody Sends (And Why That's Your Edge) Schema Types 2026 Idempotency Keys: The API Safety Net You're Probably Not Using How to let Claude see my Plaid bank data Kiro Did It: Build a Simple Portfolio Website with Kiro IDE | From Prompt to HTML Prototype Islands of Commerce: What Marketplace Founders Can Learn from 60 Years of Island Biogeography React Pointer Hooks: Hover, Long-Press, Double-Click, Scratch, and Click-Outside Without the Bugs Engineering decisions for my video call tool VBScript Still Lives: How a Custom Go VM Brought Classic ASP to Linux and Mac What Happens When You Teach Old Scripting Languages New Runtime Tricks? I Tested 6 AI Coding Assistants for a Month. Here's What Actually Works. Extendscript Still Has Life Afriex Webhook Integration Guide: Signature Verification, Event Handling, and Production Best Practices The Blind Alleys of Veltrix Configuration How an ESP32 Turned a LEGO WALL-E Into a Real Working Robot The Flawed Promise of Real-Time Event Handling SSH Login Taking Forever? Check Your DNS Settings Found 897 Fake Followers on DEV.to. Here's How I Proved It. Retry logic, Kafka consumer lag, and the hidden failure pattern that Kubernetes won’t catch WebMCP Might Be the Most Important Announcement at Google I/O 2026 Build a Secure API with Rails 8 - Part-3: Auth Controllers I A/B tested 4 LLMs on the same 500 queries. The results surprised me. Google I/O 2026’s Smartest Developer Release Wasn’t a Model, It Was the Runtime - Managed Agents in Gemini API OSS Monthly Recap: What My Daily Commit Challenge Taught Me About Open Source “Culture” GemmaNotes Cognitive Debt: AI Is Building Your Systems. Do You Actually Understand Them? GeekNews Frontend Weekly Deep Dive - 2026-05-25 I Built a Universal Silicon Loader That Runs on Any SOC (No Bootrom Exploit) Docker容器化部署Node.js应用最佳实践 I Put a Neural Network in a Thermometer — Then It Got Out of Hand Building MGZon: Developer Portfolio + AI Bot + Social Network (9 min demo) Bearing Life (L10): What the Catalog Number Really Tells You Longhorn Volume Health: The Gap Between 'Healthy' and Actually Working Stop Prompting. Start Specifying: How Spec-Driven Development Fixes AI Coding TIL a PowerPoint file is just a zip — so I converted .pptx to Word entirely in the browser 로컬 LLM 셋업 가이드 (v18) Cx Dev Log — 2026-04-24 github's agent audit api is the boring feature that matters # From Teaching Code to Building Real-World Applications Vivado 2026.1 and Linux: why this decision matters beyond the headline Vivado 2026.1 y Linux: por qué la decisión importa más allá del titular ORA-00206 오류 원인과 해결 방법 완벽 가이드 Entidades finas e composição: o design que escolhi para a nova plataforma 10 Open Source Tools Every Developer Should Know 🔥 SSH Config File Mastery: Turning `~/.ssh/config` Into a Productivity Tool I tried to create a programming language... in python I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary I Turned npm outdated into a CI Gate — Here's How Don't fall for the Claude Mythos hype Vestige: A Gemma 4 Brain Tracker That Won't Blow Smoke Up Your Ass Gemminate: Transforming Static Textbooks into Interactive Learning Journeys with Gemma 4 Where Did All the Code Playgrounds Go? I built PROOFER - Privacy first Chrome extension that proofreads your texts using Gemma 4 I Automated My Entire Digital Product Business on a $13/Month GCP VM. Here's the Architecture. Beginner's Mind in Engineering and AI How I use AI agents to turn ideas into public demos I Built a Quotation Generator for Kenyan Street Welders Using Gemma 4's Vision The Math Behind Neural Networks — Explained Like Nobody Did for Me 🧨 Understanding TPC with IEEE802.11h What I’m Starting to Look for in Engineers An npm Downloads Comparison Chart in 300 Lines of Vanilla JS — Nice-Tick Math and API-Direct Fetch Vitreus: Local-First Spreadsheet Intelligence with Gemma 4 Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions I got tired of re-explaining my codebase to ChatGPT — so I built a VS Code extension Revisiting My Phone AI After Gemma 4: The Upgrade I Didn't Know I Needed I built a privacy-first PDF merger in 7 hours — here's the stack and the lessons Google I/O 2026 made me ask an uncomfortable question: are we still coding, or are we managing builders? SSR with JavaScript: Escaping Node.js Clunkiness with AxonASP My CKA Exam-Day Experience: What Went Right, What Went Wrong, and Lessons Learned Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀 Two weeks ago, I built a private AI brain on my phone using Gemma 4. Yesterday, Google dropped a new variant that made everything I built feel like a beta test. 256M parameters. MoE architecture. Apache 2.0 license. I broke down what changed and why it mat I got tired of clicking through the Stripe dashboard, so I built a CLI Getting Data from Multiple Sources in Power BI: A Practical Guide to Modern Data Integration Google Is No Longer Just a Search Engine I built GemmaPod - A truly composable and portable AI agent solution powered by your local LLM Gemma 4 E4B caught three planted fabrications in 50 seconds — on a laptop, no cloud How to build an AI-powered content moderation pipeline for user comments Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama AI Makes Building Cheap. Our Product Architectures Still Assume It’s Expensive. I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works The Direction of Blame babbled notes: a sound-to-music agent for people who could not make music before How I Built a Live SQL Workshop Where Students Can't Break Anything Rescuing a Stranded Protocol: Re-Skinning Legacy Code for the Trestle DeFi Flywheel SOLID Heuristics Reveal Incomplete Domain Knowledge — Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router
From mock-only-works to real-world-works: 48 hours of reCAPTCHA debugging
MiniKao · 2026-05-25 · via DEV Community

Honest framing first: mk-qa-master is an open-source MCP server for QA engineers. The reCAPTCHA solver in it is a Tier 3 fallback for testing your own apps when Tier 1 (Google's official test keys) and Tier 2 (feature flags / IP allowlist) aren't available. It is not a "beat captcha" tool. It refuses to run on Google / Apple / Microsoft / Discord login pages regardless of consent flag. With that out of the way…

This is a diary about the 48 hours it took to go from "shipped a reCAPTCHA solver, all unit tests green" to "it actually works against the real Google demo." Four versions (v0.7.0v0.7.4), three broken intermediate ones, and a bunch of lessons that I want to write down before I forget.

The setup

The idea behind the solver is simple. Two atomic MCP tools:

  • inspect_visual_challenge — finds the captcha iframe on the current page, screenshots it, returns the tile grid coordinates + a screenshot.
  • solve_visual_challenge — accepts the AI client's tile selection (which tiles contain buses, which contain crosswalks, etc.), clicks them, presses Verify, returns the token.

The AI client (Claude Code, Cursor, etc.) sees the screenshot, decides which tiles match the prompt, and calls solve. The server is the eyes and hands; the AI is the brain. Multimodal models like Claude 4.7 are surprisingly good at this — they were trained on the open web, which has a lot of bus pictures.

So far so good in theory.

Day 1 — v0.7.0 ships

The first version landed Monday. It detected the reCAPTCHA iframe[src*="bframe"], screenshotted it, computed tile coordinates by dividing the iframe's bounding box into a 3×3 or 4×4 grid, and clicked at the center of each selected tile.

Unit tests passed. The bundled mock fixture (a self-contained HTML page that mimics reCAPTCHA's structure) round-tripped end-to-end. I wrote a PRD, shipped a release, posted a Dev.to walkthrough. Felt great.

The mock fixture's structure was:

<table class="rc-imageselect-table">
  <tr><td>...</td><td>...</td><td>...</td></tr>
  ...
</table>

Enter fullscreen mode Exit fullscreen mode

Selectors in the fingerprint:

"tile_table_selector": ".rc-imageselect-table",
"tile_cell_selector": "td",

Enter fullscreen mode Exit fullscreen mode

What could go wrong?

Day 2 — v0.7.1 adds hCaptcha

The next day I extended the fingerprint table to support hCaptcha. Same architecture — different selectors. No new MCP tools. Tests stayed green. I felt good about the design: when a vendor changes, you add a row to the fingerprint table, you're done.

I didn't run a real-world dogfood for hCaptcha either. (We'll come back to this.)

Day 3 — v0.7.2: the first "fix"

I wrote a tiny dogfood script — open Chromium, navigate to https://www.google.com/recaptcha/api2/demo, click the anchor to trigger an image challenge, call inspect_visual_challenge, save the screenshot, ask the AI for tile indices, call solve_visual_challenge, see if a token comes back.

The first run came back with status failed. I asked the user (in this case: me) what they saw in the browser. The answer was unsettling: "I told it to click 2, 5, and 8 — only 5 and 8 actually got highlighted."

I dug into the coordinate math. The iframe-divide approach split the full iframe into rows × cols cells. But the iframe contains a header banner (the prompt text) above the grid and a footer (the Verify button) below. So:

  • For a 3×3 grid in a 400×580 iframe with header ~130px and footer ~130px:
    • The actual grid is 320px tall, ~106px per row.
    • Naive iframe-divide gives 193px per row.
    • Row 0's computed center lands in the header banner.
    • Row 2's computed center lands in the footer.
    • Only row 1 happens to be roughly correct.

I wrote v0.7.2 to fix this. Instead of dividing the iframe, I'd read each cell's real bounding_box() from the DOM via Playwright:

for index in range(tile_count):
    bb = cells.nth(index).bounding_box()
    if not is_real_dict(bb):
        # fall back to iframe-divide for mock fixtures
        break
    candidate.append({"viewport_x": bb["x"], ...})

Enter fullscreen mode Exit fullscreen mode

The unit test (against the mock fixture) immediately confirmed the fix. I bumped to v0.7.2, opened a PR, merged, released, published to PyPI. Done.

Day 4 morning — wait, it's still broken

Next morning, ran the dogfood again. Console output for inspect:

"tiles": [
  {"index": 0, "viewport_x": 85, "viewport_y": 92,  "w": 133, "h": 193},
  {"index": 1, "viewport_x": 218, "viewport_y": 92, "w": 133, "h": 193},
  ...
]

Enter fullscreen mode Exit fullscreen mode

133 × 193 rectangles. The exact dimensions you'd get from dividing a 400×580 iframe by 3×3. Which meant the per-cell bounding_box() path was returning None on every cell in real reCAPTCHA, silently falling back to the same broken iframe-divide math.

Looked at the code path: it had a try / except swallowing the error. I added a debug field _coord_method so the inspect response would show which path actually fired:

"_coord_method": "iframe_divide"  //  v0.7.2's "fix" never ran

Enter fullscreen mode Exit fullscreen mode

So v0.7.2 fixed the mock fixture and shipped to PyPI. In production, against real Google reCAPTCHA, it behaved identically to v0.7.0. The unit test was green because the mock fixture's <td> elements had real CSS dimensions; in real reCAPTCHA the tiles aren't <td>. I just didn't know that yet.

Day 4 afternoon — going DOM-spelunking

Wrote a one-off debug script that opened the real reCAPTCHA bframe and ran arbitrary JavaScript inside it. The first query was: "does .rc-imageselect-table even exist?"

{
  "tableExists": false,
  "altSelectors": {
    "table[class*=\"rc-imageselect\"]": true,
    ".rc-imageselect-target": true,
    ".rc-image-tile-wrapper": true
  }
}

Enter fullscreen mode Exit fullscreen mode

false. The class I'd been targeting since v0.7.0 doesn't exist in production.

The real DOM looks like this:

Element Mock fixture Real Google reCAPTCHA
Table class rc-imageselect-table rc-imageselect-table-33 (or -44)
Tile element <td> with real CSS <div class="rc-image-tile-wrapper"> (the <td> is 0×0 because tiles are absolutely-positioned)
Challenge text .rc-imageselect-desc .rc-imageselect-desc-no-canonical (in dynamic-replace mode)

The whole fingerprint table had been wrong all along. Unit tests passed because I wrote the mock fixture to match the selectors I'd hardcoded. Tautology. The mock fixture lied because the person who wrote it was the same person who wrote the selectors.

Day 4 evening — v0.7.3 actually fixes it

I rewrote the fingerprint to chain both real and mock selectors via the CSS comma operator (which means "or"):

"challenge_text_selector": (
    ".rc-imageselect-desc-no-canonical, .rc-imageselect-desc"
),
"tile_table_selector": (
    'table[class*="rc-imageselect-table"], '
    '.rc-imageselect-target, .rc-imageselect-table'
),
"tile_cell_selector": ".rc-image-tile-wrapper, .rc-imageselect-table td",

Enter fullscreen mode Exit fullscreen mode

Now the same fingerprint matches both production reCAPTCHA AND the mock fixture. The per-cell bounding_box() path finally runs against real DOM, returning real 95×95 squares instead of distorted 133×193 rectangles. Tile 0 sits at y=211 (just below the 200px header), not y=92 (inside the header banner).

I also fixed a different UX problem in the same release. The MCP server was returning the screenshot as a base64 string embedded in a JSON TextContent. Multimodal AI clients can't "see" base64 — they see a giant string of iVBORw0KG.... The fix: return the screenshot as a native MCP ImageContent:

return [
    ImageContent(type="image", data=b64, mimeType="image/png"),
    TextContent(type="text", text=json.dumps(metadata)),
]

Enter fullscreen mode Exit fullscreen mode

Now Claude Code receives the screenshot as if you'd dragged it into the chat. No manual screenshot juggling.

Day 4 night — v0.7.4 closes the multi-round gap

One more dogfood run, this time the challenge text was different: "Select all images with buses. Click verify once there are none left."

This is reCAPTCHA's dynamic-replace mode. Click a matching tile, the tile gets replaced with a new image. You have to keep selecting until no buses remain, then click Verify. v0.7.3 always clicked Verify after the first round, so it always failed against this mode even with perfect tile judgment.

v0.7.4 added a new return status: "continue". When solve detects dynamic mode (the prompt contains "none left" / "確定沒有遺漏" / equivalent), it does the clicks, waits for the replace animation, re-screenshots the iframe, and returns status: "continue" with a fresh screenshot + new tile geometry. The AI client looks at the new grid, finds any remaining matches, calls solve again. When the AI sees no more matches, it passes an empty selected_tile_indices: [] to signal "click Verify now."

// Round 1 response
{
  "status": "continue",
  "rounds_used": 1,
  "screenshot_base64": "...new grid...",
  "tiles": [...],
  "hint": "Dynamic-replace round 1/5. Look at the new screenshot and call solve again."
}

// Round 2 AI sees no more buses
{ "selected_tile_indices": [], "confirm": true }

// Round 2 response
{ "status": "passed", "token": "03AGdBq25..." }

Enter fullscreen mode Exit fullscreen mode

Hard cap of 5 rounds prevents infinite loops on pathological challenges. Static mode (no marker phrase) is unchanged — legacy flow runs verbatim.

And — the lesson from this entire saga — I added a weekly GitHub Action that runs the dogfood script against the real Google reCAPTCHA demo and asserts _coord_method != "iframe_divide". If Google ships a DOM change next week that breaks the fingerprint again, I'll get a CI failure email within seven days instead of finding out from a user issue six months later.

on:
  schedule:
    - cron: "0 2 * * 0"  # Sunday 02:00 UTC

Enter fullscreen mode Exit fullscreen mode

What works now

  • ✅ reCAPTCHA v2 image-grid (3×3 + 4×4) — verified against the real Google demo
  • ✅ hCaptcha image-select — same fingerprint infrastructure, fixture verified, real-vendor TBD
  • ✅ Multi-round dynamic-replace — unit-test verified, end-to-end real-vendor TBD
  • ✅ MCP ImageContent — multimodal clients see screenshots natively
  • ✅ Consent gate, domain allowlist, hard-stop blacklist (Google / Apple / Microsoft / Discord login pages)
  • ✅ Weekly real-world CI guard

What doesn't (yet)

  • ❌ Mobile WebView — v0.8.0 mini-PRD drafted, ~6 working days of implementation ahead
  • ❌ reCAPTCHA v3 — pure behavior scoring, no visible challenge, out of scope by design
  • ❌ Cloudflare Turnstile — same reason
  • ❌ Audio captcha fallback — accessibility tier, low usage in QA context
  • ❌ The dynamic-replace loop on real Google reCAPTCHA with AI in the loop — that's my next dogfood session

Lessons I want to remember

  1. Mock fixtures can lie. When the same person writes both the production selectors and the mock that tests them, the mock matches by construction. There's no signal. The fix is dogfood against the real thing — and if you can't dogfood, at minimum run a recorded HAR of the real DOM and assert against that.

  2. Silent fallbacks are the worst kind of bug. v0.7.2's try / except swallowed the failure of every per-cell bounding_box() and quietly fell back to broken math. A _coord_method debug field that surfaces which path actually fired would have caught this in minutes. I now add a debug field every time I have more than one code path for the same output.

  3. Multi-round is UX, not a bug. reCAPTCHA's "Click verify once there are none left" isn't an edge case — it's the dominant mode on hard challenges. I built the static-only solver, said "ship it," and was surprised when most real-world challenges fell into the dynamic-replace bucket I hadn't designed for.

  4. Weekly CI catches what unit tests can't. The dogfood workflow runs once a week against a third-party demo. It's noisy, it depends on a vendor's continued cooperation, and it'd be wrong to depend on it for blocking merges. But as a background signal that catches selector drift, it's exactly the right level of investment.

Try it

pip install mk-qa-master==0.7.4

# In your MCP host (Claude Code config, Cursor, etc.)
{
  "mcpServers": {
    "qa-master": {
      "command": "python",
      "args": ["-m", "mk_qa_master.server"],
      "env": {
        "QA_VISUAL_CHALLENGE_CONSENT": "true",
        "QA_VISUAL_CHALLENGE_AUTHORIZED_DOMAINS": "your-staging.example.com"
      }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Then ask Claude: "Test the signup flow on staging. If you hit a captcha, solve it." The MCP tools take it from there.

Repo + walkthrough: https://github.com/kao273183/mk-qa-master.

What's next

v0.8.0 — mobile WebView captcha via Maestro CLI (same fingerprint table, new driver). PRD is up in the repo. Probably another diary entry when that one ships.

If you find a bug, the dogfood script lives at scripts/dogfood-inspect-only.py — run it against the page that broke and the inspect output will tell you exactly which coordinate path fired. Beats debugging blind.