慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
勿争DOM之争。首择器之思,可救汝之爬虫。
SIÁN Agency · 2026-05-24 · via DEV Community

所见破败刮取器,形制皆同:必有人撰其刮取之理也。且选者其次选器乃临时之想——唯DevTools午夜二时之效可用耳。

此乃倒置。选择器乃代码与页面之约。误之,则余者皆无谓矣。

心态之变

选用先之思者,谓于书一行提取之码未成,先决 数据之识如何 。非问“如何得价?”乃问“页所示,以程序之理,此物为价者何?”

三答,序先而择:

  1. 义理 getByRolegetByLabelgetByText,此二者映现无障碍树所显。虽经设计之变,犹存焉。
  2. 数据属性data-testiddata-product-iditemprop。开发者常为自测而增之;尔得乘便而行。
  3. 结构化数据 — JSON-LD、微数据、OpenGraph。此页已向Google说明何为价格;亦使其告知尔。

CSS类乃万不得已之策。类名非身份之所在,随设计而变。犹求“自上而下第三按纽”——可行,直至菜单更易。

三项清单

撰择器之前:

  1. 于开发工具(Chrome:元素→可访问性标签页)中开启可访问性树。若数据具角色且可访问名,则用getByRole
  2. 于页面源码中搜application/ld+json若其存且含尔之字段,则直解之,无需遍历DOM。
  3. 于数据近处寻data-*属性。 开发者遍置测试钩子。用其法。

若此皆无效, 退而求诸 CSS 或 XPath。且行此,必锚于稳固之物——父标识、aria-label、data- 属性——非徒依类链。

十行代换

新角色之优先序,吾所行者如是:

async function extractPrice(page) {
  // 1. Structured data first.
  const ld = await page.locator('script[type="application/ld+json"]')
                       .first().textContent();
  const data = JSON.parse(ld ?? '{}');
  if (data?.offers?.price) return data.offers.price;

  // 2. Semantic selectors.
  const priceByLabel = page.getByLabel(/^price$/i);
  if (await priceByLabel.count()) return priceByLabel.textContent();

  // 3. Data attributes.
  const priceByData = page.locator('[data-testid="price"]');
  if (await priceByData.count()) return priceByData.textContent();

  // 4. Last resort: CSS class. Logged loudly so we know we're in fallback.
  console.warn('Falling back to CSS selector — selector audit needed.');
  return page.locator('.price-tag').textContent();
}

入全景模式 出全景模式

察告警于备选之途。当此警始现于尔之日志,乃知此域更其高要之讯,尔距败坏仅一设计更新之遥。当其未败而治之,勿待其败而后谋。

Fig. 1 — Selector-priority ladder. Top is most stable. Bottom is most fragile.

简例

于吾之理想家扮演者,上序之优先次第,将"每六周修正选择器"之常例,易为"每年修正选择器两次"之常例。JSON-LD路径不触DOM,即可捕获95%之房源。可访问性角色之备选方案,又捕获4%。CSS备选方案触发于边缘案例之属性类型,并示吾新布局已发布——通常在我等他种监测察觉前一周。

未请之CTA

此选择器之梯,乃吾辈所送每物之次第,继上周文中所阻之请 — 观其行于 Idealista之俦。其致如此,遂为吾辈制一利器.

故曰:

速启汝之刮取器之选择器之码。 计其类名链之数,较之语义/结构数据之检索。注其比率于文末。最长CSS链者得赏——吾断其必有人有 .product-grid > .item:nth-child(3) > .price > span > strong.

然否?或有机遇需CSS链者?请复。


乃**Nova Chen所撰,SIÁN Agency自动化开发之倡者。欲知Nova更多,访 dev.to。者,若欲定制爬取或自动化之事,宜聘SIÁN Agency