慣性聚合 高效追蹤和閱讀你感興趣的部落格、新聞、科技資訊
閱讀原文 在慣性聚合中打開

推薦訂閱源

小众软件
小众软件
博客园 - 叶小钗
有赞技术团队
有赞技术团队
大猫的无限游戏
大猫的无限游戏
博客园_首页
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
L
LangChain Blog
Hugging Face - Blog
Hugging Face - Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
aimingoo的专栏
aimingoo的专栏
Blog — PlanetScale
Blog — PlanetScale
爱范儿
爱范儿
T
Tailwind CSS Blog
Jina AI
Jina AI
量子位
Stack Overflow Blog
Stack Overflow Blog
人人都是产品经理
人人都是产品经理
J
Java Code Geeks
V
Visual Studio Blog
月光博客
月光博客

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
如何使用 AT 協議 API 建立一個 Bluesky 爬蟲 (並在 Apify 上發布)
Daniel Ainsw · 2026-05-28 · via DEV Community

Bluesky 這年早些時候已經達到 4000 萬用戶,與 Twitter 不同的是,它運行在開放協議上 — AT 協議 — 在這個協議中,公開數據本質上是公開的,並且設計上可以被機器讀取。沒有每月 5000 美元的企業 API 層級。沒有需要律師才能理解的速率限制。只是一個任何人都可以查詢的乾淨 REST API.

我想爬取它。這裡是我的建構一個生產級別的 actor 的方法,以及我在過程中學到的事情。

為何 Bluesky 易於爬蟲(合法地)

大多數社交媒體爬蟲都是與 Cloudflare、輪換代理和服務條款灰色地帶的對抗。Bluesky 不同。AT 協議是明確設計用於第三方客戶端和數據訪問的。public.api.bsky.app 的公共 API 嚴格服務未經身份驗證的讀取請求。沒有指紋識別,沒有驗證碼,沒有 DOM 解析。

唯一麻煩是:搜尋終端 (app.bsky.feed.searchPosts) 現在需要透過免費應用程式密碼進行驗證。其他一切 — 作者資料流、主題、個人檔案 — 都不需要憑證即可正常運作.

我開發的三種模式

我希望有一個角色能涵蓋主要的 B2B 使用案例:

搜尋貼文 — 按關鍵字和標籤搜尋,可設定日期範圍、語言篩選和排序順序。使用bsky.social/xrpc/app.bsky.feed.searchPosts 配合持有者令牌.

作者資訊 — 從一個或多個帳號中拉取所有發布內容。無需權限。適用於競爭對手監控或審計創作者內容歷史.

串連 — 從一個發布內容的 URL 中抓取完整的對話樹。API 返回一個嵌套樹;我以深度優先的方式展平它,以便您獲得一個乾淨有序的發布內容列表.

唯一的注意點:API 路由

這燒到我了。我正在向public.api.bsky.app發送經過驗證的請求(使用JWT)。那個端點是由Cloudflare代理的,如果你向它發送認證令牌,它會返回403——它只供未經驗證的流量使用。

解決方法:經過驗證的調用發往bsky.social。未經驗證的讀取發往public.api.bsky.app。你向bsky.social進行驗證,獲取一個JWT,然後只在接下來的請求中使用那個JWT。bsky.social calls.

單一倉庫部署的困擾

我正在TypeScript單一倉庫中使用npm工作區建立Apify演員的專案。共享庫(@apify-actors/shared)包含PPE充電輔助程序和錯誤類別。在本地,工作區解析能夠順利處理。在Apify的建構伺服器上,沒有單一倉庫 — 只是有上傳的演員資料夾。

解決方案:將共享來源複製到src/shared/ 在每個演員內部並使用相對導入。tsup 將其全部捆綁成一個 dist/main.js。共享代碼保留在倉庫中的一個典範位置;每個演員在編譯時期都獲得其自己的副本內嵌進去.

輸出模式

每個帖子都作為一個扁平的 JSON 記錄返回:

{
  "url": "https://bsky.app/profile/user.bsky.social/post/3lhxxxxxxxxx",
  "text": "Post content here",
  "authorHandle": "user.bsky.social",
  "authorDisplayName": "User Name",
  "likeCount": 142,
  "repostCount": 28,
  "replyCount": 19,
  "images": [{ "thumb": "...", "fullsize": "...", "alt": "..." }],
  "externalEmbed": { "uri": "...", "title": "...", "description": "..." },
  "createdAt": "2025-11-15T10:30:00.000Z"
}

進入全屏模式 退出全屏模式

直接從 Apify 匯出為 JSON、CSV 或 Excel。可連接到 Zapier 或 Make 以實現無代碼工作流程.

演員已上線

如果您想在不建立任何東西的情況下使用它:Apify Store 上的 Bluesky Posts Scraper

PPE 定價:每次運行 $0.25 + 每篇帖子 $0.003 ($3/1,000)。無需訂閱。

AT 協議讓 Bluesky 成為你目前可以合作的其中一個最乾淨的數據來源。如果你的應用場景涉及社交監聽、品牌監控,或從快速增長的科技先進受眾那裡獲取潛在客戶信號,那麼值得加入你的工具組合。