惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Product Experimentation for Collaborative AI Features: Cluster Randomization for LLM-Based Tools in Python How to Use Dart Cloud Functions and the Firebase Admin SDK: A Handbook for Developers How to Build an AI-Powered Medical Image De-Identification Pipeline for Clinical Research How to Build a Software Factory with Claude Code: From Vibe Coding to Agentic Development How to Avoid Rebuilding Infrastructure for Every New Project How to Use GitHub Search Like a Pro Learn LaTeX in 41 Hours Think Like the JavaScript Engine How to Encrypt Kubernetes Traffic with cert-manager, Let's Encrypt, and Internal TLS How to Migrate from ASP.NET Framework to ASP.NET Core Learn to Build Automated Workflows with Manis AI Learn to Build Automated Workflows with Manus AI How to Protect Your Privacy Online in 2026 How to Build a Browser-Based PDF Watermark Tool Using JavaScript AI Paper Review: Language Models are Few-Shot Learners (GPT-3) How to Clean Time Series Data in Python 7 Tools Digital Nomads Need in 2026 How to Build a Calculator with Tkinter in Python How to Build an Autonomous OSINT Agent in Python Using Claude's Tool Use API Common DevOps Mistakes and How to Avoid Them — Tips for Startups Claude Code for Beginners AWS Certified Cloud Practitioner Study Course – Pass the Exam With This Free 14-Hour Course Product Experimentation with Synthetic Control: Causal Inference for Global LLM Rollouts in Python How to Build Production-Ready AI Features with Flutter [Full Handbook for Devs] How to Build a Browser-Based PDF to Image Converter Using JavaScript How to Build Optimal AI Agents That Actually Work – A Handbook for Devs How to Develop Chrome Extensions using Plasmo [Full Handbook] Why Your “Simple Deploy” Turned Into a Week of Infrastructure Work AI Paper Review: Language Models are Unsupervised Multitask Learners (GPT-2) How to Build a Self-Hosted WhatsApp Bot with n8n and WAHA The Codex Handbook: A Practical Guide to OpenAI's Coding Platform Learn Command Line Interface (CLI) Development with Dart: From Zero to a Fully Published Developer Tool How to Bypass Cloud SMTP Restrictions Using Brevo and HTTP APIs How to Apply Academic Theories to Human-Centered Web Design [Full Handbook How to Convert Images to PDF in the Browser Using JavaScript – A Step-by-Step Guide The Rise of AI Agents: How Software Is Learning to Act How to Build a Complete SaaS Payment Flow with Stripe, Webhooks, and Email Notifications Product Experimentation with Regression Discontinuity: How an LLM Confidence Threshold Creates a Natural Experiment in Python How to Build a Live Options Database in Python – A Complete Guide How to Migrate to S3 Native State Locking in Terraform How to Use SCons to Build Software Projects [Full Handbook] How to Run Open Source LLMs Locally and in the Cloud QuRT: The Real-Time OS Inside Your Phone's Processor [Full Handbook] The Real Infrastructure Behind Remote Work (It’s Not Just Wi-Fi) The Lithography Handbook: Machines, Markets, and the Next Wave of Semiconductor Startups ITCM vs DTCM vs DDR: Embedded Memory Types Explained [Full Handbook] AI Paper Review: Improving Language Understanding by Generative Pre-Training (GPT-1) How to Build a Market Research Copilot with MCP and Python [Full Handbook] How to Build a Scoped Note-Taking API with Django Rest Framework and SimpleJWT The Complete SOC 2 Type II Implementation Handbook for Engineers: A Month-by-Month Roadmap with Real Commands Mastering the JavaScript Event Loop Data Science Insights: Why the Mean Lies When Handling Messy Retail Data How to Build High-Ranking SEO Landing Page How to Query Data in DynamoDB Using .Net How to Unblock Your AI PR Review Bottleneck: A Tech Lead’s Guide to Building a Codebase-Aware Reviewer How to Navigate Microservices as a Frontend Engineer How to Compress PDF Files in the Browser Using JavaScript (Step-by-Step) Stanford's youngest instructor talks InfoSec, AI, and catching cheaters - Rachel Fernandez interview [Podcast #217] Product Experimentation with Propensity Scores: Causal Inference for LLM-Based Features in Python How to Build a Multi-Agent AI System with LangGraph, MCP, and A2A [Full Book] How to Land Your First Cloud or DevOps Role: What Hiring Managers Actually Look For How to Deploy a Serverless Spam Classifier Using Scikit-Learn, AWS Lambda, & API Gateway How to Dockerize a Go Application – Full Step-by-Step Walkthrough Learn Hardware, Cloud, DevOps, Networking, Security, Databases, DNS, Git, and Linux Inside TreeHacks 2026, Stanford’s Elite Student Hakc Inside Stanford’s Elite Student Hackathon [Full Documentary] How to Measure Your AI Citation Rate Across ChatGPT, Perplexity, and Claude How to Deploy a Full-Stack Next.js App on Cloudflare Workers with GitHub Actions CI/CD How to Build a Multi-Tenant SaaS Platform with Next.js, Express, and Prisma How I Completed 15 freeCodeCamp Certifications in 4 Months: A Structured Learning Journey How to Build an Agentic Terminal Workflow with GitHub Copilot CLI and MCP Servers How AI Changed the Economics of Writing Clean Code How to Apply STRIDE Threat Modeling and SonarQube Analysis for Secure Software Development How to Set Up OpenID Connect (OIDC) in GitHub Actions for AWS How to Split PDF Files in the Browser Using JavaScript (Step-by-Step) How to Build Your Own Language-Specific LLM [Full Handbook] How to Build a Self-Learning RAG System with Knowledge Reflection How to Trace Multi-Agent AI Swarms with Jaeger v2 How I Tested Malaysia's Open Data Portals with Plain English How I Built a Production-Ready CI/CD Pipeline for a Monorepo-Based Microservices System with Jenkins, Docker Compose, and Traefik The Hidden Tax of Infrastructure: Why Your Team Shouldn’t Be Running It Anymore From Metrics to Meaning: How PaaS Helps Developers Understand Production From Symptoms to Root Cause: How to Use the 5 Whys Technique Product Experimentation for AI Rollouts: Why A/B Testing Breaks and How Difference-in-Differences in Python Fixes It How to Create a GPU-Optimized Machine Image with HashiCorp Packer on GCP 3D Web Development with Blender and Three.js How to Fix a Failing GitHub PR: Debugging CI, Lint Errors, and Build Errors Step by Step How to Merge PDF Files in the Browser Using JavaScript (Step-by-Step) How to Handle Stripe Webhooks Reliably with Background Jobs How to Build an Automatic Knowledge Graph for Your Blog with PHP and JSON-LD Understanding Proxies and Reverse Proxies: Your Gateway to Secure Networking The Evolution of Nvidia Blackwell GPU Memory Architecture How to Use PostgreSQL as a Cache, Queue, and Search Engine The New Definition of Software Engineering in the Age of AI Reclaim Your Time – Master Automation with Zapier How to Create Dynamic Emails in Go with React Email Why Many Beginner Self-Taught Developers Struggle (And What to Do About It) How to Build a Headless WordPress Frontend with Astro SSR on Cloudflare Pages How to Make Your GitHub Profile Stand Out How to Use Context Hub (chub) to Build a Companion Relevance Engine
How to Connect Your AI Coding Agent to a Browser on macOS
אחיה כהן · 2026-05-26 · via freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
How to Connect Your AI Coding Agent to a Browser on macOS

AI coding agents like Claude Code, Cursor, and the rest have gotten remarkably good at reading and writing code. But the moment they need to look at something on the web, they hit a wall. They can't see your staging site. They can't read the error in your analytics dashboard. They can't check whether the form they just built actually submits.

The usual fix is to hand the agent a headless browser — Puppeteer or Playwright driving a fresh Chromium instance. That works, sort of. But a headless Chromium starts every session as a stranger: no logins, no cookies, no sessions. It spins up a second browser engine that pushes your CPU and spins up your fan. And a growing number of sites simply block it on sight.

There's another option, and on a Mac it's a good one: let the agent drive the Safari you already use — the one that's already logged into GitHub, your analytics, your staging environment. That's what Safari MCP does. It's an open-source MCP server that exposes Safari to any MCP-capable agent through around 80 tools, with no Chromium, no WebDriver, and no separate browser to babysit.

In this tutorial you'll connect Safari MCP to an AI agent, run your first automation, and then build something a headless browser fundamentally cannot do: an automation that works inside a page you're logged into. By the end you'll understand not just how to wire this up, but when native browser automation is the right call — and when it isn't.

Here's what you'll need:

  • A Mac (Safari MCP is macOS-only — more on that trade-off later)

  • Node.js 18 or newer

  • An MCP-capable AI agent — this tutorial uses Claude Code and Cursor, but any MCP client works

Table of Contents

What is MCP, and Why Does Browser Automation Need It?

Before wiring anything up, it helps to know what the "MCP" in Safari MCP stands for.

MCP is the Model Context Protocol — an open standard for connecting AI agents to external tools and data. Think of it the way you'd think of a USB port. Before USB, every device needed its own connector. MCP is the equivalent of agreeing on one connector: an agent that speaks MCP can use any tool that speaks MCP, with no custom integration code on either side.

An MCP server exposes a set of tools. An MCP client — your AI agent — discovers those tools and calls them. The server describes each tool (its name, what it does, what arguments it takes) and the agent decides when to call it. When Claude Code decides it needs to read a web page, it doesn't run browser code itself. It calls a tool that some MCP server provides.

Browser automation is a natural fit for this model. The agent's job is reasoning — "I need to see what's on the staging site, then check the console for errors." The actual mechanics — open a tab, wait for load, read the DOM, capture console output — are well-defined operations that belong behind a stable interface. That interface is exactly what an MCP server provides.

Safari MCP is one such server. It runs as a local process, exposes around 80 browser tools (navigate, click, fill, read, screenshot, extract, and more), and any MCP client can drive it. The agent never touches AppleScript or WebKit internals. It just calls safari_navigate and gets a result.

The "USB port" framing matters for a practical reason: nothing in this tutorial is Claude-specific. Wire Safari MCP into Cursor, Cline, Windsurf, or your own MCP client and the tools are identical.

Why Safari Instead of Chrome or Playwright?

If you've automated a browser before, you've almost certainly used Chrome through Puppeteer, Playwright, or Selenium. So why reach for Safari?

It comes down to three differences that matter once an AI agent, not a test script, is the thing driving the browser.

1. It's your real browser, with your real sessions. A headless Chromium launched by Playwright is a clean room. It has never logged into anything. If you want your agent to read your analytics dashboard, you first have to solve authentication — store credentials somewhere, script the login, handle two-factor prompts, refresh tokens. Safari MCP skips all of that. It drives the Safari instance you use every day, which is already logged into your dashboards, your GitHub, your email. The agent inherits those sessions for free.

2. It doesn't melt your laptop. A headless Chromium is a second, full browser engine running alongside the browser you already have open. On a laptop that's real CPU, real memory, and a fan you can hear. Safari MCP uses the WebKit engine that's already running on every Mac — there's no second engine to start. The project measures this at roughly 60% less CPU for the browsing work, and the automation runs with Safari in the background, so it doesn't steal your screen.

3. Sites don't treat it as a bot. Headless browsers leak. They expose navigator.webdriver, they ship with telltale automation fingerprints, and bot-detection services — Cloudflare's challenge pages, reCAPTCHA, the WAFs in front of a lot of B2B sites — have gotten very good at spotting them. Your real Safari, driven through the operating system, looks like exactly what it is: a person's browser. (To be clear: this is for automating your own accounts and sites — not for evading access controls you don't own.)

The cost of all this is the obvious one: Safari MCP is macOS-only. It's built on WebKit and AppleScript, so there's no Windows or Linux story. If your agent runs on a Linux CI box, this isn't your tool. If it runs on your Mac — which, for a coding agent, it very often does — the trade is a good one. We'll come back to limitations honestly at the end.

Installing Safari MCP

Installation is genuinely one command, but there are two Safari settings to flip first. Let's do it in order.

Step 1 — Enable Safari's developer features

Safari MCP reads and controls pages by running JavaScript inside Safari. Two settings have to be on:

  1. Open Safari → Settings → Advanced and check "Show features for web developers." This reveals the Develop menu.

  2. Open the new Develop menu and check "Allow JavaScript from Apple Events."

That second one is the important one. It's what lets an outside process — the MCP server — ask Safari to run JavaScript on a page. Without it, every tool call fails.

Step 2 — Run the server

npx safari-mcp

That's the whole install. npx fetches the package and runs it; there's nothing to build. The first time an agent calls a tool, macOS will pop up a permission prompt — something like "Terminal wants to control Safari." Click OK. That's the standard Automation permission, and you can review it later under System Settings → Privacy & Security → Automation.

If you'd rather have it installed permanently:

npm install -g safari-mcp

Step 3 — Tell your agent about it

Your AI agent needs to know the server exists. For Claude Code, one command does it:

claude mcp add safari -- npx safari-mcp

For Cursor, create .cursor/mcp.json in your project:

{
  "mcpServers": {
    "safari": {
      "command": "npx",
      "args": ["safari-mcp"]
    }
  }
}

The process is the same for every client — Claude Desktop, Cline, Windsurf, Continue, VS Code. You're telling the agent: "there's an MCP server named safari; start it by running npx safari-mcp."

Restart your agent (or reload its MCP servers) and it will connect. In Claude Code you can confirm with the /mcp command, which lists connected servers and their tools. You should see safari with around 80 tools available.

That's it. Your agent now has a browser.

Your First Automation: Reading a Page

Let's prove the wiring works with the simplest possible task: have the agent read a web page.

In your agent, just ask in plain language:

"Use the safari tools to open example.com and tell me what the page says."

Behind that request, the agent makes two tool calls. First it navigates:

{ "tool": "safari_navigate", "arguments": { "url": "https://example.com" } }

Then it reads the content:

{ "tool": "safari_read_page", "arguments": {} }

safari_read_page returns the page's title, URL, and text content with the HTML stripped out — exactly the form an LLM wants. The agent gets back something like this:

Example Domain
https://example.com/
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.

And it relays that to you. You just watched your agent browse.

A quick note on how the agent should look at a page, because it changes everything downstream. safari_read_page is great for "what does this say." But when the agent needs to act — click a button, fill a field — text isn't enough. It needs to know what's actually there and how to target it. For that, the better first move is safari_snapshot:

{ "tool": "safari_snapshot", "arguments": {} }

This returns an accessibility-tree view of the page, where every interactive element has a stable ref ID:

[textbox ref=0_8] "Full Name" value=""
[combobox ref=0_10] "Subject"
[button ref=0_15] "Submit"

Those ref IDs are the agent's reliable handles. CSS selectors break when a page re-renders. A snapshot ref stays valid for the life of the page. Keep that in mind — it's the difference between an automation that works once and one that works every time.

The Payoff: Automating a Logged-in Workflow

Reading example.com is a wiring test. Here's the thing a headless browser genuinely cannot do.

Pick a site you're logged into in Safari right now — your analytics, your project board, your CI dashboard. We'll use GitHub, because every developer has an account and the notifications page is a real, mildly annoying chore. The task: have the agent open your GitHub notifications and summarize what actually needs your attention.

Ask the agent:

"Open my GitHub notifications, read them, and group them into 'needs a reply' versus 'just FYI'."

The agent navigates:

{ "tool": "safari_navigate", "arguments": { "url": "https://github.com/notifications" } }

Stop and notice what didn't happen. No login screen. No OAuth dance. No personal access token in an environment variable. Safari is already authenticated as you, so the agent lands directly on your real notifications. A headless Chromium would have hit a login wall here and stopped.

Notification lists load incrementally, so the agent should wait for content before reading. safari_wait_for polls the page until a selector or piece of text appears, or a timeout elapses:

{ "tool": "safari_wait_for", "arguments": { "text": "Inbox", "timeout": 10000 } }

Then it reads. safari_read_page scoped to the notifications region returns the list as clean text:

{ "tool": "safari_read_page", "arguments": { "selector": "main" } }

The agent reasons over that text and hands you the grouped summary. The whole loop — navigate, wait, read, summarize — is a handful of tool calls.

When you need data in a precise shape rather than prose — to feed another step, or to write to a file — the agent can reach for safari_evaluate, which runs custom JavaScript on the page and returns whatever you build:

{
  "tool": "safari_evaluate",
  "arguments": {
    "expression": "JSON.stringify([...document.querySelectorAll('li')].map(li => li.innerText.trim()))"
  }
}

The agent writes that expression itself, against the structure it just saw in the snapshot — you don't hand-author selectors.

You might be thinking: GitHub has an API, why scrape the page? Fair. For GitHub specifically, the API is excellent. But the point generalizes. Most of the dashboards you stare at every day — your billing portal, your error tracker's specific filtered view, a client's analytics, the admin panel of some tool your company pays for — either have no usable API or would cost you an afternoon of OAuth setup to reach. With Safari MCP, "the page I'm already looking at" is the API. The agent reads what you can see, because it's using the browser you're seeing it in.

That's the capability headless automation can't match. Not speed, not features — access.

Handling the Tricky Parts

A first automation always looks easy. Three things tend to bite on the second one.

Tab Safety — The Agent Must not Hijack Your Tabs

This is the scariest failure mode: you're typing in a tab, the agent navigates that tab, and your work is gone. Safari MCP guards against it by stamping each automation tab with an identity marker — it uses window.name, which survives page navigations — and resolving "the agent's tab" through that marker on every call. If it can't positively identify its own tab, it refuses to act and raises a re-anchor error rather than guessing.

The practical rule for you: let the agent open its own tab with safari_new_tab, and it will stay in its lane. Don't point it at "the current tab" and assume.

Waiting for Dynamic Content

Modern pages render after load. If the agent reads too early, it reads an empty shell. Don't have it guess with fixed sleeps — use safari_wait_for, which polls for a selector or text until it appears or the timeout elapses:

{ "tool": "safari_wait_for", "arguments": { "selector": ".results-list", "timeout": 8000 } }

This is the single most common fix for "the automation works when I step through it slowly but fails when it runs."

Framework Forms

Set a React or Vue input's .value directly and the framework never notices — its internal state stays empty, and your "filled" form submits blank. Safari MCP's safari_fill and safari_fill_form use the native value setters and dispatch the input and change events the framework listens for, so React, Vue, Angular, and Svelte state all stay in sync:

{
  "tool": "safari_fill_form",
  "arguments": {
    "fields": [
      { "selector": "#email", "value": "jane@example.com" },
      { "selector": "#message", "value": "Looks great." }
    ]
  }
}

For framework-heavy pages where CSS selectors are fragile, go back to the snapshot refs from the previous section — pass { "ref": "0_9" } instead of { "selector": "#email" }. Refs survive re-renders; selectors don't.

None of these are exotic. They're just the difference between a demo and an automation you'd actually leave running.

Limitations: When Not to Use This

A tool tutorial that only lists strengths isn't worth much. Here's where Safari MCP is the wrong choice.

It's macOS-only, and that's structural. Safari MCP is built on WebKit and AppleScript. There's no Windows or Linux port coming, because the foundation doesn't exist on those platforms. If your agent runs in Linux CI, use Playwright.

It drives one Safari, on one Mac. This is browser automation for your machine — a coding agent working alongside you. It is not a fleet. If you need 50 parallel browsers scraping in a data center, that's a headless-Chromium-in-containers job, and Safari MCP is the wrong shape for it.

Cross-browser test suites should stay on Playwright. If you're writing end-to-end tests that must pass on Chrome, Firefox, and Safari, use the tool built for that. Safari MCP drives exactly one engine: WebKit.

It shares a browser with you. Because it uses your real Safari, the agent and you are in the same browser. That's the entire point — but it means you should let the agent work in its own tabs and not fight it for the same window.

The honest summary: Safari MCP is built for one specific situation — an AI agent doing real browser work on the Mac you're sitting at, against sites you're already logged into. In that situation it's hard to beat. Outside it, reach for the headless tools. Knowing which situation you're in is the actual skill.

Wrapping Up

You've gone from an AI agent that could only see code to one that can see the web — the real web, behind your real logins.

To recap what you did: you learned what MCP is and why browser automation belongs behind that interface. You saw why a native Safari engine beats a headless Chromium for an agent working on your Mac and you installed Safari MCP with one command and two settings. You ran a first read, and then you did the thing that actually matters — an automation inside a logged-in page, with no auth code at all. Finally, you saw the edges: tab safety, waiting for dynamic content, framework forms, and the cases where you should pick a different tool.

The bigger idea is worth holding onto. An AI agent is only as capable as the tools you connect to it. Giving it a browser — a real one — turns "write me code" into "go look at the staging site, find the bug, and tell me what's wrong." That's a different kind of collaborator.

Safari MCP is open source under the MIT license, and it exposes around 80 tools beyond the handful you used here — screenshots, network inspection, storage, accessibility audits, multi-tab workflows. The repository and full tool reference are at github.com/achiya-automation/safari-mcp. Point your agent at it and see what it does when it can finally look around.



Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started