惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

DEV Community

Why Your API Gateway Might Be Your Biggest Compliance Liability Liquidity Pool Analyzer — Zero-Dep Python CLI for Solana DEX Data What AI Leaders Are Really Worried About in 2026 5 ways AI agents quietly die inside n8n production LLM-as-judge variance broke our DPO training signal for 3 weeks I Tracked Revenue Per User for 6 Months — Here's Why ARPU Beats ARPPU for Channel Decisions 2026 I stopped trying to build a “productivity app.” How to Build a HIPAA-Compliant Healthcare App in React Native (2026) Veltrix Was Losing Events in Plain Sight—Heres the Flame Graph That Proved It Understanding Closures in JavaScript: A Complete Beginner Guide Most expense trackers expect perfect English. But real users type in Hindi, Hinglish, mixed language, and natural conversation. So I built https://vitmora.com to understand the way people actually type. I Got Tired of Messy Bookmark Managers, So I Built My Own HackTheBox: DarkZero Writeup The seam I Built an AI Expense Tracker That Understands the Way People Actually Type I built a Chrome extension after my kid turned my YouTube feed into Roblox Building a Production MCP Server in Laravel How Our Event-Driven Pipeline Blew Up Because We Trusted the Default Config Looping in Python I Built a Retro Gaming Console Using ESP32 and OLED Display 🎮 ORA-00255 오류 원인과 해결 방법 완벽 가이드 Why Hytale Treasure Hunt Servers Throttle at 100 Players (And How We Fixed It) Product Update: Post-Quantum Cryptography meets <1s Kubernetes Syncs ECS vs EKS vs Lambda: How to Pick the Right AWS Compute Service (2026) Shopify fired the webhook. My server never processed it. Here's how I catch that now. Understanding React: Components, JSX, Virtual DOM, and More Stage 0.2 — Operating System Fundamentals I Didn’t Need Another Markdown App. So I Built This Instead. ClickUp Alternatives for Solo Freelancers Who Want Less Complexity The Gods That Ate the Engineers "My AI Agent Kept Missing Buttons, So I Used Windows UI Automation" Manejo de errores en Go - Primeros pasos The Treasure Hunt Engine Blew Up My Inbox at 3 AM Curing Telegram Information Overload: How I Automate Deal Hunting with AI and MTProto Read-Modify-Write isolation in NoSQL, part 2: When the invariant spans multiple aggregates. The Code Runs. The System Runs Too. How I secured my FastAPI app - 6 vulnerabilities fixed in one session with gstack /cso The Day the Treasure Hunt Engine Stopped Beeping The bf16 grad accumulator that killed our SDXL LoRA training I Still Have Nightmares About the Time Our Hytale Server Crashed Under Load Stop Using Global State: Master Localized React Context ⚡ Build a Private AI Search on Your Device: Local RAG in the Browser Stop Freezing Your API: Async Email Delivery in Laravel An AI Agent Wrote and Sold Her Own Prompt Collection Solana Validator Stake Checker CLI — Track Decentralization from Your Terminal Mouse Unlock!—no password, just a secret click pattern Reloading Textures in Blender Is a Pain — I Made a Free Add-on for That AI Agents Don't Log In. That's Why Your Entire Security Stack Is Flying Blind Claude Cowork has changed managing a Figma design system library forever Bayesian Knowledge Tracing in 37 lines of Python — how NumPath models what a student knows Two Cross-Platform Bugs in Our Go CLI (And How We Fixed Them) Two Knowledge Hierarchies: Structuring Context for AI Agents and LLMs The Day Treasure Hunt Broke My Caches—And How We Fixed It From Figma to production React, with AI in the loop Built a Sentiment Analysis Web App – My First Full-Stack ML Project I built a zsh cleanup script for macOS dev machines — and learned more than I expected AI 3D tools need product evals, not benchmark faith AI Prompt Injection Defense: Building Effective Strategies in 5 Steps Treasure Hunt Engine Blew Up When We Asked It To Grow I Tried Self-Hosting Open Source AI Models. Here's Why I Went Back to APIs. Enterprise vs Startup AI APIs — The Architectural Decision Nobody Talks About I Cut My AI API Bill from $420 to $28/Month — Here's Exactly How ENS Resolver CLI — Look Up Any ENS Name from Your Terminal 🚀 My Journey Begins on DEV Community — Building Startups, Communities & AI-Powered Solutions Using AI Chat Is Not the Same as Using an AI Agent The Cache That Bled — How We Turned Veltrix Event Config From Silent Killer to Silent Savior Designing a Modular Wiring Harness for Multi-Function Vehicle Trackers Reviving a 12K+ Star Abandoned Library: toastr-next v3 🍞 The Day the Language Became the Bottleneck winston vs pino in 2026: A Production-Tested Comparison HTB: MonitorsFour - Full Walkthrough Fixing your writing tone with a Chrome extension Experimented to fork AWS infra graph and simulate what breaks before you deploy Industrial SEO at 100 Pages/Week: My n8n + Claude Code + RAG Stack I Built a Kubernetes Alternative. It Changed My Perspective on Complexity. Chronos vs Toto: Zero-Shot Forecasting Benchmark Results Edge-Cached Localhost Tunnels: How to Give Stakeholders a Production-Fast Preview Directly from Your IDE Radiation-Proof Flash Storage Could Be the Missing Layer for AI Data Centers in Space AI Learning Roadmap: Where to Start if You're a Complete Beginner I built 6 free dev tools to skip the signup walls — here's what I learned How to Set Realistic Goals for an Open Source Project? How I Built an Indonesian NLP Parser That Understands Warung Owners, Then Abandoned It Keyboard shortcuts that fixed my editing flow I Built an AI-Native Productivity System Instead of Another AI Wrapper LogicNodes MCP bridge: Connecting Claude to real-world utility I Built a Stateful Research Agent Inside a Sandbox. Here's What the Numbers Actually Looked Like. From Credentials to Domain Admin: Support Machine Writeup logfx v1.0.0: One Logger for Development and Production The Day the Garbage Collector Slowed Down a Real-Time Treasure Hunt ARTIST: RL-Powered Tool Use for LLM Agents Explained Breaking the RL Flywheel: From Manual Grind to Instant Debugging When Your Treasure Hunt Engine Becomes a Scavenger Hunt for DevOps Nightmares BoxAgnts Introduction (3) — WebAssembly Sandbox Engineering a 100% Client-Side, $0 Server-Cost Document AI SRE and AI DevOps: different problems, one reliability stack When Server Growth Hits a Wall the Treasure Hunt Engine Documentation Fails You Considering RAG for your Agent? Build this instead. I Built a Self-Healing Extension Stabilizer for Ungoogled Chromium (and You Can Use It Too) I scanned Dub's codebase. It's not a link shortener. AI Coding Subscriptions: Where to Go After GitHub Copilot Changes
Anthropic Self-Hosted Sandboxes + MCP Tunnels: Enterprise AI Agents That Keep Your Data Behind Your Walls
Ramsis Hamma · 2026-05-27 · via DEV Community

Anthropic Self-Hosted Sandboxes + MCP Tunnels: Enterprise AI Agents That Keep Your Data Behind Your Walls

TL;DR Summary

  • Anthropic now supports self-hosted sandboxes — agent orchestration stays on Anthropic's side, but code execution runs on your own servers (Cloudflare, Vercel, Modal, or on-prem)
  • MCP tunnels provide encrypted access to private databases and internal APIs through a single outbound connection — no inbound firewall holes, no public endpoints
  • Mid-session tool swapping lets you change tools and MCP servers without restarting the agent session
  • 100K+ token MCP outputs auto-offload to sandbox files instead of bloating the agent's context window
  • Powered by OS-level sandboxing (Seatbelt on macOS, bubblewrap on Linux) with layered filesystem and network isolation

Direct Answer Block

Anthropic's enterprise infrastructure upgrade separates agent reasoning (which stays on Anthropic's cloud) from code execution (which moves to your infrastructure). Self-hosted sandboxes keep sensitive files behind your firewall. MCP tunnels connect Claude to private databases and APIs through one encrypted outbound connection with zero inbound firewall rules. Mid-session tool swapping eliminates restarts, and large output offloading prevents context bloat.

Introduction

The enterprise AI adoption conversation has shifted from "can it do the work?" to "where does the work happen?" For regulated industries — finance, healthcare, defense — the answer can't be "on a vendor's cloud." Anthropic's latest infrastructure moves address this directly: self-hosted sandboxes that execute code on your servers, MCP tunnels that reach private services without exposing them, and quality-of-life improvements like mid-session tool swapping. The age of "just trust our cloud" is yielding to "keep everything behind your own walls."

How do self-hosted sandboxes split agent orchestration from code execution — and why does this matter for enterprise data residency?

Diagram showing the architectural split: Claude's thinking happens on Anthropic's side, but code execution (files, shell, packages) happens on your own servers via Cloudflare, Vercel, or Modal

The architectural split is the core innovation. According to the AlphaSignal newsletter: "Agent orchestration stays on Anthropic's side, but tool execution moves to your infrastructure. Files never leave your perimeter."

This means Claude's reasoning — the model thinking, the decision-making, the prompt processing — happens on Anthropic's infrastructure. But when the agent needs to execute code (read a file, run a shell command, install a package, generate output), that execution happens inside a sandbox running on your servers.

The sandbox can run on managed providers (Cloudflare, Vercel, Daytona, Modal) or on your own on-prem infrastructure. The key property: your files never leave your network. Source code, proprietary data, environment variables, API keys — everything the agent touches during execution stays behind your firewall.

Anthropic's existing OS-level sandboxing architecture (Seatbelt on macOS, bubblewrap on Linux) provides the enforcement layer. According to Anthropic's sandboxing documentation: "The sandboxed bash tool uses OS-level primitives to enforce both filesystem and network isolation." The self-hosted sandbox extends this architecture — instead of the sandbox running on Anthropic's machines, it runs on yours, with the same OS-level enforcement guarantees.

For enterprises with data residency requirements (GDPR, HIPAA, SOC 2, FedRAMP), this architectural split means the agent can process sensitive data without that data ever touching third-party infrastructure during code execution. The model's thinking is still on Anthropic's cloud, but the thinking doesn't contain the raw data — it contains prompts and tool call instructions.

How do MCP tunnels let Claude access private databases and internal APIs through a single outbound connection?

MCP (Model Context Protocol) tunnels solve the enterprise network access problem. The traditional approach to letting an external service access your internal APIs involves: opening firewall ports, configuring VPNs, setting up public endpoints, managing certificates. Each step is a security review. Each endpoint is an attack surface.

MCP tunnels reverse the connection: the tunnel is initiated from inside your network, as a single outbound connection to Claude Code. No inbound firewall rules. No public endpoints. No exposed services.

The AlphaSignal newsletter describes the mechanism: "MCP tunnels let agents talk to internal databases and APIs through a single outbound encrypted connection — no inbound firewall rules, no public endpoints."

Traffic is encrypted end-to-end. The tunnel carries MCP tool calls — Claude accessing your private Postgres database, your internal ticketing system, your proprietary API — as if the agent were running inside your network. But the only network change is one outbound connection.

This pattern is similar to how Cloudflare Tunnels and ngrok work: the client inside the network establishes an outbound connection to the service, and traffic flows through that tunnel. No ports are opened. No DNS records are changed. The connection is initiated from the trusted side.

The newsletter notes that the tunnel configuration can be changed mid-session — you don't need to restart the agent to connect to a different database or API. This is part of the broader "mid-session tool swapping" capability.

How does mid-session tool and MCP server swapping eliminate restarts in long-running agent sessions?

One of the frustrations of long-running agent sessions: you start a session, realize you need a tool that wasn't configured, and have to restart. Every restart loses context. Every restart costs time.

Anthropic's update allows mid-session tool and MCP server changes. According to the newsletter: "Swap tools and MCP servers mid-session without restarting." This means:

  1. Add tools during an active session: if the agent discovers it needs a database connector halfway through a task, you can add it without stopping
  2. Switch MCP server configurations: change which backend the agent connects to (e.g., switch from staging to production database)
  3. Remove unused tools: reduce context bloat by dropping tools the agent no longer needs
  4. Update tool configurations: change API endpoints, authentication tokens, or tool parameters mid-task

This is particularly valuable for complex multi-step tasks where the agent's tool requirements evolve. A security audit might start with code analysis tools, then need database access when it finds a potential SQL injection, then need Slack access to notify the team — all in the same session.

How does offloading 100K+ token MCP outputs to sandbox files prevent context bloat and improve session length?

Large MCP tool outputs are a context problem. When an agent queries a database and gets back 100,000 tokens of results, those tokens consume the context window — the agent has less room for reasoning, instruction following, and conversation history. Long sessions degrade as context fills with tool output rather than productive content.

The solution: auto-offload large outputs to sandbox files. According to the newsletter: "Large MCP outputs (>100K tokens) auto-offload to sandbox files instead of bloating context."

The mechanism:

  1. Agent makes an MCP tool call (e.g., "query all customer records from last quarter")
  2. The tool returns a large result set
  3. Instead of inserting the raw output into the agent's context, the system writes it to a file in the sandbox
  4. The agent reads from the file when it needs specific data (using file search, grep, or chunked reads)
  5. The context stays lean — the agent has a reference to the data without the data consuming its working memory

This is similar to how human engineers work: you don't load an entire database dump into your brain. You query it, get a reference to the results, and inspect subsets as needed. The sandbox file acts as the agent's external memory for large data.

For long-running enterprise sessions that might process multiple large data sources, this feature extends the effective session length significantly. A session that would hit context limits after 20 minutes might run for hours, referencing large data files as needed without context exhaustion.

How does the OS-level sandbox (Seatbelt/bubblewrap) layer with self-hosted execution for defense-in-depth?

Diagram showing the dual filesystem + network isolation: Seatbelt (macOS) or bubblewrap (Linux) provides OS-level enforcement. Network proxy controls domain access.

Anthropic's sandboxing architecture provides defense-in-depth through layered isolation:

  1. Filesystem isolation: The sandbox restricts read and write access to specific directories using OS-level primitives (Seatbelt on macOS, bubblewrap on Linux). According to Anthropic's documentation: "These restrictions are enforced at the OS level, so they apply to all subprocess commands, including tools like kubectl, terraform, and npm."

  2. Network isolation: A proxy server running outside the sandbox controls domain access. Only approved domains are reachable. New domain requests trigger permission prompts.

  3. Self-hosted boundary: With self-hosted sandboxes, an additional boundary — your network perimeter — sits between the agent and sensitive data. Even if the sandbox's OS-level isolation were compromised, the data is still behind your firewall.

Anthropic's sandboxing documentation emphasizes: "Effective sandboxing requires both filesystem and network isolation. Without network isolation, a compromised agent could exfiltrate sensitive files like SSH keys. Without filesystem isolation, a compromised agent could backdoor system resources to gain network access."

The self-hosted sandbox adds a third layer: physical/organizational separation. The sandbox runs on infrastructure you control, under your monitoring, with your access controls. This matters for compliance frameworks that require demonstrated control over data processing locations.

How does Anthropic's enterprise infrastructure compare to OpenAI Codex and Cursor Cloud on data control?

The competitive landscape on enterprise data control:

Feature Anthropic (2026) OpenAI Codex Cursor Cloud
Code execution location Self-hosted (your infra) or Anthropic cloud Codex cloud sandbox or local Cursor cloud or local IDE
Private service access MCP tunnels (outbound only, encrypted) MCP connectors via API IDE-local tools
Mid-session tool changes Yes Limited IDE-native (local)
Context offloading 100K+ token auto-offload Compaction features IDE manages context
OS-level sandbox Seatbelt/bubblewrap Container-based IDE + cloud VM
Data residency Files stay in your perimeter Cloud sandbox (files on OpenAI infra) Cloud or local (user's choice)

The key differentiator for Anthropic is the self-hosted execution model. Both OpenAI and Cursor offer cloud execution (where files are processed on their infrastructure) and local options (where files stay on your machine). Anthropic splits the difference: the model's reasoning runs on Anthropic's cloud (giving access to Claude's capabilities without local GPU requirements), but code execution — where sensitive data is actually touched — runs on your servers.

For enterprises where "data leaves our perimeter" is a hard compliance boundary, Anthropic's model provides a middle ground that neither pure-cloud nor pure-local alternatives match. The model's thinking uses Anthropic's infrastructure (which you're already trusting with your prompts), while your data stays behind your walls.

Frequently Asked Questions

Q: Does self-hosted sandbox execution cost more?

Anthropic hasn't published specific pricing for self-hosted sandboxes. The sandbox compute resources (CPU, memory) are provided by your infrastructure, which you're already paying for. Anthropic charges for the model usage (token-based) regardless of where execution happens. The cost difference is the infrastructure you provide vs. the infrastructure Anthropic would have provided.

Q: What are the minimum requirements for running a self-hosted sandbox?

The sandbox runs as a managed execution environment — you can use Cloudflare Workers, Vercel Functions, Daytona, Modal, or your own container infrastructure. The specific requirements depend on the provider: Cloudflare/Vercel require zero infrastructure management; on-prem requires Docker or similar container runtime with the Anthropic sandbox runtime installed.

Q: Can MCP tunnels work with on-prem databases behind a corporate proxy?

MCP tunnels initiate an outbound encrypted connection from inside your network. If your corporate proxy allows outbound connections (as most do), the tunnel works through it. The key property is that no inbound connections are required — the tunnel client connects out.

Q: How does mid-session tool swapping affect agent context?

The agent's context adjusts dynamically — new tools appear in the tool list, removed tools disappear. The conversation history and task state are preserved. This is handled by the agent runtime, not the model — the model sees an updated tool list in the next turn.

Q: What happens if the self-hosted sandbox crashes mid-task?

The agent's conversation state and task progress are maintained on Anthropic's side (the orchestration layer). If the sandbox crashes, the agent can restart execution in a new sandbox — either resuming from a snapshot or restarting the current step. State loss depends on whether sandbox snapshots were configured.

Q: Is the MCP tunnel approach compatible with zero-trust architecture?

Yes. MCP tunnels follow zero-trust principles: outbound-only connections, encrypted end-to-end, per-session authentication, and no persistent network exposure. Each tunnel is scoped to a specific session and tool, not a persistent network bridge.

Glossary

  • Self-hosted sandbox: A code execution environment running on the customer's infrastructure (or managed provider) rather than Anthropic's cloud — files and data stay behind the customer's firewall
  • MCP tunnel: An encrypted outbound connection from inside a private network to Claude Code, enabling tool access to internal services without inbound firewall rules
  • OS-level sandboxing: Filesystem and network isolation enforced by operating system primitives (Seatbelt on macOS, bubblewrap on Linux) rather than application-level controls
  • Mid-session tool swapping: The ability to add, remove, or modify agent tools and MCP server configurations during an active session without restarting
  • Context offloading: Automatically writing large tool outputs (100K+ tokens) to sandbox files instead of inserting them directly into the agent's context window

Author

Ramsis Hammadi — AI/ML engineer specializing in GenAI, LLM engineering, and automation. Full bio →