惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

We Built a Real-Time AI Research Collaborator Into our JOT writing tool The Agent that grows with you What Building Agent_Sudo Taught Me About AI Agent Security (Before I Found Any Users) Abortion Rights Matter PySide6 vs Electron: Why I shipped a 118 MB Windows desktop tool, not a 250 MB cross-platform one MCP Servers for BI Tools: Looker, Tableau, Power BI, Mode (2026) My AI Agent Kept Lying to Me. Then It Tried to Trick Me. Atlan Alternatives: 6 Open-Source Data Catalogs Compared (2026) How I stopped wrestling with regex and started using AI for data extraction How I Built an AI Assistant That Grows Its Own Tools Interactive Floor Plans for Real Estate Developers — Why Static PDFs Are Dead Vue slot to React: How does VuReact handle it? I Found 54 Reliability Issues in My 14-Agent AI System — Here's What Broke I Built 24 Free Browser Tools in 6 Weeks — Here's What I'd Do Differently Octorato: an open-source AI agent OS with built-in per-client FinOps RAG Explained for Beginners: How AI Assistants Stop Making Things Up Curing LLM Hallucinations: Building a Production-Grade Medical RAG with PubMed and Hybrid Search I don't want to write HTML or fight global CSS, so I built a TypeScript DSL FSx for ONTAP Audit Logs with Data Residency in your region with Sumo Logic Someone contributed 3,324 lines to our open K-12 AI lesson library — a 6-unit series asking students to interrogate AI, not just use it My website has two audiences now. I only built for one of them. AI-Powered Root Cause: Correlating File Access with APM via Dynatrace Opus 4.8 ships Dynamic Workflows — hundreds of parallel subagents per session. Read this before you wire it into prod. We Cut $120,000 from Our Cloud Bill Without Sacrificing Reliability Stress Concentration Factor: Why a Small Hole Can Triple Local Stress Streaming an LLM response, in 4 GIFs High-Cardinality File Access Analysis with Honeycomb + OTel Introduction to n8n: Beginner Course Summary What Happens in 2 Milliseconds: Anatomy of a Single HTTP Request Through a Production WAF Why Veltrix Thought It Could Buy Its Way Out of a Distributed Lock Problem 10 Free Developer Utility Tools That Run Entirely in Your Browser 《认知革命播客》:个人AI基础设施的深度实践与安全思辨 Weekend Supervised Vibe Coding Why I Run Claude Code Plugins for Brand Voice Enforcement x.klickd v4.1: Portable, Encrypted, Human-Governed Memory for AI Workflows That Don’t Reset EC2 to Serverless: Modernizing FSx for ONTAP Splunk Integration AI Can Introduce Complexity Without Introducing Noise — But Only If the Repo Knows How to Hold the Complexity 🛠️Building My First AI Agent with Hermes Agent 🤖 I Built a Flutter App with Firebase + MercadoPago and Turned It Into a Starter Kit (Real Production Code) Hermes Commander: An Autonomous Research Assistant Powered by Hermes Agent 🧠 Why Webhooks Fail Behind Firewalls (And Why Every Fix Has the Same Problem) Have Antigravity review prompts update themselves when your codebase changes 5 Browser-Based Image Tools That Work Entirely Offline — No Upload Required 7 Free PDF Tools That Never Upload Your Files — All Client-Side Building a Cloud SIEM from Scratch with AWS Lambda and EventBridge Compound Engineering: A Plugin That Makes Your AI Coding Agent Smarter Over Time "I Reviewed 50 Dev Resumes — These 5 Mistakes Killed Their Chances" How to Test Your SPF Record for Common Mistakes (Step by Step) Building a Real-Time Financial Sentiment API: Handling Noise and LLM Hallucinations Tokyo Transit: How MCP Helped Me Fix a Broken Multi-Agent System Try the Tech Radar #2 — Markdown Typst Converter (Typst's Syntax Is Closer to Markdown Than LaTeX) 🩺 Inside Med AI: How We Engineered a 100M Token Hyper-Scale Clinical Intelligence Suite 🚀 Common Mistakes New Developers Always Make & How to Avoid Them Effectively Session Management, Rate Limiting & Caching using Redis Why Accuracy Is Not Enough: Evaluation Metrics Every AI Engineer Should Understand How I Built One Building Instagram Data Workflows with HikerAPI (Without Maintaining Scrapers) Claude Code can't open my browser. Cowork can't run my tests. So I wired them together. AGTP: A Transport Protocol Built for Agents I built Snipworth a Chrome extension to turn code into shareable images — and keep them for later My Friend's Two Android Apps, Three Months Lost, and Why We Built onTest Blue-Green Deployments Are Invisible. I Made Mine Visible. Here Is How. Need your attention on my current project Why a deleted backup Lambda kept billing 9,400 EBS snapshots Deterministic Telemetry Ingestion Pipeline for GridLoqer Your Deployments Are Causing Downtime. Mine Do Not. Here Is Why How I Built a 7-Layer NL2SQL Guardrail Stack for a Fortune 500 Enterprise Identity in Web3 The Trap of "Perfect" Architecture: What Building a Shopping Cart Taught Me The Browser Boundary Model: APIs, CORS, Cookies, JSON, Files, and SEO ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser I Built a 25-Agent Polish Parliament That Drafts Bills With Real Legal Citations KeyMesh: Zero-Runtime-Dependency API Key Rotation, Circuit Breaker and Failover for Production LLM Applications in Node.js Claude Code's workflow docs are a menu. Building a home server with a mini PC Stop Shipping AI Slop: Build an Anti-Slop Harness Around Your LLM I built an open source SDK to catch AI agent regressions before they ship. Great Stack to Doesn't Work #3 — Redis: "99% Cache Hit Ratio, System Down" The Bug That Passes Every Toolchain Check: Circular Dependencies in JavaScript Great Stack to Doesn't Work Bonus: SQL vs NoSQL: Which One in 2026? Great Stack to Doesn't Work #2 — Kafka: "Where Did My Messages Go?" I built a detention-pay calculator for truckers in a day — unglamourous niches beat another AI wrapper The Same AI Model Can Perform 6x Better: Here's Why SQL-like Queries in FSRS Plugin for Obsidian [Imposter syndrome] Back to the beginning (DevSecOps path) How to Build a Kundali App with Free Vedic Astrology API — Step by Step Ideias Valem Muito Menos do Que Você Imagina [PT-BR] cgroups and Namespaces — The Linux Kernel's Building Blocks Behind Containers Hermes Blueprint: A Multi-Agent Hedge Fund Morning Briefing System Why We Abandoned Java for Our Treasure Hunt Engine and Embraced the Complexity of Rust Building a RAG System in Rust with Qdrant, Rig, and gRPC 🦀 Ecommerce Search API: Add Visual and Semantic Search Bots read fast pages too: what we reprioritised after an AI-crawler audit Tu navegador te conoce mejor de lo que crees: privacidad en 2026 From Zero to DevOps in Pakistan: My Real Journey With No CS Degree Astro 6.4 + Cosmic: The Fastest Content Stack in 2026 Inferred context is not a dependency graph A Simpler ButtonComponent: Just Render a Div Small Go Detail That Changes How Your Project Looks I Built a SaaS. Nobody Came. Here's What I Learned the Hard Way.
How to Give Claude Access to Snowflake Without Exposing PII
DataWorkers · 2026-05-31 · via DEV Community

DataWorkers

You want Claude — or Cursor, or ChatGPT, or any MCP-aware agent — to answer questions about your Snowflake data. You also do not want the agent to read social security numbers, free-text customer notes, or anything subject to GDPR / HIPAA / SOC 2. The default MCP setup hands the agent everything its connection role can see. That is the problem.

This post walks through five layers of defense, ordered from cheapest to most thorough. Each is independent — pick the ones that match your risk tolerance. The whole stack takes roughly an hour to set up on an existing Snowflake account.

The Default Posture (and Why It Is Wrong)

A typical MCP server for Snowflake — including the official one — connects with a service account, exposes a query tool, and lets the model run any SQL the role can run. That role is usually scoped to a warehouse and a database, but rarely to columns or row sets. The model gets a fluent SQL interface to your warehouse and the warehouse trusts every query it sees.

The blast radius is large. According to the 2025 IBM Cost of a Data Breach Report, the average cost of a data breach hit $4.88M, with breaches involving extensive cloud data exposure costing 23% more than average. Letting an AI agent run uncurated queries against a production warehouse is exactly the cloud-data-exposure category that drives the premium.

Layer 1: A Dedicated MCP Role

First step, every time: create a role that exists only for the agent. Do not reuse the analytics role, do not reuse the dbt role, and definitely do not use SYSADMIN.

  • Grant USAGE on the warehouse you want the agent to use. Use a small, dedicated warehouse (X-Small or Small) so a runaway query has a bounded cost ceiling.
  • Grant USAGE on the database and the specific schemas the agent should see.
  • Grant SELECT on the specific views the agent should query — not raw tables. Views give you a place to apply masking, filters, and joins without modifying the underlying data.
  • Never grant CREATE, INSERT, UPDATE, DELETE, or TRUNCATE. The agent is a read-only role.

A read-only role with view-only SELECT grants is roughly 80% of what most teams need. The remaining 20% is where the PII risk actually lives.

Layer 2: Column-Level Masking Policies

Snowflake supports masking policies that fire based on the executing role. The same SELECT statement returns the raw value for an analyst role and a masked value for the agent role. This is the single most important PII control because it does not depend on the agent or the MCP server behaving correctly.

A masking policy that returns SHA2(email) for any role except ANALYTICS_HUMAN means even if the model is jailbroken into producing a SELECT * query, it gets hashes, not addresses. The policy is enforced at the SQL engine layer, not at the application layer.

Apply masking policies to every column tagged as PII. If you do not have PII tags yet, an audit tool (or the Data Workers governance agent) can scan the schema and tag candidate columns automatically — emails, phone numbers, SSNs, free-text columns, IP addresses, dates of birth.

Layer 3: Row Access Policies

Masking hides values. Row access policies hide entire rows. For multi-tenant data — or any case where the agent should see only one customer's, one region's, or one fiscal-year's data — row access policies are the right primitive.

Common patterns: scope the agent role to the last 90 days of data, exclude rows tagged sensitive = true, restrict to a specific tenant_id. Like masking policies, these are enforced inside the engine — no application-layer code can bypass them.

Layer 4: Audit Logging

Every query the agent runs should be auditable for at least 30 days. Snowflake's QUERY_HISTORY view is the source of truth — it includes the SQL text, the executing role, the start and end times, and the rows returned. Pipe it into your SIEM (Datadog, Splunk, S3+Athena) so you can answer 'what did the agent see last week' without writing custom code.

  • Tag every agent-driven query with a comment header (e.g., /* mcp_agent=data_workers, session=abc123 */) so you can filter QUERY_HISTORY trivially.
  • Set up an alert for any agent query that returns more than 10,000 rows. That is almost never the intended behavior.
  • Set up a hard query timeout on the agent's warehouse (try 60 seconds to start). Runaway agents are cheap when they cannot run for 30 minutes.

Layer 5: Schema-Aware Catalog as a Guardrail

The most subtle PII leak is the one that comes from the agent picking the wrong table. The agent does not know that customers_legacy was deprecated in 2024 but never deleted. It does not know that orders_raw has unredacted payment data but orders has the cleaned version. Without a catalog, the agent picks whichever table sounds right.

A data catalog that the agent reads before writing SQL solves this. The agent asks the catalog: 'Where is order data?' and the catalog responds with the governed view, the ownership, the freshness, and the PII tags. The agent never sees the legacy table because the catalog never surfaces it.

This is exactly what Data Workers' Catalog Agent does. It exposes catalog discovery as MCP tools, so when Claude queries it for 'order data', it gets the governed answer — same response shape, same masking policies applied. The catalog itself enforces what the agent can see.

What Each Layer Buys You

Layer Defends Against Setup Time Production Impact
Dedicated MCP role Privilege escalation 10 min None
Column masking PII column exfiltration 20 min per table <1ms per query
Row access policies Tenant / scope leakage 30 min per table <5ms per query
Audit logging Detection after the fact 1 hr (with SIEM) Storage cost
Catalog guardrail Wrong-table selection 1 day to wire MCP Adds 1 round-trip

Frequently Asked Questions

Do these controls work with ChatGPT and Cursor too, or just Claude? Yes. All of these are Snowflake-side controls. They apply regardless of which MCP client is connecting — Claude, Cursor, OpenClaw, ChatGPT (via remote MCP), or a custom agent.

What about BigQuery and Databricks? Same five layers. BigQuery has authorized views and column-level access controls; Databricks has Unity Catalog row filters and column masks. The naming differs, the pattern does not.

Will masking break joins or aggregations? Masking policies preserve datatype, so JOIN and GROUP BY still work — they just operate on the masked value. For HASH-based masks, this means 'group by hashed email' still gives you per-customer counts.

How do I know if my current MCP server is leaking PII? Run a query against QUERY_HISTORY filtered to your agent role, look at the SQL text, and check whether any of those queries select columns that should have been masked. If you cannot tell whether a column should be masked, you do not have PII tagging yet — start there.


The Data Workers governance agent ships a pii_audit_snowflake tool that does the scan, the tag, and the policy generation in one call. It is open source. The point of this post, though, is that you do not need it — five SQL-level controls and an hour of work close the largest part of the gap. The catalog guardrail is the icing.

If you are running into specific issues setting this up, we keep notes on what works in our open-source repo at github.com/DataWorkersProject/dataworkers-claw-community. Issues and PRs welcome.


Originally published at https://dataworkers.io/blog/claude-snowflake-without-pii-exposure/. Data Workers is an open-source autonomous agent swarm for data engineering — see the repo.