惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

W
WeLiveSecurity
D
DataBreaches.Net
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Exploit Database - CXSecurity.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
腾讯CDC
PCI Perspectives
PCI Perspectives
阮一峰的网络日志
阮一峰的网络日志
S
Security Archives - TechRepublic
Hugging Face - Blog
Hugging Face - Blog
U
Unit 42
IT之家
IT之家
T
Troy Hunt's Blog
P
Proofpoint News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
F
Full Disclosure
V
V2EX
Stack Overflow Blog
Stack Overflow Blog
C
Comments on: Blog
V
Vulnerabilities – Threatpost
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
V2EX - 技术
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
N
News | PayPal Newsroom
MyScale Blog
MyScale Blog
Google DeepMind News
Google DeepMind News
Application and Cybersecurity Blog
Application and Cybersecurity Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
李成银的技术随笔
P
Privacy & Cybersecurity Law Blog
大猫的无限游戏
大猫的无限游戏
V
Visual Studio Blog
T
ThreatConnect
WordPress大学
WordPress大学
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA
Recent Announcements
Recent Announcements
Google DeepMind News
Google DeepMind News
SecWiki News
SecWiki News
Recorded Future
Recorded Future
小众软件
小众软件
K
Kaspersky official blog
T
Tor Project blog
Last Week in AI
Last Week in AI
GbyAI
GbyAI
人人都是产品经理
人人都是产品经理
Jina AI
Jina AI
S
SegmentFault 最新的问题
MongoDB | Blog
MongoDB | Blog
Simon Willison's Weblog
Simon Willison's Weblog

DEV Community

I Run MCP Servers. Here's What the Recent Vulnerabilities Actually Mean for Me Phive v1.1.1 — automatic port conflict handling for local VS Code environments Building a SQL-like Relational Database Engine in C++ From Scratch How a Self-Documenting Semantic Layer Reduces Data Team Toil The Adopter: Advocating for OSS You Use (But Don't Own) Optimizing Vite Build Output: A Practical Guide to Tree-Shaking I built a free audit tool that runs 12 checks in parallel against any domain. Here is the architecture. I made a free 7-video series to prep for the new GH-600 (GitHub Agentic AI Developer) cert Choosing the Right Treasure Map to Avoid Data Decay in Veltrix Migrating to Apache Iceberg: Strategies for Every Source System Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter Should you use Gemma 4 for your Development? A Multiversal Analysis to Determine if Gemma 4 is Right for You! The Rising Trend of Creative Interview Questions in Tech I Spent Hours Fighting a Silent Subnet Conflict to Build an Isolated ICS Security Lab (And What It Taught Me About the Linux Kernel) It Worked When I Closed the Laptop. I Swear. We Built an Agent That Flags Fake Internships #kryx Your Personal AI Stack Is the New Dotfiles Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the Fix How We Prevent Attendance Fraud Using GPS Verification AI Code Review in 2026: How the Tools Actually Differ (A Builder's Field Guide) From Problems to Patterns: Generative AI in .Net (C#) GemmaOps Edge: From 373 Alarms to 1 Root Cause Using Local AI (Gemma 4) Building an Amazon EKS Security Baseline Hands-On with Apache Iceberg Using Dremio Cloud 🤫 Firebase Is Quietly Preparing for an Offline-First AI Future Should Angular Apps Still Rely on RxJS in 2025? Gaslighting Gemma 4: Can Open-Weight Reasoning Models Withstand a Confident Liar? AI Workflow Automation Needs More Than Another Script Reviving Cineverse: From Local Storage to Firebase 🚀 Approaches to Streaming Data into Apache Iceberg Tables How to Add Rounded Corners to an Image Online The subtle impact of AI (&amp; IT) on jobs Made a Rust based AI agent Your AI is not bad, your instructions are What Clicked for Me After Building on Solana for a Few Days WhatsApp's Encryption Stack: What It Covers, What It Doesn't, and What a Federal Agent Spent 10 Months Investigating Building CogniPlan: A Local-First Task Planning System Using Apache Iceberg with Python and MPP Query Engines How I Built AegisDesk: A Zero-Token Semantic IT Agent with <5ms Latency I built CodeArchy: an open-source that turns any codebase into a visual, explainable architectural experience, powered by Gemma 4. The Day Our Bot Ran Out of Money How we're using Gemini Embeddings to build a smarter, community-driven feed on DEV The Speculative Decoding Pattern The PKCE "Gotcha" in Expo’s exchangeCodeAsync TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4) n8n for Healthcare: 5 Automations for Clinics, Practices, and Health Tech Teams (Free Workflow JSON) How I Built an OWASP Memory Guard for AI Agents (ASI06) Condition-Based vs Time-Based Maintenance: Making the Switch I Tested Spam Protection on Formspree vs Formgrid. The Results Were Surprising. May 27 - Video Understanding Workshop Beyond Keywords: How Google's 2026 Algorithms are Redefining SEO From Click to Cart: Ensuring an Accessible Customer Journey in WooCommerce Your company won't replace you with good AI. They'll replace you with bad AI. How to Use an SVG Icon Search Engine as a Claude Custom Connector O fim do “modelo que faz tudo”? Conheça o Conductor, a IA que orquestra outras IAs 10 First-Principles Strategies to Learn Any Programming Language Deeply 10 First-Principles Strategies to Learn Any Programming Language Deeply Understanding Embeddings easily. The Hidden Cost of “Move Fast and Break Things” Why Your Logs Are Useless Without Traces DressCode: Your AI Stylist for Tomorrow The Documented Shortcoming of Our Production Treasure Hunt Engine I'm 16, and I Built an AI Tool That Audits Your Technical Debt Without Ever Touching code Building Your Own Crypto Poker Bot: A Developer's Guide to Blockchain Gaming Logic Apache Iceberg Metadata Tables: Querying the Internals Hermes, The Self-Improving Agent You Can Actually Run Yourself Unity vs Unreal: 5 Things I Had to Relearn the Hard Way Building Agentic Commerce Infrastructure: Overcoming SQLite Concurrency for Autonomous Procurement Agents Solana Accounts vs Databases HTML Table Borders I built a skill that makes AI-generated AWS diagrams actually usable My first post! I'm kinda excited The Page Root Was the Wrong Unit How to audit what your IDE extension actually sends to the cloud I Migrated 23 Make.com Scenarios to n8n and Cut My Bill by 60% — Complete Migration Guide (2026) Solving a Logistics Problem Using Genetic Algorithms Claude Code Skills Explained: What They Are & When to Use Them (2026) Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup Zero-Idle Local LLMs: Running Llama 3 in AWS Lambda Containers We scanned 8 B2B SaaS companies across 5 categories. ChatGPT named the same 12 brands in every answer. How To "Market" Yourself As A Tech Pro We scanned 500 MCP servers on Smithery. Here is what we found. HTML Basics for Beginners – Markup Language, Elements and Types of CSS DiffWhisperer: How I Turned Cryptic Git Diffs into Architectural Stories with Gemma 4 I built a version manager for llama.cpp using nothing but vibe coding. Unit Testing vs System Testing: Key Differences, Use Cases, and Best Practices for 2026 A game design textbook explains why products with fewer features win How to Build a Raydium Launchpad Bonding Curve in 5 Minutes with forgekit How to turn an AI prototype into a production system How Data Lake Table Storage Degrades Over Time Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence Auto-Generate Optimized GitHub Actions Workflows For Any Stack With This New CLI Tool Unchaining the African Creator Economy The Treasure Hunt Engine Gotcha - A Lesson in Constrained Performance great_cto v2.17 - no more tambourine dance When Catalogs Are Embedded in Storage SafeMind AI: Instant Health & Safety Intelligence What Is PKCE, How It Works & Flow Examples AI Agent Failure Modes Beyond Hallucination
Semantic Layer Best Practices: 7 Mistakes to Avoid
Alex Merced · 2026-05-23 · via DEV Community

Semantic layer best practices checklist — checks and mistakes

Semantic layers don't fail because the technology is wrong. They fail because of design decisions made in the first two weeks — choices that seem reasonable at the time and create compounding problems for months afterward.

Here are the seven mistakes that kill semantic layer projects, and how to avoid each one.

Mistake 1: Defining Metrics in Multiple Places

What happens: Revenue is defined in a Tableau calculated field, a Power BI DAX measure, a dbt model, and a SQL view. Four sources of truth. None of them agree.

Why it's common: Teams adopt new tools without migrating metric definitions. Each tool gets its own model. Over time, the definitions drift.

The fix: Every metric gets exactly one canonical definition in the semantic layer. All downstream tools query that definition. No exceptions. When someone needs Revenue, they query business.revenue, not their own formula.

This principle extends to AI agents. If your AI generates its own metric formulas instead of referencing the semantic layer, you've just added another source of truth — the least trustworthy one.

Mistake 2: Skipping the Bronze Layer

What happens: A data engineer creates a Silver view that joins raw source tables directly, mixing data cleanup (type casting, column renaming) with business logic (filters, calculations) in a single query. When the source schema changes — a column is renamed, a type is modified — the Silver view breaks.

Why it's common: The Bronze layer feels redundant. It's just a 1:1 mapping of the source. Why add a layer that doesn't change anything?

The fix: The Bronze layer absorbs schema changes. When a source renames col_7 to order_date_utc, you update one Bronze view. The Silver and Gold views above it don't change. This insulation is worth the tiny overhead of maintaining passthrough views.

Bronze views also standardize data formats. Timestamps normalized to UTC. Strings cast to consistent encodings. Column names made human-readable. This cleanup happens once, at the bottom of the stack, and every view above benefits.

Mistake 3: Using SQL Reserved Words as Column Names

Bad vs. good naming conventions — cryptic abbreviations vs. clear business names

What happens: A Bronze view exposes a column called Date. Now every downstream query must reference "Date" with double quotes. Analysts forget. AI agents don't quote it at all. Queries break intermittently. Debugging is frustrating because the error messages are cryptic.

Why it's common: Source systems often use generic names. Date, Timestamp, Order, Group, Role — all are SQL reserved words. Bronze views that don't rename them propagate the problem to every consumer.

The fix: Rename early. In the Bronze layer, map Date to TransactionDate, Timestamp to EventTimestamp, Order to CustomerOrder. Use domain-specific prefixes that are unambiguous and never conflict with SQL keywords.

This small decision saves hundreds of hours of debugging across the life of the semantic layer. It also dramatically improves AI agent accuracy, since language models generating SQL rarely add appropriate quoting for reserved words.

Mistake 4: Building Without Stakeholder Input

What happens: A data engineering team builds 50 Silver views based on the database schema. They expose every table, every column, every possible metric. Business users look at the result, don't recognize any of the terms, and go back to their spreadsheets.

Why it's common: Data engineers understand the schema. They assume the schema structure maps to business needs. It usually doesn't.

The fix: Start with a metric glossary co-created with stakeholders from Sales, Finance, Marketing, and Product. Ask them: What are your top 5 metrics? How do you calculate them? What decisions do they drive? Build the Silver layer around those answers, not around the database schema.

This step feels slow. It's the fastest path to adoption. A semantic layer that uses business language and models business concepts gets adopted. A semantic layer that mirrors the database schema gets ignored.

Mistake 5: Treating Documentation as Optional

What happens: Views are created with no Wikis, no column descriptions, no Labels. The semantic layer works for the person who built it. Everyone else — analysts, AI agents, new team members — can't figure out what the views mean.

Why it's common: Documentation takes time. Deadlines are tight. Teams plan to "add documentation later." Later never comes.

The fix: Make documentation part of the view creation process, not a follow-up task. At minimum, every view gets:

  • A one-sentence description of what it represents
  • Labels for governance (PII, Finance, Certified)
  • Column descriptions for any non-obvious field

Modern platforms reduce this burden with AI-generated documentation. Dremio's generative AI samples table data and auto-generates Wiki descriptions and Label suggestions. The AI provides a 70% first draft. The data team adds domain context for the other 30%.

Undocumented views are invisible to AI agents. If the Wiki is empty, the AI agent has no context to generate accurate SQL. Documentation isn't just nice to have. It's an accuracy requirement.

Mistake 6: Applying Security at the BI Tool Level Only

What happens: Row-level security is configured in Tableau so regional managers only see their region. Then an analyst opens a SQL client, queries the underlying table directly, and sees all regions. The security was enforced in the dashboard, not in the data.

Why it's common: BI tools make it easy to apply filters and security rules. Data platforms require more setup. Teams take the easy path.

The fix: Enforce access policies at the semantic layer, not the BI layer. Row-level security and column masking should be applied on the virtual datasets (views). Every query path — dashboard, notebook, API, AI agent — inherits the same rules.

Dremio implements this through Fine-Grained Access Control (FGAC): policies defined as UDFs at the view level. A regional manager queries business.revenue and automatically sees only their region, regardless of how they access the data. No security gaps between tools.

Mistake 7: Trying to Model Everything at Once

Incremental growth — from a small core to a comprehensive semantic layer

What happens: The team commits to building a complete semantic layer covering every source, every table, and every metric. The project takes six months. By the time it launches, requirements have changed, stakeholder interest has waned, and half the views are out of date.

Why it's common: Ambitious leaders want a "complete" solution. Data teams want to avoid rework. Neither wants to ship an incomplete layer.

The fix: Start with 3-5 core metrics that the organization actively debates (usually Revenue, Active Users, Churn). Build one Bronze → Silver → Gold pipeline per metric. Validate that the same question produces the same answer across two different tools.

Once those metrics are stable, expand incrementally. Add new sources, new views, new metrics — one at a time. Each addition is low-risk because the layered architecture isolates changes. A new Gold view doesn't affect existing Silver views.

The fastest semantic layers reach 80% organizational coverage not by modeling everything up front, but by proving value quickly and expanding from momentum.

What to Do Next

Pick one mistake from this list. Check whether your semantic layer (or your plan for one) is making it. Fix that one thing this week. Then come back for the next one.

Try Dremio Cloud free for 30 days