惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

I
Intezer
Jina AI
Jina AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
有赞技术团队
有赞技术团队
J
Java Code Geeks
人人都是产品经理
人人都是产品经理
博客园 - 叶小钗
M
MIT News - Artificial intelligence
月光博客
月光博客
C
Check Point Blog
Y
Y Combinator Blog
S
SegmentFault 最新的问题
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Cybersecurity and Infrastructure Security Agency CISA
A
Arctic Wolf
S
Security Archives - TechRepublic
S
Securelist
美团技术团队
SecWiki News
SecWiki News
H
Help Net Security
V
Vulnerabilities – Threatpost
S
Secure Thoughts
F
Fortinet All Blogs
量子位
aimingoo的专栏
aimingoo的专栏
T
Tor Project blog
大猫的无限游戏
大猫的无限游戏
Scott Helme
Scott Helme
MyScale Blog
MyScale Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Docker
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
L
Lohrmann on Cybersecurity
F
Fox-IT International blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
博客园 - 三生石上(FineUI控件)
Engineering at Meta
Engineering at Meta
Microsoft Security Blog
Microsoft Security Blog
Recorded Future
Recorded Future
V
Visual Studio Blog
WordPress大学
WordPress大学
S
Schneier on Security
Stack Overflow Blog
Stack Overflow Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Apple Machine Learning Research
Apple Machine Learning Research
N
News | PayPal Newsroom
GbyAI
GbyAI
T
Threat Research - Cisco Blogs

Databricks

Reliable LLM Inference at Scale How the lakebase architecture stays resilient to cloud failures Introducing Always-On pricing: automatic savings for Databricks Lakebase Announcing Lakebase Change Data Feed (CDF) Building a FHIR-native health data platform on Databricks Lakebase AI readiness in telecommunications Pharma launch analytics: How to compress the first 90 days and win the three years that follow Scaling for MHHS: 50x cost-efficient margin data engineering at Octopus Energy Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks Observability for any agent, anywhere: Production-ready tracing with OpenTelemetry & Unity Catalog on Databricks How World Bank Group uses Databricks to eradicate poverty through shared knowledge How Databricks Genie democratizes data access in financial services Using observability data to prevent incidents How security teams can report cyber risk to boards Transforming industries with conversational AI: Partner solutions built on Databricks Genie From emissions reporting to decarbonization decisions You’ve built the media products, now make them personalized From "What Happened?" to "What Will Happen?" Unlock seamless and cost-effective marketing campaigns with Lakebase Governing AI agents at scale with Unity Catalog How telecom CFOs can make smarter network capex decisions with AI How Databricks Genie improves retail personalization Databricks for Good and Virtue Foundation: Partnering to Connect Medical Volunteers to Critical Health Services in 72 Countries Automate Data & KPI Monitoring with SQL Alerts How to Build Real-Time Fraud Detection using Spark Real-Time Mode and Lakebase How Databricks Genie improves supply chain visibility with real-time AI analytics A CFO’s guide to managing value-based care financial performance Stop Rogue AI: How Unity Catalog Secures Your Agent Actions Why AI Security Infrastructure is Now a CMO Priority Databricks context engineer associate: the industry’s first certification for reliable AI agent systems Introducing AI Spend Controls with Unity AI Gateway How to safeguard AI workloads with Unity AI Gateway Guardrails What’s new in Unity AI Gateway: service policies, guardrails, observability, and cost controls for AI agents and MCPs MemEx: A Programmable Scratchpad for LLM Agents How Deutsche Börse built a generative AI tool to tackle the large-scale migration of Zeppelin notebooks to Databricks Announcing the Databricks analytics engineer learning pathway The question your commercial data should already be able to answer PipelineIQ: Forward‑Looking Sales Intelligence That Drives Action Backstage with Lakebase, part 2 Expanded interoperability with Unity Catalog Open APIs From manual to autonomous: how AI agents are transforming electric grid operations Data quality is the AI strategy Clinical operations intelligence belongs on the Lakehouse The Rosetta stone of CPS: Claroty’s AI-powered library ABAC row filtering and column masking policies, governed tags, and data classification are now generally available in Unity Catalog The Rise of Sports Intelligence: How the Lakehouse Turns Tracking Data into Competitive Advantage How CFOs in consulting can recover margin with Databricks Announcing Native Lakehouse Sync Announcing Databricks Student Fellows Faster Queries and New Capabilities with the Open-Source Databricks JDBC Driver The Convergence of Open Table Formats and Open Catalogs: Catalog Commits is Generally Available Unlocking the Archives: Turning Unstructured Documents into a Searchable Database for Groundwater Discovery Using MemAlign to Improve Evaluation of Traditional Machine Learning in Genie Code How Superhuman and Databricks built a 200K QPS inference platform together Addressing HR's widening capacity gap with AI MCP Marketplace Brings Real-Time Intelligence to Agentic Applications Pushing the Frontier for Data Agents with Genie Energy trading analytics in a real-time market Operating room utilization is hiding in your scheduling data Predictive Quality Starts Where Defect Detection Stops Retail markdown optimization: from reactive markdowns to proactive Why telecom churn prediction misses the intervention window Growth Analytics Is What Comes After Growth Hacking Real-world evidence for medical affairs: who can actually use it? Wealth advisor productivity starts with the client conversation How lakebase architecture delivers 5x faster Postgres writes Why Talent Transformation Is the Missing Focus of Enterprise AI Public Health Intelligence Shouldn't Require a Data Scientist Mean Time to Detect Is a Data Access Problem First-party audience data is the ad sales relationship now Rethinking Distributed Systems for Serverless Performance and Reliability The AI Scaling Gap Hiding in Digital Native Companies 10 trillion samples a day: Scaling beyond traditional monitoring infra at Databricks AI success starts with clean data, not just better models How nOps Rebuilt Their Cloud Optimization Platform on Databricks Lakebase, and Why Other ISVs Should Too Peril Predicts: Precision Payouts for a Volatile World The foundation of AI scalability: one team, one platform, one operating model The Federal Data Paradox: Rich in Data, Poor in Access Driving Budapest Forward: How BKK Uses Databricks to Transform City Mobility LLM Vs AI: A Practical Guide to Differences, Use Cases, and Tools Model Risk Governance Is Not the Same as Risk Intelligence Generative AI for Business: A Complete Strategy and Implementation Guide Data Science vs Data Engineering: Choosing Analysis or Infrastructure AI Applications: Tools, Use Cases, and Platforms MLOps vs DevOps: A Practical Guide for Data Scientists and IT Teams Top Data Warehouse Tools For Modern Data Analytics Unlocking SAP Business Context in Databricks with Semantic Metadata Delta Sharing The marketing activation gap has a fix: Databricks and Stitch partner to turn data infrastructure into marketing performance Alert Fatigue Is a Business Risk Backstage with Lakebase Shipping Faster isn’t Learning Faster Why Your OEE Dashboard Is Lying to You The Turbine That Tried to Tell You It Was Failing Predicting Readmissions Isn't Enough. Acting in Time Is. Clinical Trials Run Longer Than They Have To. That's a Patient Problem Network Quality Is a Revenue Problem, Not a Technical One Shelf Availability Starts with Better Demand Visibility When Predicting the Next Hit Requires More Than Intuition Approximate Answers, Exact Decisions: New Sketch Functions for Analytics Companies Winning with AI Built the Data Layer First
BI Serving Pointers; Maximizing for Performance and TCO
2026-05-28 · via Databricks

Your BI dashboards are slow, and tuning them is costing too much time and money.

It's a familiar pattern. A dashboard query takes 30 seconds, so someone builds an aggregate table to speed it up. That table needs a refresh pipeline. The pipeline needs monitoring. Then a second BI tool needs the same data in a slightly different shape, so someone builds another aggregate table using a separate pipeline. Before long, you're managing a sprawl of aggregates, extracts, and tool-specific semantic layers — each with its own staleness window, its own governance gaps, and its own line item on the compute bill.

BI workloads are different from other analytical workloads. They're highly concurrent, latency-sensitive, and repetitive in their query patterns. That combination demands a deliberate approach to modeling, storing, optimizing, and serving data. The good news: Databricks provides a full stack for BI serving — from physical data layout to a governed semantic layer — and each layer compounds the performance gains of the layer below it.

This post walks through that stack bottom-up, with practical guidance on where to focus for the biggest improvements in query performance and cost.

The BI Serving Stack

Before diving into each layer, here's the full picture:

Unity Catalog Governance

Unity Catalog provides governance throughout — lineage and access control from raw data through semantics to consumption. Each layer addresses a different aspect of performance and cost. Let's walk through them.

Optimize the Physical Layer

The physical layer is where most BI performance is won or lost. Get this right and every query benefits — before you've touched the semantic layer.

Start with Dimensional Modeling

Star schemas remain the gold standard for BI query performance. Wide, denormalized dimension tables joined to fact tables via surrogate keys give the query optimizer clean, predictable join paths.

Databricks fully supports the relational modeling constructs you need: primary and foreign key constraints (with RELY for optimizer hints), identity columns for surrogate keys, and CHECK and NOT NULL constraints. If you're following a medallion architecture, keep your normalized or Data Vault models in Silver and build denormalized star schemas in Gold for BI consumption.

For detailed implementation patterns — SCD Type-1 and Type-2 handling, fact table ETL with MERGE, late-arriving dimensions — see the Implementing a Dimensional Data Warehouse in Databricks SQL blog series.

Use Managed Tables

Unity Catalog managed tables are the foundation for everything else in this stack. Unity Catalog manages all read, write, storage, and optimization responsibilities for managed tables. This unlocks automatic features you don't get with external tables: Predictive Optimization (covered below) is enabled by default. Automatic liquid clustering selects clustering keys that adapt as query patterns change. Metadata caching is always on, reducing cloud storage requests and speeding up query planning.

Use managed tables throughout the platform — not just for BI-serving, but across Bronze, Silver, and Gold layers. They're the default table type in Unity Catalog, and the performance and governance benefits compound with every other optimization in this stack.

Apply Liquid Clustering

Liquid clustering replaces static partitioning and manual Z-ORDER — and unlike those approaches, you can redefine clustering keys without rewriting existing data. Add CLUSTER BY (col1, col2) at table creation or use ALTER TABLE on existing tables. If you're not sure which columns to choose, CLUSTER BY AUTO lets Predictive Optimization select keys based on observed query patterns.

For BI workloads, cluster on your most common filter and join columns — date keys, region, product category. You can select up to four columns, and if two columns are highly correlated, include only one. When dashboards filter on cluster columns, liquid clustering improves query performance through data skipping.

Let Predictive Optimization Handle the Rest

Predictive Optimization automatically runs OPTIMIZE, VACUUM, and statistics collection on tables that would benefit from these operations — so you don't need to schedule these jobs yourself. It collects both Delta data-skipping statistics and query optimizer statistics during Photon writes, and back-fills stats for existing tables. In observed workloads, this delivered an average 22% performance improvement. For BI workloads with repetitive filter patterns, the impact is especially significant — better statistics mean better data skipping and more efficient query plans.

Enable Predictive Optimization at the catalog level and let it run. Using Predictive Optimization is one of the highest-return, lowest-effort optimizations you can make.

The result: BI queries scan less data, join more efficiently, and cost less to run — and you haven't touched the semantic layer yet.

Metric Views: Define Your Metrics Once

Here's where things get interesting. Most organizations have the same business metrics defined in multiple places — a revenue calculation in one BI tool, a slightly different one in another, a third variant in a SQL notebook someone wrote last quarter. Each definition drifts independently. Nobody's sure which one is right.

Metric Views in Unity Catalog solve this by providing a headless BI layer — a single, governed semantic layer where you define your data model and KPIs once, independent of any specific BI tool. You define them centrally in SQL or the point-and-click UI in Unity Catalog Explorer. AI/BI Dashboards, Genie, SQL notebooks, and third-party BI tools all resolve metrics from the same definitions. Define a metric once, and every consumer — human or AI — gets the same answer.

Metric Views go beyond centralized metric definitions — the semantic metadata is what sets them apart. Fields like display_name, comment, and synonyms give AI systems the context they need to interpret business questions correctly. When a user asks Genie "what was our revenue last week?", those annotations are how Genie maps natural language to the right measure and dimensions. No custom prompts, no separate glossary. The same applies to AI agents built on Databricks — any agent with access to Unity Catalog can discover and query governed metrics through the semantic layer instead of hard-coded SQL. The richer your metadata, the more accurately AI serves the right answer.

Here's an example using a system table, since every Databricks customer has access — but the same pattern applies to business KPIs like revenue, order volume, or customer retention. This Metric View calculates DBSQL warehouse metrics:

Consumers query the Metric View using MEASURE() to reference the governed metric definitions:

The metrics are defined once in the Metric View. Every dashboard, Genie space, or notebook that queries metv_dbsql_metrics gets the same result. Below is a dashboard using the metric view as a source.

Warehouse Metrics

Here's Genie using the same metric view.

Daily Query Count

For teams with metric definitions scattered across multiple BI tools, Metric Views provide a path to consolidate the semantic layer into Databricks. Instead of maintaining separate metric logic in each tool, you define it once in Unity Catalog and connect your BI tools to that governed source.

The core implementation is open-sourced in Apache Spark™ (SPARK-54119), with Unity Catalog OSS support coming — so you're building on an open standard with no vendor lock-in. That openness matters more as AI takes on more of the BI workload. Agents querying your data need a consistent, machine-readable definition of what each metric means, and an open standard lets any tool or agent — not just vendor-specific ones — reason over the same governed metrics.

Metric View Materialization: OLAP Performance Without the Overhead

Traditionally, when BI dashboards were too slow, the answer was to build aggregate tables. You'd create materialized views or custom pre-aggregation tables on top of your star schema, set up refresh pipelines, and re-point your BI tools at the new tables. It worked, but it added a whole layer of objects and pipelines to maintain — and every time the aggregation logic changed, you had to update the BI tool queries to match.

Metric View materialization offers a simpler alternative. When you enable materialization on a Metric View, the platform automatically maintains pre-aggregated results behind the same metric definitions your BI tools already query — no separate aggregate tables to build, no BI tool queries to refactor. Here's what happens under the hood:

  • Automatic pre-aggregation: Metric results are pre-computed and stored
  • Incremental refresh: Metrics stay current without full recomputation
  • Intelligent query rewriting: The engine routes queries to the best available materialization
  • Transparent routing: Users query metrics the same way — the system serves the fastest path

Dashboard queries that previously scanned full fact tables now hit pre-aggregated materializations — with lower latency and lower compute cost. The dashboard and Genie examples above both queried the same Metric View, and both had their queries transparently routed to a materialization. The query plan below from Genie shows this in action.

Query Plan

Practical TCO Pointers

Faster queries and lower cost aren't competing goals — every optimization that reduces data scanned also reduces the compute you pay for. And each optimization in the stack compounds. Liquid clustering and better statistics improve data skipping and query plans. Materializations can be refreshed incrementally, reducing the compute SQL warehouses need to serve dashboards. Here are a few more ways to lower cost:

  • Right-size your SQL warehouse. Use serverless SQL warehouses with auto-scaling for BI concurrency bursts. You pay for what you use, not peak capacity.
  • Lean on DBSQL's caching tiers. Disk cache keeps hot data local to the warehouse, and query result cache (QRC) serves repeated queries without re-execution. For dashboards with consistent query patterns, caching turns many requests into millisecond-latency responses at near-zero compute cost.
  • Eliminate redundant data movement. Serve BI directly from the lakehouse via DirectQuery or live connections, rather than using extracts or imports.
  • Monitor with system tables. System tables such as system.billing.usage and system.query.history can be used to track BI usage by dashboard, user, and warehouse. Build Metric Views and an AI/BI Dashboard on system tables to gain visibility into your BI usage.

Get Started

You don't need to implement the entire stack at once. Start where you'll see the most impact:

  1. Build (or validate) your Gold-layer star schema with managed tables, primary/foreign keys, and liquid clustering
  2. Enable Predictive Optimization on your catalog to auto-manage OPTIMIZE, VACUUM, and statistics collection
  3. Define Metric Views for your core business KPIs — start with SQL or the UC Explorer UI
  4. Enable Metric View materialization for your highest-traffic metrics
  5. Monitor the results — point dashboards at Metric Views and track query performance via system tables

Databricks provides optimizations at every layer of the BI serving stack. Managed tables, liquid clustering, and Predictive Optimization minimize data scanned and compute spent. Metric Views centralize your business logic in a governed semantic layer that serves dashboards, Genie, and AI agents consistently. Materialization delivers sub-second query performance without manual pre-aggregation pipelines. Together, these layers compound — driving down both query latency and total cost of ownership.

Start by defining your first Metric View on an existing Gold-layer table and enabling materialization. See the resources below to get started.