惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner The Treasure Hunt Engine That Broke Before the Traffic Did Reset Windows Update: The Definitive MSP Guide to RWU Your Resume Was Never Built for This I built a token-level debugger for comparing two LLMs VCP-Virtual Private Cloud Embedding sing-box in an iOS messenger to bypass Russian DPI (no VPN) Microsoft Copilot just exfiltrated a company's files. The attack was one email. Here's the mechanism. RAG 시스템 실전 구축 (v42) copilot cloud agent is becoming an automation api Cx Dev Log — 2026-04-23 Why Tesla Is Becoming the AI Enterprise Case Study Every Leader Should Understand ORA-00214 오류 원인과 해결 방법 완벽 가이드 SpecAgnt v2.0: The Agent Lifecycle Framework for AI-Native Engineering Optimizing Signal Latency and Weight Allocations in Algorithmic Pipelines SSH Under the Hood: Protocols, Mechanisms, and the Full Technical Story دليل بوابات الدفع للتاجر العربي في 2026 (وكيف تختار المناسبة لمتجرك) Cómo Mi Configuración de Docker Me Salvó de un Ataque de Supply Chain (Y Por Qué la Tuya Debería Hacerlo También) How My Docker Setup Saved Me From a Supply Chain Attack (And Why Yours Should Too) Astro: The epitome of SEO Technical Update I Gave My AI Agent the Ability to Research Before It Writes — Here’s What Changed Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5) CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39)
Fabric AI Functions Turn GenAI Into a Data Pipeline Step
Shai Karmani · 2026-05-26 · via DEV Community

Originally published at https://shai-kr.github.io/data-ninja-ai-lab/blog/2026-05-24-fabric-ai-functions-data-workflows.html

AI Functions in Fabric data workflow

Most enterprise GenAI demos start in the wrong place.

They start with a chat window.

The more useful place is usually earlier: inside the data workflow, before the dashboard, before the semantic model, before the analyst has to clean the same messy text for the tenth time.

That is why Fabric AI Functions are worth paying attention to.

They let data teams use GenAI directly inside pandas and Spark workflows in Microsoft Fabric. Not as a separate app. Not as a one-off script sitting outside the platform. As a transformation step inside the work data teams already do.

That changes the shape of the use cases.

Instead of asking “how do we add a chatbot?”, the better question becomes:

Where is language, document mess, or unstructured content slowing down our data pipeline?

What you can actually do with it

Fabric AI Functions expose common GenAI operations as DataFrame-friendly functions.

You can use them to:

  • classify support tickets, survey responses, incidents, or customer feedback
  • summarize notes, long text fields, operational logs, and service records
  • extract fields from documents or semi-structured text
  • translate records as part of a data preparation flow
  • fix grammar or normalize messy text before reporting
  • create embeddings for search, RAG, and semantic retrieval
  • compare similarity between text values
  • generate structured responses from instructions
  • enrich rows in pandas or Spark without moving the workflow outside Fabric

That sounds simple, but it is a useful shift.

For years, a lot of GenAI work around data platforms has looked like this:

  1. Export data from the platform.
  2. Send it to a separate script or service.
  3. Call an AI model.
  4. Stitch the result back into the data estate.
  5. Hope the process is governed enough to survive production.

Fabric AI Functions make a cleaner pattern possible.

The AI step can live closer to the lakehouse, notebook, Spark job, data science workflow, Power BI preparation layer, and downstream semantic model.

That is a much better starting point for teams that want AI to improve real data work, not just demo well.

The big changes that make this interesting

There are a few parts that matter more than the feature list.

1. GenAI becomes part of the pipeline

The most important change is architectural.

AI enrichment can become a normal transformation step.

A notebook can read raw records, apply an AI function, store the output as another column or table, and send that enriched dataset into the next layer of the platform.

That means AI output can be reviewed, versioned, refreshed, tested, governed, and consumed like other data assets.

That is very different from treating GenAI as a sidecar experiment.

Before and after workflow for Fabric AI Functions

2. Multimodal input makes the use cases much better

Text classification is useful, but many business workflows are not clean text.

They are PDFs.

Screenshots.

Images.

CSV files.

JSON files.

Markdown notes.

Operational documents that never quite made it into a table.

Microsoft documents AI Functions support for image files such as JPG, PNG, GIF, and WebP, documents such as PDF, and common text formats such as MD, TXT, CSV, JSON, and XML.

That opens better Fabric workflows.

A team can bring files into the lakehouse, use AI to extract or summarize what matters, and store the result in structured tables for review and reporting.

That is the kind of AI use case that can save real operational time.

3. Embeddings can be created where the content already lives

ai.embed is one of the more important functions because it connects Fabric directly to search and RAG preparation.

A team can take product documentation, policy files, support resolutions, internal wiki pages, field notes, or knowledge base articles and create embeddings as part of the data workflow.

That creates a cleaner path from raw business content to retrieval-ready datasets.

The useful part is not just the embedding itself. It is that the data team can decide what content is approved, what should be excluded, how often embeddings refresh, and what downstream applications are allowed to use.

4. The model/provider configuration is becoming more serious

The documentation now covers configuration details around providers and models, including the default model behavior.

That matters because production teams eventually need answers to basic governance questions:

  • Which model is being used?
  • Who approved it?
  • Which data can be sent to it?
  • Which capacity pays for it?
  • Which workloads are allowed to use it?
  • What happens when the output is wrong?

This is where Fabric AI Functions become more than a notebook convenience. They become part of the data platform operating model.

5. The best output is not “AI magic”. It is a reviewable data asset.

The mistake is to take AI output and treat it as automatically trusted.

The better pattern is to produce reviewable enrichment.

Keep the original value.

Add the AI-generated label, summary, extracted field, or embedding.

Add review flags where needed.

Store the result in a table with ownership and downstream rules.

Then decide what is safe enough for reporting, automation, search, or user-facing apps.

That is how this becomes useful without becoming sloppy.

Three practical things I would build first

1. Support ticket enrichment

Most support datasets contain useful signal, but the text is messy.

A Fabric notebook can add AI-generated columns for:

  • topic classification
  • urgency
  • sentiment
  • short summary
  • product area
  • likely ownership team

The key is not to pretend the model is perfect. The key is to create a reviewable enrichment layer that helps analysts and operations teams move faster.

A good output table might include the original text, AI-generated labels, confidence or review flags where available, and a human-reviewed status column.

That gives Power BI a better dataset without hiding the uncertainty.

2. Document extraction into structured tables

A lot of business data is trapped in semi-structured documents.

Invoices, forms, reports, agreements, field notes, inspection PDFs, and vendor files often contain fields that teams later retype manually.

With AI Functions, the useful pattern is:

  1. Store the files in the lakehouse.
  2. List file paths as input.
  3. Use extraction or generation instructions to pull out the fields.
  4. Store the result as a structured table.
  5. Review exceptions before the data becomes trusted.

That does not replace proper document processing for every scenario. It does make small and medium internal automation projects much easier to test inside Fabric.

3. Embeddings for search and RAG preparation

A team can take approved internal content and create embeddings as part of the Fabric workflow.

That content might include:

  • product documentation
  • policy files
  • support resolutions
  • internal wiki pages
  • knowledge base articles
  • implementation notes

The output can become a governed retrieval layer instead of a random pile of files passed into an AI app.

That matters because RAG quality starts before the chat interface. It starts with content selection, metadata, refresh rules, ownership, and preparation.

Good use cases for Fabric AI Functions

Where I would be careful

Positive does not mean careless.

AI Functions make enrichment easier, but the usual production questions still matter:

  • Which data is allowed to be sent to the model?
  • Is the Fabric tenant setting for Copilot and Azure OpenAI enabled intentionally?
  • Does the workload require cross-geo processing approval?
  • Which Fabric capacity will pay for the work?
  • Which model/provider is configured?
  • How will output quality be reviewed?
  • Which outputs are allowed to flow into reports or user-facing apps?
  • How will failures, blanks, and hallucinated values be handled?

Microsoft notes that Fabric AI Functions require a paid Fabric capacity, F2 or higher, or any P capacity. The documentation also states that AI Functions are supported in Fabric Runtime 1.3 and later, and that the default model is gpt-4.1-mini unless a different model is configured.

Those details matter. They turn this from a cool notebook feature into a platform decision.

My take

Fabric AI Functions are useful because they move GenAI into the unglamorous part of AI work.

The pipeline.

The notebook.

The enrichment step.

The document cleanup.

The semantic preparation layer.

That is where a lot of business value actually sits.

Not every AI feature needs to become a chat window. Some of the most valuable AI work will happen quietly inside pipelines, quality checks, enrichment jobs, and retrieval preparation steps.

The practical opportunity is simple:

Take the data you already manage in Fabric. Add AI where language, documents, and meaning slow the team down. Store the result as a governed data asset. Review it before it reaches users.

That is a much better direction than treating AI as a separate island next to the data platform.

When did this become available?

The official Microsoft Learn page for Fabric AI Functions currently has a documentation date of November 13, 2025 and an updated timestamp of May 7, 2026.

The GitHub history for the Fabric documentation shows the AI Functions overview page existed by February 28, 2025. A later documentation commit on November 24, 2025 is titled “Update AI Functions documentation for GA release with enhancements.” Recent documentation updates in February, March, and May 2026 added more coverage around multimodal input, schema extraction, configuration, providers, and file workflows.

So the short version is:

  • The documentation trail starts in early 2025.
  • The GA documentation update appears in November 2025.
  • The more interesting expansion for practical teams is the 2026 work around multimodal inputs, broader model/provider configuration, schema extraction, and file workflows.

Sources