AI Data Pipeline Automation with AIDB

EDB

Enterprise Automation Resilience: Red Hat AAP on EDB Postgres AI EDB heads to PGConf.Brasil 2026, this is what we’ll be talking about! Powering Invisible Commerce at World Cup Speed By the Time Your Data Warehouse Answers, the Opportunity Is Gone Building a Sovereign, Intelligent Data Foundation with EDB Postgres® AI on IBM LinuxONE 5 Deep Dive Into EDB Postgres AI's Agentic Database Capabilities Jumping the gun: looking ahead at PostgreSQL 19 Meeting in Montreal: Developer U plan(ner) patches KubeCon + CloudNativeCon NA EDB Summer Academy Your Database Goes Down. What Does That Cost Your Business? The Oracle Renewal Is Coming. This Time, There’s a Way Out. One Dashboard to Rule Them All — and Finally Get Your Fridays Back Your Database Should Be Working While You Sleep Inside the Agentic Database: How EDB Turned Postgres Into a Self-Managing System The Architecture IS the Security: Building Sovereign AI Ops on Postgres with EDB Agent Factory EDB Named a Leader in Multimodel Data Platforms Evaluation PGDay Hyderabad The Role of AI in Data Analytics: Moving From Hype to High-Octane Utility Iga Januszek Mike Olifirowicz Meeting EU Data Sovereignty Requirements While Speeding-Up Innovation Inside EDB’s New Principles for Responsible AI: Sovereign, Governed, Trusted and Beneficial Built From the Data Up: A Trusted Foundation for the Agentic Era | EDB Postgres® AI Q2-2026 Release EDB Launches Agentic Database, Converged Analytics, and Governance, Bringing Sovereign AI Where Enterprise Data Already Lives Stop Spending Hours on What Should Take Minutes: A DBA's Guide to EDB Postgres AI’s Agentic Database Capabilities Making Agentic AI Smarter at the Architecture Level Charly Batista Buildfarm Query API Jaime Arze EDB PGD 6.4 Brings Distributed Consistency to Mission-Critical Postgres Data Layer Precedes Compute, GPU Capacity in Sovereign AI The pipeline tax is breaking enterprise AI at agent scale Sovereignty boosts enterprise AI returns, study finds As the Agentic Era Reshapes the Data Layer, Enterprises Build Their Sovereign Foundation on EDB Postgres® AI The Industrial Bank of Korea Bets Its Core Financial Infrastructure on EDB Postgres® AI Governing Agentic AI at Enterprise Speed Beyond the Latency Gap: Building Sovereign, Real-Time Agentic Applications on a Unified Postgres Estate Just Clear a Day: What We Learned Running an AI Security Hackathon How Shinhan EZ Insurance Built a Cloud-Native Core Banking System on EDB Postgres® AI PGConf.dev 2026: Our team’s sessions, working groups, and key takeaways EDB Releases PGD 6.4 with Quorum Commit, Bringing True Distributed Consistency to Mission-Critical Postgres PostgreSQL Conference Europe (PGConf EU) Cloud Native Denmark Data Stack Conf Community over Code Postgres Summit US PGDay Lowlands PGDay UK PGConf.Brasil Kubernetes Community Days (KCD) Melbourne Swiss PGDay Switchover and Switchback of CloudNativePG Replica Clusters in a Distributed Topology (K8s) - Part 2 Preparing Enterprises for the Agentic Workforce CWO Society Dinner for FSI From VMs to Kubernetes: A DBA's Journey in a Large Global Bank Navigating Disruption: Architecting Your Sovereign Data Estate for Resiliency Sovereignty Is the New Operating System for Agentic AI, New MIT Technology Review Insights Report Finds Beyond the DBaaS Trap: Achieving Data Sovereignty with Kubernetes and CloudNativePG Red Hat Ansible Automates: Washington DC OpenShift Showcase: Toronto 소버린 AI 전문가와 함께하는 EDB 웨비나 コンテナ化の運用の壁をどう超えるか〜デプロイ・保守を自動化し、リソース負担を最小化する次世代DB運用戦略〜コンテナ化の運用の壁をどう超えるか？〜デプロイ・保守を自動化し、リソース負担を最小化する次世代DB運用戦略〜 A Day in the Life: Inside a Director of Sales Development Role at EDB Taller: Creación de una plataforma de análisis soberana a gran escala con EDB Postgres AI Workshop: Building a Sovereign Analytics Platform at Scale with EDB Postgres AI Building Real-Time, Data-Aware Intelligence with Postgres and the Model Context Protocol Yogesh Jain POSETTE How Euronext FX Built the Data Foundation for a New Era of Electronic Trading EDB Postgres® AI: The Sovereign Data and AI Platform for the Agentic Enterprise HOW2026 Data, Trust, and the New Rules of AI EDB at Red Hat Summit 2026: Building AI on Ground You Own A Day in the Life at EDB: Inside a Director of Customer Success Role at EDB PostgreSQL vs MySQL: Migration Without the Migraine DIVA (Dive into AI) 2026 Club des Utilisateurs Français d’EDB Postgres (CUFEP) 2026 EDB Delivers “Intelligence per Watt” Paradigm to Slash Token Consumption and Cut Data Center Emissions by up to 87% EDB Postgres AI on OpenShift cluster using CSI driver for Dell PowerFlex takashi eridai EDB Japan EDB Spearheads the Year of the Agentic Workforce with Industry Recognition, Ecosystem Momentum, and Continued Postgres® Leadership A Strategic Roadmap for Oracle to Postgres Migration at Ooredoo Deployment of PostgreSQL Replica Cluster via Barman Cloud Plugin on CloudNativePG - Part 1 Making AI Work for Your Business PGDay Armenia Ava Chawla Why the World’s Most Stable OS Demands a High-Performance Data Foundation MySQL to PostgreSQL Migration Chris Chiappone EDB Postgres® AI Delivers Superior Predictability vs. Cloud Data Warehouses in High-Concurrency Benchmark, Unveils Q1 Platform Updates to Power the Agentic AI Era The Agentic Confusion: Why I Keep My Postgres Control Plane Deterministic The Next Generation of EDB Postgres AI Factory: Built for the Agent Era Why Your Analytical Database Needs Multiple Clusters to Do What WarehousePG Does With One Driving the Next Digital Experience

Dr. Sala Muthukrishnan · 2026-05-18 · via EDB

AIDB is EDB's Postgres extension that automates the entire AI data preparation pipeline — chunking, embedding, and vector indexing — triggered live inside the database the moment new data arrives. To show it in action, I used an investigation story PDF: a real document full of players, events, and relationships that needed to become a queryable knowledge base without any manual data wrangling.

What AIDB automates — and why it matters

Every AI pipeline that works with unstructured documents has the same hidden cost: data preparation. Before a single query can be answered, raw text must be cleaned, split into model-friendly chunks, converted into vector embeddings, and indexed for retrieval. In a conventional setup, each of those steps is a separate job — something you write, schedule, monitor, and re-run every time source data changes.

AIDB removes that cost entirely. You declare a preparer and a knowledge base as live database objects. From that moment on, the database owns data preparation. Every INSERT into a source table automatically triggers chunking, embedding, and indexing — with no external scheduler, no ETL script, and no risk of the knowledge base falling out of sync.

The core principle: data preparation is not a job you run on a schedule — it is a behaviour the database exhibits automatically on every INSERT. AIDB's Live mode makes this real.

To put this to the test, I took an investigation story PDF — a dense, character-rich document full of named players, a timeline of events, and a web of relationships — and used AIDB to turn it into a fully automated, queryable RAG knowledge base. Here is the exact pipeline, in SQL.

The automated pipeline at a glance

PDF file → source_documents → target_preparer (auto) → RAG_KB (auto) → Query ready

Only the first step — loading parsed PDF content into source_documents — involves any manual action. Everything beyond that is owned by AIDB. The target_preparer chunks each passage the moment it arrives. The RAG_KB embeds each chunk immediately after. The investigation story is query-ready before you close your SQL client.

Step 1 — load the PDF content into source_documents

The investigation story PDF is parsed externally — each passage, section, or paragraph extracted as a discrete piece of text. Those pieces land as rows in the source table. The table is intentionally simple: just an ID, a part number, a generated unique key, and the raw text.

The generated unique_id column — combining document ID and part number — ensures every passage is traceable all the way through chunking and embedding. When you query RAG_KB later and get a result back, you can trace it directly to the original story passage.

DROP TABLE IF EXISTS source_documents;

CREATE TABLE source_documents (
  id        TEXT,
  part_id   INTEGER NOT NULL,
  unique_id TEXT NOT NULL GENERATED ALWAYS AS
            ((id || '.part.') || part_id) STORED,
  result    TEXT,
  CONSTRAINT source_documents_pkey PRIMARY KEY (unique_id)
);

CREATE INDEX source_documents_id_idx
  ON source_documents (id);

-- confirm story passages are loaded
SELECT * FROM source_documents;

In the investigation story use case, each row represents one passage — a player profile, a scene description, a timeline entry, or a relationship note. The richer and more granular the parsing, the more precise the retrieval. Once the rows are inserted, AIDB automation takes over immediately.

Step 2 — automate chunking with target_preparer

Story passages vary in length and structure. AIDB's target_preparer applies the ChunkText operation to break each passage into consistently-sized, embedding-ready chunks, writing them automatically to the prepared document table. This is the first stage of automated data preparation — configured once, runs forever.

Calling aidb.set_auto_preparer with 'Live' activates the automation. Every subsequent INSERT into source_documents is chunked instantly — no cron job, no Celery worker, no polling loop.

-- idempotent: safe to re-run
SELECT aidb.delete_preparer('target_preparer');

SELECT aidb.create_table_preparer(
  name                    => 'target_preparer',
  operation               => 'ChunkText',
  source_table            => 'source_documents',
  source_data_column      => 'result',
  destination_table       => 'prepared_document',
  destination_data_column => 'chunks',
  source_key_column       => 'unique_id',
  destination_key_column  => 'id',
  options => '{"desired_length": 100}'::JSONB
);

-- Live mode: chunking fires automatically on every INSERT
SELECT aidb.set_auto_preparer('target_preparer', 'Live');

A chunk size of 100 tokens suits narrative content well — specific enough to isolate a player or event, broad enough to carry surrounding context. For the investigation story, this means each chunk typically covers a single character trait, a single scene, or a single relationship link.

Step 3 — automate embedding with RAG_KnowledgeBase

With chunks flowing automatically from the preparer, RAG_KB handles the second stage: embedding. AIDB converts each chunk into a BERT vector and stores it in the backing vector table. Set to Live mode, this fires the moment a new chunk is written — completing the fully automated data preparation chain.

No manual embedding run. No batch job. The entire investigation story — every player, every event, every relationship — is encoded as semantic vectors inside Postgres, continuously and automatically.

-- idempotent: safe to re-run
SELECT aidb.delete_knowledge_base('RAG_KB');

SELECT aidb.create_table_knowledge_base(
  name               => 'RAG_KB',
  model_name         => 'bert',
  source_table       => 'prepared_document',
  source_data_column => 'chunks',
  source_data_format => 'Text',
  source_key_column  => 'unique_id'
);

-- Live mode: embedding fires automatically on every new chunk
SELECT aidb.set_auto_knowledge_base('RAG_KB', 'Live');

-- verify vectors are populated
SELECT * FROM RAG_KB_vector LIMIT 10;

The pipeline is now fully live. Insert a new passage from the investigation story into source_documents — new testimony, a newly discovered document, an updated player profile — and it flows through target_preparer and into RAG_KB automatically. No further action needed.

The result: querying the story like a database

With the automated pipeline live, RAG_KB can be queried using natural language — passed directly to AIDB's semantic retrieval function. Ask about a player by name, a type of event, a location, or a relationship between parties. The retrieval is similarity-based, not keyword-based: even if the query words differ from the original text, AIDB finds and ranks the most relevant passages.

This is what AIDB's automated data preparation makes possible. The investigation story stops being a static PDF and becomes a structured, searchable intelligence layer — one that stays current automatically as new material is added.

Summary: what AIDB automates end to end

The investigation story example demonstrates a general capability. Any document — PDF, report, case file, research paper, HTML, etc. — can be loaded into source_documents and immediately become part of a live, automated RAG pipeline. AIDB handles the rest:

Chunking — target_preparer splits raw text into model-ready chunks automatically on INSERT.

Embedding — RAG_KB converts each chunk to a vector automatically as chunks arrive.

Indexing — vectors are stored and indexed inside Postgres, with no external vector store required.

Freshness — the knowledge base is always in sync with source data. No scheduled re-runs, no drift.

Swap bert for any AIDB-supported model — OpenAI embeddings, a local Ollama model, or a fine-tuned domain model — without changing the pipeline. target_preparer and RAG_KB are model-agnostic by design.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

EDB

What AIDB automates — and why it matters

The automated pipeline at a glance

Step 1 — load the PDF content into source_documents

Step 2 — automate chunking with target_preparer

Step 3 — automate embedding with RAG_KnowledgeBase

The result: querying the story like a database

Summary: what AIDB automates end to end