惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Project Zero
Project Zero
WordPress大学
WordPress大学
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
Visual Studio Blog
爱范儿
爱范儿
P
Proofpoint News Feed
F
Fortinet All Blogs
雷峰网
雷峰网
小众软件
小众软件
Jina AI
Jina AI
人人都是产品经理
人人都是产品经理
TaoSecurity Blog
TaoSecurity Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
S
Secure Thoughts
Recent Commits to openclaw:main
Recent Commits to openclaw:main
博客园 - 司徒正美
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Microsoft Azure Blog
Microsoft Azure Blog
IT之家
IT之家
S
Security @ Cisco Blogs
Help Net Security
Help Net Security
GbyAI
GbyAI
Webroot Blog
Webroot Blog
T
Troy Hunt's Blog
B
Blog
MongoDB | Blog
MongoDB | Blog
月光博客
月光博客
H
Heimdal Security Blog
Google Online Security Blog
Google Online Security Blog
S
Security Affairs
云风的 BLOG
云风的 BLOG
Engineering at Meta
Engineering at Meta
www.infosecurity-magazine.com
www.infosecurity-magazine.com
H
Help Net Security
O
OpenAI News
H
Hacker News: Front Page
博客园 - 叶小钗
Last Week in AI
Last Week in AI
S
Schneier on Security
The Last Watchdog
The Last Watchdog
C
Cyber Attacks, Cyber Crime and Cyber Security
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
MyScale Blog
MyScale Blog
Recorded Future
Recorded Future
博客园 - 【当耐特】
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
N
News | PayPal Newsroom
The Hacker News
The Hacker News
A
Arctic Wolf

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Can AI in Manufacturing Work Without the Cloud? A Guide
Tiioluwani · 2026-05-18 · via DEV Community

Keeping external traffic out of operational networks is a best practice that most manufacturing facilities build into their architecture from the ground up.

Manufacturing networks use the Purdue Model, a five-level system that has shaped industrial network design for decades. At the lowest level are the physical machines: sensors, motors, and actuators at Level 0; real-time controllers and SCADA systems at Level 1; and supervisory servers and HMI systems at Level 2. Level 3 manages operations. Levels 4 and 5 connect to the enterprise network and to the internet.

IEC 62443 enforces strict boundaries between these levels. Traffic from Level 2 does not reach the internet. For defense contractors, ITAR compounds the problem. Technical data must stay on U.S. soil and remain accessible only to U.S. persons. Cloud-hosted vector databases like Pinecone, Weaviate Cloud, and Qdrant Cloud fail both requirements. Level 2 has no way to send that request, and other industries learned this lesson the hard way.

Why cloud AI cannot reach the factory floor

Latency compounds the problem. Cloud round-trips average 50 to 500 milliseconds. PLC-level control loops require responses in under 10 milliseconds. Teams that need AI during outages use edge deployment patterns designed for disconnected environments.

Cost adds another layer. AWS standard egress starts at $0.09 per GB. At any serious production scale, sensor and vision data add up quickly, and the bill arrives faster than most teams expect.

Architecture, latency, and cost all point in the same direction. AI on the factory floor needs to run where the data lives.

This tutorial shows you how to build a local RAG pipeline that runs entirely on factory-floor hardware, where a technician can ask a question about any piece of equipment and get a cited answer from decades of maintenance records, with no internet connection required.

What You Are Building

You’ll build a three-layer RAG pipeline that runs fully inside your factory network. The ingestion layer processes PDF maintenance documents and stores them in Actian VectorAI DB. The query layer takes a technician's question and returns a cited answer fast enough for interactive use on factory-floor hardware.

  • Ingestion: Reads the PDF maintenance documents, splits them into 256-token chunks with a 25-token overlap, generates embeddings using sentence-transformers on a CPU, and stores everything in VectorAI DB with metadata for equipment line, document date, and source file.
  • Query: Takes a technician's question, embeds it with the same model, runs a hybrid search in VectorAI DB filtered by equipment line and date range, and sends the top results to a local LLM running with Ollama, which generates a cited answer in plain English.
  • Audit: Logs every ingestion and query event as a structured JSON entry to ./data/audit.log, timestamped in UTC, and stored inside your security boundary to satisfy IEC 62443 traceability requirements.

VectorAI DB sits at the center of all three layers. It stores the embeddings that the ingestion layer produces, and serves the search results that the query layer runs. Running it on-premises instead of in the cloud keeps the whole pipeline inside your security boundary.

Pipeline architecture

The pipeline runs on standard factory edge server hardware, with Ubuntu 22.04 LTS, 16 GB of RAM, and a 4-core CPU.

Building a Local RAG Pipeline with VectorAI DB

Set up VectorAI DB, build the ingestion pipeline, run your first query, add hybrid filters, and connect a local LLM.

Prerequisites

Sign up for the VectorAI DB community edition before you start, then make sure you have these installed:

  • Docker and Docker Compose
  • Python 3.10 or higher
  • uv package manager. Install with curl -LsSf https://astral.sh/uv/install.sh | sh
  • Ollama. Install from Ollama.com and pull the model with ollama pull llama3.2:3b

Your machine needs at least 8 GB RAM (16 GB or more recommended) and 10 GB of disk space (100 GB or more recommended) to run VectorAI DB. If you're on Windows, the uv install command needs 'sh', which PowerShell doesn't have. Run all commands in WSL2 (Windows Subsystem for Linux). To set up WSL2, run 'wsl --install' in PowerShell, then use the Ubuntu terminal for this tutorial.

Project structure

Set up your project folder like this:

 factory-rag/
├── docker-compose.yml
├── data/
│   └── audit.log
├── config/
└── src/
    ├── healthcheck.py
    ├── ingest.py
    ├── query.py
    ├── llm.py
    ├── audit.py
    └── test_e2e.py

Enter fullscreen mode Exit fullscreen mode

Create the directories:

mkdir -p factory-rag/{data,config,src}
cd factory-rag

Enter fullscreen mode Exit fullscreen mode

Step 1: Deploy VectorAI DB

Create docker-compose.yml in your project root:

services:
  vectorai-db:
    image: williamimoh/actian-vectorai-db:latest
    platform: linux/amd64
    container_name: vectorai-db
    ports:
      - "50051:50051"
    volumes:
      - ./data:/data
      - ./config:/config
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "nc -z localhost 50051 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s

Enter fullscreen mode Exit fullscreen mode

Start the container with:

docker compose up -d

Enter fullscreen mode Exit fullscreen mode

Install the SDK with:

uv add actian_vectorai-0.1.0b2-py3-none-any.whl

Enter fullscreen mode Exit fullscreen mode

Install these required libraries:

uv add sentence-transformers pypdf

Enter fullscreen mode Exit fullscreen mode

Check that the server is running. Make a file called src/healthcheck.py:

from actian_vectorai import VectorAIClient

with VectorAIClient("localhost:50051") as client:
    info = client.health_check()
    print(f"✓ VectorAI DB is running")
    print(f"  Title:   {info['title']}")
    print(f"  Version: {info['version']}")

Enter fullscreen mode Exit fullscreen mode

Run the script:

uv run python src/healthcheck.py

Enter fullscreen mode Exit fullscreen mode

Terminal output:

Terminal output

Step 2: Build the ingestion pipeline

Put your PDF maintenance documents in the data/ folder before running this step. Add any equipment maintenance records, inspection reports, or failure logs there.
The pipeline uses sentence-transformers/all-MiniLM-L6-v2, which needs less than 200 MB of RAM on CPU. We split text into 256-token chunks with a 25-token overlap to keep enough context for good retrieval.

Create src/ingest.py:

from __future__ import annotations

import argparse
import uuid
from pathlib import Path

from actian_vectorai import Distance, PointStruct, VectorAIClient, VectorParams
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer

from audit import log_ingestion

COLLECTION = "maintenance_records"
HOST = "localhost:50051"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
VECTOR_DIM = 384
CHUNK_TOKENS = 256
OVERLAP_TOKENS = 25

def chunk_text(text, tokenizer, chunk_size=CHUNK_TOKENS, overlap=OVERLAP_TOKENS):
    token_ids = tokenizer.encode(text, add_special_tokens=False)
    chunks = []
    start = 0
    while start < len(token_ids):
        end = min(start + chunk_size, len(token_ids))
        window = token_ids[start:end]
        decoded = tokenizer.decode(window, skip_special_tokens=True).strip()
        if decoded:
            chunks.append(decoded)
        if end >= len(token_ids):
            break
        start += chunk_size - overlap
    return chunks

def ingest_pdf(pdf_path, equipment_line, doc_date, model, client):
    reader = PdfReader(str(pdf_path))
    full_text = "\n".join(page.extract_text() or "" for page in reader.pages)
    if not full_text.strip():
        print(f"  [warn] No extractable text in {pdf_path.name}, skipping.")
        return 0
    tokenizer = model.tokenizer
    chunks = chunk_text(full_text, tokenizer)
    points = []
    for idx, chunk in enumerate(chunks):
        embedding = model.encode(chunk, show_progress_bar=False).tolist()
        points.append(
            PointStruct(
                id=str(uuid.uuid5(uuid.NAMESPACE_DNS, f"{pdf_path.name}:{idx}")),
                vector=embedding,
                payload={
                    "equipment_line": equipment_line,
                    "doc_date": doc_date,
                    "source_file": pdf_path.name,
                    "text": chunk,
                    "chunk_index": idx,
                },
            )
        )
    if points:
        client.points.upsert(COLLECTION, points)
    return len(points)

def main(data_dir, equipment_line, doc_date):
    data_path = Path(data_dir)
    pdfs = sorted(data_path.glob("*.pdf"))
    if not pdfs:
        print(f"No PDF files found in '{data_dir}'. Add PDFs to ./data/ and retry.")
        return
    print(f"Loading embedding model '{MODEL_NAME}'...")
    model = SentenceTransformer(MODEL_NAME)
    with VectorAIClient(HOST) as client:
        if not client.collections.exists(COLLECTION):
            client.collections.create(
                COLLECTION,
                vectors_config=VectorParams(size=VECTOR_DIM, distance=Distance.Cosine),
            )
            print(f"Created collection '{COLLECTION}' ({VECTOR_DIM}-dim, Cosine)")
        else:
            print(f"Collection '{COLLECTION}' already exists, appending chunks.")
        total = 0
        for pdf_path in pdfs:
            print(f"Ingesting {pdf_path.name} ...")
            count = ingest_pdf(pdf_path, equipment_line, doc_date, model, client)
            print(f"{count} chunks stored")
            log_ingestion(pdf_path.name, equipment_line, count)
            total += count
        print(f"\nDone. {total} total chunks stored in '{COLLECTION}'.")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Ingest PDFs into VectorAI DB")
    parser.add_argument("--data-dir", default="./data")
    parser.add_argument("--equipment-line", required=True)
    parser.add_argument("--doc-date", required=True)
    args = parser.parse_args()
    main(args.data_dir, args.equipment_line, args.doc_date)

Enter fullscreen mode Exit fullscreen mode

Run the ingestion step:

uv run python src/ingest.py --equipment-line turbine-A --doc-date 2024-03-15

Enter fullscreen mode Exit fullscreen mode

Expected output:

Terminal output

The metadata schema saves the equipment line, document date, and source file with each chunk. This lets you filter searches by equipment line or date range without searching the whole collection.

Step 3: Run your first query

Your ingestion pipeline has stored the maintenance records in VectorAI DB. The pipeline can answer questions. When a technician asks something in plain English, the pipeline embeds the question, searches the maintenance_records collection, and returns the top five most relevant chunks with similarity scores.

Create src/query.py:

from __future__ import annotations

import argparse
import time

from actian_vectorai import Field, FilterBuilder, VectorAIClient
from sentence_transformers import SentenceTransformer

from audit import log_query

COLLECTION = "maintenance_records"
HOST = "localhost:50051"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
TOP_K = 5

def build_filter(equipment_line=None, doc_date=None, doc_date_to=None):
    fb = FilterBuilder()
    if equipment_line:
        fb.must(Field("equipment_line").eq(equipment_line))
    if doc_date and doc_date_to:
        fb.must(Field("doc_date").range(gte=doc_date, lte=doc_date_to))
    elif doc_date:
        fb.must(Field("doc_date").eq(doc_date))
    return fb.build() if (equipment_line or doc_date) else None

def search(question, equipment_line=None, doc_date=None, doc_date_to=None):
    model = SentenceTransformer(MODEL_NAME)
    embedding = model.encode(question, show_progress_bar=False).tolist()
    query_filter = build_filter(equipment_line, doc_date, doc_date_to)
    with VectorAIClient(HOST) as client:
        hits = client.points.search(
            COLLECTION,
            vector=embedding,
            limit=TOP_K,
            filter=query_filter,
        )
    return [
        {
            "score": round(r.score, 4),
            "source_file": r.payload.get("source_file", ""),
            "equipment_line": r.payload.get("equipment_line", ""),
            "doc_date": r.payload.get("doc_date", ""),
            "chunk_index": r.payload.get("chunk_index", -1),
            "text": r.payload.get("text", ""),
        }
        for r in hits
    ]

def main():
    parser = argparse.ArgumentParser(description="Search maintenance records")
    parser.add_argument("question", help="Natural language question")
    parser.add_argument("--equipment-line", default=None)
    parser.add_argument("--doc-date", default=None)
    parser.add_argument("--doc-date-to", default=None)
    args = parser.parse_args()

    start = time.monotonic()
    results = search(args.question, equipment_line=args.equipment_line,
        doc_date=args.doc_date, doc_date_to=args.doc_date_to)
    latency_ms = (time.monotonic() - start) * 1000
    log_query(args.question, args.equipment_line or "", results, latency_ms)

    if not results:
        print("No results found.")
        return

    print(f"Top {len(results)} results for: \"{args.question}\"\n")
    for i, r in enumerate(results, 1):
        print(f"[{i}] score={r['score']:.4f}  {r['source_file']} "
              f"(chunk {r['chunk_index']})  {r['doc_date']}  {r['equipment_line']}")
        print(f"     {r['text'][:200].strip()}...")
        print()

if __name__ == "__main__":
    main()

Enter fullscreen mode Exit fullscreen mode

Try your first query:

uv run python src/query.py "What caused the bearing failure?"

Enter fullscreen mode Exit fullscreen mode

The search uses the same model as ingestion to embed the query, keeping both the query and stored vectors in the same semantic space. For maintenance records with this model, similarity scores between 0.4 and 0.6 indicate relevant matches.

Step 4: Add hybrid filters

Filtering by equipment line and date helps keep search results relevant to the technician's current work. Run the same query from Step 3, but add these filters:

uv run python src/query.py "What caused the bearing failure?" --equipment-line turbine-A

Enter fullscreen mode Exit fullscreen mode

Add a date filter to narrow the results even more:

uv run python src/query.py "What caused the bearing failure?" --equipment-line turbine-A --doc-date 2024-03-15

Enter fullscreen mode Exit fullscreen mode

Expected output:

Terminal output

The build_filter function constructs a FilterBuilder query that combines vector similarity with exact metadata matching. A technician working on turbine-A only sees results from that equipment line, not from the entire maintenance history.

Step 5: Connect the local LLM

The search results feed into a local LLM running via Ollama, which generates a cited answer in plain English. The entire round trip runs on factory-floor hardware.

Create src/llm.py:

from __future__ import annotations

import json
import os
import sys
import urllib.request
from typing import Any

OLLAMA_HOST = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
OLLAMA_MODEL = "llama3.2:3b"
MAX_NEW_TOKENS = 256
TEMPERATURE = 0.1
TIMEOUT_SECONDS = 300

def build_prompt(question: str, results: list[dict[str, Any]]) -> str:
    if not results:
        return f"Question: {question}\n\nAnswer: I have no relevant context to answer this question."
    context_blocks = []
    for i, r in enumerate(results, 1):
        source = r.get("source_file", "unknown")
        date = r.get("doc_date", "unknown")
        equip = r.get("equipment_line", "unknown")
        text = r.get("text", "").strip()
        context_blocks.append(
            f"[{i}] Source: {source} | Equipment: {equip} | Date: {date}\n{text}"
        )
    context = "\n\n".join(context_blocks)
    return (
        "You are a maintenance records assistant. "
        "Answer the question using ONLY the provided context. "
        "Cite sources inline using [1], [2], etc. "
        "If the context does not contain enough information, say so.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}\n\n"
        "Answer:"
    )

def generate(question: str, results: list[dict[str, Any]]) -> str:
    prompt = build_prompt(question, results)
    payload = json.dumps({
        "model": OLLAMA_MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {
            "num_predict": MAX_NEW_TOKENS,
            "temperature": TEMPERATURE,
        },
    }).encode()
    req = urllib.request.Request(
        f"{OLLAMA_HOST}/api/generate",
        data=payload,
        headers={"Content-Type": "application/json"},
        method="POST",
    )
    with urllib.request.urlopen(req, timeout=TIMEOUT_SECONDS) as resp:
        body = json.loads(resp.read().decode())
    return body["response"].strip()

def answer(question: str, results: list[dict[str, Any]]) -> str:
    reply = generate(question, results)
    print(reply)
    print()
    print("Sources")
    for i, r in enumerate(results, 1):
        print(
            f"  [{i}] {r.get('source_file', '?')} "
            f"(chunk {r.get('chunk_index', '?')}, score {r.get('score', 0):.4f}) "
            f"{r.get('doc_date', '?')} / {r.get('equipment_line', '?')}"
        )
    return reply

if __name__ == "__main__":
    question = sys.argv[1] if len(sys.argv) > 1 else "What maintenance was performed?"
    dummy_results = [
        {
            "source_file": "example.pdf",
            "doc_date": "2024-03-15",
            "equipment_line": "turbine-A",
            "chunk_index": 0,
            "score": 0.95,
            "text": (
                "Performed scheduled bearing inspection on turbine-A. "
                "Replaced worn bearing race on shaft 2. "
                "Torque settings verified per spec TRB-004."
            ),
        }
    ]
    answer(question, dummy_results)

Enter fullscreen mode Exit fullscreen mode

Wire everything together by creating src/test_e2e.py:

from query import search
from llm import answer

question = "What maintenance was performed on the gearbox?"
results = search(question, equipment_line="turbine-A")
answer(question, results)

Enter fullscreen mode Exit fullscreen mode

Run the full pipeline:

uv run python src/test_e2e.py

Enter fullscreen mode Exit fullscreen mode

llama3.2:3b fits in the memory of a standard factory edge server. The LLM receives only the retrieved chunks as context, not the full document collection, which keeps responses fast and grounded in cited sources.

Expected output:

Terminal output

The pipeline is fully up and running. A technician can ask a question, get a cited answer from local maintenance records, and never need to use the internet.

Step 6: Add audit logging

IEC 62443 requires full traceability for every operation within the OT network. Without a local audit trail, your pipeline has no record of what was queried, when, or what it returned.

Create src/audit.py:

from __future__ import annotations

import json
import logging
from datetime import datetime, timezone
from pathlib import Path

LOG_PATH = Path("./data/audit.log")
LOG_PATH.parent.mkdir(parents=True, exist_ok=True)

handler = logging.FileHandler(str(LOG_PATH))
handler.setLevel(logging.INFO)

logger = logging.getLogger("actian_vectorai.audit")
logger.setLevel(logging.INFO)
logger.addHandler(handler)

def log_query(question: str, equipment_line: str, results: list, latency_ms: float) -> None:
    entry = {
        "event": "query",
        "timestamp": datetime.now(tz=timezone.utc).isoformat(),
        "question": question,
        "equipment_line": equipment_line,
        "results_returned": len(results),
        "latency_ms": round(latency_ms, 2),
    }
    logger.info(json.dumps(entry))

def log_ingestion(source_file: str, equipment_line: str, chunks_stored: int) -> None:
    entry = {
        "event": "ingestion",
        "timestamp": datetime.now(tz=timezone.utc).isoformat(),
        "source_file": source_file,
        "equipment_line": equipment_line,
        "chunks_stored": chunks_stored,
    }
    logger.info(json.dumps(entry))

Enter fullscreen mode Exit fullscreen mode

Run the audit script with this command:

cat data/audit.log

Enter fullscreen mode Exit fullscreen mode

Expected output:

Terminal output

The pipeline now keeps a structured record of every ingestion and query event in ./data/audit.log, timestamped in UTC and stored inside your security boundary.

Wrapping Up

You just built a local RAG pipeline that runs entirely on factory-floor hardware, serves queries during network outages, and returns cited answers from decades of maintenance records.

AI in manufacturing can operate without a cloud connection. VectorAI DB enables this by running entirely within the IEC 62443 security boundary, without relying on the cloud. Cut the internet connection entirely, and the pipeline keeps working.

Your pipeline ingests PDF maintenance documents, stores embeddings in the VectorAI DB at Level 2 of your OT network, and answers natural-language questions using a local LLM with no cloud dependency at any step. From here, you can extend the pipeline by adding more document types, tuning the embedding model for your specific equipment vocabulary, adding role-based query filtering by technician, or scaling ingestion across multiple equipment lines.

Find the full VectorAI DB documentation and the GitHub repository to explore further.

Join the community and learn more about Actian.