惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive Resume Building using HTML & CSS SpecFlow: Multi-Agent SDD in Cursor (4 phases, /approve, single code writer) Running ASR for smart homes in the NPU of Intel processors "Building a CI/CD Pipeline From Scratch: A Practical Guide for Developers (with GitHub Actions)" SpecFlow: SDD multi-agente en Cursor (4 fases, /approve, un solo escritor de código) How to Extract Your Full Team Hierarchy from HubSpot (the API doesn't expose it) Adobe Commerce Cloud now costs $40k/year. We migrated from Adobe Commerce to Magento Open Source — here's the honest breakdown .klickd v4.0.0 — Portable AI memory with constraints, strict schemas, and test vectors We Trust Third Party Code, It’s Time to Trust AI Generated Code LangGraph 워크플로우 템플릿 (v38) Sustainable AI Starts with Efficient AI Find Remove duplicated files in Google Drive How to Detect GPU Waste in a Kubernetes Cluster The Privacy Bug in My First Chrome Extension (And How to Avoid It) Serverless Mental Models: What They Don't Tell You Before You Build Preventing GPT hallucination in automated content pipelines: how I structure Make.com flows with data injection Hmm, where were we? AI Visibility Tools, Math Proofs, and Stripped Guardrails Shape Developer Landscape How AI and Electronics Are Changing Healthcare Devices: The Future of Smart Healthcare Author: Shivam Wakade | Founder, PrivSR Making Claude Sound Like Optimus Prime Understanding Reinforcement Learning with Human Feedback Part 5: Training the Reward Model with Loss Functions Learning Progress Pt.20 How Secure LoRa Communication Devices Work: Building the Future of Private and Long-Range Connectivity Author: Shivam Wakade | Founder, PrivSR How I Rebuilt an RPG Map Editor with Rust, React, and WASM Building a System That Automates YouTube Post-Production Building a 100% Serverless Digital Asset Packager in the Browser Game Recommended AI What is Human-In-The-Loop (HITL)? Deep Dive: React Server Components in TanStack Start Migrating off Google Analytics: Umami vs Plausible vs Fathom Building a Portfolio That Actually Demonstrates Software Engineering Async/Await in JavaScript: From Callbacks to Clean Code (2026) Benchmarking LLM Structured Outputs Angular 21 Multiselect Dropdown: A Migration-Friendly Component with Live Functional Tests ShareBox v5 — GPU transcoding, Netflix-style grid, and why I don't need Plex anymore TOML Schema is live Handling Duplicate Shopify Webhook Events (And Why You Must) Original Kubernetes Dashboard — retired upstream, upgraded to Angular 21. لماذا أسست ترينافو للتجار العرب الذين تتجاهلهم المنصات الغربية Construyendo un recomendador de películas en Python: de los datos al modelo When APIs Lie: A Lesson in Defensive Debugging Pope Leo XIV's AI Encyclical: What Builders Must Know (2026) Donna v0.3.0 HTB — MonitorsFour | Writeup The Free Tool You Trust Is the One You Should Fear the Most HTB — MonitorsFour | Writeup Fr 97. Embeddings and Vector Search: Semantic Search That Works Deep Dive: Building "Gravity Paint" - A Tactile Physics Instrument with React, Matter.js, and p5.js ABAP Unit Testing with Test Doubles and Mocking Frameworks: A Senior Architects Guide to Isolating Dependencies in SAP S/4HANA LeetCode Solution: 5. Longest Palindromic Substring kovax-react 0.8: Tailwind v4 preset, FormField adapters, ColorModeScript, and Storybook I built an AI résumé tool that refuses to lie about your experience The hat Azure Entra ID User & Role Management — Step-by-Step Practical Guide With A Simple Excercise The AI-Native Company: How a Single Founder Can Build Global Organizations Powered by AWS and an Ecosystem of Artificial Intelligences Building a Lightweight Remote MCP Knowledge Base on Cloudflare Workers Why I built Trinavo for the MENA merchants Western platforms ignore The N+1 Query That Killed Our Database, And How I Fixed It Docstrings vs Markdown Docs: What Should Developers Actually Write? Training Data Provenance: The Manifest Diff That Explains the Hash Add SVGIcons MCP to Claude Code and Find SVG Icons from Your Terminal 3 CLI Tools You Can Buy with Crypto — No KYC, No Subscriptions COSS Weekly: OpenClaw competitor NanoClaw Raises $12M, Dust Raises $40M, Sonar Acquires Gitar, and more How to know if you actually need mobile proxies (without buying any) Building Cursor for Community: A Buildathon Built on Time Pressure How we built a PII masking layer for LLM APIs — local detection, reversible tokens, one line to integrate Why MLFQ Was Way Ahead of Its Time Add Runtime Limits to Claude Agent Workflows I Built a Prompt Injection Detector with 98% Recall on Unseen Attacks. Here's Why Data Beat Architecture. 8 Vite Config Options Every Developer Should Know (Vite 8) Feature Flags That Forgot to Leave Why Trust Infrastructure Is Becoming the Hidden Layer of Donation Platforms XyPriss: Rethinking Core Performance and Zero-Trust Architecture in Modern Backends Designing Configuration for Scalable Treasure Hunts SSH Login Delays: The 10-Second Wait That Drives Us Crazy Building Production Multi-Agent Workflows in n8n: What 50 Deployments Taught Us A 3-layer memory system that gives Claude Code persistent context across sessions. Trishul SNMP Suite 2.0.1: Better MIBs, Traps, and SNMP Labs How I built a production AI SaaS as a solo developer Auto-labelling 1.2M robotics frames with VLMs: a failover story India’s Laws Were Not Built for AI — And Courts Are Filling the Gap skill-insp: A Skill That Scores Other Skills Clprolf Minimalist Messaging in the Age of AI What's actually in a good .cursorrules file? I built 10 of them — here's what I learned Building Strong Python Basics – Loops, Functions and Logic How to Choose the Right Tech Stack for Your Project I built a free multi-tab JSON editor — here's what I learned HTTP Headers Every Developer Should Know (2026) Building Cross-Platform Digital Products: Challenges and Best Practices Data Privacy in the Age of AI: How Product Teams Can Build Trust with Users What Would WordPress Look Like If It Were Designed Today? Why Backup Success Does Not Mean Database Recoverability Local AI Office Assistant That Never Sends Your Documents to the Cloud Building TaskForge: Translating Enterprise Chaos into an Open-Source Scheduler Tesla P40 in a Homelab: 24GB of Inference on a Budget Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution George Hotz called AI code 'slop.' He's half right. Como Construir um Fluxo de Trabalho Baseado em Engenharia de Prompt e Automação
RAG 시스템 실전 구축 (v38)
matias yoon · 2026-05-26 · via DEV Community

RAG 시스템 실전 구축 (v38)

Real-World RAG Implementation Guide for ML Engineers

1. RAG Fundamentals: The Core Loop

Retrieval-Augmented Generation (RAG) is a powerful pattern that combines information retrieval with language generation. The core loop consists of three phases:

  1. Retrieval: Find relevant documents from a knowledge base
  2. Augmentation: Inject retrieved context into prompts
  3. Generation: Generate responses using the augmented prompt
# Simplified RAG Loop
class BasicRAG:
    def __init__(self, vector_db, embedding_model, llm):
        self.vector_db = vector_db
        self.embedding_model = embedding_model
        self.llm = llm

    def query(self, user_query):
        # 1. Retrieve relevant documents
        query_embedding = self.embedding_model.encode(user_query)
        relevant_docs = self.vector_db.search(query_embedding, k=5)

        # 2. Augment prompt with context
        context = "\n".join([doc.content for doc in relevant_docs])
        augmented_prompt = f"Context: {context}\n\nQuestion: {user_query}"

        # 3. Generate response
        response = self.llm.generate(augmented_prompt)
        return response

Enter fullscreen mode Exit fullscreen mode

2. Chunking Strategies

Effective document chunking is critical for retrieval quality. Here are the main approaches:

Semantic Chunking

import numpy as np
from sentence_transformers import SentenceTransformer

class SemanticChunker:
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)

    def chunk_by_semantic(self, text, max_tokens=512):
        sentences = text.split('. ')
        embeddings = self.model.encode(sentences)

        # Group sentences based on semantic similarity
        chunks = []
        current_chunk = []
        current_embedding = np.zeros(embeddings[0].shape)

        for i, (sentence, embedding) in enumerate(zip(sentences, embeddings)):
            if len(current_chunk) > 0:
                similarity = np.dot(current_embedding, embedding) / (
                    np.linalg.norm(current_embedding) * np.linalg.norm(embedding)
                )
                if similarity < 0.7 or len(current_chunk) > 20:  # threshold
                    chunks.append(' '.join(current_chunk))
                    current_chunk = [sentence]
                    current_embedding = embedding
                else:
                    current_chunk.append(sentence)
                    # Update average embedding
                    current_embedding = (current_embedding + embedding) / 2
            else:
                current_chunk.append(sentence)
                current_embedding = embedding

        if current_chunk:
            chunks.append(' '.join(current_chunk))
        return chunks

Enter fullscreen mode Exit fullscreen mode

Recursive Chunking

class RecursiveChunker:
    def __init__(self, max_chunk_size=512, overlap=50):
        self.max_chunk_size = max_chunk_size
        self.overlap = overlap

    def chunk_recursive(self, text):
        chunks = []

        def split_recursive(text, start=0, depth=0):
            if len(text) <= self.max_chunk_size or depth > 5:
                chunks.append(text)
                return

            # Try to split at sentence boundaries first
            split_point = text.rfind('. ', start, start + self.max_chunk_size)
            if split_point == -1:
                split_point = start + self.max_chunk_size

            chunks.append(text[start:split_point])
            next_start = max(0, split_point - self.overlap)
            split_recursive(text, next_start, depth + 1)

        split_recursive(text)
        return chunks

Enter fullscreen mode Exit fullscreen mode

3. Embedding Model Selection

Choosing the right embedding model affects both performance and cost:

# Model comparison benchmark
import time
from sentence_transformers import SentenceTransformer

def benchmark_embeddings():
    models = {
        "all-MiniLM-L6-v2": {
            "dimensions": 384,
            "size_mb": 80,
            "speed": "fast"
        },
        "all-mpnet-base-v2": {
            "dimensions": 768,
            "size_mb": 400,
            "speed": "medium"
        },
        "BAAI/bge-small-en": {
            "dimensions": 512,
            "size_mb": 120,
            "speed": "fast"
        }
    }

    test_sentences = [
        "The quick brown fox jumps over the lazy dog",
        "Machine learning models require large datasets",
        "Natural language processing enables human-like interactions"
    ]

    for name, config in models.items():
        model = SentenceTransformer(name)
        start = time.time()
        embeddings = model.encode(test_sentences)
        end = time.time()

        print(f"{name}: {end-start:.2f}s for {len(test_sentences)} sentences")
        print(f"  Dimensions: {config['dimensions']}, Size: {config['size_mb']}MB")

# Benchmark output:
# all-MiniLM-L6-v2: 0.15s for 3 sentences
# all-mpnet-base-v2: 0.35s for 3 sentences  
# BAAI/bge-small-en: 0.20s for 3 sentences

Enter fullscreen mode Exit fullscreen mode

4. Vector Database Comparison

Database Pros Cons Best For
Chroma Easy setup, Python native, good for dev Limited scalability Local/development
Qdrant High performance, advanced filtering Complex setup Production
pgvector PostgreSQL integration, ACID Requires PostgreSQL Existing SQL systems
Milvus Scalable, distributed Steep learning curve Large deployments
# Example implementation with different vector DBs
class VectorDBFactory:
    @staticmethod
    def create_vector_db(db_type, **kwargs):
        if db_type == "chroma":
            import chromadb
            client = chromadb.Client()
            return chromadb.Collection(client, **kwargs)
        elif db_type == "qdrant":
            from qdrant_client import QdrantClient
            client = QdrantClient(**kwargs)
            return client
        elif db_type == "pgvector":
            import psycopg2
            conn = psycopg2.connect(**kwargs)
            return conn
        elif db_type == "milvus":
            from pymilvus import Collection
            return Collection(**kwargs)

Enter fullscreen mode Exit fullscreen mode

5. Full RAG Pipeline Implementation

import os
from sentence_transformers import SentenceTransformer
from chromadb import Client
from chromadb.config import Settings
from typing import List, Dict
import json

class CompleteRAGPipeline:
    def __init__(self, model_name="all-MiniLM-L6-v2", db_path="./chroma_db"):
        # Initialize components
        self.embedding_model = SentenceTransformer(model_name)
        self.vector_client = Client(Settings(persist_directory=db_path))
        self.collection = self.vector_client.get_or_create_collection("documents")

        # Simple LLM placeholder (replace with actual implementation)
        self.llm = self._simple_llm_response

    def _simple_llm_response(self, prompt):
        # This would be replaced with actual LLM call
        return f"Generated response to: {prompt[:50]}..."

    def add_documents(self, documents: List[Dict]):
        """Add documents to the vector database"""
        embeddings = self.embedding_model.encode([doc['content'] for doc in documents])

        # Add to Chroma
        self.collection.add(
            embeddings=embeddings,
            documents=[doc['content'] for doc in documents],
            metadatas=[doc.get('metadata', {}) for doc in documents],
            ids=[doc['id'] for doc in documents]
        )

    def search_and_generate(self, query: str, top_k: int = 5):
        """Main RAG workflow"""
        # 1. Retrieve
        query_embedding = self.embedding_model.encode([query])
        results = self.collection.query(
            query_embeddings=query_embedding,
            n_results=top_k,
            include=['documents', 'metadatas']
        )

        # 2. Augment
        retrieved_docs = results['documents'][0]
        context = "\n---\n".join(retrieved_docs)
        augmented_prompt = f"""
Context: {context}
Question: {query}
Answer:"""

        # 3. Generate
        response = self.llm(augmented_prompt)
        return {
            "query": query,
            "context": context,
            "response": response,
            "retrieved_docs": retrieved_docs
        }

# Usage example
pipeline = CompleteRAGPipeline()

# Add sample documents
sample_docs = [
    {
        "id": "1",
        "content": "The capital of France is Paris. Paris is known for the Eiffel Tower.",
        "metadata": {"source": "wiki"}
    },
    {
        "id": "2", 
        "content": "Machine learning is a subset of artificial intelligence that focuses on algorithms.",
        "metadata": {"source": "tech_blog"}
    }
]

pipeline.add_documents(sample_docs)
result = pipeline.search_and_generate("What is the capital of France?")
print(json.dumps(result, indent=2, ensure_ascii=False))

Enter fullscreen mode Exit fullscreen mode

6. Advanced Techniques

Query Transformation


python
class QueryTransformer:
    def __init__(self):
        self.transformations = [
            self.expand_query,
            self.rephrase_query,


---

📥 **Get the full guide on Gumroad**: https://gumroad.com/l/auto ($7)

Enter fullscreen mode Exit fullscreen mode