惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38) Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive Resume Building using HTML & CSS SpecFlow: Multi-Agent SDD in Cursor (4 phases, /approve, single code writer) Running ASR for smart homes in the NPU of Intel processors "Building a CI/CD Pipeline From Scratch: A Practical Guide for Developers (with GitHub Actions)" SpecFlow: SDD multi-agente en Cursor (4 fases, /approve, un solo escritor de código) How to Extract Your Full Team Hierarchy from HubSpot (the API doesn't expose it) Adobe Commerce Cloud now costs $40k/year. We migrated from Adobe Commerce to Magento Open Source — here's the honest breakdown .klickd v4.0.0 — Portable AI memory with constraints, strict schemas, and test vectors We Trust Third Party Code, It’s Time to Trust AI Generated Code LangGraph 워크플로우 템플릿 (v38) Sustainable AI Starts with Efficient AI Find Remove duplicated files in Google Drive How to Detect GPU Waste in a Kubernetes Cluster The Privacy Bug in My First Chrome Extension (And How to Avoid It) Serverless Mental Models: What They Don't Tell You Before You Build Preventing GPT hallucination in automated content pipelines: how I structure Make.com flows with data injection Hmm, where were we? AI Visibility Tools, Math Proofs, and Stripped Guardrails Shape Developer Landscape How AI and Electronics Are Changing Healthcare Devices: The Future of Smart Healthcare Author: Shivam Wakade | Founder, PrivSR Making Claude Sound Like Optimus Prime Understanding Reinforcement Learning with Human Feedback Part 5: Training the Reward Model with Loss Functions Learning Progress Pt.20 How Secure LoRa Communication Devices Work: Building the Future of Private and Long-Range Connectivity Author: Shivam Wakade | Founder, PrivSR How I Rebuilt an RPG Map Editor with Rust, React, and WASM Building a System That Automates YouTube Post-Production Building a 100% Serverless Digital Asset Packager in the Browser Game Recommended AI What is Human-In-The-Loop (HITL)?
RAG 시스템 실전 구축 (v40)
matias yoon · 2026-05-26 · via DEV Community

RAG 시스템 실전 구축 (v40)

개요

RAG(Retrieval-Augmented Generation) 시스템은 대규모 언어 모델(LLM)의 지식 범위를 확장하고, 정확한 정보를 기반으로 출력을 생성하기 위한 핵심 아키텍처입니다. 이 가이드는 실전에서 구현 가능한 RAG 시스템을 구축하고 최적화하는 방법을 다룹니다.

1. RAG 기본 구조

RAG 시스템은 다음과 같은 루프로 작동합니다:

  1. 검색(Retrieval): 사용자 질문과 유사한 문서 검색
  2. 보완(Augmentation): 검색된 문서와 질문을 결합하여 프롬프트 생성
  3. 생성(Generation): LLM이 결합된 프롬프트를 기반으로 답변 생성
# RAG 루프 기본 구조
class BasicRAG:
    def __init__(self, embedding_model, vector_db, llm):
        self.embedding_model = embedding_model
        self.vector_db = vector_db
        self.llm = llm

    def process_query(self, query):
        # 1. 질문 임베딩
        query_embedding = self.embedding_model.encode(query)

        # 2. 문서 검색
        relevant_docs = self.vector_db.search(query_embedding, k=5)

        # 3. 프롬프트 구성
        prompt = self._build_prompt(query, relevant_docs)

        # 4. 생성
        answer = self.llm.generate(prompt)
        return answer

    def _build_prompt(self, query, docs):
        context = "\n\n".join([doc.content for doc in docs])
        return f"질문: {query}\n\n문맥: {context}"

Enter fullscreen mode Exit fullscreen mode

2. 청킹 전략

청킹은 문서를 모델이 처리할 수 있는 크기로 나누는 과정입니다.

2.1 의미적 청킹

from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
import numpy as np

class SemanticChunker:
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=500,
            chunk_overlap=50,
            separators=["\n\n", "\n", " ", ""]
        )

    def chunk_documents(self, documents):
        chunks = []
        for doc in documents:
            # 문서를 작은 청크로 분할
            doc_chunks = self.text_splitter.split_text(doc.content)

            # 각 청크의 임베딩을 계산
            chunk_embeddings = self.model.encode(doc_chunks)

            # 의미적 유사도 기반으로 청크를 재조합
            chunks.extend(self._semantic_merge(doc_chunks, chunk_embeddings))
        return chunks

    def _semantic_merge(self, chunks, embeddings):
        # 유사도가 높은 청크를 병합하는 로직
        # 간단한 예: 0.8 이상 유사도 청크는 병합
        merged_chunks = []
        return chunks  # 실제 구현은 복잡한 로직 필요

Enter fullscreen mode Exit fullscreen mode

2.2 재귀적 청킹

class RecursiveChunker:
    def __init__(self, chunk_size=1000, chunk_overlap=100):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

    def chunk_recursive(self, text):
        # 재귀적으로 청킹 수행
        chunks = []
        start = 0

        while start < len(text):
            end = min(start + self.chunk_size, len(text))
            chunk = text[start:end]
            chunks.append(chunk)

            # 오버랩 부분 추가
            start = max(0, end - self.chunk_overlap)

        return chunks

Enter fullscreen mode Exit fullscreen mode

2.3 에이전트 기반 청킹

class AgentBasedChunker:
    def __init__(self, llm):
        self.llm = llm

    def chunk_with_agent(self, text):
        # LLM이 청킹 전략을 결정
        prompt = f"""
        문서를 의미 있는 단위로 나누어주세요:
        {text}

        각 청크는 500자 이내로 구성하고, 의미 단위를 유지하세요.
        """
        response = self.llm.generate(prompt)
        return response.split("\n\n")  # 예시: 줄바꿈 기준으로 분할

Enter fullscreen mode Exit fullscreen mode

3. 임베딩 모델 선택과 비교

import torch
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModel
import numpy as np

class EmbeddingBenchmark:
    def __init__(self):
        self.models = {
            "all-MiniLM-L6-v2": SentenceTransformer("all-MiniLM-L6-v2"),
            "all-mpnet-base-v2": SentenceTransformer("all-mpnet-base-v2"),
            "sentence-t5-base": SentenceTransformer("sentence-t5-base")
        }

    def benchmark_models(self, texts):
        results = {}
        for name, model in self.models.items():
            # 임베딩 생성
            embeddings = model.encode(texts)
            results[name] = {
                "embedding_shape": embeddings.shape,
                "memory_usage": embeddings.nbytes,
                "speed": self._measure_inference_time(model, texts)
            }
        return results

    def _measure_inference_time(self, model, texts):
        import time
        start = time.time()
        model.encode(texts)
        end = time.time()
        return end - start

# 사용 예시
benchmark = EmbeddingBenchmark()
texts = ["문서 내용 1", "문서 내용 2", "문서 내용 3"]
results = benchmark.benchmark_models(texts)

Enter fullscreen mode Exit fullscreen mode

4. 벡터 데이터베이스 비교

4.1 Chroma

import chromadb
from chromadb import Client

class ChromaVectorDB:
    def __init__(self, collection_name="rag_collection"):
        self.client = Client()
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            metadata={"hnsw:space": "cosine"}
        )

    def add_documents(self, documents, embeddings):
        ids = [str(i) for i in range(len(documents))]
        self.collection.add(
            ids=ids,
            embeddings=embeddings,
            documents=[doc.content for doc in documents]
        )

    def search(self, query_embedding, k=5):
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=k
        )
        return results["documents"][0]

Enter fullscreen mode Exit fullscreen mode

4.2 Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue

class QdrantVectorDB:
    def __init__(self, host="localhost", port=6333):
        self.client = QdrantClient(host=host, port=port)
        self.collection_name = "rag_collection"

    def create_collection(self):
        self.client.create_collection(
            collection_name=self.collection_name,
            vectors_config={"size": 384, "distance": "Cosine"}
        )

    def add_documents(self, documents, embeddings):
        points = [
            {
                "id": i,
                "vector": embedding.tolist(),
                "payload": {"content": doc.content}
            }
            for i, (doc, embedding) in enumerate(zip(documents, embeddings))
        ]
        self.client.upsert(
            collection_name=self.collection_name,
            points=points
        )

    def search(self, query_embedding, k=5):
        results = self.client.search(
            collection_name=self.collection_name,
            query_vector=query_embedding.tolist(),
            limit=k
        )
        return [point.payload["content"] for point in results]

Enter fullscreen mode Exit fullscreen mode

4.3 pgvector


python
import psycopg2
from psycopg2.extras import Json
import numpy as np

class PGVectorDB:
    def __init__(self, connection_string):
        self.conn = psycopg2.connect(connection_string)

    def create_table(self):
        with self.conn.cursor() as cur:
            cur.execute("""
                CREATE TABLE IF NOT EXISTS rag_documents (
                    id SERIAL PRIMARY KEY,
                    content TEXT,
                    embedding VECTOR(384),
                    metadata JSONB
                )
            """)
            cur.execute("""
                CREATE INDEX IF NOT EXISTS idx_embedding 
                ON rag_documents USING ivfflat (embedding vector_cosine_ops)
            """)
        self.conn.commit()

    def add_document(self, content, embedding, metadata=None):
        with self.conn.cursor() as cur:
            cur.execute("""
                INSERT INTO rag_documents (content, embedding, metadata)
                VALUES (%s, %s, %s)
            """, (content, embedding.tolist(), Json(metadata)))
        self.conn.commit()

    def search(self, query_embedding, k=5):
        with self.conn.cursor() as cur:
            cur.execute("""
                SELECT content FROM rag_documents
                ORDER BY embedding <-> %s
                LIMIT %s
            """, (query_embedding.tolist(), k))


---

📥 **Get the full guide on Gumroad**: https://gumroad.com/l/auto ($7)

Enter fullscreen mode Exit fullscreen mode