惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Perl 🐪 Weekly #774 - Perl is too HOT How to Track AI Usage Without Losing Revenue (Complete Guide) 77 Rules Later: What Graduating Our First Stack Actually Looked Like When Premature Scaling Leads to Operator Burnout Multi-Repo Microservice Changes Are a Coordination Problem. I Solved It With AI Agent Teams. The Next Frontier: How Multi-Agent Systems are Redefining Productivity The Kimwolf Bust Just Outed Android Webcams as Botnet Fodder — Here's the Question Every Repurposed-Phone Camera Setup Has to Answer I'm an autonomous AI agent. I shipped 18 fixes to myself in one session. Building a Secure Future with Zero Trust Security Architecture Asynchronous Functions in Dart How I migrated magic-link login from Resend to AWS SES + Lambda five days before launch Edge Computing He creado una empresa ficticia IT/OT para poder encontrar sus vulnerabilidades y reforzar su seguridad en sus activos críticos Why I Built @editora/react I built a tiny UGC script generator because hooks are the hardest part The Phone Is Becoming the New Terminal Why Most AI Music Tools Feel Wrong to Developers Goroutines vs. Promises: Why Go and JavaScript Look at Concurrency Completely Differently How I Use Antigravity 2.0 to Navigate Open-Source Codebases and Make Better Technical Decisions Understanding Basic HTML & CSS Concepts for Beginners Go Error Handling: Annoying or Awesome? Your To-Do List Doesn't Know You — So I Gave Mine Three Brains Shell Basics (Bash, Zsh, Sh) Free MongoDB GUI Tool for Developers, Students, and Teams Designing High-Performance Blockchain Indexers Choosing Models for an Agentic Chat App on Amazon Bedrock How Smart Growth Teams Automate Their Marketing Stack in 2026 (Without Hiring More People) What I Learned About Memory-Augmented AI Agents Seven Docker Tips Every Engineer Should Know (from Docker Captains) Welcome to the Fast-Food Era of Testing: Over-Weight by Tests How to use Claude in vscode? Prompt Engineering for Automated Evaluation: Making LLMs the Judge in AI Builder Solutions Full Stack Projects Are Not Enough Anymore Virtualization & Cloud Basics Orakle: Turning Raw Blockchain Data into Intelligence with Gemma 4 Building an Autoposting Pipeline with Hermes Agent: Why Waterfall Beats Parallel, and the Edge Cases Nobody Talks About OpenShift Virtualization Migration Advisor — Local-First, Powered by Gemma 4 26B MoE WebMCP is coming — so I’m building webmcp.js I Disappeared for 4 Months After Launch - Here's What Brought Me Back Jira Is Turing-Complete (And You've Been Coding in It) NyayAI: Building an AI Legal Assistant for 1.4 Billion People — A Technical Deep Dive E-commerce Order Automation: Stripe + Invoice + Shipping Workflow How to Evaluate AI Agents: LLM-as-Judge Tutorial The Interview Prep Stack I Used as a Senior Software Engineer Targeting Big Tech Gemma4 Challenge OptiLearn - Powered by Google Gemma 4 Aura — The Gemma 4 Powered Agentic Web Copilot & Self-Healing Accessibility Engine I built a tool that catches misleading charts using Gemma 4 running locally Worklog companion with Gemma4 GBase: Building LLM Agents That Actually Learn from Their Mistakes Blossom — a small step toward student mental wellbeing WordPress Performance Monitoring: A Complete Guide Principal Components in TypeScript (Part 4) When three sharp wallets agree: what consensus signals on Polymarket actually mean I Built a Fail-Fast Rust Scheduler with Background OAuth Auto-Refresh (Part 2) Sharing is caring How Putting Faces (Literally) to My AI Garden Images Gave It a Personality Sofi Log #001: Thailand's Tourism Tax & the 180-Day AI Surveillance Wall Sofi Log #006: Decentralized IP-Address Obfuscation Specs Sofi Log #008: Bypassing Legacy Cross-Border Bank Fee Traps Secret Rotation Automation: The Operational Cost of Security Sofi Log #009: Portable Identity & DID Passport Framework Sofi Log #011: Autonomous Smart Treasury Repatriation Specs History of Linux & Unix I asked Claude if my plan was on track for the goal — and got an honest 'No' PHPStan 'expects X, Y given' — the trace it doesn't give you Using Gemma4 2B to Assist Community Health Workers Open-source Playwright wrapper that passes bot.sannysoft.com, pixelscan, and CreepJS in headless mode Policy Storyteller: Turning Nepali Bills into Human Stories with Gemma 4 Avoid Cross Module Dependencies with Dependency Cruiser Invariant-Driven Architecture: 20M transactions on a €80/mo Cloud VM. Stop using external npm packages just to generate a UUID v4 Choosing the Right Gemma 4 Model Matters More Than Choosing the Best One Your LLM Is Not an Agent. Your Framework Is Not Enough. You Need a Harness. From HTTPS to UCP: Shopping Is About to Stop Being Your Problem From Creation to Consumption: How Antigravity 2.0 and Gemini Spark Are Defining the Agentic Era 10 Mistakes I Wish I Knew Before Taking the CKA Exam AI That Actually Does Stuff: Autonomous Agents Explained Exploring AI workflow Orchestration: Comparing Weft, Python & Alternative Pipeline Approaches El Poder del Aprendizaje Federado: Cuando los Algoritmos Distribuidos Entrenan a la IA Email Marketing Automation in 2026: 5 Tools (and 1 Self-Hosted) Through Their APIs A Replay Runbook For Missed Publishing Windows Why timeout handling matters more than most backend logic How I Make $6,800/Month Selling Niche VS Code Extensions Model Routing Cost Checklist: Hosted APIs, Open Models, Or Self-Hosted Inference? ORA-00207 오류 원인과 해결 방법 완벽 가이드 Deno 2.8 Operator Upgrade Checklist: CI, Lockfiles, Node Compatibility, And Rollback AI-Discovered Vulnerabilities Need A Triage Queue, Not A Panic Channel AI Agent Workboards Need Audit Controls Before They Need More Agents Demystifying DevRel: What It Actually Is (And Why Should You Become One?) Your AI, Your Device, Your Data - Introducing Aide Gemma 4 GenAI Coach - GenAI Concepts Made Easy with an Interactive Playground QuietPulse - Mood Tracker Principal Components in TypeScript (Part 3) The pgAudit Attribution Gap: Why Role-Level Logging Fails GDPR and How to Close It Gemma 4 CAD Orchestrator I built a local Postgres triage co-pilot because HIPAA says I can't paste plans into ChatGPT or Claude Live Holographic Editor In Fractal Time Everbench: A document management system with Local Intelligence Instanton in Fractal Time
RAG 시스템 실전 구축 (v26)
matias yoon · 2026-05-25 · via DEV Community

RAG 시스템 실전 구축 (v26)

개요

이 가이드는 실전에서 RAG(Retrieval-Augmented Generation) 시스템을 구축하는 데 필요한 모든 단계를 다룹니다. 개발자들은 단순한 RAG 시스템을 구현하는 것에서 벗어나 실제 운영 환경에서의 성능, 비용, 유지보수를 고려한 완전한 솔루션을 만들고자 합니다.

1. RAG 기초: 검색 → 보완 → 생성 루프

RAG 시스템은 세 가지 핵심 단계로 구성됩니다:

  1. 검색(Retrieval): 사용자의 질문과 관련된 문서 조각들을 데이터베이스에서 찾습니다.
  2. 보완(Augmentation): 검색된 문서를 프롬프트에 추가하여 생성 모델이 더 많은 컨텍스트를 갖도록 합니다.
  3. 생성(Generation): 보완된 프롬프트를 기반으로 답변을 생성합니다.
# 기본 RAG 루프 구현
class BasicRAG:
    def __init__(self, embedding_model, vector_db):
        self.embedding_model = embedding_model
        self.vector_db = vector_db

    def retrieve(self, query, top_k=5):
        query_embedding = self.embedding_model.encode([query])
        return self.vector_db.search(query_embedding, top_k)

    def generate(self, query, context):
        prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
        return self.llm.generate(prompt)

    def run(self, query):
        context = self.retrieve(query)
        return self.generate(query, context)

Enter fullscreen mode Exit fullscreen mode

2. 청킹 전략: 의미적, 재귀적, 에이전트 기반

청킹 전략은 문서를 모델이 이해할 수 있는 단위로 분할하는 방법입니다.

의미적 청킹 (Semantic Chunking)

from sentence_transformers import SentenceTransformer
import numpy as np

class SemanticChunker:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.model = SentenceTransformer(model_name)

    def chunk(self, text, max_chunk_size=512):
        sentences = text.split('.')
        embeddings = self.model.encode(sentences)

        # 문장 간 거리 계산 후 의미 있는 단위로 분할
        chunks = []
        current_chunk = []
        current_length = 0

        for sentence, embedding in zip(sentences, embeddings):
            if current_length + len(sentence.split()) > max_chunk_size:
                chunks.append(' '.join(current_chunk))
                current_chunk = [sentence]
                current_length = len(sentence.split())
            else:
                current_chunk.append(sentence)
                current_length += len(sentence.split())

        if current_chunk:
            chunks.append(' '.join(current_chunk))
        return chunks

Enter fullscreen mode Exit fullscreen mode

재귀적 청킹 (Recursive Chunking)

class RecursiveChunker:
    def __init__(self, chunk_size=512, overlap=50):
        self.chunk_size = chunk_size
        self.overlap = overlap

    def chunk(self, text):
        chunks = []
        start = 0

        while start < len(text):
            end = min(start + self.chunk_size, len(text))
            chunk = text[start:end]

            # 오버랩 추가
            if start > 0:
                overlap_start = max(0, start - self.overlap)
                chunk = text[overlap_start:end]

            chunks.append(chunk)
            start = end - self.overlap

        return chunks

Enter fullscreen mode Exit fullscreen mode

3. 임베딩 모델 선택과 비교

모델 비교 테스트

import time
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModel
import torch

class EmbeddingBenchmark:
    def __init__(self):
        self.models = {
            'all-MiniLM-L6-v2': SentenceTransformer('all-MiniLM-L6-v2'),
            'all-mpnet-base-v2': SentenceTransformer('all-mpnet-base-v2'),
            'sentence-tiny': SentenceTransformer('sentence-tiny')
        }

    def benchmark(self, texts):
        results = {}
        for name, model in self.models.items():
            start_time = time.time()
            embeddings = model.encode(texts)
            end_time = time.time()

            results[name] = {
                'time': end_time - start_time,
                'memory': len(embeddings) * len(embeddings[0]) * 4  # float32
            }
        return results

# 사용 예시
benchmark = EmbeddingBenchmark()
test_texts = ["This is test text 1", "This is test text 2", "Another sample text"]
results = benchmark.benchmark(test_texts)

Enter fullscreen mode Exit fullscreen mode

성능 기준 정리

  • all-MiniLM-L6-v2: 최적의 균형 (256차원, 50MB)
  • all-mpnet-base-v2: 높은 정확도 (768차원, 100MB)
  • sentence-tiny: 빠른 속도 (128차원, 20MB)

4. 벡터 데이터베이스 비교

Chroma vs Qdrant vs pgvector vs Milvus

# Chroma 예제
import chromadb
from chromadb.utils import embedding_functions

class ChromaRAG:
    def __init__(self):
        self.client = chromadb.Client()
        self.collection = self.client.get_or_create_collection(
            name="rag_collection",
            embedding_function=embedding_functions.DefaultEmbeddingFunction()
        )

    def add_documents(self, documents, ids):
        self.collection.add(documents=documents, ids=ids)

    def search(self, query, top_k=5):
        results = self.collection.query(
            query_texts=[query],
            n_results=top_k
        )
        return results['documents'][0]

# Qdrant 예제
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue

class QdrantRAG:
    def __init__(self):
        self.client = QdrantClient(host="localhost", port=6333)
        self.collection_name = "rag_collection"

    def create_collection(self):
        self.client.recreate_collection(
            collection_name=self.collection_name,
            vectors_config={"size": 384, "distance": "Cosine"}
        )

    def search(self, query, top_k=5):
        results = self.client.search(
            collection_name=self.collection_name,
            query_vector=query,
            limit=top_k
        )
        return [hit.payload['text'] for hit in results]

# pgvector 예제
import psycopg2
from psycopg2.extras import RealDictCursor

class PGVectorRAG:
    def __init__(self, connection_string):
        self.conn = psycopg2.connect(connection_string)

    def search(self, query_embedding, top_k=5):
        with self.conn.cursor(cursor_factory=RealDictCursor) as cursor:
            cursor.execute("""
                SELECT text, 1 - (embedding <-> %s) as similarity
                FROM documents
                ORDER BY similarity DESC
                LIMIT %s
            """, (query_embedding, top_k))
            return cursor.fetchall()

Enter fullscreen mode Exit fullscreen mode

성능 비교

데이터베이스 성능 메모리 사용량 설치 복잡도
Chroma 빠름 낮음 쉬움
Qdrant 빠름 중간 중간
pgvector 중간 높음 어려움
Milvus 빠름 높음 어려움

5. 전체 RAG 파이프라인 코드


python
import numpy as np
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.utils import embedding_functions
import logging

class CompleteRAGPipeline:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.embedding_model = SentenceTransformer(model_name)
        self.client = chromadb.Client()
        self.collection = self.client.get_or_create_collection(
            name="documents",
            embedding_function=embedding_functions.DefaultEmbeddingFunction()
        )
        self.logger = logging.getLogger(__name__)

    def setup(self, documents, ids):
        """문서 추가 및 임베딩"""
        try:
            embeddings = self.embedding_model.encode(documents)
            self.collection.add(
                documents=documents,
                embeddings=embeddings.tolist(),
                ids=ids
            )
            self.logger.info(f"Added {len(documents)} documents")
        except Exception as e:
            self.logger.error(f"Error adding documents: {e}")

    def retrieve_context(self, query, top_k=5):
        """문서 검색"""
        query_embedding = self.embedding_model.encode([query])
        results = self.collection.query(
            query_texts=[query],
            n_results=top_k,
            include=['documents', 'distances']
        )
        return results['documents'][0], results['distances'][0]

    def generate_answer(self, query, context):
        """답변 생성 (모델 사용)"""
        prompt = f"""
        사용자 질문: {query}


---

📥 **Get the full guide on Gumroad**: https://gumroad.com/l/auto ($7)

Enter fullscreen mode Exit fullscreen mode