惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

DEV Community

The Context Tax: Why Every Cursor Session Costs You 15 Minutes Prompt Physics: Building a Cognitive Steering Layer for Gemma 4 Pain Points Will Always Outlive Platforms 92. BERT: The Model That Reads in Both Directions QAOA vs. 75,000 Nodes: Building a Hybrid Architecture to Solve NP-Hard Problems When Quantum Simulators Hit a Wall E2B? E4B? 26B A4B? The Gemma 4 Model Names Finally Explained One Tool That Cuts Token Costs 40-80% for Claude Code, Codex, opencode, and openclaw Building a 32-URL economy microsite on top of a 754,000-row SQLite dataset Coordinating 100+ AI Agents in the Field: Practical Patterns for Robotic Swarms Static site search for Astro in 2026: why I picked Pagefind over Algolia and Lunr How I built pairwise AI model compare pages with Claude Haiku and a budget cap Three post-deploy checks I run after every Cloudflare Pages build Why I'm betting on AI-curated directories when Google AI Overviews answer the same queries When boto3 doesn't have it (yet), you write it: a realtime speech-to-speech story in Python Zero-Trust RAG: Defeating the Shared Private Link Deadlock in Azure Terraform You Can't Co-Design What You Don't Operate Counting tokens is dumb. So we built a free metric for AI proficiency. Choosing the Right RAG Strategy A Complete Decision Guide to Chunking, Agentic RAG, and GraphRAG The Egregious Cost of Compliance: One Platform's Overly Broad Restrictions GitHub Breach via VSCode Extension, ZTE Router CVE-2026-34472, & Public Repo Secrets Leaks Applied AI: From Agent Orchestration to Workflow Automation & Code Generation SQLite Journaling on SMB, TypeGraph for SQL Graphs, Cross-Engine Migrations Steps to Deploying a Virtual Machine in Linux Stop Putting dd() Everywhere Debug the Database From the Source Instead Africa's Digital Ecosystem is Not Dead Digital Payments in Africa: A System Designer's Lament # How to Validate UK VAT Numbers, NINO, Company Numbers and UTR in Any Language (2026) Chat with your database in plain English — locally, for free The simplest self-hosted RAG you'll ever set up (Apache 2.0, 20K stars) Building Production RAG Pipelines: Practical Lessons Benchmarking AWS Nova on Log Data: How It Compares to ChatGPT-3.5 Tracking Real-Time Solana Liquidity Pools Using PHP and Webhooks Strands Agents + AgentCore Runtime - a perfect match Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering Building a Full-Stack AI Agent on Amazon Bedrock AgentCore Tencent just released a RAG framework and nobody's talking about it Why hypergraphz beats every other Python hypergraph library Gary Winston Won: How “Antitrust” Predicted the Fate of Developers 5 Chinese AI tools with 100K+ stars that the West is ignoring I built a multi-agent AI workflow with Claude Code + Java/Spring Boot (real-world experiment) Understanding Solana: From Account Model to Token Creation Hello DEV! I'm a DevOps Engineer who built a 15-microservice Ecommerce Platform 🚀 Are you really doing CI/CD? The security problem nobody is talking about: MCP servers Transparency correlates with security maturity: what the TRACS study found about EDR vendors Why I built a baby tracker after a week of trying every other one Turn Any API Into a SQL Database Preventing double-bookings with PostgreSQL exclusion constraints Gemma 4 wrote three summaries in one response. The middle one was a self-disclaimer. Trunk-Based Development with Release Streams: A Real-World Case Study Hardware End-of-Support-Life (EOSL) — The EOL Risk Nobody Tracks The Complete EOL Calendar for 2026 — Every Major Software End-of-Life Date Your EOL Dependencies Are a Compliance Problem — Not Just Tech Debt Hidden Compliance Risks from Unsupported Software — What Auditors Find First React End-of-Life Dates — What's Actually Supported in 2026 AI Cost Attribution Evidence Anchors in 2026: How to Close Tenant Chargeback Disputes Without Re-running Allocation Self-evolving retrieval lifts benchmark scores 25% Building a Self-Healing Kill Switch for AI Infrastructure AI/ML Research Digest — May 16, 2026 My Experiment with Global Access: A Cautionary Tale of Unchained Commerce Shipping Your Machine: Building a Container in 60 Lines of Code (Part 1) How I Built a Sub-10ms Car Database API for 86,835 Vehicles Using FastAPI and Supabase AVL Trees Explained: How Rotations Keep BST Operations O(log n) Go Gotchas That Cost Me Hours (Learn From My Pain) Python Day 2: Conditions, Loops & Functions — The Engine Behind Every AI App Access Denied: What Every AWS Beginner Gets Wrong About IAM Stop Running LLM Workloads on Vanilla Kubernetes Google I/O 2026: From Consumer to Builder OpenGuard AI How to Validate Spanish NIF, NIE, CIF and IBAN in Any Programming Language (2026) What I Learned Building a 402-Powered API for Agent Workflows Faking a Payment Gateway in a Country Stripe Does Not Support AWS vs DigitalOcean for SaaS: Why We Chose DigitalOcean for a Production Rails App Running an Online Store Without a Credit Card Processing Account is a Myth Handling Non-Stationary Time Series: Building a Probabilistic Engine with XGBoost & Python AI-Written Code Is Only Better When a Skilled Programmer Is Holding the Wheel What I learned scraping 141 crypto cardholder agreements Google I/O Review (1/5) — Gemini 3.5 'Flash' Costs 15x More Than Flash 2.0. It's Pro in Disguise Inspector.dev (Neuron), Laravel AI SDK, and Prism PHP: A Practical Comparison for Laravel Developers Beyond CRUD: Building a GitHub Activity Tracker to Level Up Backend Engineering Building a native terminal for AI coding agents in Rust + GPUI Bypassing Bandwidth Limitations for Global E-commerce Platforms Without the Traditional Cost Burden The Dark Side of Standardized E-commerce Solutions for Global Creators Saved by chance The git commands I actually run every day Google I/O Review (4/5) — Google Quietly Killed Gemini CLI Rate Limiting Strategies in Go: Token Bucket, Leaky Bucket, and Sliding Window Understanding Reinforcement Learning with Human Feedback Part 3: Collecting Human Preferences Building Software for Undocumented Citizens: Why PayPal, Stripe, and Gumroad Don't Cut It Outside the US Which LLM is the best stock picker? I built a benchmark to find out. Google I/O Just Made MCP Inevitable kovax-react 0.7: Next.js App Router, kovax-react/server, and jest-axe in every test Spec Anchor Development: The Methodology That Replaced Our AI Chaos The Art Of Keeping Business Logic Honest Legal Buddy 🚀 — AI-Powered Legal Chat, Document Review & Drafting with Gemma 4 I replaced nginx with a reverse proxy I wrote in Go How to Stop Leaking AWS Keys to GitHub (And What to Do When You Already Did) JavaScript Number Tricks Every Developer Should Know (2026) Talki vs Intercom: An Honest Comparison for B2B Startups in 2026 Idea: **Shazam for Movies** Upload a screenshot, short clip, or Reel/Shorts link from social media and instantly find the movie or TV show using AI. Thinking of building this with **Next.js + FastAPI + OpenCLIP + Whisper**. Thoughts?
A Unified View of AI Evolution: From Machine Learning to LLMs, RAG, and Fine-Tuning
Naimul Karim · 2026-04-28 · via DEV Community


Artificial Intelligence has progressed far beyond its early rule-based origins. What once depended on predefined logic has evolved into systems that can learn from data, reason through problems, and even generate entirely new content. This transformation is largely driven by Machine Learning (ML), where algorithms improve their performance by identifying patterns in large datasets rather than following hard-coded instructions.

Building on ML, Deep Learning introduces neural networks—layered computational structures inspired by the human brain. These networks excel at processing complex data such as images, speech, and text. The next leap in this evolution is Generative AI, which shifts the focus from analyzing data to creating it. Whether producing text, images, or audio, generative systems mimic human creativity in increasingly sophisticated ways.

Large Language Models: The Core of Modern GenAI

At the center of today’s generative revolution are Large Language Models (LLMs). These models are designed to interpret and produce human-like language, enabling natural conversations, content generation, and problem-solving.

Most modern LLMs are built on the Transformer architecture, introduced in the landmark concept “Attention Is All You Need.” This architecture uses attention mechanisms to understand how words relate to each other in context, making it far more effective than earlier sequence models.

Some prominent LLM families include:

  • OpenAI models such as GPT-4.5, GPT-4o, and smaller optimized variants
  • Anthropic’s Claude series (e.g., Claude 3.5 Sonnet, Claude 3 Opus)
  • Meta’s Llama models (e.g., Llama 3.x series)
  • Google’s Gemini models

These models power a wide range of applications, from chatbots and virtual assistants to marketing content generation, document summarization, and even software development tasks like debugging and code generation.

How LLMs Work in Practice

Accessing LLMs

Users can interact with LLMs through intuitive interfaces (chat-based systems) or integrate them into applications using APIs.

Prompting and Instructions

To guide an LLM toward the desired output, users provide structured inputs—this process is known as prompt engineering. The clarity and design of prompts significantly influence the quality of responses.

Understanding Language via Embeddings

LLMs convert text into numerical representations called embeddings. These vectors capture semantic meaning, enabling the model to understand relationships between words, phrases, and broader contexts.

Controlling Output with Temperature

LLMs do not always produce the same answer. A parameter called temperature controls how deterministic or creative the output is. Lower values lead to predictable responses, while higher values increase variability and creativity.

Grounding LLMs with Real-World Knowledge

Despite their capabilities, LLMs are inherently general-purpose. They do not automatically know company-specific or real-time information. To make them useful in practical settings, additional context must be provided. Two key approaches enable this: Retrieval-Augmented Generation (RAG) and Fine-Tuning.

Tokenization: How Models Read Language

Computers don’t interpret language the way humans do. Instead of understanding full words or sentences directly, Large Language Models break text into smaller units called tokens. These tokens are then converted into numbers so the model can process them mathematically.

A token isn’t always a whole word—it can be:

A complete word (“river”)
A fragment of a word (“run” + “ning”)
Symbols or punctuation (like “?” or “.”)

Different AI models use different methods to split text into tokens. Some rely on frequently occurring patterns, while others use statistical approaches to segment text.

On average, in English:

1 token is roughly equal to 4 characters
Or about ¾ of a word

Why this matters:

Efficiency: Breaking text into tokens allows models to process language in a structured way.
Flexibility: Even unfamiliar words can be understood by splitting them into smaller parts.
Cost impact: More tokens directly increase usage costs.

Example:
Sentence: “Learning AI is fun.”
Possible tokens: “Learn”, “ing”, “ AI”, “ is”, “ fun”, “.”
Each of these is internally mapped to a numeric ID for computation.

Context Windows: The Model’s Working Memory

LLMs don’t have unlimited memory. Instead, they operate within a fixed limit called a context window, which defines how many tokens the model can consider at one time.

Think of it as short-term memory:

Smaller models handle a few thousand tokens
Advanced models can process very large inputs, even entire books

If the input exceeds this limit, older parts are removed from consideration. The model then loses access to that earlier information.

Why this matters:

Conversations: Important details from earlier messages can disappear in long chats.
Large files: Long reports or documents must often be split into sections.
Cost vs capability: Larger windows provide more context but require more resources.

Example:
Imagine summarizing a long report. If the report is longer than the model’s context window, only the most recent sections might be considered unless special techniques are used.

Token-Based Pricing: Why Usage Adds Up

Most AI platforms charge based on token usage. This includes both:

Input tokens: The text you send
Output tokens: The text generated in response

The total cost depends on the combined number of tokens processed.

Simple breakdown:

Total tokens = input + output
Pricing is typically calculated per 1,000 tokens

Why this matters:

Efficiency saves money: Shorter prompts reduce cost
Control output: Limiting response length prevents unnecessary usage
Planning: Developers often estimate token usage before sending requests

Example:

Input: 300 tokens (a short question with context)
Output: 450 tokens (a detailed answer)
Total: 750 tokens

This total determines the cost of that interaction.

Retrieval-Augmented Generation (RAG)

RAG enhances LLMs by connecting them to external knowledge sources such as databases, documents, or APIs.

How It Works

  1. A user submits a query
  2. Relevant information is retrieved from a knowledge source
  3. This information is added to the model’s input
  4. The LLM generates a response grounded in both its training and the retrieved data

Benefits

Produces more accurate, fact-based responses
Keeps information up to date without retraining
Cost-efficient since the base model remains unchanged

Trade-offs

Slightly slower due to the retrieval step

Fine-Tuning: Customizing Intelligence

Fine-tuning takes a pre-trained LLM and further trains it on domain-specific data. This process embeds specialized knowledge directly into the model.

How It Works

  • A base model is trained further on curated datasets
  • It learns domain terminology, patterns, and workflows

Benefits

  • Faster responses (no external lookup required)
  • Highly tailored outputs aligned with specific use cases

Trade-offs

Expensive in terms of compute and maintenance
Requires retraining to incorporate new information

RAG vs Fine-Tuning

RAG and fine-tuning serve different but complementary purposes:

RAG is ideal for dynamic, frequently changing knowledge
Fine-tuning is best for consistent, domain-specific expertise

In real-world systems, combining both often yields the best results—fine-tuning for behavior and tone, and RAG for factual accuracy and freshness.

Understanding LLM Performance

The effectiveness of an LLM is closely tied to its size, typically measured by the number of parameters (often in billions). Larger models tend to perform better on complex tasks but require significant computational resources to train and deploy.

This trade-off has led to the rise of smaller, efficient models—sometimes called “mini-giants”—which aim to deliver strong performance with lower cost and latency.