惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
D
Docker
博客园 - 聂微东
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园 - 叶小钗
李成银的技术随笔
Hugging Face - Blog
Hugging Face - Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
大猫的无限游戏
大猫的无限游戏
Jina AI
Jina AI
罗磊的独立博客
小众软件
小众软件
月光博客
月光博客
量子位
雷峰网
雷峰网
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - Franky
The Cloudflare Blog
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
Last Week in AI
Last Week in AI
J
Java Code Geeks
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
宝玉的分享
宝玉的分享
H
Help Net Security
腾讯CDC
T
ThreatConnect
Cyberwarzone
Cyberwarzone
S
Securelist
A
Arctic Wolf
B
Blog
有赞技术团队
有赞技术团队
Y
Y Combinator Blog
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
F
Fox-IT International blog
P
Proofpoint News Feed
The Register - Security
The Register - Security
G
GRAHAM CLULEY
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
P
Privacy & Cybersecurity Law Blog
美团技术团队
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
Security Latest
Security Latest
F
Full Disclosure
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
Lohrmann on Cybersecurity

DEV Community

HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found
DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes
Naimur Rahma · 2026-05-23 · via DEV Community

The Algorithm Hiding Behind git diff

You've run git diff hundreds of times.

Red lines. Green lines. Done.

But have you ever stopped and asked — what algorithm is actually doing that?

It turns out, it's one of the most classic problems in computer science: Longest Common Subsequence. And it's been hiding inside your terminal every single day.

In this article, we'll explore how Git-style diffing works, why LCS is the right mental model, how the actual algorithm Git uses (Myers diff) connects to it, and what tradeoffs real tools make when choosing a diff algorithm.

This is the first article in my series "DSA Application in Real Life" — where I explore how common data structures and algorithms power the tools developers use every day.


The Problem Git Is Solving

Imagine we have an old version of a file:

function add(a, b) {
  return a + b;
}

Enter fullscreen mode Exit fullscreen mode

Then we update it:

function addNumbers(a, b) {
  return a + b;
}

Enter fullscreen mode Exit fullscreen mode

When we run git diff, Git shows:

-function add(a, b) {
+function addNumbers(a, b) {
   return a + b;
 }

Enter fullscreen mode Exit fullscreen mode

This looks obvious to us as humans. Only the function name changed.

But Git does not "understand" JavaScript the way we do. At the diffing level, Git treats the file as a sequence of lines. Its job is to compare two sequences and decide:

  • Which lines stayed the same?
  • Which lines were deleted?
  • Which lines were added?

This is a sequence comparison problem — and that's exactly where LCS comes in.


Why Simple Line-by-Line Comparison Is Not Enough

A beginner might think Git just compares files line by line:

Old line 1 vs New line 1
Old line 2 vs New line 2
Old line 3 vs New line 3

Enter fullscreen mode Exit fullscreen mode

This works only when changes happen at the same position. But real code changes are rarely that simple.

Consider this old file:

login()
validate()
save()
logout()

Enter fullscreen mode Exit fullscreen mode

Now we insert one new line:

login()
checkPermission()
validate()
save()
logout()

Enter fullscreen mode Exit fullscreen mode

A naive line-by-line comparison would produce:

Old: login()      New: login()            same
Old: validate()   New: checkPermission()  different
Old: save()       New: validate()         different
Old: logout()     New: save()             different
Old: (nothing)    New: logout()           added

Enter fullscreen mode Exit fullscreen mode

That makes it look like almost the entire file changed — which is completely wrong. Only one line was added.

A smarter approach doesn't compare by position. It finds what's common between the two files first. That's the LCS idea.


LCS: The Mental Model Behind Diffing

LCS stands for Longest Common Subsequence.

A subsequence means you can pick elements from a sequence while keeping their relative order — but they don't need to be adjacent.

Example:

Old = [A, B, C, D]
New = [A, C, E, D]

Enter fullscreen mode Exit fullscreen mode

The longest common subsequence is [A, C, D] — because A, C, and D appear in both sequences in the same order.

Applied to file diffing, the lines of each file become the sequences:

Old = [login(), validate(), save(), logout()]
New = [login(), checkPermission(), validate(), save(), logout()]

Enter fullscreen mode Exit fullscreen mode

LCS = [login(), validate(), save(), logout()]

Now Git can reason:

  • These lines are common → unchanged
  • checkPermission() is only in the new file → added

Result:

 login()
+checkPermission()
 validate()
 save()
 logout()

Enter fullscreen mode Exit fullscreen mode

That's the core idea.


The Actual LCS Algorithm (with Code)

Here's the classic dynamic programming solution you've likely seen in competitive programming:

def lcs_length(A, B):
    m, n = len(A), len(B)
    # dp[i][j] = LCS length of A[:i] and B[:j]
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if A[i-1] == B[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])

    return dp[m][n]

Enter fullscreen mode Exit fullscreen mode

For sequences A = [A, B, C, D] and B = [A, C, E, D], the DP table looks like:

     ""  A   C   E   D
""  [ 0   0   0   0   0 ]
A   [ 0   1   1   1   1 ]
B   [ 0   1   1   1   1 ]
C   [ 0   1   2   2   2 ]
D   [ 0   1   2   2   3 ]

Enter fullscreen mode Exit fullscreen mode

The answer is dp[4][4] = 3 → LCS length is 3 → [A, C, D].

Time complexity: O(m × n)
Space complexity: O(m × n)

For large files, this gets expensive — which is why Git doesn't use this directly.


How LCS Builds the Diff

Once you know the LCS, building the diff is straightforward:

  • Lines in the LCS → unchanged
  • Lines in old but not in LCS → deleted (prefix with -)
  • Lines in new but not in LCS → added (prefix with +)

Example:

Old = [A, B, C, D]
New = [A, C, E, D]
LCS = [A, C, D]

B is only in Old  → deleted
E is only in New  → added

Enter fullscreen mode Exit fullscreen mode

Diff output:

 A
-B
 C
+E
 D

Enter fullscreen mode Exit fullscreen mode

This is the basic shape of what Git, GitHub pull requests, VS Code comparison, and merge tools show: unchanged lines, deleted lines, and added lines.


Does Git Actually Use Textbook LCS?

Not directly. Git's default algorithm is Myers diff — and it solves a slightly different (but deeply related) problem called the Shortest Edit Script.

What is the minimum number of insertions and deletions needed to transform the old file into the new file?

The connection to LCS is direct:

LCS finds what is common.
Shortest Edit Script finds what changed.

Enter fullscreen mode Exit fullscreen mode

If LCS is long → fewer edits needed.
If LCS is short → more edits needed.

They are two sides of the same coin.

So when we say "Git uses LCS-based diffing," the accurate meaning is:

Git's diffing is based on sequence comparison ideas rooted in LCS, but its default implementation uses Myers' shortest edit script algorithm — which is faster in practice.


How Myers Diff Actually Works (Simplified)

Myers models the diff problem as a graph search.

Imagine a grid where:

  • The X-axis represents lines of the old file
  • The Y-axis represents lines of the new file
  • Moving right = delete a line from the old file
  • Moving down = insert a line from the new file
  • Moving diagonally = lines match (no edit needed)

For Old = [A, B, C, D] and New = [A, C, E, D]:

     A    B    C    D
   ┌────┬────┬────┬────┐
A  │╲   │    │    │    │
   ├────┼────┼────┼────┤
C  │    │    │╲   │    │
   ├────┼────┼────┼────┤
E  │    │    │    │    │
   ├────┼────┼────┼────┤
D  │    │    │    │╲   │
   └────┴────┴────┴────┘

Enter fullscreen mode Exit fullscreen mode

Myers finds the path from (0,0) to (N,M) that uses the most diagonal moves — because diagonals are free (matched lines). That path is the shortest edit script.

Time complexity: O(n × d) where d = number of differences
Space complexity: O(n)

This is much faster than O(m × n) LCS for files that are mostly similar — which is the common case in real codebases.


Diff as an Edit Script

Let's walk through a concrete edit script:

Old file → [A, B, C, D]
New file → [A, C, E, D]

Step 1: Delete B

A, C, D

Enter fullscreen mode Exit fullscreen mode

Step 2: Insert E after C

A, C, E, D

Enter fullscreen mode Exit fullscreen mode

Edit script: Delete B, Insert E — that's just 2 operations.

Git's diff output:

 A
-B
 C
+E
 D

Enter fullscreen mode Exit fullscreen mode

Clean, minimal, exactly right.


Why This Matters in Real Development

When we review code, we're not just looking at text changes — we're trying to understand intent.

A good diff makes that easy:

 function calculateTotal(items) {
-  return items.length;
+  return items.reduce((sum, item) => sum + item.price, 0);
 }

Enter fullscreen mode Exit fullscreen mode

Any reviewer immediately understands: the old code counted items, the new code sums their prices.

A bad diff creates noise and confusion. That's why diff algorithms matter — they're not just about correctness, they're about readability.


The Tradeoff: Shortest Diff vs Most Readable Diff

The "smallest" diff is not always the most readable one — especially in code with repeated patterns:

if (user) {
  return true;
}

if (admin) {
  return true;
}

if (owner) {
  return true;
}

Enter fullscreen mode Exit fullscreen mode

When many lines look similar, a diff algorithm can match the wrong lines. The result is technically correct but hard to read. That's why Git ships multiple algorithms.


Git's Four Diff Algorithms — With a Real Example


git diff --diff-algorithm=myers      # default
git diff --diff-algorithm=minimal
git diff --diff-algorithm=patience
git diff --diff-algorithm=histogram

Enter fullscreen mode Exit fullscreen mode

Here's what each one does and when to use it:

Myers (default)

Fast, generally good results. This is what runs when you just type git diff. Best for everyday use.

Minimal

Tries harder to find the absolute smallest diff. Slower, but useful when patch size matters (e.g., generating patches to send via email).

Patience

Prioritizes human readability. It only matches unique lines first, avoiding false alignments on repeated code. Best for reviewing refactors or moved code blocks.

Histogram

An evolution of Patience that also handles low-frequency lines well. Often produces the most readable output for real codebases. Many developers set this as their global default.

Side-by-side example — Myers vs Patience:

Given this change (a function was refactored):

# Old
def process(data):
    result = []
    for item in data:
        result.append(item * 2)
    return result

# New
def process(data):
    return [item * 2 for item in data]

Enter fullscreen mode Exit fullscreen mode

Myers output might show:

 def process(data):
-    result = []
-    for item in data:
-        result.append(item * 2)
-    return result
+    return [item * 2 for item in data]

Enter fullscreen mode Exit fullscreen mode

Patience output groups the change more cleanly around unique anchors, making it immediately obvious that the body was replaced with a list comprehension — less noise, same information.

To set histogram globally (recommended for most developers):

git config --global diff.algorithm histogram

Enter fullscreen mode Exit fullscreen mode


Algorithm Complexity Summary

Algorithm Rough Idea Best For
Textbook LCS DP O(m × n) time and space Learning the concept
Myers diff Efficient when files are mostly similar Default everyday diffs
Minimal Spends extra work to reduce diff size Smaller patches
Patience Uses unique lines as anchors Refactors / moved blocks
Histogram Extends patience using low-frequency lines Often readable code diffs

Where m = number of lines in the old file and n = number of lines in the new file.


Where the DSA Is Hiding

In competitive programming, LCS is a textbook DP problem. In the real world, the same idea powers:

Git diff
GitHub pull request review
VS Code file comparison
Merge conflict resolution
Google Docs version history
Code review platforms (Gerrit, Phabricator)
Patch generation

Enter fullscreen mode Exit fullscreen mode

The input changes — lines of code, words in a document, DOM nodes in a UI, events in a timeline — but the core question is always the same:

What stayed the same, and what changed?


A Real-World Developer Example

Old code:

function createUser(name, email) {
  const user = { name, email };
  saveUser(user);
  return user;
}

Enter fullscreen mode Exit fullscreen mode

New code:

function createUser(name, email, role) {
  const user = { name, email, role };
  validateUser(user);
  saveUser(user);
  return user;
}

Enter fullscreen mode Exit fullscreen mode

A well-tuned diff shows:

-function createUser(name, email) {
+function createUser(name, email, role) {
-  const user = { name, email };
+  const user = { name, email, role };
+  validateUser(user);
   saveUser(user);
   return user;
 }

Enter fullscreen mode Exit fullscreen mode

Any reviewer immediately understands: a role parameter was added, it's stored on the user object, and validation was introduced before saving. Three changes, instantly clear.

That's the value of a good diff algorithm — it's not just computing differences. It's helping humans understand change.


Why Git Works at the Line Level (Not Character Level)

Git diffs at the line level because source code is naturally line-based. A character-level diff would be more precise:

-const total = price * quantity;
+const total = price * quantity * tax;

Enter fullscreen mode Exit fullscreen mode

vs character diff: * tax was appended.

But character-level diffs get noisy fast for code review. Line-level is the right abstraction for most developer workflows. (You can get word-level diffs with git diff --word-diff when you need them.)

The best algorithm isn't always the most precise one. It's the one that gives the most useful output for the context.


LCS vs Myers: The Mental Model

LCS:    Find the longest part that stayed the same.
Myers:  Find the shortest set of changes to get from old to new.

Enter fullscreen mode Exit fullscreen mode

LCS gives you the intuition.
Myers gives Git an efficient practical algorithm.

If the LCS is long → few changes needed.
If the LCS is short → many changes needed.

They measure the same thing from different directions.


Why This Is a Great Example of DSA in Real Life

Many beginners ask: "Where do we actually use DSA in real projects?"

git diff is one of the best answers — because every developer runs it daily without thinking about it.

When you run git diff, you're using an algorithm.
When you review a pull request on GitHub, you're using an algorithm.
When you resolve merge conflicts, you're relying on algorithms that compare versions of files.

The algorithm is invisible behind a clean developer experience. That's what good engineering looks like — the user sees red and green lines, and behind it is a carefully designed algorithmic solution built on decades of computer science research.

That's the beauty of DSA. Not just for interviews. Inside the tools you use every day.


Practical Commands to Try

# Try different algorithms on any repo
git diff --diff-algorithm=myers
git diff --diff-algorithm=patience
git diff --diff-algorithm=histogram
git diff --diff-algorithm=minimal

# Word-level diff (great for prose or config files)
git diff --word-diff

# Set histogram as your permanent default
git config --global diff.algorithm histogram

Enter fullscreen mode Exit fullscreen mode


Final Thoughts

LCS may look like just another DP problem when you first learn it. But the idea is powerful:

Find what stayed the same so we can understand what changed.

Git uses it to show file changes. Code review tools use it to help developers understand pull requests. Merge tools use it to combine work from different branches. Document editors use it to show version history.

So the next time you run git diff, remember: you're not just seeing red and green lines. You're seeing dynamic programming, graph search, and decades of algorithmic research — all compressed into a two-word command.


References