惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Project Zero
Project Zero
Webroot Blog
Webroot Blog
Google DeepMind News
Google DeepMind News
T
Troy Hunt's Blog
N
News and Events Feed by Topic
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Forbes - Security
Forbes - Security
Attack and Defense Labs
Attack and Defense Labs
S
Security @ Cisco Blogs
W
WeLiveSecurity
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
LangChain Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
Security Affairs
O
OpenAI News
酷 壳 – CoolShell
酷 壳 – CoolShell
PCI Perspectives
PCI Perspectives
Y
Y Combinator Blog
H
Heimdal Security Blog
P
Privacy International News Feed
Know Your Adversary
Know Your Adversary
T
Tenable Blog
宝玉的分享
宝玉的分享
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
腾讯CDC
IT之家
IT之家
The Last Watchdog
The Last Watchdog
Jina AI
Jina AI
V
V2EX
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Cloudflare Blog
K
Kaspersky official blog
罗磊的独立博客
Help Net Security
Help Net Security
博客园_首页
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Privacy & Cybersecurity Law Blog
Simon Willison's Weblog
Simon Willison's Weblog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
Hacker News: Ask HN
Hacker News: Ask HN
NISL@THU
NISL@THU
C
CERT Recently Published Vulnerability Notes
云风的 BLOG
云风的 BLOG
博客园 - 三生石上(FineUI控件)
Google DeepMind News
Google DeepMind News
Martin Fowler
Martin Fowler
Scott Helme
Scott Helme
阮一峰的网络日志
阮一峰的网络日志

Stonecharioteer on Tech

I Traced My Traffic Through a Home Tailscale Exit Node What Was I Reading Last? In Three Not-So-Easy Pieces Dogfooding Is Hard Code blocks in your books, finally GoForGo v0.9.0 Merrilin - We built an app to read books I use a Macbook now Data Structures & Algorithms - Preparing for Interviews Using a local DNS namespace for local service discovery Direction KOllector - Publishing KOReader Highlights gbt: branches touched in the last 24 hours A Soiree into Symbols in Ruby Some Smalltalk about Ruby Loops Ruby Blocks Returning from Ruby Blocks, Procs and Lambdas My Linux Laptop Finally Works: How Claude Helped Me Fix Years of Annoyances TIL: Watchexec - Modern File Watching for Development Workflows A Less Busy Mind GoForGo - Learn Go through live examples Migrating My Old Blog to Hugo with Claude The Qtile Window Manager: A Python-Powered Tiling Experience Read the RFCs that Built the Internet Py-x-Protobuf - Or How I Learned to Stop Worrying and Love Protocol Buffers Python Reverse a List New Beginnings Leaving ChainSafe Systems Screen Lock for Cinnamon Desktop using Zenity and Terminal Commands Crews Not Teams A System for Getting Better at LeetCode So Far So Rust Retrying HTTP Requests with Rust A Primer on Control Charts Learning Rust Explicit is Better than Implicit: Rust for Pythonistas Using Custom Delimiters in Jinja Templates TIL: Creating Fixed Length Iterables in Python Documentation Without Assumption Vagrant Python - A Reflection in 2022 Learning Golang No, A Virtual Machine Is Not Enough: Why Developers Need Native Linux Empathy in Tech For Those Who Came in Late A Weekend With PostgreSQL TIL: Gooey and Python Fire for Quick GUIs and CLIs TIL: 2ality - Dr. Axel Rauschmayer's JavaScript Blog TIL: MassDNS - High-Performance Bulk DNS Lookups TIL: Matomo Analytics, Google Tech Writing, Memory Programming, and NES TV Signals TIL: MontyDB - MongoDB Implemented in Python Returning to the Craft of Programming TIL: CPUFetch, OneFetch, and Learn CSS TIL: DNS Performance Testing and Pi-hole with Unbound TIL: Eli Bendersky's Blog, Awesome By Example, NoCoDB, and Martin Kleppmann TIL: CRDTs, Extreme HTTP Performance, and BYTEPATH Game TIL: AutoInvent, ASGI, Python Packaging, RAPIDS GPU Computing, and FlaskCon TIL: MangaDesk - Terminal Client for MangaDex TIL: McFly - Smart Shell History Search TIL: Siege Load Testing and Awesome FastAPI Resources TIL: Ventoy Bootable USB and Justniffer Network Analysis TIL: CLI Code Review, Git Split Diffs, and Internal Combustion Engine TIL: Benford's Law, Web Security Headers, Event Sourcing, and Mozilla Security Guidelines How to Write Documentation - The README.md File The Importance of Documentation TIL: NNgroup UX Research, SponsorBlock, and Labella Python Library TIL: The Little Book of Rust Macros and Rust Performance Book TIL: Git-Bug Distributed Issue Tracker and Omni Kubernetes Monitoring TIL: Zellij - Modern Terminal Multiplexer TIL: How Discord Handles 2.5 Million Concurrent Voice Users TIL: Volumio - The Audiophile Music Player TIL: Areopagitica - Milton's Defense of Free Speech TIL: Fast Node Manager, Zoxide Smart CD, Technical Writing, PyO3, and Qubes OS TIL: Slurm Workload Manager for HPC Clusters TIL: Data Visualization Guide and Oso Authorization Academy TIL: CORS Deep Dive, Piku Tiny PaaS, Rust Strings, and Deno Standard Library TIL: Raspberry Pi OS Development, Vim Beginner Guide, Password Management, and QueryBook TIL: uBlock Origin Performance Optimization on Firefox TIL: Breaking PostgreSQL at Scale and LeetCode Problem Patterns TIL: Awesome Tmux Resources for Terminal Multiplexing TIL: Grit - A Multitree-Based Personal Task Manager TIL: Lens 4.2 Kubernetes IDE, Shell Scripting Guide, and Dark HTTP Server Do The Job You Hate So You Won't Hate The Job You Love TIL: Innernet VPN Solution and NoteCalc Calculator App TIL: Argo CD for GitOps and Lens Kubernetes IDE TIL: Modern Rust CLI Tools - System Monitoring, HTTP Requests, and DNS TIL: tz - A Time Zone Helper Tool TIL: Distributed Systems Education, Fallacies, and Self-Hosted Internet Archiving TIL: Real-Time Voice Cloning Technology TIL: ChartMuseum for Helm, AMD's Corporate Journey, and Kubernetes Pod Scaling TIL: Docker and Kubernetes Tools - Whaler, Descheduler, and Dive TIL: Post-Mortem Collection, Terminal Plotting, and Technical Twitter TIL: Dark Mode Toggle Web Component by Google Chrome Labs TIL: Python eval(), exec(), and compile() Functions TIL: Camelot PDF Tables, PostgreSQL Row Level Security, Zerodha Varsity, and Write Yourself a Git TIL: fuser Command for Process and File Investigation TIL: i Hate Regex - The Ultimate Regex Cheat Sheet TIL: Dolt - Git for Data and Database Version Control TIL: x86 Assembly Programming and SafeEyes Break Reminder TIL: Comprehensive Distributed Systems Reading List TIL: Cosmopolitan C Library, Distributed Systems Book, High Performance Browser Networking, and Rust Roguelike Tutorial
TIL: Dolt - Git for Data and Database Version Control
2021-03-05 · via Stonecharioteer on Tech

GitHub - dolthub/dolt: Dolt – It’s Git for Data

A SQL database with Git-style version control built into the core:

Core Concept:

  • Git + SQL: Combines familiar Git workflows with SQL database operations
  • Data Versioning: Track changes to data like you track changes to code
  • Collaboration: Multiple people can work on the same dataset simultaneously
  • Audit Trail: Complete history of who changed what and when

Key Features:

Git-Style Operations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Clone a database
dolt clone dolthub/ip-to-country

# Make changes to data
dolt sql -q "INSERT INTO countries VALUES ('XX', 'Test Country')"

# Stage changes
dolt add .

# Commit changes
dolt commit -m "Added test country"

# Push to remote
dolt push origin main

SQL Database Functionality:

  • Standard SQL: Full MySQL-compatible SQL interface
  • ACID Transactions: Complete transaction support
  • Indexes: Performance optimization with standard database indexes
  • Constraints: Primary keys, foreign keys, unique constraints

Unique Capabilities:

Branch and Merge Data:

1
2
3
4
5
6
7
8
9
# Create feature branch
dolt checkout -b feature/cleanup-data

# Make data changes
dolt sql -q "DELETE FROM users WHERE last_login < '2020-01-01'"

# Merge back to main
dolt checkout main
dolt merge feature/cleanup-data

Time Travel Queries:

1
2
3
4
5
6
7
8
-- Query data as of specific commit
SELECT * FROM users AS OF 'abc123';

-- Compare data between commits
SELECT * FROM diff('main', 'feature/cleanup', 'users');

-- Show history of specific row
SELECT * FROM dolt_history_users WHERE id = 123;

Use Cases:

Data Analytics:

  • Experiment Tracking: Different feature engineering approaches
  • Model Versioning: Track training data versions with model performance
  • Reproducible Research: Exact data state for research papers
  • A/B Testing: Compare dataset variants and results

Data Engineering:

  • ETL Pipeline Versioning: Track data transformation steps
  • Data Quality: Rollback corrupted data changes
  • Collaboration: Multiple analysts working on same dataset
  • Audit Compliance: Complete change history for regulations

Application Development:

  • Schema Evolution: Version database schema alongside data
  • Feature Flags: Different data configurations for different features
  • Testing: Isolated test data environments
  • Rollback Safety: Safe deployment with easy rollback

Architecture Benefits:

Storage Efficiency:

  • Content Addressable: Deduplication of identical data blocks
  • Incremental Changes: Only store what actually changed
  • Compression: Efficient storage of large datasets
  • Remote Sync: Only transfer changed data

Concurrent Access:

  • MVCC: Multiple version concurrency control
  • Branch Isolation: Changes don’t interfere until merge
  • Conflict Resolution: Merge conflict handling for data
  • Distributed: Clone and work offline

Comparison with Traditional Approaches:

vs Database Backups:

  • Granular Changes: See individual row changes, not just snapshots
  • Efficient Storage: Don’t duplicate unchanged data
  • Branch Support: Multiple parallel data versions
  • Merge Capability: Combine changes intelligently

vs Data Lakes:

  • Structured Data: SQL interface with schema enforcement
  • ACID Properties: Transactional consistency
  • Version Control: Built-in change tracking
  • Query Performance: Optimized for relational queries

Getting Started:

Installation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Install Dolt
curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | bash

# Initialize new database
mkdir my-database && cd my-database
dolt init

# Create table and add data
dolt sql -q "CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100))"
dolt sql -q "INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob')"

# Track changes
dolt add .
dolt commit -m "Initial users"

Connect Existing Tools:

1
2
3
4
5
# Start SQL server
dolt sql-server

# Connect with any MySQL client
mysql -h 127.0.0.1 -P 3306 -u root my-database

Enterprise Features:

  • DoltHub: GitHub-like hosting for Dolt databases
  • Access Control: User permissions and authentication
  • API Access: REST and GraphQL APIs
  • Integration: Works with existing BI and analytics tools

Limitations and Considerations:

  • Performance: Not optimized for high-throughput OLTP
  • Ecosystem: Newer tool with growing ecosystem
  • Learning Curve: New concepts for traditional database users
  • Storage: Version history can grow large over time

Dolt represents a paradigm shift in how we think about data management, bringing software engineering best practices to database operations and making data collaboration as natural as code collaboration.