慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
揭车之秘:释数据库之内涵(InnoDB与ARCHIVE之辨)
Tejas · 2026-05-24 · via DEV Community

众开发者视数据库引擎为黑箱:汝书一问,数据便自现。然当数据增至百万行,欲明其理是故,此数据存于实器,则制器之道异矣。

若将所有数据悉数倾入默认数据库,则存储之费终将暴涨,查询之速亦将迟滞。

欲建可扩之系统,须明二种截然对立之存储哲理:基于页面的交易存储(InnoDB)及流式压缩存储(檔案), 及其如何連接使用表分区.


一、InnoDB之层级(自字节至表空间)

MySQL之默认引擎InnoDB,为速、为众、为重读重写而建。非徒追加素文于牍,乃于高构之严阶,统御数据。

存储之栈

  • 表空间:此乃最高之逻辑容器。启现代之设若innodb_file_per_table者,每表皆有其自之物理.ibd檔於硬碟之上。
  • 者,幅也:者,以防操作系統之檔案系統散佈吾表於碟之物理扇區,InnoDB乃以塊塊分配空間,名曰Extents。每幅皆正1兆字。 之巨细,并聚续页为束,使序次之数据恒居碟上相邻之地.
  • 页: 乃 InnoDB 所为之根本微单元。常制,InnoDB 之页,正为 十六千字 。每值 InnoDB 读或书数据,必载入十六KB之页全之乃至君之服务器之RAM(Buffer Pool),纵尔仅索一列。
  • 行:此结构之极基,乃独行之列,密布于数据页之内。因 InnoDB 之用也。聚簇索引B+树汝之行列,依其主键,于磁盘上物理排序而存。

2. 存档引擎(反页面之法)

若汝无需更新数据?若汝所存者,乃亿万系统日志、点击流或审计轨迹,惟于事有损时方观之?

投此于InnoDB,实乃极费,盖索引之树与16KB页面结构,徒增汝之磁盘之耗。存档引擎 完全颠倒 InnoDB 之设计,以解此问题:

  • 无固定页: 不将数据结构于严整之 16KB 块,ARCHIVE 视数据为连续无界之 仅有追加之二进制字节流,存于 .arz 文件中。
  • 即时流压缩: 資料既入,乃經於記憶壓縮之緩衝。尾隙之餘,悉刪,而一優化之位首,處理 NULL 之值。原始之行列,隨即以 zlib 之法壓縮,方及於盤。
  • 索引之權衡: 以存其微渺之跡,ARCHIVE 許 無次級索引。所允之索引,惟AUTO_INCREMENT之列而已。

盖因其为纯文本之压缩字符串,非索引之块结构,故ARCHIVE常得之比,较之InnoDB,其压缩率可达三比一至十比一之奇效


3. 架构比较之表

架构之特征 InnoDB引擎 存档引擎
存储布局 严苛十六页之页/一兆字节之区 连续zlib压缩字节流
主文件格式 .ibd(数据与索引合并) .arz(数据)+.frm(元数据)
写入模型 B+树插入,页分裂 仅存入式内存缓冲流
支持之变更 INSERTSELECTUPDATEDELETE INSERTSELECT唯(WORM模式)
锁定之机制 细粒度行锁(MVCC) 交织行流锁定

4. 连接寰宇:表分区与之交易

既明二机之理,可融此心智模型为寒热数据层级系统。其旨简矣:使近时之数据(热)存于InnoDB,以便速更速索;徙旧时之数据(冷)于他储,以节磁盘之用。

MySQL不允許於單一分割表中混用儲存引擎,故吾等採用一策,名曰分割交換。此乃僅涉及元數據之換置,能於一瞬之間轉移百萬行數。

步驟甲:創分割之InnoDB表

吾等以時戳或日期,按月分割一活動之InnoDB表。

CREATE TABLE system_logs (
    log_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    log_date DATE NOT NULL,
    subsystem VARCHAR(50) NOT NULL,
    message TEXT NOT NULL,
    PRIMARY KEY (log_id, log_date)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(log_date)) (
    PARTITION p_past_month VALUES LESS THAN (TO_DAYS('2026-05-01')),
    PARTITION p_current_month VALUES LESS THAN (TO_DAYS('2026-06-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

步入全屏模式 退出全屏模式

步骤二:即时分区交换

月终之时,欲压缩旧数据。若运行巨量DELETE之语,则表将封锁,CPU骤升。故当创建一空而同之InnoDB暂存表,即时易旧分区:

-- 1. Create a temporary staging table matching the schema
CREATE TABLE system_logs_stage LIKE system_logs;
ALTER TABLE system_logs_stage REMOVE PARTITIONING;

-- 2. Instantly swap the old partition into the staging table (Takes < 1 second)
ALTER TABLE system_logs EXCHANGE PARTITION p_past_month WITH TABLE system_logs_stage;

-- 3. Drop the now empty partition from the active table
ALTER TABLE system_logs DROP PARTITION p_past_month;

步入全屏模式 退出全屏模式

自此,可直将system_logs_stage之内容,注入以ARCHIVE支持的表,使汝之冷日志,于数据库中自然压缩存之


5. 产业实况:今之现状

分区间换是否实用于生产?然,日日皆用之。 此乃弃置或隔离历史数据之圭臬,而活表不锁。

然,虽 ARCHIVE 引擎为流压缩之宏伟蓝图,然今之云基建团队,于"冷"级分层之道,常别有处理。今时,生产数据库之存储与 IOPS(每秒输入输出操作数)实属昂贵。工程师不欲冷、旧之日志数据,虽经压缩,亦存于同活生产数据库实例之上。

然,诸公司乃用分表交换之法,将旧数据隔离于暂存之表,悉流之出 MySQL,入云中数据湖(如 AWS S3)为超压缩列式之Apache Parquet文件,复弃其暂存之表。如此则生产数据库之存留,既轻且速,运行之费亦殊为廉。


结论

尔建侧项或习软件工课,易徇常例配置数据库。然窥机发微,则改尔应事之方。

洞悉InnoDB之固板页分配与ARCHIVE之流变压缩流间之张力,助君审思数据访问之格局——适得其所之存储策略,配适得其所之工作负载,乃由仅撰标准增删改查之应用,进至工于优化、可投于产之系统之道也。