VictoriaMetrics 1.146.0 源码专题【左扬精讲】—— 整体数据流：一条监控数据的完整生命周期

博客园 - 左扬

左扬 · 2026-06-29 · via 博客园 - 左扬

VictoriaMetrics 1.146.0 源码专题【左扬精讲】—— 整体数据流：一条监控数据的完整生命周期

在监控系统中，一个监控数据点从 Prometheus 发出 Remote Write 请求，到最终写入磁盘成为可查询的时序数据，经历了怎样的旅程？本篇文章将从源码层面完整追踪一条监控数据的生命周期，包括协议解析、索引构建、数据压缩、存储合并等关键环节。

本文目录

数据写入全链路概览
协议解析层：Remote Write 到 RawRow
索引构建：MetricName 到 MetricID
存储层：Partition 到 Part 文件
查询层：PromQL 到结果集
面试高频提问

一、数据写入全链路概览

思考记忆提示 — 理解数据全链路是理解 VM 架构的基础

写入链路：Remote Write → HTTP → Parser → Storage.Add → Partition → InmemoryPart
合并链路：InmemoryPart → Small Part → Big Part → 更大 Part
查询链路：PromQL → Parser → Search → Merge → Results
面试高频提问：一条数据从写入到可查询需要经历哪些步骤？

让我们先从宏观角度理解一条监控数据在 VictoriaMetrics 中的完整旅程：

  VictoriaMetrics 数据写入全链路
┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│  ① 协议接入层                                                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ Prometheus ──Remote Write──► :8428/api/v1/write                       │  │
│  │                                       │                                │  │
│  │                                       ▼                                │  │
│  │                          lib/protoparser/prometheus/                   │  │
│  │                          WriteRequestParser.Parse()                    │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    ▼                                         │
│  ② 协议解析层                                                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ TimeSeries → []RemoteWriteSample                                        │  │
│  │                                                                           │  │
│  │ 每个 Sample 包含：                                                       │  │
│  │   - MetricName: __name__="cpu_usage" job="prometheus"                   │  │
│  │   - Timestamp: 1704067200000 (毫秒)                                     │  │
│  │   - Value: 95.5                                                         │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    ▼                                         │
│  ③ 索引构建层                                                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ MetricName ──编码──► []byte ──查/创建──► MetricID (uint64)            │  │
│  │                                                                           │  │
│  │ lib/storage/metric_name.go: Marshal()                                   │  │
│  │ lib/storage/tsid.go: GetOrCreateTSID()                                  │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    ▼                                         │
│  ④ 存储写入层                                                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ RawRow{TSID, Timestamp, Value, PrecisionBits}                           │  │
│  │                                                                           │  │
│  │ lib/storage/partition.go: AddRawRows()                                 │  │
│  │ lib/storage/storage.go: Add()                                           │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    ▼                                         │
│  ⑤ InmemoryPart 内存缓冲                                                    │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ InmemoryPart.AddRow(RawRow)                                           │  │
│  │                                                                           │  │
│  │ 4 个 ChunkedBuffer 并行写入：                                          │  │
│  │   - metaindex buffer                                                   │  │
│  │   - index buffer                                                       │  │
│  │   - data buffer                                                        │  │
│  │   - lens buffer                                                        │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    ▼ (1秒后)                                 │
│  ⑥ 刷盘生成 Part                                                          │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ InmemoryPart.MustStoreToDisk()                                         │  │
│  │                                                                           │  │
│  │ 生成: small_001.tar (metaindex + index + data + lens)                 │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

源码视角：数据写入的 6 个关键函数

理解数据写入链路，只需记住 6 个关键函数：

WriteRequestParser.Parse()：解析 Remote Write 协议 → lib/protoparser/prometheus/
MetricName.Marshal()：标签编码 → lib/storage/metric_name.go
GetOrCreateTSID()：查/建 MetricID → lib/storage/tsid.go
Storage.Add()：存储入口 → lib/storage/storage.go
InmemoryPart.AddRow()：内存缓冲 → lib/mergeset/inmemory_part.go
MustStoreToDisk()：刷盘 → lib/mergeset/inmemory_part.go

二、协议解析层：Remote Write 到 RawRow

思考记忆提示 — 协议解析是数据的入口，理解它才能理解整个数据流

Remote Write 协议基于 Protobuf 编码
lib/protoparser 支持 12+ 种协议
面试高频提问：VictoriaMetrics 支持哪些写入协议？

2.1 Remote Write 协议解析

当 Prometheus 发送 Remote Write 请求时，VictoriaMetrics 通过 lib/protoparser/prometheus/ 进行协议解析：

 // lib/protoparser/prometheus/parser.go
// Remote Write 协议解析入口

// WriteRequest 结构（来自 prompb）：
// message WriteRequest {
//   repeated.TimeSeries timeseries = 1;
// }
// message TimeSeries {
//   labels  = 1;
//   samples = 2;
// }
// message Sample {
//   value       = 1;
//   timestamp   = 2;
// }

// 解析流程：
// 1. HTTP 请求到达 /api/v1/write
// 2. WriteRequestParser.Parse() 解析 Protobuf
// 3. 遍历每个 TimeSeries，提取 labels 和 samples
// 4. 将 labels 转换为 MetricName
// 5. 将 samples 转换为 RawRow

// MetricName 结构：
// 二进制格式：[__name__=cpu_usage\x00job=prometheus\x00instance=localhost:9090]
// 注意：标签之间用 \x00 分隔，键值对用 = 分隔

// RawRow 结构：
// type RawRow struct {
//     TSID          TSID           // 20 字节：MetricID + 其他
//     Timestamp     int64          // 毫秒时间戳
//     Value         float64        // 指标值
//     PrecisionBits int           // 精度位数
// }

2.2 MetricName 二进制编码

MetricName 的二进制编码在 lib/storage/metric_name.go 中实现：

 // lib/storage/metric_name.go
// MetricName 二进制编码格式

// 原始标签：
// __name__="cpu_usage"
// job="prometheus"
// instance="localhost:9090"

// 二进制编码过程：
// 1. 标签排序（黄金排序：__name__ → job → instance）
// 2. 拼接：__name__=cpu_usage\x00job=prometheus\x00instance=localhost:9090
// 3. 转义特殊字符（\x00 → \x00\xff, = → \x00\xfe）
// 4. 最终二进制：[__name__=cpu_usage][\x00][job=prometheus][\x00][instance=localhost:9090]

// 关键设计点：
// - \x00 作为分隔符，高概率不在标签中出现
// - 黄金排序确保相同标签组合的编码结果唯一
// - 转义机制处理边界情况

// MarshalBinary() 实现：
func (mn *MetricName) MarshalBinary() ([]byte, error) {
    // 1. 分配足够大的 buffer
    // 2. 按黄金顺序写入每个标签
    // 3. 写入分隔符和转义
    // 4. 返回编码后的字节数组
}

三、索引构建：MetricName 到 MetricID

思考记忆提示 — MetricID 是时序数据的唯一标识，理解它的创建过程很重要

MetricID 是 uint64，通过 TSID 结构关联到 MetricName
indexDB 存储 MetricName → MetricID 的映射
面试高频提问：MetricID 是如何生成的？如何保证唯一性？

3.1 TSID 四字段体系

TSID（TimeSeries ID）是 VictoriaMetrics 中时序数据的核心标识，在 lib/storage/tsid.go 中定义：

 // lib/storage/tsid.go
// TSID 结构：20 字节定长标识符

type TSID struct {
    // 字段 1：MetricGroupID (uint32, 4 字节)
    // 用于同批次写入的快速分组
    MetricGroupID uint32
    
    // 字段 2：JobID (uint32, 4 字节)
    // 标签中 job 的索引 ID
    
    // 字段 3：InstanceID (uint32, 4 字节)
    // 标签中 instance 的索引 ID
    
    // 字段 4：MetricID (uint64, 8 字节)
    // 全局唯一递增的时序 ID
}

// TSID 生成流程（GetOrCreateTSID）：
// 1. 接收 MetricName（二进制编码后的标签）
// 2. 查询 indexDB：MetricName → TSID
// 3. 如果找到，返回现有 TSID
// 4. 如果未找到，生成新 TSID：
//    - atomic.AddUint64(&globalMetricIDCounter, 1) 获取新 MetricID
//    - 写入 indexDB：MetricName → TSID
//    - 返回新 TSID

// 为什么是 20 字节？
// - 4 + 4 + 4 + 8 = 20 字节
// - 对齐到 8 字节边界，访问效率高
// - 20 字节比 24 字节（3 个 uint64）更节省空间

3.2 indexDB 倒排索引

indexDB 是 VictoriaMetrics 的倒排索引引擎，在 lib/storage/index_db.go 中实现：

 // lib/storage/index_db.go
// indexDB 8 种索引前缀

// indexDB 是一个 LSM-like 的键值存储，专门存储索引数据
// 索引前缀定义了不同类型的索引：

const (
    // 前缀 0x00：MetricName → MetricID
    // 用于 TSID 查询
    nsPrefixMetricID = 0x00
    
    // 前缀 0x01：TagKey → TagValue → MetricID[]
    // 用于标签查询（如 job=prometheus）
    nsPrefixTagMetricID = 0x01
    
    // 前缀 0x02：TagValue → MetricID[]
    // 用于无键标签查询（如 =prometheus）
    nsPrefixMetricIDTagValue = 0x02
    
    // 前缀 0x03：Date → MetricID[]
    // 用于按日期查询
    nsPrefixDateMetricID = 0x03
    
    // 前缀 0x04-0x07：其他专用索引
    nsPrefix4 = 0x04
    nsPrefix5 = 0x05
    nsPrefix6 = 0x06
    nsPrefix7 = 0x07
)

// 写入索引数据：
// AddMetricName(tsid TSID, mn *MetricName)
// 1. 编码 MetricName 为二进制
// 2. 写入 nsPrefixMetricID: MetricName → TSID
// 3. 遍历每个标签，写入 nsPrefixTagMetricID
// 4. 写入 nsPrefixDateMetricID

// 查询索引数据：
// SearchTagKeys() / SearchTagValues() / GetTSIDByMetricName()

四、存储层：Partition 到 Part 文件

思考记忆提示 — Partition 是数据存储的核心，理解它才能理解查询

Partition 按月分区，不同月份的数据在不同 Partition
数据先写入 InmemoryPart，1秒后刷盘生成 Part
面试高频提问：数据是如何按时间分区存储的？

4.1 Partition 分区策略

VictoriaMetrics 按月分区存储数据，在 lib/storage/partition.go 中实现：

 // lib/storage/partition.go
// Partition 分区策略：按月分区

// Partition 目录结构：
// /data/2024_01/     ← 2024 年 1 月的分区
// ├── small_001.tar
// ├── small_002.tar
// ├── big_001.tar
// └── ...
// /data/2024_02/     ← 2024 年 2 月的分区
// └── ...

// 分区键计算：
func getPartitionKey(timestamp int64) string {
    // 时间戳转月份
    t := time.Unix(timestamp/1000, 0)  // 毫秒转秒
    return fmt.Sprintf("%d_%02d", t.Year(), t.Month())
    // 例如：2024-01-15 10:00:00 → "2024_01"
}

// GetOrCreatePartition：
// 1. 根据时间戳计算分区键
// 2. 在 partitions map 中查找
// 3. 如果不存在，创建新 Partition
// 4. 返回 Partition 实例

// Partition 内部结构：
type Partition struct {
    // 分区键，如 "2024_01"
    path string
    
    // 该分区的 InmemoryPart
    ip *InmemoryPart
    
    // 该分区已刷盘的 Part 列表
    parts []*Part
    
    // 后台合并任务
    merger *Merger
}

4.2 InmemoryPart 刷盘机制

InmemoryPart 是内存缓冲层，在 lib/mergeset/inmemory_part.go 中实现：

 // lib/mergeset/inmemory_part.go
// InmemoryPart：内存缓冲 + 1秒刷盘

// 4 个并行 buffer：
type InmemoryPart struct {
    // metaindex buffer：存储 Block 元信息
    metaindexBuf *bytes.Buffer
    
    // index buffer：存储 MetricName → BlockID 映射
    indexBuf *bytes.Buffer
    
    // data buffer：存储时序数据点
    dataBuf *bytes.Buffer
    
    // lens buffer：存储每行数据的长度
    lensBuf *bytes.Buffer
    
    // 写入锁
    mu sync.Mutex
    
    // 写入计数器
    rowsCount uint64
}

// AddRow 流程：
// 1. 加锁
// 2. 将 RawRow 写入 4 个 buffer（可能并行）
// 3. 解锁
// 4. 如果达到刷盘阈值，触发后台刷盘

// 刷盘流程（MustStoreToDisk）：
// 1. 创建临时文件 small_XXX.tar.tmp
// 2. 将 4 个 buffer 的内容打包写入 .tar 文件
// 3. 调用 tar.Close() 关闭
// 4. os.Rename() 原子性重命名为 small_XXX.tar
// 5. 更新 partition.parts 列表

// 刷盘触发条件：
// - 定时器：每 1 秒强制刷盘
// - 大小：buffer 超过阈值（默认 8MB）

五、查询层：PromQL 到结果集

思考记忆提示 — 查询是写入的逆过程，理解它才能理解数据如何被消费

查询链路：PromQL → Parser → Search → Merge → Results
涉及多分区并行查询和 k-way 归并
面试高频提问：查询是如何跨分区执行的？

5.1 PromQL 执行引擎

PromQL 查询执行在 lib/promql/exec.go 中实现：

 // lib/promql/exec.go
// PromQL 执行引擎

// 查询流程：
// 1. 解析 PromQL → AST（抽象语法树）
//   parseWithCache(q.PromQL)
//
// 2. 执行 AST → 时序数据
//   evalExpr(ast, q)
//
// 3. 返回结果
//   return results

// AST 节点类型：
// - *parser.MatrixSelector：瞬时向量选择器（如 cpu_usage{job="prometheus"})
// - *parser.VectorSelector：向量选择器
// - *parser.AggregateExpr：聚合表达式（如 sum()、avg()）
// - *parser.Call：函数调用（如 rate()、irate()）
// - *parser.BinaryExpr：二元运算
// - *parser.SubqueryExpr：子查询

// 执行顺序：
// 1. 从叶子节点（VectorSelector）开始
// 2. 调用 Search() 在存储层查询数据
// 3. 逐层向上执行算子（聚合、函数、二元运算）
// 4. 返回最终结果

// VectorSelector 执行：
// func (vs *VectorSelector) eval(q *Query) []Sequence {
//     // 1. 解析标签选择器
//     tf := NewTagFilters(vs.LabelMatchers)
//     
//     // 2. 调用存储层搜索
//     searchResult := storage.Search(tf, vs.Start, vs.End)
//     
//     // 3. 应用时间范围过滤
//     return filterByTimeRange(searchResult, vs.Start, vs.End)
// }

5.2 多分区并行查询

当查询跨越多个月份时，VictoriaMetrics 并行查询所有相关分区：

  多分区并行查询流程
┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│  PromQL: cpu_usage{job="prometheus"}[1h]                                   │
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ 查询时间范围：2024-01-15 09:00 ~ 10:00                                 │  │
│  │ 需要查询的分区：2024_01                                                  │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    ▼                                         │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │ Table.Search()                                                        │  │
│  │                                                                       │  │
│  │ 1. 根据时间范围确定需要查询的 Partition 列表                           │  │
│  │ 2. 并行向每个 Partition 发送 Search 请求                               │  │
│  │ 3. 收集所有 Partition 的结果                                           │  │
│  │ 4. k-way 归并排序                                                     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│              ┌─────────────────────┼─────────────────────┐                  │
│              ▼                     ▼                     ▼                  │
│  ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────┐      │
│  │ Partition 2024_01 │  │ Partition 2024_02 │  │ Partition 2024_03 │      │
│  │                   │  │                   │  │                   │      │
│  │ Part 1: hits      │  │ Part 1: hits      │  │ Part 1: hits      │      │
│  │ Part 2: hits      │  │ Part 2: no hit    │  │ Part 2: hits      │      │
│  │ Part 3: no hit    │  │ Part 3: hits      │  │ Part 3: no hit    │      │
│  └───────────────────┘  └───────────────────┘  └───────────────────┘      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

源码视角：查询链路的 5 个关键函数

理解查询链路，只需记住 5 个关键函数：

parseWithCache()：解析 PromQL → AST → lib/promql/
evalExpr()：执行 AST → 时序数据 → lib/promql/exec.go
Table.Search()：多分区并行查询 → lib/storage/table_search.go
Partition.Search()：分区级搜索 → lib/storage/partition_search.go
Merge()：k-way 归并排序 → lib/storage/merge.go

六、面试高频提问

面试问答 — 本节精选面试高频问题，直接命中面试官想听到的答案

Q1：一条数据从写入到可查询需要经历哪些步骤？

经历 6 个关键步骤：协议解析 → 索引构建 → 存储写入 → 内存缓冲 → 刷盘 → 合并。具体来说：1）Remote Write 请求到达 lib/protoparser/prometheus/ 被解析为 TimeSeries；2）MetricName 被编码为二进制；3）通过 GetOrCreateTSID() 获取或创建 MetricID；4）数据写入 Partition.AddRawRows()；5）InmemoryPart 每秒刷盘生成 Part；6）后台合并任务将小 Part 合并成大 Part。

Q2：MetricID 是如何生成的？如何保证唯一性？

通过原子操作全局递增计数器生成，通过 indexDB 去重保证唯一性。在 lib/storage/tsid.go 中，GetOrCreateTSID() 使用 atomic.AddUint64() 原子递增全局计数器获取新 MetricID。写入 indexDB 前先查询是否已存在，已存在则直接返回，避免重复。

Q3：数据是如何按时间分区存储的？

按月分区，通过时间戳计算分区键。在 lib/storage/partition.go 中，getPartitionKey() 将毫秒时间戳转换为 "YYYY_MM" 格式的分区键。不同月份的数据存储在不同目录（如 /data/2024_01/、/data/2024_02/），查询时根据时间范围只扫描相关分区。

Q4：InmemoryPart 为什么每秒刷盘而不是等内存满？

平衡数据可靠性和写入性能。如果只依赖内存阈值刷盘，在低写入场景下数据可能在内存中停留很久，进程崩溃会丢失大量数据。1 秒刷盘保证最多丢失 1 秒数据，同时保持高写入吞吐。对于超低写入场景（如每分钟只有几条数据），可以配置 -retentionPeriod 控制数据保留时间。

全篇必记口诀

数据全链路 = 写入链路 + 查询链路。写入链路：Remote Write → Parser → Marshal → GetOrCreateTSID → Add → MustStoreToDisk。查询链路：PromQL → parseWithCache → evalExpr → Table.Search → Partition.Search → Merge。记住这个口诀："解析建索引，缓冲后刷盘，查询并归并"。

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

博客园 - 左扬