惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
I
InfoQ
The Cloudflare Blog
人人都是产品经理
人人都是产品经理
博客园 - Franky
T
Tailwind CSS Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
博客园_首页
罗磊的独立博客
V
V2EX
李成银的技术随笔
大猫的无限游戏
大猫的无限游戏
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
True Tiger Recordings
Vercel News
Vercel News
Cyberwarzone
Cyberwarzone
Cisco Talos Blog
Cisco Talos Blog
F
Fox-IT International blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
M
Microsoft Research Blog - Microsoft Research
Know Your Adversary
Know Your Adversary
爱范儿
爱范儿
The Register - Security
The Register - Security
G
Google Developers Blog
The Hacker News
The Hacker News
Malwarebytes
Malwarebytes
S
Securelist
博客园 - 三生石上(FineUI控件)
Jina AI
Jina AI
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
SegmentFault 最新的问题
博客园 - 叶小钗
F
Fortinet All Blogs
Apple Machine Learning Research
Apple Machine Learning Research
宝玉的分享
宝玉的分享
博客园 - 聂微东
T
Threatpost
博客园 - 【当耐特】
D
Docker
P
Privacy & Cybersecurity Law Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
G
GRAHAM CLULEY
V
Visual Studio Blog
C
Cisco Blogs
IT之家
IT之家
S
Security Archives - TechRepublic
Latest news
Latest news
阮一峰的网络日志
阮一峰的网络日志

jdhao's digital space

Manage uv.lock file with Renovate Set up Python Provider for Neovim Ripgrep Config to Search Hidden Files Pre-commit Setup for Your Project I read the nvim v0.12 release note so you don't have to Return Different Values for Each Call of A Mock Migrate Python Project from Pip to Uv 德语常用不规则动词 葱油鸡腿制作 Check Trailing White Spaces in Your Project 菜谱:茄子肉丁 object vs nested type in data mapping in Elasticsearch Logging setup for Pytest Select fields in Elasticsearch: _source, fields and stored_fields 中式葱花饼制作 菜谱: 凉拌苤蓝(卜留克/kohlrabi) 我也有高考 PTSD Garmin Course Syncing Not Working? Prevent Accidental Index Delete in Elasticsearch How to Import GPX File into Garmin Watch Python system PATH issues When We Use Pytest 菜谱:泰式打抛牛肉 菜谱:烤箱羊肉串 How to Filter Warnings in Python/pytest 家常烤箱烤鸡腿 Comparison between Several Desktop Speakers How to Use LuaRocks Package in Neovim Macbook 外接显示器 家常萝卜炖羊排 Run the Job Immediately after Starting Scheduler in Python APScheduler Retry for Google Cloud Client 菜谱:土豆金枪鱼沙拉 菜谱:椰香咖喱鸡 凉拌绿豆宽粉制作 Make Python logging Work in GCP Liveness and Readiness Check in Kubernetes Notes on Using GCP Logging 西班牙土豆饼制作 Elasticsearch Version Conflict Error How to Use the Elasticsearch task API Speed up document indexing in Elasticsearch via bulk indexing Index refresh issue in Elasticsearch Google Cloud Storage Usage 家常煎羊排制作 凉拌茄子制作 Configure Python logging with dictConfig Debugging Wezterm Issues Black Formatter Setup for Python Project Git line ending config Garmin Forerunner 965 Essential Tips and Setups How to Download Files from Google Cloud Storage in the Databricks Workspace Notebook Databricks Cli Usage Working with Databricks Workspace Files 手抓羊肉饭制作 Databricks Init Scripts Using Virutal Environment in Python with venv File Systems in Databricks LATERAL VIEW EXPLODE in Spark 菜谱:麻婆豆腐 在德国做台湾卤肉饭 FastAPI testing and OpenAPI doc generation Change Timezone in Databricks Spark How to Profile Your Python Script/Module 菜谱:茄子肉沫 Migrating from Packer.nvim to Lazy.nvim How to Extract PDF file on macOS How to Deploy Fastapi Application with Docker Nerdfont Icon Missing after Wezterm Upgrade Pylsp setup for Neovim in 2023 How to Parse Query Param With Multiple Values in FastAPI 菜谱:土豆胡萝卜烧牛肉 Zsh Startup Files in macOS PATH Variable Changed inside Tmux on macOS? Work with JSON File in Neovim Running/importing Python code/module in Databricks Agile and Scrum 菜谱:凉拌牛肉 Awesome Command Line Tools Written in Rust How to get or set Databricks spark configuration Set Up German Version macOS Add A Custom Search Engine for Vimium 中国大陆小米手机如何使用 Google Pay 春节回乡记 滇西之行 2023 贵阳行 2023 程序员海外工作---语言篇 2023 长沙行 2023 西安行 德国工签申请指南 2022 年博客回顾 感染 omicron 记录 How to Override Default Options in Neovim Variadic Arguments in Lua How to Enable Method Autocompletion for OpenCV How to Read Local CSV File to Table in MySQL I read the nvim v0.8 release note so you do not have to Creating A Trigger in PostgreSQL Cost of Living in Shenzhen You Do Not Need a Plugin for This Feature Ctrl-left and Ctrl-right Not Working in macOS?
Node, Index, Shard in Elasticsearch
2025-11-14 · via jdhao's digital space

relationship between cluster, node, index, shard, segment#

Explanations of basic terminology:

  • An Elasticsearch cluster has multiple nodes, for example data nodes, ML nodes, etc.
  • A node is a JVM instance that is running Elasticsearch.
  • An index is a collection of documents, an index can have multiple primary shards and replica shards.
  • A shard is placed in a node in the Elasticsearch cluster.
  • A shard is a Apache Lucene index
  • A Lucene index consists of multiple segments (internal structure used by Lucene)

You can use the cat API to get the info about nodes/index/shard/segments:

GET _cat/nodes?v=true

GET _cat/indices/my-index

GET _cat/shards/my-index

GET _cat/segments/my-index

# or you can also use the following api to get segments info about an index
GET my-index/_segments

ref:

shards and replia (primary and replica)#

A index can have multiple primary shards and replica shards. Primary shard can accept read/write requests, while replica shards can only accept read requests.

The cat shards api can be used to check the status of shards:

# v=true will show the column header
GET _cat/shards/my_index?v=true&h=index,shard,prirep,state,docs,unassigned.for,unassigned.reason&s=state

number of shards and number of replicas#

For how to set proper number of shards, refer to official doc

Constraints for the number of replica: for a primary shard, its replica shards can not be in the same node, also between those replica shards, they can not be in the same node. This effectively means that the number of replica must be less or equal to num_node - 1. For example, if you have 3 nodes, if primary1 is in node 1, then its replica shards can only take node 2 and node 3. If you break this constraint, and set the number of replica to larger value, when you check the info of this index, you will see that its health status is yellow instead of green.

GET _cat/indices/my_index?v

If you check the shard info about this index (GET _cat/shards/my_index), you will see that some replica has UNASSIGNED status:

my_index 0     r      UNASSIGNED

The number of shards is a static index setting and can only be set at index creation time. The number of replicas is a dynamic setting that can be changed dynamically for a index without interrupting search and indexing request. You can set the number of shards and replicas using the index creation api:

DELETE my_index

PUT my_index
{
  "settings": {
    "index.number_of_shards": "1",
    "index.number_of_replicas": "2"
  }
}

As explained, the number of replica is a dynamic setting, you can change the value after index creation with index-update-setting api:

PUT my_index/_settings
{
  "settings": {
    "index.number_of_replicas": "1"
  }
}

When you decrease the number of replicas, Elastic will delete the extra replicas. When you increase the number of replicas, Elastic will automatically copy the primary shards to suitable node. For some time, you will see the index status is yellow. If you use the cat-shard API, you will see that the state for the replica shards is INITIALIZING. After some time, the state of these replica shards become STARTED, and the index status becomes green.

ref:

shard write and read model#

When we do indexing operation for an index, the operation is first done on primary shards, then synced to replica shards. If you have a large number of documents to index, this is usually slower than only updating the primary shards. So the Elastic official doc recommends to set the number of replica to 0 for initial large load. After indexing, you can set the number of replica to its original value, Elastic will then sync the changes under the hood.

Having multiple replica helps Elastic to prevent data loss and also let Elastic to handle more search request, because it can distribute the read operation to one of the node holding the replica shards. When Elastic receive search/read request, the request will be routed to nodes that contains the relevant data, see shard-routing.

explain why shard is unassigned or assigned to a certain node#

If you see that a shard is unassigned in the cat-shard API and want to get more detailed info. The cluster allocation explain API can explain why a shard is unassigned or assigned. The API: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-allocation-explain.html

GET _cluster/allocation/explain
{
  "index": "my_custom_index",
  "shard": 0,
  "primary": true
}

For the index parameter, it seems we can not use alias, and we have to use the actual index name. shard refers to the shard number. primary refers whether this is a primary or replica shard.

Note that when we want to explain for unassigned shard, we should not use the current_node:

To explain an unassigned shard, omit this parameter.

References#