How to Upgrade Ceph Clusters in Stretch Mode

OneUptime Blog

Nawaz Dhandala · 2026-03-31 · via OneUptime Blog

Upgrade Considerations for Stretch Mode

Upgrading a Ceph cluster in stretch mode requires extra care because you must maintain quorum throughout the upgrade. Upgrading monitors or OSDs on both sites simultaneously can break quorum and cause a cluster-wide outage.

Pre-Upgrade Checklist

Before starting the upgrade, verify the cluster is healthy:

ceph status
ceph health detail
ceph osd stat
ceph mon stat

Ensure all PGs are active+clean:

ceph pg stat | grep "active+clean"

Set conservative flags to prevent rebalancing during the upgrade:

ceph osd set noout
ceph osd set noscrub
ceph osd set nodeep-scrub

Step 1 - Upgrade the Arbiter Monitor

Start with the arbiter since it has no OSDs and minimal impact on cluster operations:

ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 --daemon-types mon --hosts mon-arbiter

Monitor the upgrade:

ceph orch upgrade status

Wait for the arbiter to rejoin quorum:

ceph mon stat

Step 2 - Upgrade Site A Monitors

Upgrade the two monitors on site A one at a time:

ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 --daemon-types mon --hosts mon-dc1a

Wait for quorum:

ceph quorum_status

Then upgrade the second monitor on site A:

ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 --daemon-types mon --hosts mon-dc1b

Step 3 - Upgrade Site B Monitors

Repeat for site B monitors:

ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 --daemon-types mon --hosts mon-dc2a
ceph quorum_status
ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 --daemon-types mon --hosts mon-dc2b

Step 4 - Upgrade Managers

Upgrade MGR daemons:

ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 --daemon-types mgr

Step 5 - Upgrade OSDs by Site

Upgrade OSDs on site A first, then site B. This maintains data accessibility from at least one site:

# Upgrade site A OSDs
ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 \
  --daemon-types osd --hosts host-dc1a,host-dc1b

# Wait for PGs to recover
watch ceph pg stat

# Upgrade site B OSDs
ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.0 \
  --daemon-types osd --hosts host-dc2a,host-dc2b

Step 6 - Post-Upgrade Verification

After all daemons are upgraded, clear the maintenance flags:

ceph osd unset noout
ceph osd unset noscrub
ceph osd unset nodeep-scrub

Verify the cluster version and health:

ceph version
ceph status
ceph orch ps | grep -v "running"

Summary

Upgrading Ceph clusters in stretch mode requires a sequential per-site approach starting with the arbiter monitor. By upgrading monitors and OSDs site by site and waiting for quorum and PG health at each step, you maintain continuous availability throughout the upgrade process. Setting maintenance flags before starting prevents unnecessary rebalancing during the rolling upgrade.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

OneUptime Blog

Upgrade Considerations for Stretch Mode

Pre-Upgrade Checklist

Step 1 - Upgrade the Arbiter Monitor

Step 2 - Upgrade Site A Monitors

Step 3 - Upgrade Site B Monitors

Step 4 - Upgrade Managers

Step 5 - Upgrade OSDs by Site

Step 6 - Post-Upgrade Verification

Summary