惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Cisco Talos Blog
Cisco Talos Blog
T
Threat Research - Cisco Blogs
P
Privacy International News Feed
S
Schneier on Security
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
云风的 BLOG
云风的 BLOG
P
Proofpoint News Feed
Scott Helme
Scott Helme
人人都是产品经理
人人都是产品经理
G
GRAHAM CLULEY
O
OpenAI News
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
PCI Perspectives
PCI Perspectives
GbyAI
GbyAI
宝玉的分享
宝玉的分享
Y
Y Combinator Blog
T
Troy Hunt's Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
腾讯CDC
C
Check Point Blog
Spread Privacy
Spread Privacy
L
LINUX DO - 最新话题
Recent Announcements
Recent Announcements
大猫的无限游戏
大猫的无限游戏
P
Palo Alto Networks Blog
Hacker News: Ask HN
Hacker News: Ask HN
M
MIT News - Artificial intelligence
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
The Hacker News
The Hacker News
H
Hacker News: Front Page
Microsoft Azure Blog
Microsoft Azure Blog
I
InfoQ
T
Tor Project blog
Martin Fowler
Martin Fowler
博客园 - 叶小钗
罗磊的独立博客
C
Cyber Attacks, Cyber Crime and Cyber Security
H
Heimdal Security Blog
V
Vulnerabilities – Threatpost
Simon Willison's Weblog
Simon Willison's Weblog
Latest news
Latest news
WordPress大学
WordPress大学
G
Google Developers Blog
N
Netflix TechBlog - Medium
S
Security Affairs
S
Secure Thoughts
Know Your Adversary
Know Your Adversary

Proxmox Support Forum

[SOLVED] - Github Auth for Mirrors-Kernel Repo? [Automation] Mass migration tool for MS Win11/Server Proxmox GUI hang - not response is it possible to reject or quarantine spam based on conditions I set ? The PVENode task list in PVE9 is partially obscured due to the terminal font being too large. About 100% error reporting due to pveproxy.service hooks Kubernetes overlay networking breaks when upgrading from PVE 9.1 to PVE 9.2.3 Zentraler Speicher No space left on device Combine datastore and direct file archival to tape Kernel panic VFS: Unable to mount root fs on unknown-block (0,0) sobald ein 7.x Kernel verwendet wird. How to migrate disk of a VM from one ZFS to another Windows Server 2025 fails to boot after PVE 9.2 / Linux 7.0 Kernel upgrade Cannot Install Proxmox on T610 Poweredge with H700 PERC card sdn Config. gateway not reachable How to safely change domain/FQDN? Welche Filterquote erreicht ihr? NFS Share status unknown on 2 of 5 nodes Can't connect to PVE9 consoles [solved] Can't connect to PVE9 consoles [solved] [SOLVED] - Use secondary network for PVE commands Created cluster, one node storage gone BUG: proxmox mail gateway FROM = null bypass spam filtering Moving existing PBS from VMWare workstation to PVE cluster Does eBGP SDN fabric support external peering? Bug: PDM 1.1 not recognizing valid license status Proxmox GUI hang - not response PVE crashes unexpectedly Proxmox Backup Server 4.2 released! Advice ceph-osd crashes with kernel 6.17.2-1-pve on Dell system [META] Links on Proxmox Forum Website Hardwarer oder Software RAID Joining a cluster with already created guests VM PDM missing backup jobs from PVE / Log retention Remove VM.Monitor from all users/roles, PVE 9.2 Proxmox Freezing (new instalation) 9.2.2 - Intel 12700T No Web gui and random connection reset by peer [SOLVED] - i40e module for X710 Intel NIC Dutch Proxmox Day 2026 How pools use the space Corosync initiiert Reboot trotz Verfügbarkeit der Systeme Opt-in Linux 7.0 Kernel for Proxmox VE 9 available After PVE 8to9 upgrade, unable to check guest fs freeze status Problem with MegaRAID SAS3508 controller proxmox-kernel-7.0.2-6-pve failing network service Auto sync guest time after rollback of VM snapshot with RAM/state Broadcom BCM57504 (100G) bnxt_en TX timeout and NIC reset on Proxmox 8.1.5 — while BCM57414 (25G) works fine on same host QEMU 11.0 available on pve-test and pve-no-subscription as of now 350 MPM Solventless Lamination Machine for High-Speed Flexible Packaging Making sense of NVMe zfs and SMART errors [SOLVED] - PVE loses network connection after kernel upgrade to proxmox-kernel-7.0.0-3-pve [SOLVED] - Remove or reset cluster configuration. Proxmox 8.4.1 Fresh Install BCM57416 10G Ethernet Adapter Not Recognized PDM 1.1.1 unable to add AD realm with anonymous search [TUTORIAL] - Developer Workstation (Proxmox-VE 9) with cinnamon (LMDE7) SDN zone shows "pending" on peer nodes after node reboot (9.2.x) Cluster not quorate - extending auth key lifetime! Proxmox not rebooting properly (SOLVED) Proxmox 9 Stuck on loading initial ramdisk With new HA-Disarm Feature is there a Documentation for NUT Setup on Clusters? Proxmox 8.3 Installation Issue on ProLiant DL380 Gen9 Cluster networking setup LXC System images unavailable [SOLVED] - Fix: NVIDIA Drivers Failing after upgrade to Proxmox 9.2.2 (Kernel 7.0.2-6-pve) / NovaCore Conflict Install NUT directly on Proxmox VE and control guests from here driver usb for windows 7 System startup error and no network: Failed to start ifupdown2-pre.service - Helper to synchronize boot up for ifupdown. PBS backup space grow up constantly Proxmox Datacenter Manager 1.1 released! IPv4 not available in newly created VM Recommended Setup for Offsite Proxmox Backups? Hetzner Storage Box & Remote PBS Challenges duplicate, please delete this passthrought an USB device "by ID" to CT PDM Installer Freezes at 66% Tried PDM for the first time (version 1.1) - had issues PDM 1.1 automated install Suche Server-Provider für Proxmox connecting sdn to edge firewall SDN, IPAM & DHCP Migrating from read-only file system Ubuntu 26.04 installation fails for unknown reason Status Unbekannt nach Cluster Join Installing Proxmox Backup Server on Mac Mini (Late 2012) kernel 7.0 performance issue with zfs pools PVE becomes unreachable via ethernet but OS is running [SOLVED] - New 9.2 install - can't find 7.0.2-6-pve , not all the time [SOLVED] - Backup and dedupe a VM with LUKS Gibt es mit PVE 2.x ggf. Änderungen bei der RAM-Nutzung, bzw. deren Anzeige bei VMs? I need help for setting up backup solution Way more NAGware, very little functionality, bugs galore Root squashing virtiofsd with --uid-map Intel ixgbe Driver Update Fail Passkey Login (not 2FA) Roblox VM detection - can be overcome? [TUTORIAL] - ZFS-Autosnaptshot inkl. Rollback und Daten direkt recovern (Windows/Linux) How to stop PVE Kernel upgrade [SOLVED] - very long waiting to log in to lxc debian 11 ssh [TUTORIAL] - Configuring Fusion-Io (SanDisk) ioDrive, ioDrive2, ioScale and ioScale2 cards with Proxmox Increase maximum USB devices in vm.conf
Ceph rbd mirror force promote
invalid@exam · 2026-06-25 · via Proxmox Support Forum

Hello everyone,

I’m currently setting up my first Ceph mirror configuration and have a few questions regarding its behavior.
For example, I’m uncertain about how to force-promote an image on my DR cluster (site-b) during the synchronization process.
From what I’ve read in the documentation, in a disaster scenario occurring during synchronization, a force-promote operation promotes the last snapshot received by the DR cluster. However, as noted:

"Since this mode is not as fine-grained as journaling, the complete delta between two snapshots will need to be synced prior to use during a failover scenario. Any partially applied set of deltas will be rolled back at the moment of failover."

When I attempt to force-promote an image, I encounter the following error:

Code:

root@pve1-b:~# rbd mirror image promote ceph-pool/vm-103-disk-1 --force
2025-01-09T09:42:40.412+0100 7983d4e006c0 -1 librbd::mirror::snapshot::util:  can_create_primary_snapshot: cannot rollback
2025-01-09T09:42:40.412+0100 7983d4e006c0 -1 librbd::mirror::snapshot::PromoteRequest: 0x7983b0001d40 send: cannot promote
2025-01-09T09:42:40.412+0100 7983d4e006c0 -1 librbd::mirror::PromoteRequest: 0x7983b401a810 handle_promote: failed to promote image: (22) Invalid argument
rbd: error promoting image to primary
2025-01-09T09:42:40.412+0100 7983d84f1780 -1 librbd::api::Mirror: image_promote: failed to promote image

I’ve checked the snapshots on my DR cluster (site-b) and always see the latest snapshot of the image present there.

I have configured periodic snapshots to run every 3 minutes.
On the main cluster (site-a), I always retain the last 5 snapshots, while on the DR cluster (site-b), only the most recent snapshot is kept.
I assume that this latest snapshot is overwritten during the synchronization process

My main question is: How does Ceph handle promotion for an image when the data hasn’t been fully received on the DR cluster (site-b)?

Thank you!

Regards

Hi, we also have a Wiki page for Ceph Mirroring[0], if site A is still available, you first need to demote it:

Promote images on site B​

By promoting an image or a all images in a pool, we can tell Ceph that they are now the primary ones to be used. In a planned failover, we would first demote the images on site A before we promote the images on site B. In a recovery situation with site A down, we need to `--force` the promotion.

To promote a single image, run the following command:

Code:

root@site-b $ rbd mirror image promote <pool>/<image> --force

To promote all images in a pool, run the following command:

Code:

root@site-b $ rbd mirror pool promote <pool> --force

After this, our guests should start fine.

[0] https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring#Failover_Recovery

Hi, we also have a Wiki page for Ceph Mirroring[0], if site A is still available, you first need to demote it:

[0] https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring#Failover_Recovery

Hi @KevinS,

Thank you for your response.
In a planned failover, the system works perfectly.

But the issue I’m facing is that, in a recovery scenario (with no connection between site-a and site-b, similar to a DR scenario), there might be ongoing synchronization processes.
As a result, the complete image may not be fully available on site-b.

For this reason, when I attempt to force-promote an image (VM disk), the command returns the error I mentioned earlier.

In general, during a disaster scenario, I cannot guarantee that all synchronization processes have been fully completed.

The message in the output, "cannot rollback," makes me think that Ceph doesn’t have a restore point to revert to the previous snapshot.
From what I’ve observed, Ceph RBD mirror in snapshot mode keeps, by default, 5 snapshots on site-a and 1 snapshot on site-b (the recovery site). However, if I interrupt the incremental sync process for the single snapshot on site-b, the image is no longer available, even if I use promote --force.
Is what I’m saying correct?
I hope I’ve been as clear as possible.

Thank you

Last edited:

Hi @KevinS,
I wanted to share my findings with you:
When I attempt to force-promote a snapshot that is 88% copied and still in a syncing state, the force promotion process gets stuck during the promotion.
please see below the snapshot number 1247065

Code:

Image: vm-103-disk-0
Snapshots:
SNAPID   NAME                                                                                           SIZE    PROTECTED  TIMESTAMP                 NAMESPACE                                                 
1247064  .mirror.non_primary.0e0c83ba-b709-4bf9-832d-013ca194b00e.fff33483-93fa-4923-8240-6ca810a68744  21 GiB             Fri Jan 10 16:54:01 2025  mirror (non-primary peer_uuids:[] a946fac3-067c-47c1-a80c-05187eb77a30:431652 copied)
----------------------------------------
Image: vm-103-disk-1
Snapshots:
SNAPID   NAME                                                                                           SIZE   PROTECTED  TIMESTAMP                 NAMESPACE                                                   
1247059  .mirror.non_primary.ab2549b1-78bc-48f5-be58-85cdc470b3ae.3bb6f0ff-b22d-4d6b-b210-2f30edbbc011  5 GiB             Fri Jan 10 16:51:00 2025  mirror (non-primary peer_uuids:[] a946fac3-067c-47c1-a80c-05187eb77a30:431647 copied)
1247065  .mirror.non_primary.ab2549b1-78bc-48f5-be58-85cdc470b3ae.2f65824c-f6bc-49f9-9982-c4804fa9a0e1  5 GiB             Fri Jan 10 16:54:01 2025  mirror (non-primary peer_uuids:[] a946fac3-067c-47c1-a80c-05187eb77a30:431653 88% copied)
----------------------------------------
Image: vm-104-disk-0
Snapshots:
SNAPID   NAME                                                                                           SIZE    PROTECTED  TIMESTAMP                 NAMESPACE                                                 
1247066  .mirror.non_primary.04c18313-1f08-4343-89dd-b236a6e3937a.97d6f893-aa52-4d16-a951-cd9c3410711b  21 GiB             Fri Jan 10 16:54:01 2025  mirror (non-primary peer_uuids:[] a946fac3-067c-47c1-a80c-05187eb77a30:431654 copied)

This command is totally stuck

Code:

rbd mirror image promote ceph-pool/vm-103-disk-1 --force

I tried to catch the problem:

Code:

 subprocess.run(promote_command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
  File "/usr/lib/python3.11/subprocess.py", line 550, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 1199, in communicate
    self.wait()
  File "/usr/lib/python3.11/subprocess.py", line 1262, in wait
    return self._wait(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 1997, in _wait
    (pid, sts) = self._try_wait(0)
                 ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 1955, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

It's a very critical situation because in case on I have a disaster in site-a and the comunication between site-a and site-b no works I cannot promote site-b.
I hope I’ve been as clear as possible.

Thank you
bye

I have the same problem. Here is my situation:

  1. I have two ceph cluster called "main" and "secondary".
  2. I created an rbd image named my-pool/test and enable mirroring in snapshot mode.
  3. I mounted the image and wrote 1GB data using sudo dd if=/dev/zero of=/dev/rbd0 bs=1M count=1024
  4. I unmounted the image and created mirror snapshot with rbd mirror image snapshot my-pool/test
  5. While mirror snapshot was being synced, I powered off the "main" server.
  6. I promote the image on the "secondary" cluster with rbd mirror image promote my-pool/test --force
  7. The promote command got stuck or occurred core dumped error.

I think that the purpose of rbd mirroring is disaster recovery. However in a real disaster scenario, failure can occur at any time - even while an image is being synced. In this case, the image is no longer available.

It seems quite possible to promote the image using previous mirror snapshot while latest mirror snapshot is still being synced, is there a feature that allows this?

Is it possible make multiple version of snapshot copies, then promote specified version of snapshot copy?
according to https://docs.ceph.com/en/squid/rbd/rbd-mirroring/#rbd-mirroring
--------------------------------------------------------------------------------------------------------------------------------------------
For example:
$ rbd --cluster site-a mirror image snapshot image-pool/image-1
By default up to 5 mirror-snapshots will be created per-image. The most recent mirror-snapshot is automatically pruned if the limit is reached.The limit can be overridden via the rbd_mirroring_max_mirroring_snapshots configuration option if required. Additionally, mirror-snapshots are automatically deleted when the image is removed or when mirroring is disabled.
--------------------------------------------------------------------------------------------------------------------------------------------

Last edited:

Thank you for your reply

By default up to 5 mirror-snapshots will be created per-image. The most recent mirror-snapshot is automatically pruned if the limit is reached.The limit can be overridden via the rbd_mirroring_max_mirroring_snapshots configuration option if required.

This option affects only the primary(promoted) image. Regardless of this option, a demoted image always has only one mirror snapshot. While syncing, two mirror-snapshots exist temporarily in demoted image. Once the sync is complete, the previous mirror-snapshots is deleted.

What I wonder is whether I can cancel the sync and promote the image using previous mirror-snapshot while two mirror-snapshots is existing in demoted image.

Last edited:

Thank you for your reply

This option affects only the primary(promoted) image. Regardless of this option, a demoted image always has only one mirror snapshot. While syncing, two mirror-snapshots exist temporarily in demoted image. Once the sync is complete, the previous mirror-snapshots is deleted.

What I wonder is whether I can cancel the sync and promote the image using previous mirror-snapshot while two mirror-snapshots is existing in demoted image.

You are right..
this is the core of the problem.
From what I’ve observed, Ceph RBD mirror in snapshot mode keeps, by default, 5 snapshots on site-a and 1 snapshot on site-b (the recovery site). However, if I interrupt the incremental sync process for the single snapshot on site-b, the image is no longer available, even if I use promote --force.
I don't like that the Proxmox team hasn't addressed this topic.
I would have expected more engagement from them, or at least a response

This is so frustrating, as I worked with Enterprise Storage (HDS/DellEMC) over two decade, In a properly designed long distance DR solution, to keep at least 2 or more copies at remote site is the basic requirement. Because no one can assume which time the disaster will happen! so how to let remote site can take over the primary site's work in any situation is always need to be keep in mind. I googled some rbd mirror keyword and couldn't find how to keep more then 2 copies at remote site between two Ceph cluster! As the result my personally will consider to use rbd mirror for disaster recovery solution in currently is not a good idea. And i'm also doubt this issue can be solved by Proxmox because the Ceph are upstream project. Instead may need to find some other ways to achieve DR propose with Ceph. But that will be more complexity...

Last edited:

aaron

Proxmox Staff Member

So, I tried to reproduce this problem:

  • 2 Proxmox VE + Ceph clusters (v18.2.4 / reef)
  • Snapshot based sync
  • Overwrite test disk image with random data (lots of changes, more data to sync)
  • Watch target cluster for any new snapshots: watch -n0.5 -d rbd snap list --all vm-101-disk-3
  • Once the new snapshot is being copied:

    Code:

    on-primary peer_uuids:[] b7d7c03d-573a-4028-ad04-a6530f15d4a9:33779 26% copied)
    Kill all machines of the source cluster

Then try to promote the image on the target cluster: rbd mirror image promote vm-101-disk-3 --force

And so far, it hangs there. @willybong how long did you have to wait until you got the error message in the first post, roughly?
Which Ceph version did you test this with?

I will investigate further to see what can be done and if there are certain Ceph versions that work better.

For the Datacenter Manager we plan to integrate DR options into it. Then we don't need to use Cephs RBD mirroring directly anymore or other options like Backup -> remote sync -> (live) restore.

  • Off-site replication copies of guest for manual recovery on DC failure (not HA!)

https://pve.proxmox.com/wiki/Proxmox_Datacenter_Manager_Roadmap

Last edited:

So, I tried to reproduce this problem:

  • 2 Proxmox VE + Ceph clusters (v18.2.4 / reef)
  • Snapshot based sync
  • Overwrite test disk image with random data (lots of changes, more data to sync)
  • Watch target cluster for any new snapshots: watch -n0.5 -d rbd snap list --all vm-101-disk-3
  • Once the new snapshot is being copied:

    Code:

    on-primary peer_uuids:[] b7d7c03d-573a-4028-ad04-a6530f15d4a9:33779 26% copied)
    Kill all machines of the source cluster

Then try to promote the image on the target cluster: rbd mirror image promote vm-101-disk-3 --force

And so far, it hangs there. @willybong how long did you have to wait until you got the error message in the first post, roughly?
Which Ceph version did you test this with?

I will investigate further to see what can be done and if there are certain Ceph versions that work better.

For the Datacenter Manager we plan to integrate DR options into it. Then we don't need to use Cephs RBD mirroring directly anymore or other options like Backup -> remote sync -> (live) restore.

https://pve.proxmox.com/wiki/Proxmox_Datacenter_Manager_Roadmap

Hi Aaron,
thank you for your feedback!
I had to wait a few seconds before getting that error.
In my lab, I tried with both ceph 18 and 19.2.0 (my current version) but the problem persists.

Based on your experience, has Ceph RBD snapshot mirroring ever worked reliably in the past?

Since Ceph RBD mirror is used for DR scenario, I had assumed that this kind of critical case would been able to manage it, bu it is not the case.
In my humble opinion (and not just mine), a robust and secure disaster recovery mechanism will be the key factor at a corporate level in determining whether Proxmox is considered the right choice to adopt or not.

happy to hear that Proxmox will have its own DR mechanism