How stale Ceph RBD locks and blocklisted clients caused OpenStack VMs to fail after a datacenter power outage — and how we recovered them.
After the power restoration, the Ceph cluster reported HEALTH_OK and OpenStack services appeared operational. A test VM booted successfully. However, all pre-existing VMs failed to start, dropping into initramfs with I/O errors before reaching the root filesystem:
No init found. Try passing init= bootarg.
BusyBox v1.36.1 (Ubuntu 1:1.36.1-6ubuntu3.1) built-in shell (ash)
Enter 'help' for a list of built-in commands.
(initramfs)
The key observation: newly created VMs worked without issues. This indicated the problem was specific to the relationship between existing VMs and their storage, rather than network or storage infrastructure issues.
Root Cause Analysis
The issue stemmed from Ceph's RBD exclusive locking mechanism. This feature prevents simultaneous writes to the same image from multiple clients, avoiding data corruption. When a compute node connects to an RBD volume, it acquires an exclusive lock; when disconnected cleanly, it releases the lock.
During the power outage, compute nodes lost power without clean disconnection. When they returned, they appeared as untrusted clients:
$ ceph osd blocklist ls
10.88.10.91:0/3853293677 2026-05-06T08:59:47.102488+0000
10.88.10.90:0/316670229 2026-05-07T00:26:11.581329+0000
10.88.10.90:0/3783311129 2026-05-07T00:26:11.581329+0000
...
listed 14 entries
Ceph blocklists clients that crash without releasing locks to prevent zombie processes from corrupting data. The old locks remained held by client IDs that no longer existed, creating a deadlock where VMs needed the locks to boot, but the locks were held by processes that would never release them.
Resolution
Verifying Lock State
We verified the theory by checking an affected volume's lock state:
$ rbd lock list --pool volumes --image volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be
There is 1 exclusive lock on this image.
Locker ID Address
client.3406724 auto 135766063836400 10.88.10.91:0/3853293677
The address matched a blocklisted entry. The lock was held by a client that would not return to release it.
Removing Stale Locks
The command syntax for force-removing an RBD lock requires positional arguments with quoted strings:
rbd lock remove volumes/volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be \
"auto 135766063836400" "client.3406724"
Verification:
$ rbd lock list --pool volumes --image volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be
No locks on this image.
The VM rebooted successfully.
Bulk Resolution
For multiple affected volumes, we used the following script:
for vol in $(rbd ls volumes); do
locks=$(rbd lock list volumes/$vol 2>/dev/null)
if echo "$locks" | grep -q "client"; then
echo "Removing lock on: $vol"
lock_id=$(rbd lock list volumes/$vol | awk 'NR==3{print $2" "$3}')
locker=$(rbd lock list volumes/$vol | awk 'NR==3{print $1}')
rbd lock remove volumes/$vol "$lock_id" "$locker"
echo "Done: $vol"
fi
done
Then hard rebooted all affected VMs:
for vm in $(openstack server list --all-projects -f value -c ID); do
name=$(openstack server show $vm -f value -c name)
status=$(openstack server show $vm -f value -c status)
echo "Rebooting: $name ($vm) - Current status: $status"
openstack server reboot --hard $vm
done
All VMs recovered.
Clearing Blocklist Entries
After confirming all locks were released and VMs were healthy, we cleared the blocklist entries:
ceph osd blocklist rm 10.88.10.90
ceph osd blocklist rm 10.88.10.91
Important: Only perform this step after confirming crashed nodes will not return with stale state. Reconnecting zombie processes while another client holds the lock risks data corruption.
Prevention Measures
Granting OpenStack Blocklist Capabilities
OpenStack requires specific Ceph capabilities to manage blocklist entries automatically. Without allow command "osd blocklist" in its monitor capabilities, Nova cannot clear stale entries.
Step 1: Check current capabilities
ceph auth get client.openstack
Step 2: Add blocklist capability
# First, save existing OSD caps
ceph auth get client.openstack -o /tmp/openstack.keyring
# Then update caps (adjust pool names and OSD caps for your environment)
ceph auth caps client.openstack \
mon 'allow r, allow command "osd blocklist"' \
osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=backups'
Note: Adjust pool names according to your environment (e.g.,
vms,volumes,images).
Step 3: Verify the update
ceph auth get client.openstack
Nova Configuration Tuning
The following settings were added to nova.conf on compute nodes:
[libvirt]
hw_disk_discard = unmap
disk_cachemodes = network=writeback
rbd_io_timeout = 30
The rbd_io_timeout parameter gives the RBD client additional time to recover during transient issues rather than immediately failing I/O.
Key Takeaways
Ceph's blocklist mechanism protects data from split-brain scenarios. The issue arises from unclean shutdowns leaving orphaned locks behind.
New VMs working while existing VMs fail is a diagnostic indicator. This pattern after an outage strongly suggests blocklist-related issues, avoiding time spent investigating network or OSD problems.
Proactively grant blocklist permissions to the OpenStack Ceph client. The
allow command "osd blocklist"capability enables automatic recovery without manual intervention.The
rbd lock removesyntax requires positional arguments with quoted strings. The--lockerflag is not available in many versions. Use the format:
rbd lock remove <pool>/<image> "<lock_id>" "<locker>"
- Include lock-related failure scenarios in disaster recovery testing. Standard monitoring and backup verification may not catch this failure mode.
`






















