Troubleshooting OpenStack Instance I/O Errors: A Ceph Blocklist Case

How stale Ceph RBD locks and blocklisted clients caused OpenStack VMs to fail after a datacenter power outage — and how we recovered them.

After the power restoration, the Ceph cluster reported HEALTH_OK and OpenStack services appeared operational. A test VM booted successfully. However, all pre-existing VMs failed to start, dropping into initramfs with I/O errors before reaching the root filesystem:

No init found. Try passing init= bootarg.

BusyBox v1.36.1 (Ubuntu 1:1.36.1-6ubuntu3.1) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

The key observation: newly created VMs worked without issues. This indicated the problem was specific to the relationship between existing VMs and their storage, rather than network or storage infrastructure issues.

Root Cause Analysis

The issue stemmed from Ceph's RBD exclusive locking mechanism. This feature prevents simultaneous writes to the same image from multiple clients, avoiding data corruption. When a compute node connects to an RBD volume, it acquires an exclusive lock; when disconnected cleanly, it releases the lock.

During the power outage, compute nodes lost power without clean disconnection. When they returned, they appeared as untrusted clients:

$ ceph osd blocklist ls
10.88.10.91:0/3853293677 2026-05-06T08:59:47.102488+0000
10.88.10.90:0/316670229 2026-05-07T00:26:11.581329+0000
10.88.10.90:0/3783311129 2026-05-07T00:26:11.581329+0000
...
listed 14 entries

Ceph blocklists clients that crash without releasing locks to prevent zombie processes from corrupting data. The old locks remained held by client IDs that no longer existed, creating a deadlock where VMs needed the locks to boot, but the locks were held by processes that would never release them.

Resolution

Verifying Lock State

We verified the theory by checking an affected volume's lock state:

$ rbd lock list --pool volumes --image volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be

There is 1 exclusive lock on this image.
Locker          ID                    Address
client.3406724  auto 135766063836400  10.88.10.91:0/3853293677

The address matched a blocklisted entry. The lock was held by a client that would not return to release it.

Removing Stale Locks

The command syntax for force-removing an RBD lock requires positional arguments with quoted strings:

rbd lock remove volumes/volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be \
  "auto 135766063836400" "client.3406724"

Verification:

$ rbd lock list --pool volumes --image volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be
No locks on this image.

The VM rebooted successfully.

Bulk Resolution

For multiple affected volumes, we used the following script:

for vol in $(rbd ls volumes); do
  locks=$(rbd lock list volumes/$vol 2>/dev/null)
  if echo "$locks" | grep -q "client"; then
    echo "Removing lock on: $vol"
    lock_id=$(rbd lock list volumes/$vol | awk 'NR==3{print $2" "$3}')
    locker=$(rbd lock list volumes/$vol | awk 'NR==3{print $1}')
    rbd lock remove volumes/$vol "$lock_id" "$locker"
    echo "Done: $vol"
  fi
done

Then hard rebooted all affected VMs:

for vm in $(openstack server list --all-projects -f value -c ID); do
  name=$(openstack server show $vm -f value -c name)
  status=$(openstack server show $vm -f value -c status)
  echo "Rebooting: $name ($vm) - Current status: $status"
  openstack server reboot --hard $vm
done

All VMs recovered.

Clearing Blocklist Entries

After confirming all locks were released and VMs were healthy, we cleared the blocklist entries:

ceph osd blocklist rm 10.88.10.90
ceph osd blocklist rm 10.88.10.91

Important: Only perform this step after confirming crashed nodes will not return with stale state. Reconnecting zombie processes while another client holds the lock risks data corruption.

Prevention Measures

Granting OpenStack Blocklist Capabilities

OpenStack requires specific Ceph capabilities to manage blocklist entries automatically. Without allow command "osd blocklist" in its monitor capabilities, Nova cannot clear stale entries.

Step 1: Check current capabilities

ceph auth get client.openstack

Step 2: Add blocklist capability

# First, save existing OSD caps
ceph auth get client.openstack -o /tmp/openstack.keyring

# Then update caps (adjust pool names and OSD caps for your environment)
ceph auth caps client.openstack \
  mon 'allow r, allow command "osd blocklist"' \
  osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=backups'

Note: Adjust pool names according to your environment (e.g., vms, volumes, images).

Step 3: Verify the update

ceph auth get client.openstack

Nova Configuration Tuning

The following settings were added to nova.conf on compute nodes:

[libvirt]
hw_disk_discard = unmap
disk_cachemodes = network=writeback
rbd_io_timeout = 30

The rbd_io_timeout parameter gives the RBD client additional time to recover during transient issues rather than immediately failing I/O.

Key Takeaways

Ceph's blocklist mechanism protects data from split-brain scenarios. The issue arises from unclean shutdowns leaving orphaned locks behind.
New VMs working while existing VMs fail is a diagnostic indicator. This pattern after an outage strongly suggests blocklist-related issues, avoiding time spent investigating network or OSD problems.
Proactively grant blocklist permissions to the OpenStack Ceph client. The allow command "osd blocklist" capability enables automatic recovery without manual intervention.
The rbd lock remove syntax requires positional arguments with quoted strings. The --locker flag is not available in many versions. Use the format:

rbd lock remove <pool>/<image> "<lock_id>" "<locker>"

Include lock-related failure scenarios in disaster recovery testing. Standard monitoring and backup verification may not catch this failure mode.

推荐订阅源

DEV Community