[lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

Fri Mar 14 20:06:40 PDT 2025

Thank you for your advice.

A user named Oyvind replied on the users at clusterlabs.org mailing list:
You need the systemd drop-in functionality introduced in RHEL 9.3 
to avoid this issue: [https://bugzilla.redhat.com/show_bug.cgi?id=2184779](https://bugzilla.redhat.com/show_bug.cgi?id=2184779)

The reason I understand is as follows: 
During reboot, both the system and Pacemaker will unmount the Lustre resource simultaneously. 
If the system unmounts first and Pacemaker unmounts afterward, Pacemaker will immediately return success. 
However, at this point, the system's unmounting process is not yet complete, 
causing Pacemaker to mount on the target end, which triggers this issue.

My current modification is as follows: 
Add the following lines to the file `/usr/lib/systemd/system/resource-agents-deps.target`:
```
After=remote-fs.target  
Before=shutdown.target reboot.target halt.target
```

After making this modification, the issue no longer occurs during reboot.

chenzufei at gmail.com

From: Laura Hild
Date: 2025-03-06 06:12
To: chenzufei at gmail.com
CC: lustre-discuss
Subject: Re: [lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker
I'm not sure what to say about how Pacemaker *should* behave, but I *can* say I virtually never try to (cleanly) reboot a host from which I have not already evacuated all resources, e.g. with `pcs node standby` or by putting Pacemaker in maintenance mode and unmounting/exporting everything manually.  If I can't evacuate all resources and complete a lustre_rmmod, the host is getting power-cycled.

So maybe I can say, my guess would be that in the host's shutdown process, stopping the Pacemaker service happens before filesystems are unmounted, and that Pacemaker doesn't want to make an assumption whether its own shut-down means it should standby or initiate maintenance mode, and therefore the other host ends up knowing only that its partner has disappeared, while the filesystems have yet to be unmounted.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250315/1c4ffff8/attachment.htm>