[lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

Laura Hild lsh at jlab.org
Wed Mar 5 14:12:00 PST 2025


I'm not sure what to say about how Pacemaker *should* behave, but I *can* say I virtually never try to (cleanly) reboot a host from which I have not already evacuated all resources, e.g. with `pcs node standby` or by putting Pacemaker in maintenance mode and unmounting/exporting everything manually.  If I can't evacuate all resources and complete a lustre_rmmod, the host is getting power-cycled.

So maybe I can say, my guess would be that in the host's shutdown process, stopping the Pacemaker service happens before filesystems are unmounted, and that Pacemaker doesn't want to make an assumption whether its own shut-down means it should standby or initiate maintenance mode, and therefore the other host ends up knowing only that its partner has disappeared, while the filesystems have yet to be unmounted.



More information about the lustre-discuss mailing list