[lustre-discuss] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

Thu Mar 13 06:58:18 PDT 2025

+1 to the pacemaker/lustre startup problem after unexpected reboot
(power loss in my case).
rocky9.5 + lustre(2.16.52) + pacemaker(2.1.8) + corosync(3.1.8) + pcs(0.11.8)

"pcs status" afer "pcs cluster start --all" shows the following errors:
Failed Resource Actions:
  * lustre-mgt start on mds2-ha could not be executed (Timed Out:
Resource agent did not complete within 20s) at Thu Mar 13 08:20:21
2025 after 20.001s
  * lustre-mdt00 start on mds2-ha could not be executed (Timed Out:
Resource agent did not complete within 20s) at Thu Mar 13 08:20:21
2025 after 20.002s
  * lustre-mdt01 start on mds2-ha could not be executed (Timed Out:
Resource agent did not complete within 20s) at Thu Mar 13 08:20:01
2025 after 20.001s
  * lustre-mgt start on mds1-ha could not be executed (Timed Out:
Resource agent did not complete within 20s) at Thu Mar 13 08:20:01
2025 after 20.002s
  * lustre-mdt00 start on mds1-ha could not be executed (Timed Out:
Resource agent did not complete within 20s) at Thu Mar 13 08:20:01
2025 after 20.001s
  * lustre-mdt01 start on mds1-ha could not be executed (Timed Out:
Resource agent did not complete within 20s) at Thu Mar 13 08:20:21
2025 after 20.001s

/var/log/messages:
Mar 13 08:20:01 mds1 Lustre(lustre-mgt)[4213]: INFO: Starting to mount
/dev/mapper/mgt
Mar 13 08:20:01 mds1 Lustre(lustre-mdt00)[4224]: INFO: Starting to
mount /dev/mapper/mdt00
Mar 13 08:20:01 mds1 kernel: LDISKFS-fs warning (device dm-2):
ldiskfs_multi_mount_protect:334: MMP interval 42 higher than expected,
please wait.
Mar 13 08:20:01 mds1 kernel: LDISKFS-fs warning (device dm-3):
ldiskfs_multi_mount_protect:334: MMP interval 42 higher than expected,
please wait.
Mar 13 08:20:21 mds1 kernel: LDISKFS-fs warning (device dm-2):
ldiskfs_multi_mount_protect:338: MMP startup interrupted, failing
mount
Mar 13 08:20:21 mds1 kernel: LustreError:
4222:0:(osd_handler.c:8348:osd_mount()) MGS-osd: can't mount
/dev/mapper/mgt: -110
Mar 13 08:20:21 mds1 kernel: LustreError:
4222:0:(obd_config.c:777:class_setup()) setup MGS-osd failed (-110)
Mar 13 08:20:21 mds1 kernel: LustreError:
4222:0:(obd_mount.c:193:lustre_start_simple()) MGS-osd setup error
-110
Mar 13 08:20:21 mds1 kernel: LustreError:
4222:0:(tgt_mount.c:2203:server_fill_super()) Unable to start osd on
/dev/mapper/mgt: -110
Mar 13 08:20:21 mds1 kernel: LustreError:
4222:0:(super25.c:170:lustre_fill_super()) llite: Unable to mount
<unknown>: rc = -110
Mar 13 08:20:21 mds1 pacemaker-controld[3039]: error: Result of start
operation for lustre-mgt on mds1-ha: Timed Out after 20s (Resource
agent did not complete within 20s)
Mar 13 08:20:21 mds1 kernel: LDISKFS-fs warning (device dm-3):
ldiskfs_multi_mount_protect:338: MMP startup interrupted, failing
mount

The only way to proceed is to stop HA-cluster (and sometimes it just
didnot stop  - had to reset the server), manually mount mgt/mdt/ost,
unmount, start HA-cluster.
Same for both ldiskfs/zfs(2.2.7) backends.

Another problem is the following error:
# pcs resource describe ocf:lustre:Lustre
Error: Unable to process agent 'ocf:lustre:Lustre' as it implements
unsupported OCF version '1.0.1', supported versions are: '1.0', '1.1'
Error: Errors have occurred, therefore pcs is unable to continue

Was thinking to increase some lustre resource agent timeouts (because
it seems start timeout = 20s, MMP interval complains at 42), but it
seems not possible because of pcs' error above.

Thanks,
Alex

чт, 6 мар. 2025 г. в 19:20, Cameron Harr via lustre-discuss
<lustre-discuss at lists.lustre.org>:
>
> To add to this, instead of issuing a straight reboot, I prefer running
> 'pcs stonith fence <node>' which will fail over resources appropriately
> AND reboot the node (if doable) or otherwise power it off. The advantage
> to doing it this way is that it keeps Pacemaker in-the-know about the
> state of the node so it doesn't (usually) shoot it as it's trying to
> boot back up. When you're doing maintenance on a node without letting
> Pacemaker know about it, results can be unpredictable.
>
> Cameron
>
> On 3/5/25 2:12 PM, Laura Hild via lustre-discuss wrote:
> > I'm not sure what to say about how Pacemaker *should* behave, but I *can* say I virtually never try to (cleanly) reboot a host from which I have not already evacuated all resources, e.g. with `pcs node standby` or by putting Pacemaker in maintenance mode and unmounting/exporting everything manually.  If I can't evacuate all resources and complete a lustre_rmmod, the host is getting power-cycled.
> >
> > So maybe I can say, my guess would be that in the host's shutdown process, stopping the Pacemaker service happens before filesystems are unmounted, and that Pacemaker doesn't want to make an assumption whether its own shut-down means it should standby or initiate maintenance mode, and therefore the other host ends up knowing only that its partner has disappeared, while the filesystems have yet to be unmounted.
> >