[lustre-discuss] Lustre HA

Dominika Wanat d.wanat at cyfronet.pl
Thu Mar 30 02:42:31 PDT 2023


Hi,

we have the HA stack based on Pacemaker/Corosync 
with ZFS and Lustre resource agents on production, with
setup provided by Laura and ZFS multi-mount protection. The main
advantage is that the resources are moved automatically when there
is a problem with the server or Lustre RPC. The main disadvantage we
experienced is that sometimes the ZFS resource agents do not
behave correctly with bigger ZFS pools remounted in one moment. The
ZFS resource agents call 'zpool' commands so often that sometimes it
causes a lock which needs to timeout and go into Pacemaker 'failed'
state, later we need to cleanup the HA resources manually to
redetect the current state and mount the pools. Sometimes it isn't
automatic in our case.

Dominika 
 
-- 
Dominika Wanat
Dział Pamięci Masowych
ACK Cyfronet AGH
tel.: +48 12 632 33 55 wew. 704

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: Podpis cyfrowy OpenPGP
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230330/03ce9bfd/attachment.sig>


More information about the lustre-discuss mailing list