[lustre-discuss] how does lustre handle node failure
Shawn
neutronsharc at gmail.com
Fri Jul 21 15:04:32 PDT 2023
Hi Laura, thanks for your reply.
It seems the OSSs will share the disks created from a shared SAN. So the
OSS-pairs can failover in a pre-defined manner if one node is down,
coordinated by a HA manager.
This can certainly work on a limited scale. I'm curious if this static
schema can scale to a large cluster with 100s of OSSs servers?
regards,
Shawn
On Tue, Jul 18, 2023 at 1:25 PM Laura Hild <lsh at jlab.org> wrote:
> I'm not familiar with using FLR to tolerate OSS failures. My site does
> the HA pairs with shared storage method. It's sort of described in the
> manual
>
> https://doc.lustre.org/lustre_manual.xhtml#configuringfailover
>
> but in more, Pacemaker-specific detail at
>
>
> https://wiki.lustre.org/Creating_a_Framework_for_High_Availability_with_Pacemaker
>
> and
>
>
> https://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230721/73321982/attachment.htm>
More information about the lustre-discuss
mailing list