[lustre-discuss] Node Failure in Lustre

Wed Mar 15 04:27:29 PDT 2023

Hi

Okay. Thank you for the information
Can you tell if the MDS/MGS or the OSS server goes down, how will the
failure be handled on Lustre level?

On Wed, 15 Mar 2023 at 13:45, Andreas Dilger <adilger at whamcloud.com> wrote:

> No, because the remote-attached SSDs are part of the ZFS pool and any
> drive failures a t that level are the responsibility of ZFS in that case to
> manage the failed drives (eg. with RAID) and for you to have system
> monitors in place to detect this case and alert you to the drive failures.
> This is no different than if the drives inside a RAID enclosure fail.
>
> Lustre cannot magically know about drives below the filesystem layer have
> problems. It only cares about being able to access the whole filesystem,
> and that the filesystem is intact even in the case of drive failures.
>
> Cheers, Andreas
>
> > On Mar 15, 2023, at 01:26, Nick dan via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
> >
> > 
> > Hi
> >
> > There is a situation where disks from multiple servers are sent to a
> main server.(Lustre storage) Zpool is created from the SSDs and mkfs.lustre
> is done using zfs as a backend file system. Lustre client is also
> connected. If one of the nodes from where the SSDs are sent goes down, will
> the node failure be handled?
> >
> > Thanks and regards,
> > Nick Dan
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230315/14737b96/attachment.htm>