[lustre-discuss] I/O error on lctl ping although ibping successful

John Hearns hearnsj at gmail.com
Wed Jun 21 05:13:08 PDT 2023


Have you run ibdiagnet?
Also you want to run ibqueryerrors

On Tue, 20 Jun 2023, 17:11 Youssef Eldakar via lustre-discuss, <
lustre-discuss at lists.lustre.org> wrote:

> In a cluster having ~100 Lustre clients (compute nodes) connected together
> with the MDS and OSS over Intel True Scale InfiniBand (discontinued
> product), we started seeing certain nodes failing to mount the Lustre file
> system and giving I/O error on LNET (lctl) ping even though an ibping test
> to the MDS gives no errors. We tried rebooting the problematic nodes and
> even fresh-installing the OS and Lustre client, which did not help.
> However, rebooting the MDS seems to possibly momentarily help after the MDS
> starts up again, but the same set of problematic nodes seem to always
> eventually revert back to the state where they fail to ping the MDS over
> LNET.
>
> Thank you for any pointers we may pursue.
>
> Youssef Eldakar
> Bibliotheca Alexandrina
> www.bibalex.org
> hpc.bibalex.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230621/f1d373c0/attachment.htm>


More information about the lustre-discuss mailing list