[lustre-discuss] I/O error on lctl ping although ibping successful

Youssef Eldakar youssefeldakar at gmail.com
Tue Jun 20 09:09:56 PDT 2023


In a cluster having ~100 Lustre clients (compute nodes) connected together
with the MDS and OSS over Intel True Scale InfiniBand (discontinued
product), we started seeing certain nodes failing to mount the Lustre file
system and giving I/O error on LNET (lctl) ping even though an ibping test
to the MDS gives no errors. We tried rebooting the problematic nodes and
even fresh-installing the OS and Lustre client, which did not help.
However, rebooting the MDS seems to possibly momentarily help after the MDS
starts up again, but the same set of problematic nodes seem to always
eventually revert back to the state where they fail to ping the MDS over
LNET.

Thank you for any pointers we may pursue.

Youssef Eldakar
Bibliotheca Alexandrina
www.bibalex.org
hpc.bibalex.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230620/9a54a68e/attachment.htm>


More information about the lustre-discuss mailing list