[lustre-discuss] [EXTERNAL] I/O error on lctl ping although ibping successful

Youssef Eldakar youssefeldakar at gmail.com
Wed Jun 21 09:08:45 PDT 2023


Thanks, Rick, for that suggestion. TCP ping between a problematic host and
the MDS indeed does not go through.

Not exactly sure what to investigate next, but that gives me somewhere to
start...

- Youssef

On Tue, Jun 20, 2023 at 7:00 PM Mohr, Rick via lustre-discuss <
lustre-discuss at lists.lustre.org> wrote:

> Have you tried tcp pings on the IP addresses associated with the IB
> interfaces?
>
> --Rick
>
>
> On 6/20/23, 12:11 PM, "lustre-discuss on behalf of Youssef Eldakar via
> lustre-discuss" <lustre-discuss-bounces at lists.lustre.org <mailto:
> lustre-discuss-bounces at lists.lustre.org> on behalf of
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>>
> wrote:
>
>
> In a cluster having ~100 Lustre clients (compute nodes) connected together
> with the MDS and OSS over Intel True Scale InfiniBand (discontinued
> product), we started seeing certain nodes failing to mount the Lustre file
> system and giving I/O error on LNET (lctl) ping even though an ibping test
> to the MDS gives no errors. We tried rebooting the problematic nodes and
> even fresh-installing the OS and Lustre client, which did not help.
> However, rebooting the MDS seems to possibly momentarily help after the MDS
> starts up again, but the same set of problematic nodes seem to always
> eventually revert back to the state where they fail to ping the MDS over
> LNET.
>
>
> Thank you for any pointers we may pursue.
>
>
>
>
> Youssef Eldakar
> Bibliotheca Alexandrina
> www.bibalex.org <
> https://urldefense.us/v2/url?u=http-3A__www.bibalex.org&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SpEwA4Pnyq7nH7aMGq8KpA&m=kwZRPirpHWOowgLmVOYe_KJ4ZigAHQk3DiF8-BwQ2qFikINn8C5-0SyyYEDelqDH&s=5DLPIzJx0tgg1TgSZkvvNNVfDfgpo-Prv-BPOga0WMA&e=>
> <
> https://urldefense.us/v2/url?u=http-3A__www.bibalex.org&amp;d=DwMFaQ&amp;c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&amp;r=SpEwA4Pnyq7nH7aMGq8KpA&amp;m=kwZRPirpHWOowgLmVOYe_KJ4ZigAHQk3DiF8-BwQ2qFikINn8C5-0SyyYEDelqDH&amp;s=5DLPIzJx0tgg1TgSZkvvNNVfDfgpo-Prv-BPOga0WMA&amp;e=&gt
> ;>
> hpc.bibalex.org <
> https://urldefense.us/v2/url?u=http-3A__hpc.bibalex.org&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SpEwA4Pnyq7nH7aMGq8KpA&m=kwZRPirpHWOowgLmVOYe_KJ4ZigAHQk3DiF8-BwQ2qFikINn8C5-0SyyYEDelqDH&s=HMqKriFlJ2qwafMOSVJMqre9-wmJ--kaSS_rx4t7hQw&e=>
> <
> https://urldefense.us/v2/url?u=http-3A__hpc.bibalex.org&amp;d=DwMFaQ&amp;c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&amp;r=SpEwA4Pnyq7nH7aMGq8KpA&amp;m=kwZRPirpHWOowgLmVOYe_KJ4ZigAHQk3DiF8-BwQ2qFikINn8C5-0SyyYEDelqDH&amp;s=HMqKriFlJ2qwafMOSVJMqre9-wmJ--kaSS_rx4t7hQw&amp;e=&gt
> ;>
>
>
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230621/9a602dfb/attachment.htm>


More information about the lustre-discuss mailing list