[lustre-discuss] LNet nid down after some thing changed the NICs

CJ Yin woshifuxiuyin at gmail.com
Sat Feb 18 22:22:47 PST 2023


Hi Chris,

Thanks for your help. I have collected the relevant logs according to your
hints. But I need an account to open a ticket on Jira. I have sent an
email to the administrator at info at whamcloud.com. I was wondering if this
is the correct way to apply for an account. I only found this email on the
site.

Regards,
Chuanjun

Horn, Chris <chris.horn at hpe.com> 于2023年2月18日周六 00:52写道:

> If deleting and re-adding it restores the status to up then this sounds
> like a bug to me.
>
>
>
> Can you enable debug tracing, reproduce the issue, and add this
> information to a ticket?
>
> To enable/gather debug:
>
> # lctl set_param debug=+net
> <reproduce issue>
> # lctl dk > /tmp/dk.log
>
> You can create a ticket at https://jira.whamcloud.com/
>
> Please provide the dk.log with the ticket.
>
>
>
> Thanks,
>
> Chris Horn
>
>
>
> *From: *lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on
> behalf of 腐朽银 via lustre-discuss <lustre-discuss at lists.lustre.org>
> *Date: *Friday, February 17, 2023 at 2:53 AM
> *To: *lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
> *Subject: *[lustre-discuss] LNet nid down after some thing changed the
> NICs
>
> Hi,
>
>
>
> I encountered a problem when using Lustre Client on k8s with kubenet. Very
> happy if you could help me.
>
>
>
> My LNet configuration is:
>
>
>
> net:
>     - net type: lo
>       local NI(s):
>         - nid: 0 at lo
>           status: up
>     - net type: tcp
>       local NI(s):
>         - nid: 10.224.0.5 at tcp
>           status: up
>           interfaces:
>               0: eth0
>
>
>
> It works. But after I deploy or delete a pod on the node. The nid goes
> down like:
>
>
>
> - nid: 10.224.0.5 at tcp
>           status: down
>           interfaces:
>               0: eth0
>
>
>
> k8s uses veth pairs, so it will add or delete network interfaces when
> deploying or deleting pods. But it doesn't touch the eth0 NIC. I can fix it
> by deleting the tcp net by `lnetctl net del` and re-add it by `lnetctl net
> add`. But I need to do this every time after a pod is scheduled to this
> node.
>
>
>
> My node OS is Ubuntu 18.04 5.4.0-1101-azure. The Lustre Client is built by
> myself from 2.15.1. Is this an expected LNet behavior or I got something
> wrong? I re-build and tested it several times and got the same problem.
>
>
>
> Regards,
>
> Chuanjun
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230219/438c7854/attachment.htm>


More information about the lustre-discuss mailing list