[lustre-discuss] [EXTERNAL] client failing off network
John Hearns
hearnsj at gmail.com
Fri Oct 31 07:42:32 PDT 2025
For information, arpwatch can be used to alert on duplicated addresses
https://en.wikipedia.org/wiki/Arpwatch
On Fri, 31 Oct 2025 at 13:13, Michael DiDomenico via lustre-discuss <
lustre-discuss at lists.lustre.org> wrote:
> unfortunately i don't think so. we're pretty good about assigning
> addresses, but still human. i don't see any evidence of a dup'd
> address, but i'll keep looking
>
> thanks
>
> On Thu, Oct 30, 2025 at 8:10 PM Mohr, Rick <mohrrf at ornl.gov> wrote:
> >
> > Michael,
> >
> > It might be a long shot, but is there any chance another machine has the
> same IP address as the one having problems?
> >
> > --Rick
> >
> >
> >
> > On 10/30/25, 3:09 PM, "lustre-discuss on behalf of Michael DiDomenico
> via lustre-discuss" wrote:
> > our network is running 2.15.6 everywhere on rhel9.5, we recently built a
> new machine using 2.15.7 on rhel9.6 and i'm seeing a strange problem. the
> client is ethernet connected to ten lnet routers which bridge ethernet to
> infiniband. i can mount the client just fine, read/write data, but then
> several hours later, the client marks all the routers offline. the only
> recovery is to lazy unmount, lustre_rmmod, and then restart the lustre
> mount nothing unusual comes out in the journal/dmesg logs. to lustre it
> "looks" like someone pulled the network cable, but there's no evidence that
> this has happened physically or even at the switch/software layers we
> upgraded two other machine to see if the problem replicates, but so far it
> hasn't. the only significant difference between the three machines is the
> one with the problem has heavy container (podman) usage, the others have
> zero. i'm not sure if this is an cause or just a red herring any suggestions
> >
> >
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20251031/bbf07832/attachment-0001.htm>
More information about the lustre-discuss
mailing list