[lustre-discuss] LNet Multi-Rail config - with BODY!

Horn, Chris chris.horn at hpe.com
Wed Jan 17 09:53:44 PST 2024


NRS only affects Lustre traffic, so it will not factor into lnet_selftest (LST) results.

I gave some talks on troubleshooting multi-rail that you may want to review.
Overview:
https://youtu.be/j3m-mznUdac?feature=shared
Demo:
https://youtu.be/TLN56cw9Zgs?feature=shared

You should probably start by verifying that the client and server see each other as multi-rail peers, and by checking the send and receive counts for each interface on your client and server to ensure that traffic is being spread across them.

Chris Horn

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Gwen Dawes via lustre-discuss <lustre-discuss at lists.lustre.org>
Date: Wednesday, January 17, 2024 at 5:48 AM
To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] LNet Multi-Rail config - with BODY!
Hi Andreas,

Thanks for the pointer. I have a second server set up running 2.15.3 as
well specifically to check this, and can set it up with lnet_selftest,
same as the client. After taking a bit to convince the fabric manager
to accept the moved IPs, I get the exact same results between the two.

Good to know that it is possible, though - I wonder what needs to be
modified to achieve that. It's completely stock - the UDSP is just
blank, and the default NRS config is in play.

I don't suppose there's any chance the NRS config is what I'm missing?

Gwen

On Wed, 2024-01-17 at 03:14 +0000, Andreas Dilger wrote:
> Hello Gwen,
> I'm not a networking expert, but it seems entirely possible that the
> MR discovery in 2.12.9
> isn't doing as well as what is in 2.15.3 (or 2.15.4 for that matter).
>  It would make more sense
> to have both nodes running the same (newer) version before digging
> too deeply into this.
>
> We have definitely seen performance > 1 IB interface from a single
> node in our testing,
> though I can't say if that was done with lnet_selftest or with
> something else.
>
> Cheers, Andreas
>
> > On Jan 16, 2024, at 08:14, Gwen Dawes via lustre-discuss
> > <lustre-discuss at lists.lustre.org> wrote:
> >
> > Hi folks,
> >
> > Let's try that again.
> >
> > I'm in the luxury position of having four IB cards I'm trying to
> > squeeze the most performance out of for Lustre I can.
> >
> > I have a small test setup - two machines - a client (2.12.9) and a
> > server (2.15.3) with four IB cards each. I'm able to set them up as
> > Multi-Rail and each one can discover the other as such. However, I
> > can't seem to get lnet_selftest to give me more speed than a single
> > interface, as reported by ib_send_bw.
> >
> > Am I missing some config here? Is LNet just not capable of doing
> > more
> > than one connection per NID?
> >
> > Gwen
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240117/74c4204e/attachment-0001.htm>


More information about the lustre-discuss mailing list