<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">NRS only affects Lustre traffic, so it will not factor into lnet_selftest (LST) results.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I gave some talks on troubleshooting multi-rail that you may want to review.<o:p></o:p></p>
<p class="MsoNormal">Overview:<o:p></o:p></p>
<p class="MsoNormal"><a href="https://youtu.be/j3m-mznUdac?feature=shared">https://youtu.be/j3m-mznUdac?feature=shared</a>
<o:p></o:p></p>
<p class="MsoNormal">Demo:<br>
<a href="https://youtu.be/TLN56cw9Zgs?feature=shared">https://youtu.be/TLN56cw9Zgs?feature=shared</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">You should probably start by verifying that the client and server see each other as multi-rail peers, and by checking the send and receive counts for each interface on your client and server to ensure that traffic is being spread across
them.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Chris Horn<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">lustre-discuss <lustre-discuss-bounces@lists.lustre.org> on behalf of Gwen Dawes via lustre-discuss <lustre-discuss@lists.lustre.org><br>
<b>Date: </b>Wednesday, January 17, 2024 at 5:48 AM<br>
<b>To: </b>lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org><br>
<b>Subject: </b>Re: [lustre-discuss] LNet Multi-Rail config - with BODY!<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal">Hi Andreas,<br>
<br>
Thanks for the pointer. I have a second server set up running 2.15.3 as<br>
well specifically to check this, and can set it up with lnet_selftest,<br>
same as the client. After taking a bit to convince the fabric manager<br>
to accept the moved IPs, I get the exact same results between the two.<br>
<br>
Good to know that it is possible, though - I wonder what needs to be<br>
modified to achieve that. It's completely stock - the UDSP is just<br>
blank, and the default NRS config is in play.<br>
<br>
I don't suppose there's any chance the NRS config is what I'm missing?<br>
<br>
Gwen<br>
<br>
On Wed, 2024-01-17 at 03:14 +0000, Andreas Dilger wrote:<br>
> Hello Gwen,<br>
> I'm not a networking expert, but it seems entirely possible that the<br>
> MR discovery in 2.12.9<br>
> isn't doing as well as what is in 2.15.3 (or 2.15.4 for that matter).<br>
> It would make more sense<br>
> to have both nodes running the same (newer) version before digging<br>
> too deeply into this.<br>
> <br>
> We have definitely seen performance > 1 IB interface from a single<br>
> node in our testing,<br>
> though I can't say if that was done with lnet_selftest or with<br>
> something else.<br>
> <br>
> Cheers, Andreas<br>
> <br>
> > On Jan 16, 2024, at 08:14, Gwen Dawes via lustre-discuss<br>
> > <lustre-discuss@lists.lustre.org> wrote:<br>
> > <br>
> > Hi folks,<br>
> > <br>
> > Let's try that again.<br>
> > <br>
> > I'm in the luxury position of having four IB cards I'm trying to<br>
> > squeeze the most performance out of for Lustre I can.<br>
> > <br>
> > I have a small test setup - two machines - a client (2.12.9) and a<br>
> > server (2.15.3) with four IB cards each. I'm able to set them up as<br>
> > Multi-Rail and each one can discover the other as such. However, I<br>
> > can't seem to get lnet_selftest to give me more speed than a single<br>
> > interface, as reported by ib_send_bw.<br>
> > <br>
> > Am I missing some config here? Is LNet just not capable of doing<br>
> > more<br>
> > than one connection per NID?<br>
> > <br>
> > Gwen<br>
> > _______________________________________________<br>
> > lustre-discuss mailing list<br>
> > lustre-discuss@lists.lustre.org<br>
> > <a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
<br>
> <br>
> Cheers, Andreas<br>
> --<br>
> Andreas Dilger<br>
> Lustre Principal Architect<br>
> Whamcloud<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
lustre-discuss@lists.lustre.org<br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
<o:p></o:p></p>
</div>
</div>
</body>
</html>