<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-GB" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Hi Anna,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">This isn’t Lustre specific, but last time I spoke to Mellanox when I was worried about over subscription between InfiniBand L1 and L2 switches. They said that the IB ASICs monitor congestion on the
HBAs of the connected channel and will throttle back HBA channel transfer rates if congestion on one or both is detected.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">This may help to smear out some of the imbalances, but you would probably still get Lustre ‘waiting’ type warnings.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Cheers<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">Marc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">lustre-discuss <lustre-discuss-bounces@lists.lustre.org> on behalf of lustre-discuss-request@lists.lustre.org <lustre-discuss-request@lists.lustre.org><br>
<b>Date: </b>Friday, 7 July 2023 at 21:12<br>
<b>To: </b>lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org><br>
<b>Subject: </b>lustre-discuss Digest, Vol 208, Issue 6<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal">Send lustre-discuss mailing list submissions to<br>
lustre-discuss@lists.lustre.org<br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://linkprotect.cudasvc.com/url?a=http%3a%2f%2flists.lustre.org%2flistinfo.cgi%2flustre-discuss-lustre.org&c=E,1,udOzn8uiss34fCFAWaJTKs9MrvTzb7uSTeisCdxzZ76x5DnzLe2J3JMmFIyIv-IJtUaBMKZaiBbYuF1c5RF6m8rF-jHYOcR0evY3ZEutI-rZ&typo=1">
https://linkprotect.cudasvc.com/url?a=http%3a%2f%2flists.lustre.org%2flistinfo.cgi%2flustre-discuss-lustre.org&c=E,1,udOzn8uiss34fCFAWaJTKs9MrvTzb7uSTeisCdxzZ76x5DnzLe2J3JMmFIyIv-IJtUaBMKZaiBbYuF1c5RF6m8rF-jHYOcR0evY3ZEutI-rZ&typo=1</a><br>
or, via email, send a message with subject or body 'help' to<br>
lustre-discuss-request@lists.lustre.org<br>
<br>
You can reach the person managing the list at<br>
lustre-discuss-owner@lists.lustre.org<br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of lustre-discuss digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Imbalanced incoming and outgoing network load (Anna Fuchs)<br>
2. Re: Imbalanced incoming and outgoing network load<br>
(Kulyavtsev, Alex Ivanovich)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Fri, 7 Jul 2023 13:48:34 +0200<br>
From: Anna Fuchs <anna.fuchs@uni-hamburg.de><br>
To: <lustre-discuss@lists.lustre.org><br>
Subject: [lustre-discuss] Imbalanced incoming and outgoing network<br>
load<br>
Message-ID: <ad173ee0-101a-29ee-6995-fa1a66aa5290@uni-hamburg.de><br>
Content-Type: text/plain; charset="UTF-8"; format=flowed<br>
<br>
Dear all,<br>
<br>
I have some questions regarding the following scenario:<br>
?- A large HPC system.<br>
- Let's assume that Job X is running on 1 compute node and is reading a <br>
very large file with a stripecount (>>1)..-1. Alternatively, tons of <br>
files are read at once with smaller striping each, but distributed <br>
across all OSS/OSTs.<br>
- The compute node is connected, for example, with a 100Gb/s link, and <br>
there are 50 servers, each with a 200Gb/s link. This generates a network <br>
load of 50x200Gb/s, which is processed at 100Gb/s.<br>
- Job Y, which requires the same network and potentially doesn't even <br>
perform I/O, suffers a lot as a result.<br>
<br>
Does this scenario sound familiar to you?<br>
Is the sequence of events correct?<br>
What could be done in this situation?<br>
<br>
To avoid:<br>
a) having such single/few-nodes jobs<br>
b) striping large files with up to -1<br>
c) reading millions of files at once<br>
One could try, but I have concerns that the users will persist in doing <br>
it, either intentionally or accidentally, and it would only shift the <br>
problem, rather than solving it.<br>
One could tweak the network design, reconfigure it, separate I/O from <br>
communication, but it would hardly optimize all use cases. Virtual lanes <br>
could potentially be a solution as well. Though, that might not help if <br>
the Job Y also involves some I/O.<br>
<br>
Wouldn't it be better if Lustre somehow recognized this imbalance <br>
between incoming and outgoing network traffic and loaded the <br>
file(s)/data gradually rather than all at once, saturating or slightly <br>
overloading the consumer 100Gb/s connection rather than by a factor of <br>
100? Does this sound reasonable, and is there already a solution for it?<br>
I would appreciate any opinions.<br>
<br>
Best regards<br>
Anna<br>
<br>
--<br>
Anna Fuchs<br>
Universit?t Hamburg<br>
<a href="https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwr.informatik.uni-hamburg.de%2fpeople%2fanna_fuchs&c=E,1,5lU4ZdOq0v8v03GynIpPmNbhvtS_2QTSByMPxhQ3oiVaSAIfEjlmOuf0py53AEnmokksRCIU8P50mJcXWmHPYkklq_Gcbhq8AQUcD3kGWcq9rDuUR_K1pk96&typo=1">https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwr.informatik.uni-hamburg.de%2fpeople%2fanna_fuchs&c=E,1,5lU4ZdOq0v8v03GynIpPmNbhvtS_2QTSByMPxhQ3oiVaSAIfEjlmOuf0py53AEnmokksRCIU8P50mJcXWmHPYkklq_Gcbhq8AQUcD3kGWcq9rDuUR_K1pk96&typo=1</a><br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Fri, 7 Jul 2023 16:18:55 +0000<br>
From: "Kulyavtsev, Alex Ivanovich" <alexku@anl.gov><br>
To: Anna Fuchs <anna.fuchs@uni-hamburg.de><br>
Cc: "lustre-discuss@lists.lustre.org"<br>
<lustre-discuss@lists.lustre.org><br>
Subject: Re: [lustre-discuss] Imbalanced incoming and outgoing network<br>
load<br>
Message-ID: <5F635719-E080-4CA0-BE2E-55ED330C6A7F@anl.gov><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
There is QoS in lustre, the feature called NRS - Network Request Scheduler.<br>
It is possible to set different policies.<br>
Will it address the issue ?<br>
<br>
The manual has entry and there were few presentations on LUG/LAD.<br>
<br>
I did not use NRS myself but I would like to learn.<br>
Alex.<br>
<br>
> On Jul 7, 2023, at 06:48, Anna Fuchs via lustre-discuss <lustre-discuss@lists.lustre.org> wrote:<br>
> <br>
> Dear all,<br>
> <br>
> I have some questions regarding the following scenario:<br>
> - A large HPC system.<br>
> - Let's assume that Job X is running on 1 compute node and is reading a very large file with a stripecount (>>1)..-1. Alternatively, tons of files are read at once with smaller striping each, but distributed across all OSS/OSTs.<br>
> - The compute node is connected, for example, with a 100Gb/s link, and there are 50 servers, each with a 200Gb/s link. This generates a network load of 50x200Gb/s, which is processed at 100Gb/s.<br>
> - Job Y, which requires the same network and potentially doesn't even perform I/O, suffers a lot as a result.<br>
> <br>
> Does this scenario sound familiar to you?<br>
> Is the sequence of events correct?<br>
> What could be done in this situation?<br>
> <br>
> To avoid:<br>
> a) having such single/few-nodes jobs<br>
> b) striping large files with up to -1<br>
> c) reading millions of files at once<br>
> One could try, but I have concerns that the users will persist in doing it, either intentionally or accidentally, and it would only shift the problem, rather than solving it.<br>
> One could tweak the network design, reconfigure it, separate I/O from communication, but it would hardly optimize all use cases. Virtual lanes could potentially be a solution as well. Though, that might not help if the Job Y also involves some I/O.<br>
> <br>
> Wouldn't it be better if Lustre somehow recognized this imbalance between incoming and outgoing network traffic and loaded the file(s)/data gradually rather than all at once, saturating or slightly overloading the consumer 100Gb/s connection rather than by
a factor of 100? Does this sound reasonable, and is there already a solution for it?<br>
> I would appreciate any opinions.<br>
> <br>
> Best regards<br>
> Anna<br>
> <br>
> --<br>
> Anna Fuchs<br>
> Universit?t Hamburg<br>
> <a href="https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwr.informatik.uni-hamburg.de%2fpeople%2fanna_fuchs&c=E,1,tE0uhLIlDzWctIlGAdMLaXv5rg1mMzZb43E_JUBZ-5wIPT2fsxnmMiVa4Nnjp9_x58lz7HSfjkbJOnQWnmQBPcdZ1ickmZDSMUA98IQ_VQ,,&typo=1">
https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwr.informatik.uni-hamburg.de%2fpeople%2fanna_fuchs&c=E,1,tE0uhLIlDzWctIlGAdMLaXv5rg1mMzZb43E_JUBZ-5wIPT2fsxnmMiVa4Nnjp9_x58lz7HSfjkbJOnQWnmQBPcdZ1ickmZDSMUA98IQ_VQ,,&typo=1</a><br>
> _______________________________________________<br>
> lustre-discuss mailing list<br>
> lustre-discuss@lists.lustre.org<br>
> <a href="https://linkprotect.cudasvc.com/url?a=http%3a%2f%2flists.lustre.org%2flistinfo.cgi%2flustre-discuss-lustre.org&c=E,1,RdA3hxSQ2vRMTbs0l--PQyom2TjdaN7ziaZ-dcCmFkL565YonIZRF6_wWL8RFV4Pgeb4uSMCtgWb2NMFa9yzfNdU1uaZKrsuICISLwNNYiIDOxt6qFSXuhVa&typo=1">
https://linkprotect.cudasvc.com/url?a=http%3a%2f%2flists.lustre.org%2flistinfo.cgi%2flustre-discuss-lustre.org&c=E,1,RdA3hxSQ2vRMTbs0l--PQyom2TjdaN7ziaZ-dcCmFkL565YonIZRF6_wWL8RFV4Pgeb4uSMCtgWb2NMFa9yzfNdU1uaZKrsuICISLwNNYiIDOxt6qFSXuhVa&typo=1</a><br>
<br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
lustre-discuss@lists.lustre.org<br>
<a href="https://linkprotect.cudasvc.com/url?a=http%3a%2f%2flists.lustre.org%2flistinfo.cgi%2flustre-discuss-lustre.org&c=E,1,dcSAQBOQ0m4cALCt1T7jYhggyItOaWAFRfKYaJFwekxVjZ9jzl7BcKTulPJhqJpr2AKONUYh8Zq_y2mRYMnVb61m_i6sAo6VphEn2T2aIY9LsCiRAhdT&typo=1">https://linkprotect.cudasvc.com/url?a=http%3a%2f%2flists.lustre.org%2flistinfo.cgi%2flustre-discuss-lustre.org&c=E,1,dcSAQBOQ0m4cALCt1T7jYhggyItOaWAFRfKYaJFwekxVjZ9jzl7BcKTulPJhqJpr2AKONUYh8Zq_y2mRYMnVb61m_i6sAo6VphEn2T2aIY9LsCiRAhdT&typo=1</a><br>
<br>
<br>
------------------------------<br>
<br>
End of lustre-discuss Digest, Vol 208, Issue 6<br>
**********************************************<o:p></o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>