<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Segoe UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">I believe I have root caused this, and posted detailed analysis on the opened JIRA issue (link in the previous message). Questions for the community:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">1. The Lustre manual claims that “By default, sync_journal is enabled (sync_journal=1), so that journal entries are committed synchronously,” but I’m finding that the reverse is and has been true for over a decade. This is the cause of
my client OOM malaise – my clients are holding onto referenced pages until the OSTs commit their journals *<b>and</b>* the clients ping the MGS or somebody else that updates their last committed transaction number to a value greater than the outstanding requests.
These small clients (in fact, even ones as large as 64GB) can easily write fast enough to exhaust memory before the OSTs decide it’s time to flush the transactions. Can somebody clarify if this is just a clerical error in the manual and async journal committing
is expected to be default and safe?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">2. It appears that although the default “commit” mount option for ext4 is 5 seconds, this is either disabled entirely or set to a much higher value in ldiskfs. Can somebody clarify what the ldiskfs default setting is for commit (I’m failing
hard trying to locate it in code or ldiskfs patches)? Adjusting the mount option on the OST to use “commit=5” does the right thing (prevents my client from going OOM without the workaround in #1) from what I can tell, so 5s must not be the default for ldiskfs.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">3. Are there thoughts from the community on whether setting “sync_journal=1” in lctl or changing the mount option to “commit=5” are preferable? The latter seems like it will be slightly more performant for very busy systems, but for streaming
I/O so far they produce identical results.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">4. OFD targets appear to maintain grant info relating to dirty, pending, and current available grant. I’m witnessing pending well exceed the ldiskfs journal size on my OSTs (which defaults to 1GB). Code suggests these two are discrete
concepts, as pending is correctness checked against blocks in the filesystem shifted left by the power of two associated with the block size. What’s the rationale behind the pending value?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Best,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">ellis<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Ellis Wilson <br>
<b>Sent:</b> Thursday, January 20, 2022 2:28 PM<br>
<b>To:</b> Peter Jones <pjones@whamcloud.com>; Raj <rajgautam@gmail.com>; Patrick Farrell <pfarrell@ddn.com><br>
<b>Cc:</b> lustre-discuss@lists.lustre.org<br>
<b>Subject:</b> RE: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under Buffered I/O (2.14/2.15)<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks for facilitating a login for me Peter. The bug with all logs and info I could think to include has been opened here:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><a href="https://jira.whamcloud.com/browse/LU-15468">https://jira.whamcloud.com/browse/LU-15468</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m going to keep digging on my end, but if anybody has any other bright ideas or experiments they’d like me to try, don’t hesitate to say so here or in the bug.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Peter Jones <<a href="mailto:pjones@whamcloud.com">pjones@whamcloud.com</a>>
<br>
<b>Sent:</b> Thursday, January 20, 2022 9:28 AM<br>
<b>To:</b> Ellis Wilson <<a href="mailto:elliswilson@microsoft.com">elliswilson@microsoft.com</a>>; Raj <<a href="mailto:rajgautam@gmail.com">rajgautam@gmail.com</a>>; Patrick Farrell <<a href="mailto:pfarrell@ddn.com">pfarrell@ddn.com</a>><br>
<b>Cc:</b> <a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
<b>Subject:</b> Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under Buffered I/O (2.14/2.15)<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100.0%">
<tbody>
<tr>
<td style="background:#A6A6A6;padding:5.25pt 1.5pt 5.25pt 1.5pt"></td>
<td width="100%" style="width:100.0%;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 11.25pt">
<div>
<p class="MsoNormal" style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">
<span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">You don't often get email from
</span><span style="color:black"><a href="mailto:pjones@whamcloud.com"><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif">pjones@whamcloud.com</span></a></span><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">.
</span><span style="color:black"><a href="http://aka.ms/LearnAboutSenderIdentification"><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif">Learn why this is important</span></a></span><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121"><o:p></o:p></span></p>
</div>
</td>
<td width="75" style="width:56.25pt;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 3.75pt;align:left">
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-size:14.0pt">Ellis<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:14.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:14.0pt">JIRA accounts can be requested from
</span><a href="mailto:info@whamcloud.com"><span style="font-size:14.0pt">info@whamcloud.com</span></a><span style="font-size:14.0pt"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:14.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:14.0pt">Peter<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:14.0pt"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span lang="EN-CA" style="font-size:12.0pt;color:black">From:
</span></b><span lang="EN-CA" style="font-size:12.0pt;color:black">lustre-discuss <</span><a href="mailto:lustre-discuss-bounces@lists.lustre.org"><span lang="EN-CA" style="font-size:12.0pt">lustre-discuss-bounces@lists.lustre.org</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">>
on behalf of Ellis Wilson via lustre-discuss <</span><a href="mailto:lustre-discuss@lists.lustre.org"><span lang="EN-CA" style="font-size:12.0pt">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">><br>
<b>Reply-To: </b>Ellis Wilson <</span><a href="mailto:elliswilson@microsoft.com"><span lang="EN-CA" style="font-size:12.0pt">elliswilson@microsoft.com</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">><br>
<b>Date: </b>Thursday, January 20, 2022 at 6:20 AM<br>
<b>To: </b>Raj <</span><a href="mailto:rajgautam@gmail.com"><span lang="EN-CA" style="font-size:12.0pt">rajgautam@gmail.com</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">>, Patrick Farrell <</span><a href="mailto:pfarrell@ddn.com"><span lang="EN-CA" style="font-size:12.0pt">pfarrell@ddn.com</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">><br>
<b>Cc: </b>"</span><a href="mailto:lustre-discuss@lists.lustre.org"><span lang="EN-CA" style="font-size:12.0pt">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">" <</span><a href="mailto:lustre-discuss@lists.lustre.org"><span lang="EN-CA" style="font-size:12.0pt">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA" style="font-size:12.0pt;color:black">><br>
<b>Subject: </b>Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under Buffered I/O (2.14/2.15)<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-CA"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span lang="EN-CA">Thanks Raj – I’ve checked all of the nodes in the cluster and they all have peer_credits set to 8, and credits are set to 256. AFAIK that’s quite low – 8 concurrent sends to any given peer at a time. Since I only have
two OSSes, for this client, that’s only 16 concurrent sends at a given moment. IDK if at this level this devolves to the maximum RPC size of 1MB or the current max BRW I have set of 4MB, but in either case these are small MB values.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA">I’ve reached out to Andreas and Patrick to try to get a JIRA account to open a bug, but have not heard back yet. If somebody on-list is more appropriate to assist with this, please ping me. I collected quite a bit of
logs/traces yesterday and have sysrq stacks to share when I can get access to the whamcloud JIRA.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA">ellis<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span lang="EN-CA">From:</span></b><span lang="EN-CA"> Raj <</span><a href="mailto:rajgautam@gmail.com"><span lang="EN-CA">rajgautam@gmail.com</span></a><span lang="EN-CA">>
<br>
<b>Sent:</b> Thursday, January 20, 2022 8:14 AM<br>
<b>To:</b> Patrick Farrell <</span><a href="mailto:pfarrell@ddn.com"><span lang="EN-CA">pfarrell@ddn.com</span></a><span lang="EN-CA">><br>
<b>Cc:</b> Andreas Dilger <</span><a href="mailto:adilger@whamcloud.com"><span lang="EN-CA">adilger@whamcloud.com</span></a><span lang="EN-CA">>; Ellis Wilson <</span><a href="mailto:elliswilson@microsoft.com"><span lang="EN-CA">elliswilson@microsoft.com</span></a><span lang="EN-CA">>;
</span><a href="mailto:lustre-discuss@lists.lustre.org"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA"><br>
<b>Subject:</b> [EXTERNAL] Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O (2.14/2.15)<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100.0%">
<tbody>
<tr>
<td style="background:#A6A6A6;padding:5.25pt 1.5pt 5.25pt 1.5pt"></td>
<td width="100%" style="width:100.0%;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 11.25pt">
<div>
<p class="MsoNormal" style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">
<span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">You don't often get email from
</span><span style="color:black"><a href="mailto:rajgautam@gmail.com"><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif">rajgautam@gmail.com</span></a></span><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">.
</span><span style="color:black"><a href="http://aka.ms/LearnAboutSenderIdentification"><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif">Learn why this is important</span></a></span><o:p></o:p></p>
</div>
</td>
<td width="75" style="width:56.25pt;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 3.75pt;align:left">
</td>
</tr>
</tbody>
</table>
<div>
<p class="MsoNormal"><span lang="EN-CA" style="color:black">Ellis, I would also check the peer_credit between server and the client. They should be same.</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-CA">On Wed, Jan 19, 2022 at 9:27 AM Patrick Farrell via lustre-discuss <</span><a href="mailto:lustre-discuss@lists.lustre.org"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA">> wrote:<o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-CA" style="font-size:12.0pt;color:black">Ellis,</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-CA" style="font-size:12.0pt;color:black"> </span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-CA" style="font-size:12.0pt;color:black">As you may have guessed, that function just set looks like a node which is doing buffered I/O and thrashing for memory. No particular insight available from the count of functions
there.<br>
<br>
<span style="background:white">Would you consider opening a bug report in the Whamcloud JIRA? You should have enough for a good report, here's a few things that would be helpful as well:</span><br>
<br>
It sounds like you can hang the node on demand. If you could collect stack traces with:
</span><span lang="EN-CA"><o:p></o:p></span></p>
<pre><span lang="EN-CA" style="font-size:9.0pt;font-family:Consolas;color:black">echo t > /proc/sysrq-trigger</span><span lang="EN-CA"><o:p></o:p></span></pre>
<p class="MsoNormal"><span lang="EN-CA" style="font-size:12.0pt;color:black">after creating the hang, that would be useful. (It will print to dmesg.)<br>
<br>
You've also collected debug logs - Could you include, say, the last 100 MiB of that log set? That should be reasonable to attach if compressed.</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-CA" style="font-size:12.0pt;color:black"> </span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-CA" style="font-size:12.0pt;color:black">Regards,</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span lang="EN-CA" style="font-size:12.0pt;color:black">Patrick</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div class="MsoNormal" align="center" style="text-align:center"><span lang="EN-CA">
<hr size="1" width="100%" align="center">
</span></div>
<div id="m_-1787521865844447704divRplyFwdMsg">
<p class="MsoNormal"><b><span lang="EN-CA" style="color:black">From:</span></b><span lang="EN-CA" style="color:black"> lustre-discuss <</span><a href="mailto:lustre-discuss-bounces@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss-bounces@lists.lustre.org</span></a><span lang="EN-CA" style="color:black">>
on behalf of Ellis Wilson via lustre-discuss <</span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA" style="color:black">><br>
<b>Sent:</b> Wednesday, January 19, 2022 8:32 AM<br>
<b>To:</b> Andreas Dilger <</span><a href="mailto:adilger@whamcloud.com" target="_blank"><span lang="EN-CA">adilger@whamcloud.com</span></a><span lang="EN-CA" style="color:black">><br>
<b>Cc:</b> </span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA" style="color:black"> <</span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA" style="color:black">><br>
<b>Subject:</b> Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O (2.14/2.15)</span><span lang="EN-CA">
<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-CA"> <o:p></o:p></span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p><span lang="EN-CA">Hi Andreas,<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">Apologies in advance for the top-post. I’m required to use Outlook for work, and it doesn’t handle in-line or bottom-posting well.<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">Client-side defaults prior to any tuning of mine (this is a very minimal 1-client, 1-MDS/MGS, 2-OSS cluster):<o:p></o:p></span></p>
<p><span lang="EN-CA"><br>
~# lctl get_param llite.*.max_cached_mb<o:p></o:p></span></p>
<p><span lang="EN-CA">llite.lustrefs-ffff8d52a9c52800.max_cached_mb=<o:p></o:p></span></p>
<p><span lang="EN-CA">users: 5<o:p></o:p></span></p>
<p><span lang="EN-CA">max_cached_mb: 7748<o:p></o:p></span></p>
<p><span lang="EN-CA">used_mb: 0<o:p></o:p></span></p>
<p><span lang="EN-CA">unused_mb: 7748<o:p></o:p></span></p>
<p><span lang="EN-CA">reclaim_count: 0<o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl get_param osc.*.max_dirty_mb<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0000-osc-ffff8d52a9c52800.max_dirty_mb=1938<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0001-osc-ffff8d52a9c52800.max_dirty_mb=1938<o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl get_param osc.*.max_rpcs_in_flight<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0000-osc-ffff8d52a9c52800.max_rpcs_in_flight=8<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0001-osc-ffff8d52a9c52800.max_rpcs_in_flight=8<o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl get_param osc.*.max_pages_per_rpc<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0000-osc-ffff8d52a9c52800.max_pages_per_rpc=1024<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0001-osc-ffff8d52a9c52800.max_pages_per_rpc=1024<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">Thus far I’ve reduced the following to what I felt were really conservative values for a 16GB RAM machine:<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl set_param llite.*.max_cached_mb=1024<o:p></o:p></span></p>
<p><span lang="EN-CA">llite.lustrefs-ffff8d52a9c52800.max_cached_mb=1024<o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl set_param osc.*.max_dirty_mb=512<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0000-osc-ffff8d52a9c52800.max_dirty_mb=512<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0001-osc-ffff8d52a9c52800.max_dirty_mb=512<o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl set_param osc.*.max_pages_per_rpc=128<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0000-osc-ffff8d52a9c52800.max_pages_per_rpc=128<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0001-osc-ffff8d52a9c52800.max_pages_per_rpc=128<o:p></o:p></span></p>
<p><span lang="EN-CA">~# lctl set_param osc.*.max_rpcs_in_flight=2<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0000-osc-ffff8d52a9c52800.max_rpcs_in_flight=2<o:p></o:p></span></p>
<p><span lang="EN-CA">osc.lustrefs-OST0001-osc-ffff8d52a9c52800.max_rpcs_in_flight=2<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">This slows down how fast I get to basically OOM from <10 seconds to more like 25 seconds, but the trend is identical.<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">As an example of what I’m seeing on the client, you can see below we start with most free, and then iozone rapidly (within ~10 seconds) causes all memory to be marked used, and that stabilizes at about 140MB free until at some point it
stalls for 20 or more seconds and then some has been synced out:<o:p></o:p></span></p>
<p><span lang="EN-CA"><br>
~# dstat --mem<o:p></o:p></span></p>
<p><span lang="EN-CA">------memory-usage-----<o:p></o:p></span></p>
<p><span lang="EN-CA">used free buff cach<o:p></o:p></span></p>
<p><span lang="EN-CA">1029M 13.9G 2756k 215M<o:p></o:p></span></p>
<p><span lang="EN-CA">1028M 13.9G 2756k 215M<o:p></o:p></span></p>
<p><span lang="EN-CA">1028M 13.9G 2756k 215M<o:p></o:p></span></p>
<p><span lang="EN-CA">1088M 13.9G 2756k 215M<o:p></o:p></span></p>
<p><span lang="EN-CA">2550M 11.5G 2764k 1238M<o:p></o:p></span></p>
<p><span lang="EN-CA">3989M 10.1G 2764k 1236M<o:p></o:p></span></p>
<p><span lang="EN-CA">5404M 8881M 2764k 1239M<o:p></o:p></span></p>
<p><span lang="EN-CA">6831M 7453M 2772k 1240M<o:p></o:p></span></p>
<p><span lang="EN-CA">8254M 6033M 2772k 1237M<o:p></o:p></span></p>
<p><span lang="EN-CA">9672M 4613M 2772k 1239M<o:p></o:p></span></p>
<p><span lang="EN-CA">10.6G 3462M 2772k 1240M<o:p></o:p></span></p>
<p><span lang="EN-CA">12.1G 1902M 2772k 1240M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.4G 582M 2772k 1240M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 2488k 1161M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 1528k 1174M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 140M 896k 1175M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 676k 1176M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 142M 528k 1177M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 140M 484k 1188M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 492k 1188M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 488k 1188M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 141M 488k 1186M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 141M 480k 1187M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 492k 1188M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 141M 600k 1188M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 580k 1187M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 140M 536k 1186M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 141M 668k 1186M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 139M 580k 1188M<o:p></o:p></span></p>
<p><span lang="EN-CA">13.9G 140M 568k 1187M<o:p></o:p></span></p>
<p><span lang="EN-CA">12.7G 1299M 2064k 1197M missed 20 ticks <-- client is totally unresponsive during this time<o:p></o:p></span></p>
<p><span lang="EN-CA">11.0G 2972M 5404k 1238M^C<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">Additionally, I’ve messed with sysctl settings. Defaults:<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_background_bytes = 0<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_background_ratio = 10<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_bytes = 0<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_expire_centisecs = 3000<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_ratio = 20<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_writeback_centisecs = 500<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">Revised to conservative values:<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_background_bytes = 1073741824<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_background_ratio = 0<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_bytes = 2147483648<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_expire_centisecs = 200<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_ratio = 0<o:p></o:p></span></p>
<p><span lang="EN-CA">vm.dirty_writeback_centisecs = 500<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">No observed improvement.<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="color:black">I’m going to trawl two logs today side-by-side, one with ldiskfs backing the OSTs, and one with zfs backing the OSTs, and see if I can see what the differences are since the zfs-backed version
never gave us this problem. The only other potentially useful thing I can share right now is that when I turned on full debug logging and ran the test until I hit OOM, the following were the most frequently hit functions in the logs (count, descending, is
the first column). This was approximately 30s of logs:</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="color:black"><br>
</span><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 205874 cl_page.c:518:cl_vmpage_page())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206587 cl_page.c:545:cl_page_owner_clear())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206673 cl_page.c:551:cl_page_owner_clear())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206748 osc_cache.c:2483:osc_teardown_async_page())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206815 cl_page.c:867:cl_page_delete())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206862 cl_page.c:837:cl_page_delete0())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206878 osc_cache.c:2478:osc_teardown_async_page())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206928 cl_page.c:869:cl_page_delete())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206930 cl_page.c:441:cl_page_state_set0())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 206988 osc_page.c:206:osc_page_delete())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207021 cl_page.c:179:__cl_page_free())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207021 cl_page.c:193:cl_page_free())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207021 cl_page.c:532:cl_vmpage_page())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207024 cl_page.c:210:cl_page_free())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207075 cl_page.c:430:cl_page_state_set0())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207169 osc_cache.c:2505:osc_teardown_async_page())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207175 cl_page.c:475:cl_pagevec_put())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207202 cl_page.c:492:cl_pagevec_put())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207211 cl_page.c:822:cl_page_delete0())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207384 osc_page.c:178:osc_page_delete())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 207422 osc_page.c:177:osc_page_delete())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 413680 cl_page.c:433:cl_page_state_set0())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p style="background:white"><span lang="EN-CA" style="font-size:10.5pt;font-family:"Segoe UI",sans-serif;color:black"> 413701 cl_page.c:477:cl_pagevec_put())</span><span lang="EN-CA"><o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">If anybody has any additional suggestions or requests for more info don’t hesitate to ask.<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">Best,<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<p><span lang="EN-CA">ellis<o:p></o:p></span></p>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p><b><span lang="EN-CA">From:</span></b><span lang="EN-CA"> Andreas Dilger <</span><a href="mailto:adilger@whamcloud.com" target="_blank"><span lang="EN-CA">adilger@whamcloud.com</span></a><span lang="EN-CA">>
<br>
<b>Sent:</b> Tuesday, January 18, 2022 9:54 PM<br>
<b>To:</b> Ellis Wilson <</span><a href="mailto:elliswilson@microsoft.com" target="_blank"><span lang="EN-CA">elliswilson@microsoft.com</span></a><span lang="EN-CA">><br>
<b>Cc:</b> </span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA"><br>
<b>Subject:</b> [EXTERNAL] Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O (2.14/2.15)<o:p></o:p></span></p>
</div>
</div>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="left" width="100%" style="width:100.0%">
<tbody>
<tr>
<td style="background:#A6A6A6;padding:5.25pt 1.5pt 5.25pt 1.5pt"></td>
<td width="100%" style="width:100.0%;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 11.25pt">
<div>
<p style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">
<span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">You don't often get email from
</span><span style="font-size:10.0pt;font-family:"Times New Roman",serif;color:black"><a href="mailto:adilger@whamcloud.com" target="_blank"><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif">adilger@whamcloud.com</span></a></span><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">.
</span><span style="font-size:10.0pt;font-family:"Times New Roman",serif;color:black"><a href="http://aka.ms/LearnAboutSenderIdentification" target="_blank"><span style="font-size:9.0pt;font-family:"Segoe UI",sans-serif">Learn why this is important</span></a></span><span style="font-size:10.0pt;font-family:"Times New Roman",serif"><o:p></o:p></span></p>
</div>
</td>
<td width="75" style="width:56.25pt;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 3.75pt">
</td>
</tr>
</tbody>
</table>
<div>
<p><span lang="EN-CA" style="color:white">On Jan 18, 2022, at 13:40, Ellis Wilson via lustre-discuss <</span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA">> wrote:<o:p></o:p></span></p>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<div>
<div>
<p><span lang="EN-CA">Recently we've switched from using ZFS to ldiskfs as the backing filesystem to work around some performance issues and I'm finding that when I put the cluster under load (with as little as a single client) I can almost completely lockup
the client. SSH (even existing sessions) stall, iostat, top, etc all freeze for 20 to 200 seconds. This alleviates for small windows and recurs as long as I leave the io-generating process in existence. It reports extremely high CPU and RAM usage, and appears
to be consumed exclusively doing 'system'-tagged work. This is on 2.14.0, but I've reproduced on more or less HOL for master-next. If I do direct-IO, performance is fantastic and I have no such issues regarding CPU/memory pressure.<br>
<br>
Uname: Linux 85df894e-8458-4aa4-b16f-1d47154c0dd2-lclient-a0-g0-vm 5.4.0-1065-azure #68~18.04.1-Ubuntu SMP Fri Dec 3 14:08:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux<br>
<br>
I dmesg I see consistent spew on the client about:<br>
[19548.601651] LustreError: 30918:0:(events.c:208:client_bulk_callback()) event type 1, status -5, desc 00000000b69b83b0<br>
[19548.662647] LustreError: 30917:0:(events.c:208:client_bulk_callback()) event type 1, status -5, desc 000000009ef2fc22<br>
[19549.153590] Lustre: lustrefs-OST0000-osc-ffff8d52a9c52800: Connection to lustrefs-OST0000 (at
</span><a href="mailto:10.1.98.7@tcp" target="_blank"><span lang="EN-CA">10.1.98.7@tcp</span></a><span lang="EN-CA">) was lost; in progress operations using this service will wait for recovery to complete<br>
[19549.153621] Lustre: 30927:0:(client.c:2282:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1642535831/real 1642535833] req@0000000002361e2d x1722317313374336/t0(0) o4-></span><a href="mailto:lustrefs-OST0001-osc-ffff8d52a9c52800@10.1.98.10" target="_blank"><span lang="EN-CA">lustrefs-OST0001-osc-ffff8d52a9c52800@10.1.98.10</span></a><span lang="EN-CA">@tcp:6/4
lens 488/448 e 0 to 1 dl 1642535883 ref 2 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:''<br>
[19549.153623] Lustre: 30927:0:(client.c:2282:ptlrpc_expire_one_request()) Skipped 4 previous similar messages<br>
<br>
But I actually think this is a symptom of extreme memory pressure causing the client to timeout things, not a cause.<br>
<br>
Testing with obdfilter-survey (local) on the OSS side shows expected performance of the disk subsystem. Testing with lnet_selftest from client to OSS shows expected performance. In neither case do I see the high cpu or memory pressure issues.<br>
<br>
Reducing a variety of lctl tunables that appear to govern memory allowances for Lustre clients does not improve the situation. <o:p></o:p></span></p>
</div>
</div>
</blockquote>
<div>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
</div>
<p><span lang="EN-CA">What have you reduced here? llite.*.max_cached_mb, osc.*.max_dirty_mb, osc.*.max_rpcs_in_flight and osc.*.max_pages_per_rpc?<o:p></o:p></span></p>
</div>
<div>
<p style="margin-bottom:12.0pt"><span lang="EN-CA"> <o:p></o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p><span lang="EN-CA">By all appearances, the running iozone or even simple dd processes gradually (i.e., over a span of just 10 seconds or so) consumes all 16GB of RAM on the client I'm using. I've generated bcc profile graphs for both on- and off-cpu analysis,
and they are utterly boring -- they basically just reflect rampant calls to shrink_inactive_list resulting from page_cache_alloc in the presence of extreme memory pressure.<o:p></o:p></span></p>
</div>
</div>
</blockquote>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA">We have seen some issues like this that are being looked at, but this is mostly only seen on smaller VM clients used in testing and not larger production clients. Are you able to test with more RAM on the client? Have you tried with
2.12.8 installed on the client?<o:p></o:p></span></p>
</div>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p><span lang="EN-CA" style="color:black">Cheers, Andreas</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black">--</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black">Andreas Dilger</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black">Lustre Principal Architect</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black">Whamcloud</span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black"> </span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black"> </span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<div>
<p><span lang="EN-CA" style="color:black"> </span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p><span lang="EN-CA" style="color:black"> </span><span lang="EN-CA"><o:p></o:p></span></p>
</div>
<p style="margin-bottom:12.0pt"><span lang="EN-CA"> <o:p></o:p></span></p>
</div>
<p><span lang="EN-CA"> <o:p></o:p></span></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><span lang="EN-CA">_______________________________________________<br>
lustre-discuss mailing list<br>
</span><a href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><span lang="EN-CA">lustre-discuss@lists.lustre.org</span></a><span lang="EN-CA"><br>
</span><a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=04%7C01%7Celliswilson%40microsoft.com%7Cfa5278382e5642deae0208d9dc210892%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637782856719975502%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DT%2Bb%2BQ1ec7rQcLhU1Pm9p60JHNQTZKQq51hRT2zouLc%3D&reserved=0" target="_blank"><span lang="EN-CA">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</span></a><span lang="EN-CA"><o:p></o:p></span></p>
</div>
</body>
</html>