<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi,</p>
<p>Maybe, you can also try this :<br>
</p>
<p> <a href="https://github.com/quentinbouyer/topmdt">https://github.com/quentinbouyer/topmdt</a></p>
<div class="moz-cite-prefix">Le 28/05/2020 à 18:32, Chad DeWitt a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAAyf6vCiTJA0SgT=sXh_Ew6JLGqF3EjZt+jimQX0+9pW3U1dww@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Hi Heath,
<div><br>
</div>
<div>Hope you're doing well!</div>
<div><br>
</div>
<div>Your mileage may vary (and quite frankly, there may
be better approaches), but this is a quick and dirty
set of steps to find which client is issuing a large
number of metadata operations.:</div>
</div>
</div>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div>
<ul>
<li>Log into the affected MDS.</li>
</ul>
</div>
</blockquote>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div>
<ul>
<li>Change into the exports directory.<br>
</li>
</ul>
</div>
</blockquote>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div>
<div>
<div>
<div><font face="monospace">cd
/proc/fs/lustre/mdt/<i><Your affected
MDT></i>/exports/</font></div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<ul>
<li>OPTIONAL: Set all your stats to zero and clear out
stale clients. (If you don't want to do this step, you
don't really have to, but it does make it easier to
see the stats if you are starting with a clean slate.
In fact, you may want to skip this the first time
through and just look for high numbers. If a
particular client is the source of the issue, the
stats should clearly be higher for that client when
compared to the others.)<br>
</li>
</ul>
</blockquote>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div>
<div>
<div>
<div><font face="monospace">echo "C" > clear</font></div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div>
<ul>
<li>Wait for a few seconds and dump the stats.<br>
</li>
</ul>
</div>
</blockquote>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px"><font face="monospace">for
client in $( ls -d */ ) ; do echo && echo
&& echo ${client} && cat
${client}/stats && echo ; done</font><br>
</blockquote>
</blockquote>
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>You'll get a listing of stats for each mounted
client like so:</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
<blockquote style="margin:0 0 0 40px;border:none;padding:0px">
<div>
<div>
<div>
<div>
<div><font face="monospace">open
278676 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">close
278629 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">mknod
2320 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">unlink
495 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">mkdir
575 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">rename
1534 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">getattr
277552 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">setattr
550 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">getxattr
2742 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">statfs
350058 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
<div>
<div>
<div>
<div>
<div><font face="monospace">samedir_rename
1534 samples [reqs]</font></div>
</div>
</div>
</div>
</div>
</blockquote>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>(Don't worry if some of the clients give back what
appears to be empty stats. That just means they are
mounted, but have not yet performed any metadata
operations.) From this data, you are looking for any
"high" samples. The client with the high samples is
usually the culprit. For the example client stats
above, I would look to see what process(es) on this
client is listing, opening, and then closing files in
Lustre... The advantage with this method is you are
seeing exactly which metadata operations are
occurring. (I know there are also various utilities
included with Lustre that may give this information as
well, but I just go to the source.)</div>
<div><br>
</div>
<div>Once you find the client, you can use various
commands, such as <font face="monospace">mount</font>
and <font face="monospace">lsof</font> to get a
better understanding of what may be hitting Lustre.</div>
<div><br>
</div>
<div>Some of the more common issues I've found that can
cause a high MDS load:</div>
<div>
<ul>
<li>List a directory containing a large number of
files. (Instead, unalias <font face="monospace">ls</font>
or better yet, use <font face="monospace">lfs
find</font>.)</li>
<li>Remove on many files.</li>
<li>Open and close many files. (May be better to
move the data over to another file system, such as
XFS, etc. We keep some of our deep learning off
Lustre, because of the sheer number of small
files.)</li>
</ul>
Of course the actual mitigation of the load depends on
what the user is attempting to do...</div>
<div><br>
</div>
<div>I hope this helps...</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Chad</div>
<div><br clear="all">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<p style="margin:0in 0in 0.0001pt"><span
style="background-color:rgba(255,255,255,0)"><font size="2"
face="monospace, monospace">------------------------------------------------------------</font></span></p>
<p style="margin:0in 0in 0.0001pt"><span
style="background-color:rgba(255,255,255,0)"><font size="2"
face="monospace, monospace">Chad
DeWitt, CISSP</font></span></p>
<p style="margin:0in 0in 0.0001pt"><span
style="background-color:rgba(255,255,255,0)"><font size="2"
face="monospace, monospace">UNC
Charlotte <b>| </b>ITS –
University Research Computing</font></span></p>
<p style="margin:0in 0in 0.0001pt"><span
style="background-color:rgba(255,255,255,0)"></span></p>
<p style="margin:0in 0in 0.0001pt"><font
size="2" face="monospace, monospace"
color="#000000"><span
style="background-color:rgba(255,255,255,0)"><a
href="mailto:ccdewitt@uncc.edu"
style="color:rgb(17,85,204)"
target="_blank"
moz-do-not-send="true">ccdewitt@uncc.edu</a> <b>| </b><a
style="color:rgb(34,34,34)"
moz-do-not-send="true">www.uncc.edu</a></span></font></p>
<p style="margin:0in 0in 0.0001pt"><span
style="background-color:rgba(255,255,255,0)"><font size="2"
face="monospace, monospace">------------------------------------------------------------</font></span></p>
<p style="margin:0in 0in 0.0001pt"><span
style="background-color:rgba(255,255,255,0)"><font size="2"
face="monospace, monospace"><br>
</font></span></p>
<p style="margin:0in 0in 0.0001pt">If
you are not the <span>intended</span> recipient
of this transmission or a person
responsible for delivering it to the <span>intended</span> recipient,
any disclosure, copying, distribution,
or other use of any of the information
in this transmission is strictly
prohibited. If you have received this
transmission in error, please notify
me immediately by reply email or by
telephone at 704-687-7802. Thank you.<span
style="background-color:rgba(255,255,255,0)"><font size="2"
face="monospace, monospace"><br>
</font></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, May 28, 2020 at 11:37
AM Peeples, Heath <<a href="mailto:heathp@hpc.msstate.edu"
target="_blank" moz-do-not-send="true">heathp@hpc.msstate.edu</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div>
<p class="MsoNormal">I have 2 MDSs and periodically on one
of them (either at one time or another) peak above 300,
causing the file system to basically stop. This lasts
for a few minutes and then goes away. We can’t identify
any one user running jobs at the times we see this, so
it’s hard to pinpoint this on a user doing something to
cause it. Could anyone point me in the direction of
how to begin debugging this? Any help is greatly
appreciated.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Heath</p>
</div>
</div>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org"
target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
<a
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
lustre-discuss mailing list
<a class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
<a class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
</pre>
</blockquote>
</body>
</html>