<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Inline comments as following:<br>
<br>
On 5/30/11 1:51 PM, Jinshan Xiong wrote:
<blockquote
cite="mid:BA5D598A-2A89-48DF-A67A-4ACDD8B1F409@whamcloud.com"
type="cite"><base href="x-msg://164/"><br>
<div>
<div>On May 26, 2011, at 6:01 AM, Eric Barton wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite"><span class="Apple-style-span"
style="border-collapse: separate; font-family: 'Trebuchet
MS'; font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: normal;
orphans: 2; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
font-size: medium;">
<div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
<div class="WordSection1" style="page: WordSection1;">
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">Nasf,<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><o:p> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">Interesting
results. Thank you - especially for graphing the
results so thoroughly.<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">I’m attaching them
here and cc-ing lustre-devel since these are of
general interest.<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><o:p> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">I don’t think your
conclusion number (1), to say CLIO locking is
slowing us down<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">is as obvious from
these results as you imply. If you just compare the
1.8 and<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">patched 2.x
per-file times and how they scale with #stripes you
get this…<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><o:p> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><span><image001.png></span></span><span
style="color: rgb(31, 73, 125);"><o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><o:p> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">The gradients of
these lines should correspond to the additional time
per stripe required<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">to stat each file
and I’ve graphed these times below (ignoring the
0-stripe data for this<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">calculation because
I’m just interested in the incremental per-stripe
overhead).<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><o:p> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);"><span><image004.png></span></span><span
style="color: rgb(31, 73, 125);"><o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">They show
per-stripe overhead for 1.8 well above patched 2.x
for the lower stripe<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">counts, but whereas
1.8 gets better with more stripes, patched 2.x gets
worse. I’m<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">guessing that at
high stripe counts, 1.8 puts many concurrent
glimpses on the wire<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">and does it quite
efficiently. I’d like to understand better how you
control the #<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">of glimpse-aheads
you keep on the wire – is it a single fixed number,
or a fixed<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">number per OST or
some other scheme? In any case, it will be
interesting to see<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);">measurements at
higher stripe counts.<o:p></o:p></span></div>
<blockquote style="margin-top: 5pt; margin-bottom: 5pt;">
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><span
style="color: rgb(31, 73, 125);" lang="EN-US">Cheers,<span
class="Apple-converted-space"> </span><br>
Eric<o:p></o:p></span></div>
</blockquote>
<div style="border-style: none none none solid;
border-left: 1.5pt solid blue; padding: 0cm 0cm 0cm
4pt; position: static; z-index: auto;">
<div>
<div style="border-style: solid none none;
border-top: 1pt solid rgb(181, 196, 223); padding:
3pt 0cm 0cm;">
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: 'Times New Roman',serif;
color: black;"><b><span style="font-size: 10pt;
font-family: Tahoma,sans-serif; color:
windowtext;" lang="EN-US">From:</span></b><span
style="font-size: 10pt; font-family:
Tahoma,sans-serif; color: windowtext;"
lang="EN-US"><span
class="Apple-converted-space"> </span>Fan
Yong [<a class="moz-txt-link-freetext" href="mailto:yong.fan@whamcloud.com">mailto:yong.fan@whamcloud.com</a>]<span
class="Apple-converted-space"> </span><br>
<b>Sent:</b><span
class="Apple-converted-space"> </span>12 May
2011 10:18 AM<br>
<b>To:</b><span class="Apple-converted-space"> </span>Eric
Barton<br>
<b>Cc:</b><span class="Apple-converted-space"> </span>Bryon
Neitzel; Ian Colle; Liang Zhen<br>
<b>Subject:</b><span
class="Apple-converted-space"> </span>New
test results for "ls -Ul"<o:p></o:p></span></div>
</div>
</div>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><o:p> </o:p></div>
<p class="MsoNormal" style="margin: 0cm 0cm 12pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">I have improved
statahead load balance mechanism to distribute
statahead load to more CPU units on client. And
adjusted AGL according to CLIO lock state machine.
After those improvement, 'ls -Ul' can run more fast
than old patches, especially on large SMP node.<br>
<br>
On the other hand, as the increasing the degree of
parallelism, the lower network scheduler is becoming
performance bottleneck. So I combine my patches
together with Liang's SMP patches in the test.<o:p></o:p></p>
<table class="MsoNormalTable" style="width: 1019px;"
width="100%" border="1" cellpadding="0">
<tbody>
<tr>
<td style="padding: 1.5pt;" valign="top"><br>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">client
(fat-intel-4, 24 cores)<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">server
(client-xxx, 4 OSSes, 8 OSTs on each OSS)<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">b2x_patched<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">my patches +
SMP patches<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">my patches<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">b18<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">original b1_8<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">share the same
server with "b2x_patched"<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">b2x_original<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">original b2_x<o:p></o:p></div>
</td>
<td style="padding: 1.5pt;" valign="top">
<div style="margin: 0cm 0cm 0.0001pt;
font-size: 12pt; font-family: 'Times New
Roman',serif; color: black;">original b2_x<o:p></o:p></div>
</td>
</tr>
</tbody>
</table>
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><br>
Some notes:<br>
<br>
1) Stripe count affects traversing performance much,
and the impact is more than linear. Even if with all
the patches applied on b2_x, the degree of stripe
count impact is still larger than b1_8. It is
related with the complex CLIO lock state machine and
tedious iteration/repeat operations. It is not easy
to make it run as efficiently as b1_8.<br>
</div>
</div>
</div>
</div>
</span></blockquote>
<div><br>
</div>
<div><br>
</div>
<div>Hi there,</div>
<div><br>
</div>
<div>I did some tests to investigate the overhead of clio lock
state machine and glimpse lock, and I found something new.</div>
<div><br>
</div>
<div>Basically I did the same thing as what Nasf had done, but I
only cared about the overhead of glimpse locks. For this
purpose, I ran 'ls -lU' twice for each test, and the 1st run
is only used to create IBITS UPDATE lock cache for files;
then, I dropped cl_locks and ldlm_locks from client side cache
by setting zero to lru_size of ldlm namespaces, then do 'ls
-lU' once again. In the second run of 'ls -lU', the statahead
thread will always find cached IBITS lock(we can check mdc
lock_count for sure), so the elapsed time of ls will be
glimpse related.</div>
<div><br>
</div>
<div>This is what I got from the test:</div>
<div><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<base href="x-msg://164/">
<div class="AppleOriginalContents">
<div><br>
</div>
<div><br>
</div>
<div>Description and test environment:</div>
<div>- `ls -Ul time' means the time to finish the second run; </div>
<div>- 100K means 100K files under the same directory; 400K
means 400K files under the same directory;</div>
<div>- there are two OSSes in my test, and each OSS has 8
OSTs; OSTs are crossed over on two OSSes, i.e., OST0, 2, 4,..
are on OSS0; 1, 3, 5, .. are on OSS1;</div>
<div>- each node has 12G memory, 4 CPU cores;</div>
<div>- latest lustre-master build, b140</div>
<div><br>
</div>
<div>and, prorated per stripe overhead:</div>
<div><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<base href="x-msg://164/">
<div class="AppleOriginalContents">
<div><br>
</div>
<div><br>
</div>
<div>From the above test, it's very hard to make the conclusion
that cl_lock causes the increase of ls time by the stripe
count.</div>
<div><br>
</div>
<div>Here is the test script I used to do the test, and test
output is attached as well. Please let me know if I missed
something.</div>
</div>
</blockquote>
<br>
<br>
In theory, processing glimpse RPC for each stripe of the same file
should be in parallel. So means more stripe count, then less average
overhead per-stripe, at least it is the expectation. Flat line
cannot indicate the overhead is small enough. I suggest to compare
with b1_8 for the same tests.<br>
<br>
<br>
<blockquote
cite="mid:BA5D598A-2A89-48DF-A67A-4ACDD8B1F409@whamcloud.com"
type="cite">
<div class="AppleOriginalContents">
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<base href="x-msg://164/">
<div>
<div><br>
</div>
<div><br>
</div>
<div>===================</div>
<div>Let's take a step back to reconsider what's real cause in
Nasf's test. I tend to think the load on OSSes might cause
that symptom. It's obvious that Async Glimpse Lock produces
more stress on OSS, especially in his test env where multiple
OSTs are actually on the same OSS. This will make the ls time
increased by the stripe count as well - since OSS has to
handle more RPCs when the stripe count increases in a specific
time. This problem may be mitigated by distributing OSTs to
more OSSes.</div>
</div>
</blockquote>
<br>
<br>
Basically, I agree with you that the heavy load on OSS may be the
performance bottleneck, just as I said in former email, we found the
CPU loads on OSS were quite high when "ls -Ul" for large-striped
cases. It is easy to be verified as long as we have enough powerful
OSSes, unfortunately we have not now.<br>
<br>
Cheers,<br>
--<br>
Nasf<br>
<br>
<br>
<blockquote
cite="mid:BA5D598A-2A89-48DF-A67A-4ACDD8B1F409@whamcloud.com"
type="cite">
<div>
<div><br>
</div>
<div>Thanks,</div>
<div>Jinshan</div>
<br>
<blockquote type="cite">
<div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
<div class="WordSection1" style="page: WordSection1;">
<div style="border-style: none none none solid;
border-left: 1.5pt solid blue; padding: 0cm 0cm 0cm 4pt;
position: static; z-index: auto;">
<div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',serif; color: black;"><br>
2) Patched b2_x is much faster than original b2_x, for
traversing 400K * 32-striped directory, it is 100
times or more improved.<br>
<br>
3) Patched b2_x is also faster than b1_8, within our
test, patched b2_x is at least 4X faster than b1_8,
which matches the requirement in ORNL contract.<br>
<br>
4) Original b2_x is faster than b1_8 only for small
striped cases, not more than 4-striped. For large
striped cases, slower than b1_8, which is consistent
with ORNL test result.<br>
<br>
5) The largest stripe count is 32 in our test. We have
not enough resource to test more large striped cases.
And I also wonder whether it is worth to test more
large striped directory or not. Because how many
customers want to use large and full striped
directory? means contains 1M * 160-striped items in
signal directory. If it is rare case, then wasting
lots of time on that is worthless.<br>
<br>
We need to confirm with ORNL what is the last
acceptance test cases and environment, includes:<br>
a) stripe count<br>
b) item count<br>
c) network latency, w/o lnet router, suggest without
router.<br>
d) OST count on each OSS<br>
<br>
<br>
Cheers,<br>
--<br>
Nasf<o:p></o:p></div>
</div>
</div>
<span><result_20110512.xls></span>_______________________________________________<br>
Lustre-devel mailing list<br>
<a moz-do-not-send="true"
href="mailto:Lustre-devel@lists.lustre.org" style="color:
blue; text-decoration: underline;">Lustre-devel@lists.lustre.org</a><br>
<a moz-do-not-send="true"
href="http://lists.lustre.org/mailman/listinfo/lustre-devel"
style="color: blue; text-decoration: underline;">http://lists.lustre.org/mailman/listinfo/lustre-devel</a><br>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</body>
</html>