<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">As HW latencies shrink to zero, does it not make you nervous to suggest adding compression into the metadata critical path?</div><div dir="ltr"><br>On Nov 21, 2018, at 7:27 PM, Patrick Farrell <<a href="mailto:paf@cray.com">paf@cray.com</a>> wrote:<br><br></div><blockquote type="cite"><div dir="ltr">
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<div id="divtagdefaultwrapper" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;" dir="ltr">
<p style="margin-top:0;margin-bottom:0">Andreas,</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">Thanks for the informative reply.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">You raise an interesting and nasty point about breaking the compact layout with movement. It's not possible today to move an individual OST object/stripe, though it's certainly something I've heard people ask for. So
it wouldn't be an issue if all such operations must address whole components, as is required today.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">If we did add the ability to switch out an individual OST object/stripe (which would be pretty easy to implement - data copy, layout swap, rm now-unused object), we could add those modifications as additional "traditional"
layout info "atop" the compact layout. So just the usual layout format, with OST IDs, and where present, it supersedes the relevant part of the compact layout. This implicitly assumes we don't do a ton of this to a particular layout.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">But as to reasons, it's a few things.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">The primary concern is improving the open performance of very widely striped files, which means your second case - reduce the xattr and rpc size.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">The same things that motivate this would also motivate raising the count limit, but my understanding from comments in the code is that 2000 is arbitrary, and the actual max could be quite a bit higher. The first limit
I'm aware of - I'm not sure if this is right? - is 1 MiB of extended attribute. That's a little over 5000 stripes. (Obviously, 1 MiB of layout is probably a non-starter...)</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">Your suggestion of gzip is very intriguing. Ideally, I'd pick something available in kernel and with good performance. A bit of experimentation is probably in order if we go that route. Thanks for the pointer there.
I'd probably start with extracting the binary xattr and seeing how it compresses.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">- Patrick</p>
<p style="margin-top:0;margin-bottom:0"></p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Andreas Dilger <<a href="mailto:adilger@whamcloud.com">adilger@whamcloud.com</a>><br>
<b>Sent:</b> Wednesday, November 21, 2018 5:53:03 PM<br>
<b>To:</b> Patrick Farrell<br>
<b>Cc:</b> Lustre Developement<br>
<b>Subject:</b> Re: [lustre-devel] Compact layouts</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">On Nov 16, 2018, at 11:06, Patrick Farrell <<a href="mailto:paf@cray.com">paf@cray.com</a>> wrote:<br>
> <br>
> All,<br>
> <br>
> There is an old idea for reducing the data required to describe file striping by using a bitmap to record which OSTs are in use. As best I can tell, this was most recently described here:<br>
> <a href="http://wiki.lustre.org/Layout_Enhancement_Solution_Architecture#Compact_Layouts_2">
http://wiki.lustre.org/Layout_Enhancement_Solution_Architecture#Compact_Layouts_2</a><br>
> <br>
> I’m curious if this has been pursued any further, if there’s a JIRA or other place that might have more info or be tracking the idea. I poked around and didn’t find anything.<br>
> <br>
> In particular, this comment:<br>
> “with enough data that for each OST index set in the bitmap, a corresponding OST object FID may be computed”<br>
> Points at the difficult part of implementing this.<br>
> <br>
> So, before I get too far considering this problem - Is there more out there somewhere? Hoping to avoid duplicating work!<br>
<br>
Patrick,<br>
as you mention above, the tricky part is that there would need to be sequential FID sequence allocation across all of the OSTs. Then, each of the compact files would allocate/reserve the same OID in each of the sequences so that the mapping could be compact.
I don't think that is insurmountable - we already have a good mechanism for allocating FID sequences to different targets, but it would need to be extended so that compact layouts would allocate sequences from a different range of values from regular layouts.<br>
<br>
It would also likely need to implement "OST object create on write" so that there aren't large numbers of unused objects on each OST (one for each OID that isn't used on a particular file).<br>
<br>
The other issue is that anything like migrating any single object to another OST (e.g. for mirror resync, tiering, etc) would potentially break the compact layout.<br>
<br>
I guess the question is what the need for compact layouts is? To handle more than 2000 stripes, to reduce the xattr size/RPC size, to allow more complex PFL layouts to fit into the layout size limit?<br>
<br>
In the past we discussed compressing the layout with gzip, which might be quite effective since large parts of it are zero-filled and repetitive. This would help the xattr/RPC size, and I think even with compact layouts that they would still be expanded in
RAM to allow easier processing.<br>
<br>
Cheers, Andreas<br>
---<br>
Andreas Dilger<br>
Principal Lustre Architect<br>
Whamcloud<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</div>
</span></font></div>
</div></blockquote><blockquote type="cite"><div dir="ltr"><span>_______________________________________________</span><br><span>lustre-devel mailing list</span><br><span><a href="mailto:lustre-devel@lists.lustre.org">lustre-devel@lists.lustre.org</a></span><br><span><a href="http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org</a></span><br></div></blockquote></body></html>