[lustre-devel] Design proposal for client-side compression
jinshan.xiong at intel.com
Fri Jul 21 12:12:02 PDT 2017
Please see inserted lines.
From: Anna Fuchs <anna.fuchs at informatik.uni-hamburg.de>
Date: Friday, July 21, 2017 at 8:15 AM
To: "Xiong, Jinshan" <jinshan.xiong at intel.com>
Cc: Matthew Ahrens <mahrens at delphix.com>, "Zhuravlev, Alexey" <alexey.zhuravlev at intel.com>, lustre-devel <lustre-devel at lists.lustre.org>
Subject: Re: [lustre-devel] Design proposal for client-side compression
for compression within the osc module we need a bunch of pages for the
compressed output (at most the same size like original data), and few
pages for working memory of the algorithms. Since allocating (and later
freeing) the pages every time we enter the compression loop might be
expensive and annoying, we thought about a pool of pages, which is
present exclusively for compression purposes.
We would create that pool at file system start (when loading the osc
module) and destroy at file system stop (when unloading the osc
module). The condition is, of course, the configure option --enable-
compression. The pool would be a queue of page bunches where a thread
Is it possible to enable this by writing to a sysfs or procfs entry? So that users can try this out without having to recompile Lustre.
can pop pages for compression and put them back after the compressed
portion was transferred. The page content will not be visible to anyone
outside and will also not be cached after the transmission.
We would like to make the pool static since we think, we do not need a
lot of memory. However it depends on the number of stripes or MBs, that
one client can handle at the same time. E.g. for 32 stripes of 1MB
processed at the same time, we need at most 32 MB + few MB for
Actually, we have increased the default RPC size to be 4MB so this assumption is no longer true.
overhead. Where can I find the exact number or how can I estimate how
many stripes there are at most at the same time? Another limitation is
It’s not scalable to have a pool per OSC because Lustre can support up to 2000 stripes. However, we don’t need to worry about wide stripe problem because no one can write a full stripe with even 1MB stripe size, because that means application has to issue 2GB size of write.
the number of threads, which can work in parallel on compression at the
same time. We think to exclusively reserve not more than 50 MB for the
compression page pool per client. Do you think it might hurt the
Yes, it’s reasonable to have a global pool for each client node. Let’s start from this number but please make it adjustable via sysfs or procfs.
Once there are not enough pages, for whatever reason, we wouldn't wait,
but just skip the compression for the respective chunk.
Are there any problems you see in that approach?
More information about the lustre-devel