[lustre-devel] Design proposal for client-side compression

Fri Jul 28 08:12:16 PDT 2017

> Ah.  As it turns out, much more complicated than I anticipated.
>  Thanks for explaining...
> 
> I have no expertise in compression algorithms, so that I will have to
> just watch from the sidelines.  Good luck.
> 
> When you are further along, I remain interested in helping out with
> the Lustre side of things.
> 
> One more question - Do you have a plan to make this work *without*
> the ZFS integration as well, for those using ldiskfs?  That seems
> straightforward enough - compress/decompress at send and recieve time
> - even if the benefits would be smaller, but not everyone (Cray,
> f.x.) is using ZFS, so I'm very interested in something that would
> help ldiskfs as well.  (Which is not to say don't do the deeper
> integration with ZFS.  Just that we'd like something available for
> ldiskfs too.)

I fear it is also much more complicated :)

At the very beginning of the project proposal we hoped we wouldn't need
to touch the server so much. It turned out wrong, moreover we have to
modify not only the Lustre server, but also pretty much the backend
itself. We chose ZFS since it already provides a lot of infrastructure
that we would need to implement completely new in ldiskfs. Since, at
least for me, it is a research project, ldiskfs is out of scope. Once
we proved the concept, one could re-implement the whole compression
stack for ldiskfs. So it is not impossible, but not our focus for this
project. 

Nevertheless we tried to keep our changes as far as possible not very
backend specific. For example we need some additional information to be
stored per compressed chunk. One possibility would be to change the
block pointer of ZFS and add those fields, but I don't think anyone
except of us would like the BP to be modified :) So we decided to store
them as a header for every chunk. For ldiskfs, since one would need to
implement everything from scratch anyway, one might not need that
header, but take the required fields into account from the beginning
and add them to ldiskfs' "block pointer". For that reason, we wanted to
leave the compressed data "headerless" on client-side, and add the
header only on the server side if the corresponding backend requires
it. 

Well, we did it, and it even works sometimes, but it looks horrible 
and is really counterintuitive. We send less data from client than
lands on the OST, recalculate offsets, since we add the header during
receiving on server side, recalculate the sent and received sizes,
shift buffers by offsets and so on. The only advantage of this approach
is client's independence from backend. We decided the price is too
high. So now, I will construct the chunk with the header just after
compressing the data on client-side, get rid of all those offset stuff
on the server. But ldiskfs will have to deal with that ZFS-motivated
details. 

However, a light version of compression could work with smaller changes
to ldiskfs, when we only allow a completely compressed or not
compressed files and allow potential performance drops for broken read-
ahead (due to gaps within the data). 

Hope it is somehow more clear now.

Regards,
Anna