[lustre-discuss] Lustre optimize for spares data files ?

Thu Sep 17 14:47:06 PDT 2020

Hello, T.H.Hsieh.

You asked, "... is it possible to enable file compression on the ZFS backend only without any side effect in the whole Lustre file system ?"

The answer is, yes.  As Robert Redl explained earlier, the OST can deal with some objects compressed and others not, even while toggling it.  The effect is that you can have files and filesystems on a mix.  In fact, you can have a single file composed of compressed objects (stripes) on zfs, uncompressed objects on zfs, and ldiskfs too.

-Cory

On 9/12/20, 12:53 PM, "lustre-discuss on behalf of Tung-Han Hsieh" <lustre-discuss-bounces at lists.lustre.org on behalf of thhsieh at twcp1.phys.ntu.edu.tw> wrote:

    Dear Andreas,

    Sorry for replying late. In the past days we fell into a nightmare
    of upgrading a large Lustre system from version 1.8.8 directly to
    version 2.10.7. Fortunately everything seems done.

    Our scenario of storing large sparse data file is simple. We have
    standard C code for some computation. After computation, it produced
    a large sparse matrix around 3GB with more than 90% of zeros. It just
    uses fwrite() to write the whole matrix into a binary file.

    We send this code to run in many different clusters. We expect that
    every generated data file should be size 3GB. But occationally we
    found in one cluster (which was developed by other lab, not our group)
    the actual occupied size of each file is only less than 10%. We use

    	du -s <filename>

    to see this phenomena. But we don't know what's the backend file
    system that cluster use, because we only see that the storage is
    mounted as nfs4.

    Note that our C code only use fwrite() write the whole large sparse
    matrix. There is no trick played in our code. So the "compression"
    of the large sparse file must be done by the backend file system
    itself. So I am just curious whether Lustre has this implementation
    or not.

    It seems that Lustre with ZFS backend can do this, because the file
    compression is actually done by ZFS. Hence here comes another question:
    If we have a lot of OSTs, some with ZFS, and some with ldiskfs, is it
    possible to enable file compression on the ZFS backend only without
    any side effect in the whole Lustre file system ?

    Any comments are very welcome. Thanks in advance.

    Best Regards,

    T.H.Hsieh

    On Wed, Sep 09, 2020 at 01:45:10PM -0600, Andreas Dilger wrote:
    > On Sep 8, 2020, at 9:13 PM, Tung-Han Hsieh <thhsieh at twcp1.phys.ntu.edu.tw> wrote:
    > > 
    > > I would like to ask whether Lustre file system has implemented the
    > > function to optimize for large sparse data files ?
    > > 
    > > For example, a 3GB data file but with more than 80% bytes zero, can
    > > Lustre file system optimize the storage not actually taking the whole
    > > 3GB of disk space ?
    > 
    > Could you please explain your usage further?  Lustre definitely has
    > support for sparse files - if they are written by an application with
    > "seek" or by multiple threads in parallel, then only the blocks that
    > are written will use space on the OST.
    > 
    > For ldiskfs the block size is 4KB.  For ZFS the OST block size is up
    > to 1MB, if the file size is 1MB or larger.  That is why compression
    > on ZFS can help reduce space usage on the OST, because it can effectively
    > compress the 1MB blocks that are nearly full of zeroes, if your sparse
    > writes are smaller than the blocksize.
    > 
    > If you are *copying* a sparse file, that depends on the tool that is
    > doing the copy.  For example, "cp --sparse=always" will generate a
    > sparse file.  We are also working on adding SEEK_HOLE and SEEK_DATA,
    > which will help tools to copy sparse files.
    > 
    > Cheers, Andreas
    > 
    > > I know that some file systems (e.g., ZFS) has this function. If Lustre
    > > does not have it, is there a roadmap to implement it in the future ?
    > > 
    > > Thanks for your reply in advance.
    > > 
    > > Best Regards,
    > > 
    > > T.H.Hsieh
    > > _______________________________________________
    > > lustre-discuss mailing list
    > > lustre-discuss at lists.lustre.org
    > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org 
    > 
    > 
    > Cheers, Andreas
    > 
    > 
    > 
    > 
    > 

    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org