[lustre-discuss] Interesting disk usage of tar of many small files on zfs-based Lustre 2.10

Nathan R.M. Crawford nrcrawfo at uci.edu
Thu Aug 3 17:28:38 PDT 2017


Off-list, it was suggested that tar's default 10K blocking may be the
cause. I increased it to 1MiB using "tar -b 2048 ...", which seems to
result in the expected 9.3 GiB disk usage. It probably makes archives
incompatible with very old versions of tar, but meh.

-Nate

On Thu, Aug 3, 2017 at 3:07 PM, Nathan R.M. Crawford <nrcrawfo at uci.edu>
wrote:

>   In testing how to cope with naive users generating millions of tiny
> files, I noticed some surprising (to me) behavior on a lustre 2.10/ZFS
> 0.7.0 system.
>
>   The test directory (based on actual user data) contains about 4 million
> files (avg size 8.6K) in three subdirectories. Making tar files of each
> subdirectory gives the total nominal size of 34GB, and using "zfs list",
> the tar files took up 33GB on disk.
>
>   The initially surprising part is that making copies of the tar files
> only adds 9GB to the disk usage. I suspect that the creation of the tar
> files is as a bunch of tiny appendings, and with a raidz2 on ashift=12
> disks (4MB max recordsize), there is some overhead/wasted space on each
> mini-write. The copies of the tar files, however, could be made as a single
> write that avoided the overhead and probably allowed the lz4 compression to
> be more efficient.
>
>   Are there any tricks or obscure tar options that make archiving millions
> of tiny files on a Lustre system avoid this? It is not a critical issue, as
> taking a minute to copy the tar files is simple enough.
>
> -Nate
>
> --
>
> Dr. Nathan Crawford              nathan.crawford at uci.edu
> Modeling Facility Director
> Department of Chemistry
> 1102 Natural Sciences II         Office: 2101 Natural Sciences II
> University of California, Irvine  Phone: 949-824-4508 <(949)%20824-4508>
> Irvine, CA 92697-2025, USA
>
>


-- 

Dr. Nathan Crawford              nathan.crawford at uci.edu
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170803/b45e90f0/attachment.htm>


More information about the lustre-discuss mailing list