[lustre-discuss] Interesting disk usage of tar of many small files on zfs-based Lustre 2.10

Nathan R.M. Crawford nrcrawfo at uci.edu
Thu Aug 3 15:07:29 PDT 2017


  In testing how to cope with naive users generating millions of tiny
files, I noticed some surprising (to me) behavior on a lustre 2.10/ZFS
0.7.0 system.

  The test directory (based on actual user data) contains about 4 million
files (avg size 8.6K) in three subdirectories. Making tar files of each
subdirectory gives the total nominal size of 34GB, and using "zfs list",
the tar files took up 33GB on disk.

  The initially surprising part is that making copies of the tar files only
adds 9GB to the disk usage. I suspect that the creation of the tar files is
as a bunch of tiny appendings, and with a raidz2 on ashift=12 disks (4MB
max recordsize), there is some overhead/wasted space on each mini-write.
The copies of the tar files, however, could be made as a single write that
avoided the overhead and probably allowed the lz4 compression to be more
efficient.

  Are there any tricks or obscure tar options that make archiving millions
of tiny files on a Lustre system avoid this? It is not a critical issue, as
taking a minute to copy the tar files is simple enough.

-Nate

-- 

Dr. Nathan Crawford              nathan.crawford at uci.edu
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170803/134d709e/attachment.htm>


More information about the lustre-discuss mailing list