[lustre-discuss] Interesting disk usage of tar of many small files on zfs-based Lustre 2.10

Alexander I Kulyavtsev aik at fnal.gov
Thu Aug 3 18:53:08 PDT 2017


Lustre IO size is 1MB; you have zfs record 4MB.
Do you see IO rate change when tar record size set to 4 MB (tar -b 8192) ?

How many data disks do you have at raidz2?

zfs may write few extra empty blocks to improve defragmentation; IIRC this patch is on by default in zfs 0.7 to improve io rates for some disks:
https://github.com/zfsonlinux/zfs/pull/5931

 If I understand it correctly, for very small files (untarred) there will be overhead to pad file to record size, and for extra padding to P+1 records (=P extra) and for parity records (+P). Plus metadata size for the lustre ost object. For raidz2 with P=2 it is factor 5x or more.

Alex.

On Aug 3, 2017, at 7:28 PM, Nathan R.M. Crawford <nrcrawfo at uci.edu<mailto:nrcrawfo at uci.edu>> wrote:

Off-list, it was suggested that tar's default 10K blocking may be the cause. I increased it to 1MiB using "tar -b 2048 ...", which seems to result in the expected 9.3 GiB disk usage. It probably makes archives incompatible with very old versions of tar, but meh.

-Nate

On Thu, Aug 3, 2017 at 3:07 PM, Nathan R.M. Crawford <nrcrawfo at uci.edu<mailto:nrcrawfo at uci.edu>> wrote:
  In testing how to cope with naive users generating millions of tiny files, I noticed some surprising (to me) behavior on a lustre 2.10/ZFS 0.7.0 system.

  The test directory (based on actual user data) contains about 4 million files (avg size 8.6K) in three subdirectories. Making tar files of each subdirectory gives the total nominal size of 34GB, and using "zfs list", the tar files took up 33GB on disk.

  The initially surprising part is that making copies of the tar files only adds 9GB to the disk usage. I suspect that the creation of the tar files is as a bunch of tiny appendings, and with a raidz2 on ashift=12 disks (4MB max recordsize), there is some overhead/wasted space on each mini-write. The copies of the tar files, however, could be made as a single write that avoided the overhead and probably allowed the lz4 compression to be more efficient.

  Are there any tricks or obscure tar options that make archiving millions of tiny files on a Lustre system avoid this? It is not a critical issue, as taking a minute to copy the tar files is simple enough.

-Nate

--

Dr. Nathan Crawford              nathan.crawford at uci.edu<mailto:nathan.crawford at uci.edu>
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508<tel:(949)%20824-4508>
Irvine, CA 92697-2025, USA



--

Dr. Nathan Crawford              nathan.crawford at uci.edu<mailto:nathan.crawford at uci.edu>
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170804/905cfd87/attachment-0001.htm>


More information about the lustre-discuss mailing list