[lustre-discuss] Disk usage / quota discrepancy

Hans Henrik Happe happe at nbi.dk
Fri Jan 9 00:40:42 PST 2026


Hi,

We just hit this one badly when migrating files to new hardware.

Is there a reason it was not fixed in 2.15.8? Or is the revert of patch 
47768 not the real solution?

Cheers,
Hans Henrik

On 11/12/2025 17.24, Nehring, Shane R [ITS] via lustre-discuss wrote:
> A pretty naive approach might be to lfs find any file with more than
> one component and then compare the results of df and df --apparent-size
> for instances where the --apparent-size is significantly smaller than
> the regular df size.
>
> On Thu, 2025-12-11 at 15:28 +0100, Rasool Almasikoupaei via lustre-
> discuss wrote:
>> Hi All,
>>
>> Just wondering how we can identify the affected files and migrate
>> them
>> back to the optimal block size (after applying the patch, of course).
>> We upgraded to 2.15.7 when it was released (from 2.15.6).
>>
>> Best,
>> Rasool
>>
>> On 20.11.25 07:05, David Schanzenbach via lustre-discuss wrote:
>>> Hi Shane,
>>>
>>> If you are using Lustre 2.15.7 + zfs with PFL , you are possibly
>>> encountering issue LU-19193
>>> https://jira.whamcloud.com/browse/LU-19193 .
>>>
>>> For our site, until we applied the patch provided in LU-19193,
>>> we saw newly written files balloon to 4-10 times the uncompressed
>>> file
>>> size, unless it was written to a single stripe.
>>>
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 11/19/2025 5:56 AM, Nehring, Shane R [ITS] via lustre-discuss
>>> wrote:
>>>> Hello All,
>>>>
>>>> As background this lustre volume is only (currently) 1 mdt and 5
>>>> osts, the mdt and osts are using zfs with compression on. We also
>>>> have a default PFL defined to stripe any file larger than 1T to 2
>>>> osts.
>>>>
>>>> Recently we had a user create a swath of large files and I had to
>>>> reach out to them because the volume was filling up pretty
>>>> quickly. The files compress very nicely so they're in the process
>>>> of doing that now, which resulted in a very quick reclaiming of
>>>> the used space. It all sounds innocuous, but the rate at which
>>>> the space was consumed and then freed was somewhat suspicious,
>>>> the reported usage was also very much higher than what the user
>>>> expected. So I got to looking.
>>>>
>>>> The usage as reported by lfs quota for the project directory this
>>>> is:
>>>> # lfs quota -h -p 212496 /lustre/hdd
>>>> Disk quotas for prj 212496 (pid 212496):
>>>>        Filesystem    used   quota   limit   grace   files   quota
>>>>    limit   grace
>>>>       /lustre/hdd  260.3T      0k      0k       -    7205       0
>>>>        0       -
>>>> At its highest point it was around 660T. The file count isn't too
>>>> high so on average the files are quite large.
>>>>
>>>> A du on the directory reports numbers in line with the quota
>>>> report:
>>>> # du -sh /lustre/hdd/LAS/<directory>
>>>> 261T    /lustre/hdd/LAS/<directory>
>>>>
>>>> The bulk of this space is used by a single directory containing
>>>> multiple large (3T+ before compression) files.
>>>>
>>>> a du from within it looks like:
>>>> du -sh .
>>>> 253T    .
>>>>
>>>> however that number does not appear realistic when you roughly
>>>> sum the sizes of the files reported with ls:
>>>> -rw-rw----+ 1 <user> <group> 470G Nov  1 01:34
>>>> run2025.ZmChr10.all.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 150K Nov  1 01:34
>>>> run2025.ZmChr10.all.vcf.idx
>>>> -rw-r-----+ 1 <user> <group> 259G Oct 21 20:10
>>>> run2025.ZmChr10.variant.vcf.gz
>>>> -rw-r-----+ 1 <user> <group> 150K Oct 21 17:49
>>>> run2025.ZmChr10.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 942G Nov 11 20:21
>>>> run2025.ZmChr1.all.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 302K Nov 11 20:21
>>>> run2025.ZmChr1.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 830G Nov 10 01:10
>>>> run2025.ZmChr1.variant.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 302K Nov 10 01:10
>>>> run2025.ZmChr1.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 736G Nov 10 22:02
>>>> run2025.ZmChr2.all.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 239K Nov 10 22:02
>>>> run2025.ZmChr2.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 407G Oct 31 21:16
>>>> run2025.ZmChr2.variant.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 239K Oct 31 21:16
>>>> run2025.ZmChr2.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 732G Nov 10 19:34
>>>> run2025.ZmChr3.all.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 233K Nov 10 19:34
>>>> run2025.ZmChr3.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 401G Nov  9 22:45
>>>> run2025.ZmChr3.variant.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 233K Nov  9 22:45
>>>> run2025.ZmChr3.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 792G Nov 11 06:22
>>>> run2025.ZmChr4.all.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 245K Nov 11 06:22
>>>> run2025.ZmChr4.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 437G Oct 31 21:33
>>>> run2025.ZmChr4.variant.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 245K Oct 31 21:33
>>>> run2025.ZmChr4.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 6.9T Oct 31 19:36
>>>> run2025.ZmChr5.all.vcf
>>>> -rw-------+ 1 <user> <group> 295G Nov 19 09:15
>>>> run2025.ZmChr5.all.vcf.gz
>>>> -rw-rw----+ 1 <user> <group> 222K Oct 31 19:36
>>>> run2025.ZmChr5.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 3.0T Oct 31 22:47
>>>> run2025.ZmChr5.variant.vcf
>>>> -rw-rw----+ 1 <user> <group> 222K Oct 31 22:47
>>>> run2025.ZmChr5.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 5.5T Nov  9 16:19
>>>> run2025.ZmChr6.all.vcf
>>>> -rw-rw----+ 1 <user> <group> 178K Nov  9 16:19
>>>> run2025.ZmChr6.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 2.3T Oct 30 15:51
>>>> run2025.ZmChr6.variant.vcf
>>>> -rw-rw----+ 1 <user> <group> 178K Oct 30 15:51
>>>> run2025.ZmChr6.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 5.7T Nov  9 10:58
>>>> run2025.ZmChr7.all.vcf
>>>> -rw-rw----+ 1 <user> <group> 182K Nov  9 10:58
>>>> run2025.ZmChr7.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 2.5T Oct 31 10:04
>>>> run2025.ZmChr7.variant.vcf
>>>> -rw-rw----+ 1 <user> <group> 182K Oct 31 10:04
>>>> run2025.ZmChr7.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 5.6T Oct 29 19:34
>>>> run2025.ZmChr8.all.vcf
>>>> -rw-rw----+ 1 <user> <group> 179K Oct 29 19:34
>>>> run2025.ZmChr8.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 2.4T Oct 29 16:11
>>>> run2025.ZmChr8.variant.vcf
>>>> -rw-rw----+ 1 <user> <group> 179K Oct 29 16:11
>>>> run2025.ZmChr8.variant.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 5.0T Oct 30 00:25
>>>> run2025.ZmChr9.all.vcf
>>>> -rw-rw----+ 1 <user> <group> 160K Oct 30 00:25
>>>> run2025.ZmChr9.all.vcf.idx
>>>> -rw-rw----+ 1 <user> <group> 2.2T Oct 29 18:05
>>>> run2025.ZmChr9.variant.vcf
>>>> -rw-rw----+ 1 <user> <group> 160K Oct 29 18:05
>>>> run2025.ZmChr9.variant.vcf.idx
>>>>
>>>> what does more reflect reality is the result of du with --
>>>> apparent-size added:
>>>> # du -sh --apparent-size .
>>>> 47T     .
>>>>
>>>> Looking at an individual file we see similar discrepancies:
>>>> # du -sh run2025.ZmChr5.all.vcf
>>>> 47T     run2025.ZmChr5.all.vcf
>>>> # du -sh --apparent-size run2025.ZmChr5.all.vcf
>>>> 6.9T    run2025.ZmChr5.all.vcf
>>>> # ls -lah run2025.ZmChr5.all.vcf
>>>> -rw-rw----+ 1 <user> <group> 6.9T Oct 31 19:36
>>>> run2025.ZmChr5.all.vcf
>>>>
>>>> I'm used to this being the reverse, with --apparent-size showing
>>>> the 'logical usage' of a file before any transparent compression
>>>> that might be in place (we use ZFS a lot).
>>>>
>>>> This appears to be being caused by the PFL striping to two osts,
>>>> as I don't see a discrepancy on files with a single stripe.
>>>> Here's one of the files that have been compressed (and are under
>>>> 1T so not striped):
>>>> # du -h --apparent-size run2025.ZmChr10.all.vcf.gz
>>>> 470G    run2025.ZmChr10.all.vcf.gz
>>>> # du -h  run2025.ZmChr10.all.vcf.gz
>>>> 470G    run2025.ZmChr10.all.vcf.gz
>>>>
>>>> I had another volume I copied this file to with a different
>>>> default striping rule and you can see the discrepancy from
>>>> striping:
>>>> # du -h run2025.ZmChr10.variant.vcf.gz
>>>> 418G    run2025.ZmChr10.variant.vcf.gz
>>>> # du -h --apparent-size run2025.ZmChr10.variant.vcf.gz
>>>> 259G    run2025.ZmChr10.variant.vcf.gz
>>>>
>>>> The same file in the same volume but with the striping set to
>>>> only one stripe:
>>>> # du -h run2025.ZmChr10.variant.vcf.gz.copy
>>>> 259G    run2025.ZmChr10.variant.vcf.gz.copy
>>>> # du -h --apparent-size run2025.ZmChr10.variant.vcf.gz.copy
>>>> 259G    run2025.ZmChr10.variant.vcf.gz.copy
>>>>
>>>> getstripe for the first large striped file:
>>>> # lfs getstripe -y run2025.ZmChr5.all.vcf
>>>>     lcm_layout_gen:    3
>>>>     lcm_mirror_count:  1
>>>>     lcm_entry_count:   2
>>>>     component0:
>>>>       lcme_id:             1
>>>>       lcme_mirror_id:      0
>>>>       lcme_flags:          init
>>>>       lcme_extent.e_start: 0
>>>>       lcme_extent.e_end:   1099511627776
>>>>       sub_layout:
>>>>         lmm_stripe_count:  1
>>>>         lmm_stripe_size:   1048576
>>>>         lmm_pattern:       raid0
>>>>         lmm_layout_gen:    0
>>>>         lmm_stripe_offset: 1
>>>>         lmm_objects:
>>>>         - l_ost_idx: 1
>>>>           l_fid:     0x100010000:0x1869cb06:0x0
>>>>
>>>>     component1:
>>>>       lcme_id:             2
>>>>       lcme_mirror_id:      0
>>>>       lcme_flags:          init
>>>>       lcme_extent.e_start: 1099511627776
>>>>       lcme_extent.e_end:   EOF
>>>>       sub_layout:
>>>>         lmm_stripe_count:  2
>>>>         lmm_stripe_size:   1048576
>>>>         lmm_pattern:       raid0
>>>>         lmm_layout_gen:    65535
>>>>         lmm_stripe_offset: 0
>>>>         lmm_objects:
>>>>         - l_ost_idx: 0
>>>>           l_fid:     0x100000000:0x18ad637a:0x0
>>>>         - l_ost_idx: 1
>>>>           l_fid:     0x100010000:0x18717a96:0x0
>>>>
>>>> getstripe for the second single striped file:
>>>> # lfs getstripe -y run2025.ZmChr10.all.vcf.gz
>>>>     lcm_layout_gen:    2
>>>>     lcm_mirror_count:  1
>>>>     lcm_entry_count:   2
>>>>     component0:
>>>>       lcme_id:             1
>>>>       lcme_mirror_id:      0
>>>>       lcme_flags:          init
>>>>       lcme_extent.e_start: 0
>>>>       lcme_extent.e_end:   1099511627776
>>>>       sub_layout:
>>>>         lmm_stripe_count:  1
>>>>         lmm_stripe_size:   1048576
>>>>         lmm_pattern:       raid0
>>>>         lmm_layout_gen:    0
>>>>         lmm_stripe_offset: 1
>>>>         lmm_objects:
>>>>         - l_ost_idx: 1
>>>>           l_fid:     0x100010000:0x18d70b19:0x0
>>>>
>>>>     component1:
>>>>       lcme_id:             2
>>>>       lcme_mirror_id:      0
>>>>       lcme_flags:          0
>>>>       lcme_extent.e_start: 1099511627776
>>>>       lcme_extent.e_end:   EOF
>>>>       sub_layout:
>>>>         lmm_stripe_count:  2
>>>>         lmm_stripe_size:   1048576
>>>>         lmm_pattern:       raid0
>>>>         lmm_layout_gen:    0
>>>>         lmm_stripe_offset: -1
>>>>
>>>> getstripe of the smaller file copied to another volume with a
>>>> smaller striping threshold:
>>>> # lfs getstripe -y run2025.ZmChr10.variant.vcf.gz
>>>>     lcm_layout_gen:    3
>>>>     lcm_mirror_count:  1
>>>>     lcm_entry_count:   2
>>>>     component0:
>>>>       lcme_id:             1
>>>>       lcme_mirror_id:      0
>>>>       lcme_flags:          init
>>>>       lcme_extent.e_start: 0
>>>>       lcme_extent.e_end:   107374182400
>>>>       sub_layout:
>>>>         lmm_stripe_count:  1
>>>>         lmm_stripe_size:   1048576
>>>>         lmm_pattern:       raid0
>>>>         lmm_layout_gen:    0
>>>>         lmm_stripe_offset: 1
>>>>         lmm_objects:
>>>>         - l_ost_idx: 1
>>>>           l_fid:     0x100010000:0x11ae544:0x0
>>>>
>>>>     component1:
>>>>       lcme_id:             2
>>>>       lcme_mirror_id:      0
>>>>       lcme_flags:          init
>>>>       lcme_extent.e_start: 107374182400
>>>>       lcme_extent.e_end:   EOF
>>>>       sub_layout:
>>>>         lmm_stripe_count:  2
>>>>         lmm_stripe_size:   1048576
>>>>         lmm_pattern:       raid0
>>>>         lmm_layout_gen:    0
>>>>         lmm_stripe_offset: 0
>>>>         lmm_objects:
>>>>         - l_ost_idx: 0
>>>>           l_fid:     0x100000000:0x1689eaa:0x0
>>>>         - l_ost_idx: 1
>>>>           l_fid:     0x100010000:0x11ae545:0x0
>>>>
>>>> getstripe of the same file copied with the striping disabled:
>>>>
>>>> # lfs getstripe -y run2025.ZmChr10.variant.vcf.gz.copy
>>>> lmm_stripe_count:  1
>>>> lmm_stripe_size:   1048576
>>>> lmm_pattern:       raid0
>>>> lmm_layout_gen:    0
>>>> lmm_stripe_offset: 0
>>>> lmm_objects:
>>>>         - l_ost_idx: 0
>>>>           l_fid:     0x100000000:0x1689eab:0x0
>>>>
>>>> Is this behavior expected or is something strange going on?
>>>>
>>>> Shane
>>>> _______________________________________________
>>>> lustre-discuss mailing list
>>>> lustre-discuss at lists.lustre.org
>>>> https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!RDImgn2BJkk2oqRGAmMCO07TJNyKH7ddP8H_obZ5_WKpVuS9v5UZK3H_URS94t4BwEA9fvLlj1F2rojjojBASBD3hvWS91s$
>>>>   
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260109/ef7981bb/attachment-0001.htm>


More information about the lustre-discuss mailing list