[lustre-discuss] fiemap, final chapter.

John Bauer bauerj at iodoctors.com
Fri Aug 19 12:58:02 PDT 2022


Andreas,

As I mentioned in an earlier email, this had been working for a long 
time.  I think that using an old header file is at the root of the 
issue.  On my development platform, which doesn't have Lustre installed, 
nor did I have e2fsprogs installed, I had simply copied the Lustre files 
I needed from the site I was working with.  The fiemap.h file I was 
using , the top of which is shown below ( I see you mentioned ) has 
fe_device explicitly in the structure.  Was it this way before the 
#define fe_device was implemented?  The #define was using the 
fe_reserved[0], which always had a 0 value.  What puzzles me is why this 
ever worked at all.  That will have to wait for a rainy day to mess 
with.  What started me down this path at this time was getting my lustre 
extents plotting program working with PFL.

Again, thanks much for your excellent/quick assistance in tracking this 
down.

John

/*
* FS_IOC_FIEMAP ioctl infrastructure.
*
* Some portions copyright (C) 2007 Cluster File Systems, Inc
*
* Authors: Mark Fasheh <mfasheh at suse.com>
* Kalpak Shah <kalpak.shah at sun.com>
* Andreas Dilger <adilger at sun.com>
*/

#ifndef _LINUX_FIEMAP_H
#define _LINUX_FIEMAP_H

struct  fiemap_extent {
         __u64 fe_logical;/* logical offset in bytes for the start of
* the extent from the beginning of the file */
         __u64 fe_physical;/* physical offset in bytes for the start
* of the extent from the beginning of the disk */
         __u64 fe_length;/* length in bytes for this extent */
         __u64 fe_reserved64[2];
         __u32 fe_flags;/* FIEMAP_EXTENT_* flags for this extent */
         __u32 fe_device;/* device number (fs-specific if FIEMAP_EXTENT_NET)*/
         __u32 fe_reserved[2];
};

On 8/18/22 23:44, Andreas Dilger wrote:
> The "fe_device" field is actually Lustre-specific, so it is a macro 
> that overlays on fe_reserved[0]:
>
>  #define fe_device       fe_reserved[0]
>
> but that shouldn't affect compiler alignment.  On my system, "pahole 
> lustre/llite/lustre.ko" reports:
>
> struct fiemap_extent {
>         __u64                      fe_logical;           /*   0     8 */
>         __u64                      fe_physical;          /*   8     8 */
>         __u64                      fe_length;            /*   16     8 */
>         __u64                      fe_reserved64[2];     /*   24    16 */
>         __u32                      fe_flags;             /*   40     4 */
>         __u32                      fe_reserved[3];       /*   44    12 */
>
>         /* size: 56, cachelines: 1, members: 6 */
>         /* last cacheline: 56 bytes */
> };
>
> So there is definitely something going wrong with the struct alignment 
> for fe_reserved, even though there doesn't need to be (all of the 
> fields have "natural" alignment on their 4/8-byte sizes.
>
> The other thing that is strange is that you show only 2 fe_reserved[] 
> fields, when I have 3.  Is there some other field added to your 
> version of struct fiemap_extent after fe_flags?  I don't see anything 
> in the upstream kernel, nor in the Lustre headers.
>
> You could try adding "__attribute__((packed))" at the end of the 
> struct definition to see if that fixes the problem.
>
> Cheers, Andreas
>
>> On Aug 18, 2022, at 21:54, John Bauer <bauerj at iodoctors.com> wrote:
>>
>> Andreas,
>>
>> This is no longer Lustre related, but I hope you can shed some light 
>> on this.  It appears that my compilier, gcc 8.5.0, which I upgraded 
>> to recently when I upgraded my build system to Centos 8, is not 
>> padding the struct fiemap_extent correctly.  I put the following 
>> prints in to see whats going on.  The sizeof the structure is good at 
>> 56, but notice that both fe_device and fe_reserved[0] have an offset 
>> of 48 bytes into the structure.  Odd that the sizeof fe_flags is 4, 
>> but fe_device is 8 bytes away from it. I traced the compile to ensure 
>> that I am getting the lustre_include/ext2fs/fiemap.h and there is 
>> nothing odd in the fiemap.h ( it's the one I've been using for years 
>> ).  Any thoughts on how to remedy this?
>>
>> John
>>
>> fprintf(stderr,"%s() logical  %d %ld\n", __func__, sizeof(fm_ext->fe_logical ), (char *)&fm_ext->fe_logical  -(char *)fm_ext);
>> fprintf(stderr,"%s() physical %d %ld\n", __func__, sizeof(fm_ext->fe_physical), (char *)&fm_ext->fe_physical -(char *)fm_ext);
>> fprintf(stderr,"%s() length   %d %ld\n", __func__, sizeof(fm_ext->fe_length  ), (char *)&fm_ext->fe_length -  (char *)fm_ext);
>> fprintf(stderr,"%s() res64[0] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved64[0]), (char *)&fm_ext->fe_reserved64[0] -  (char *)fm_ext);
>> fprintf(stderr,"%s() res64[1] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved64[1]), (char *)&fm_ext->fe_reserved64[1] -  (char *)fm_ext);
>> fprintf(stderr,"%s() flags    %d %ld\n", __func__, sizeof(fm_ext->fe_flags   ), (char *)&fm_ext->fe_flags    -(char *)fm_ext);
>> fprintf(stderr,"%s() device   %d %ld\n", __func__, sizeof(fm_ext->fe_device  ), (char *)&fm_ext->fe_device   -(char *)fm_ext);
>> fprintf(stderr,"%s() res32[0] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved[0]), (char *)&fm_ext->fe_reserved[0] -  (char *)fm_ext);
>> fprintf(stderr,"%s() res32[1] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved[1]), (char *)&fm_ext->fe_reserved[1] -  (char *)fm_ext);
>> StripeChunks_get() fm_ext->fe_device=0 fe_logical=0 sizeof(struct fiemap_extent)56
>> StripeChunks_get() logical  8 0
>> StripeChunks_get() physical 8 8
>> StripeChunks_get() length   8 16
>> StripeChunks_get() res64[0] 8 24
>> StripeChunks_get() res64[1] 8 32
>> StripeChunks_get() flags    4 40
>> StripeChunks_get() device   4 48
>> StripeChunks_get() res32[0] 4 48
>> StripeChunks_get() res32[1] 4 52
>>
>>
>>
>> On 8/18/22 16:11, Andreas Dilger wrote:
>>> On Aug 18, 2022, at 14:28, John Bauer <bauerj at iodoctors.com> wrote:
>>>>
>>>> Andreas,
>>>>
>>>> Thanks for the reply.  I don't think I'm accessing the Lustre 
>>>> filefrag ( see below ).  Where would I normally find that 
>>>> installed? I downloaded the lustre-release git repository and can't 
>>>> find filefrag stuff to build my own.  Is that somewhere else?
>>>>
>>> filefrag is part of the e2fsprogs package ("rpm -qf $(which 
>>> filefrag)"), so you need to download and install the Lustre-patched 
>>> e2fsprogs from 
>>> _https://downloads.whamcloud.com/public/e2fsprogs/latest/_
>>>
>>>> More info:
>>>>
>>>> pfe27.jbauer2 334> cat /sys/fs/lustre/version
>>>> 2.12.8_ddn12
>>>
>>> You should really use "lctl get_param version", since the Lustre 
>>> /proc and /sys files move around on occasion.
>>>
>>> The PFL/FLR change for FIEMAP is not included in this version, but 
>>> it _should_ be irrelevant because the file you are testing is using 
>>> a plain layout, not PFL or FLR.
>>>> pfe27.jbauer2 335> filefrag -v /nobackupp17/jbauer2/dd.dat
>>>> Filesystem type is: bd00bd0
>>>> File size of /nobackupp17/jbauer2/dd.dat is 104857600 (25600 blocks of 4096 bytes)
>>>> /nobackupp17/jbauer2/dd.dat: FIBMAP unsupported
>>>>
>>>> pfe27.jbauer2 336> which filefrag
>>>> /usr/sbin/filefrag
>>>>
>>>>
>>>> John
>>>>
>>>> On 8/18/22 14:57, Andreas Dilger wrote:
>>>>> What version of Lustre are you using?  Does "filefrag -v" from a 
>>>>> newer Lustre e2fsprogs (1.45.6.wc3+) work properly?
>>>>>
>>>>> There was a small change to the Lustre FIEMAP handling in order to 
>>>>> handle overstriped files and PFL/FLR files with many stripes and 
>>>>> multiple components, since the FIEMAP "restart" mechanism was 
>>>>> broken for files that had multiple objects on the same OST index. 
>>>>>  See LU-11484 for details.  That change was included in the 2.14.0 
>>>>> release.
>>>>>
>>>>> In essence, the fe_device field now encodes the absolute file 
>>>>> stripe number in the high 16 bits of fe_device, and the device 
>>>>> number in the low 16 bits (as it did before).   Since "filefrag 
>>>>> -v" prints fe_device in hex and would show as "0x<stripe><device>" 
>>>>> instead of "0x0000<device>", this was considered an acceptable 
>>>>> tradeoff compared to other "less compatible" changes that would 
>>>>> have been needed to implement PFL/FLR handling.
>>>>>
>>>>> That said, I would have expected this change to result in your 
>>>>> tool reporting very large values for fe_device (e.g. OST index + N 
>>>>> * 65536), so returning all-zero values is somewhat unexpected.
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>>> On Aug 18, 2022, at 06:27, John Bauer <bauerj at iodoctors.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to get my llfie program (which uses fiemap) going 
>>>>>> again, but now the struct fiemap_extent structures I get back 
>>>>>> from the ioctl call, all have fe_device=0.  The output from lfs 
>>>>>> getstripe indicates that the devices are not all 0.  The sum of 
>>>>>> the fe_length members adds up to the file size, so that is 
>>>>>> working.  The fe_physical members look reasonable also.  Has 
>>>>>> something changed?  This used to work.
>>>>>>
>>>>>> Thanks, John
>>>>>>
>>>>>> pfe27.jbauer2 300> llfie /nobackupp17/jbauer2/dd.dat
>>>>>> LustreStripeInfo_get() lum->lmm_magic=0xbd30bd0
>>>>>> listExtents() fe_physical=30643484360704 fe_device=0 
>>>>>> fe_length=16777216
>>>>>> listExtents() fe_physical=30646084829184 fe_device=0 
>>>>>> fe_length=2097152
>>>>>> listExtents() fe_physical=5705226518528 fe_device=0 
>>>>>> fe_length=16777216
>>>>>> listExtents() fe_physical=5710209351680 fe_device=0 fe_length=2097152
>>>>>> listExtents() fe_physical=30621271326720 fe_device=0 
>>>>>> fe_length=16777216
>>>>>> listExtents() fe_physical=31761568366592 fe_device=0 
>>>>>> fe_length=16777216
>>>>>> listExtents() fe_physical=24757567225856 fe_device=0 
>>>>>> fe_length=16777216
>>>>>> listExtents() fe_physical=14196460748800 fe_device=0 
>>>>>> fe_length=16777216
>>>>>> listExtents() nMapped=8 byteCount=104857600
>>>>>>
>>>>>>
>>>>>> pfe27.jbauer2 301> lfs getstripe /nobackupp17/jbauer2/dd.dat
>>>>>> /nobackupp17/jbauer2/dd.dat
>>>>>> lmm_stripe_count:  6
>>>>>> lmm_stripe_size:   2097152
>>>>>> lmm_pattern:       raid0
>>>>>> lmm_layout_gen:    0
>>>>>> lmm_stripe_offset: 126
>>>>>> lmm_pool:          ssd-pool
>>>>>> obdidxobjidobjidgroup
>>>>>>   126     13930025    0xd48e29            0
>>>>>>   113     13115889    0xc821f1            0
>>>>>>   120     14003176    0xd5abe8            0
>>>>>>   109     12785483    0xc3174b            0
>>>>>>   102     13811117    0xd2bdad            0
>>>>>>   116     13377285    0xcc1f05            0
>>>>>>
>>>>>> _______________________________________________
>>>>>> lustre-discuss mailing list
>>>>>> lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>>
>>>>> Cheers, Andreas
>>>>> --
>>>>> Andreas Dilger
>>>>> Lustre Principal Architect
>>>>> Whamcloud
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220819/c1817b79/attachment-0001.htm>


More information about the lustre-discuss mailing list