[lustre-discuss] fiemap, final chapter.
Andreas Dilger
adilger at whamcloud.com
Fri Aug 19 13:28:10 PDT 2022
On Aug 19, 2022, at 13:58, John Bauer <bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>> wrote:
Andreas,
As I mentioned in an earlier email, this had been working for a long time. I think that using an old header file is at the root of the issue. On my development platform, which doesn't have Lustre installed, nor did I have e2fsprogs installed, I had simply copied the Lustre files I needed from the site I was working with. The fiemap.h file I was using , the top of which is shown below ( I see you mentioned ) has fe_device explicitly in the structure. Was it this way before the #define fe_device was implemented?
Yes, we used to patch the fiemap.h header to add fe_device ourselves, but changed to the #define mechanism to reduce the changes to the core kernel.
Cheers, Andreas
The #define was using the fe_reserved[0], which always had a 0 value. What puzzles me is why this ever worked at all. That will have to wait for a rainy day to mess with. What started me down this path at this time was getting my lustre extents plotting program working with PFL.
Again, thanks much for your excellent/quick assistance in tracking this down.
John
/*
* FS_IOC_FIEMAP ioctl infrastructure.
*
* Some portions copyright (C) 2007 Cluster File Systems, Inc
*
* Authors: Mark Fasheh <mfasheh at suse.com><mailto:mfasheh at suse.com>
* Kalpak Shah <kalpak.shah at sun.com><mailto:kalpak.shah at sun.com>
* Andreas Dilger <adilger at sun.com><mailto:adilger at sun.com>
*/
#ifndef _LINUX_FIEMAP_H
#define _LINUX_FIEMAP_H
struct fiemap_extent {
__u64 fe_logical; /* logical offset in bytes for the start of
* the extent from the beginning of the file */
__u64 fe_physical; /* physical offset in bytes for the start
* of the extent from the beginning of the disk */
__u64 fe_length; /* length in bytes for this extent */
__u64 fe_reserved64[2];
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
__u32 fe_device; /* device number (fs-specific if FIEMAP_EXTENT_NET)*/
__u32 fe_reserved[2];
};
On 8/18/22 23:44, Andreas Dilger wrote:
The "fe_device" field is actually Lustre-specific, so it is a macro that overlays on fe_reserved[0]:
#define fe_device fe_reserved[0]
but that shouldn't affect compiler alignment. On my system, "pahole lustre/llite/lustre.ko" reports:
struct fiemap_extent {
__u64 fe_logical; /* 0 8 */
__u64 fe_physical; /* 8 8 */
__u64 fe_length; /* 16 8 */
__u64 fe_reserved64[2]; /* 24 16 */
__u32 fe_flags; /* 40 4 */
__u32 fe_reserved[3]; /* 44 12 */
/* size: 56, cachelines: 1, members: 6 */
/* last cacheline: 56 bytes */
};
So there is definitely something going wrong with the struct alignment for fe_reserved, even though there doesn't need to be (all of the fields have "natural" alignment on their 4/8-byte sizes.
The other thing that is strange is that you show only 2 fe_reserved[] fields, when I have 3. Is there some other field added to your version of struct fiemap_extent after fe_flags? I don't see anything in the upstream kernel, nor in the Lustre headers.
You could try adding "__attribute__((packed))" at the end of the struct definition to see if that fixes the problem.
Cheers, Andreas
On Aug 18, 2022, at 21:54, John Bauer <bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>> wrote:
Andreas,
This is no longer Lustre related, but I hope you can shed some light on this. It appears that my compilier, gcc 8.5.0, which I upgraded to recently when I upgraded my build system to Centos 8, is not padding the struct fiemap_extent correctly. I put the following prints in to see whats going on. The sizeof the structure is good at 56, but notice that both fe_device and fe_reserved[0] have an offset of 48 bytes into the structure. Odd that the sizeof fe_flags is 4, but fe_device is 8 bytes away from it. I traced the compile to ensure that I am getting the lustre_include/ext2fs/fiemap.h and there is nothing odd in the fiemap.h ( it's the one I've been using for years ). Any thoughts on how to remedy this?
John
fprintf(stderr,"%s() logical %d %ld\n", __func__, sizeof(fm_ext->fe_logical ), (char *)&fm_ext->fe_logical -(char *)fm_ext);
fprintf(stderr,"%s() physical %d %ld\n", __func__, sizeof(fm_ext->fe_physical), (char *)&fm_ext->fe_physical -(char *)fm_ext);
fprintf(stderr,"%s() length %d %ld\n", __func__, sizeof(fm_ext->fe_length ), (char *)&fm_ext->fe_length - (char *)fm_ext);
fprintf(stderr,"%s() res64[0] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved64[0]), (char *)&fm_ext->fe_reserved64[0] - (char *)fm_ext);
fprintf(stderr,"%s() res64[1] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved64[1]), (char *)&fm_ext->fe_reserved64[1] - (char *)fm_ext);
fprintf(stderr,"%s() flags %d %ld\n", __func__, sizeof(fm_ext->fe_flags ), (char *)&fm_ext->fe_flags -(char *)fm_ext);
fprintf(stderr,"%s() device %d %ld\n", __func__, sizeof(fm_ext->fe_device ), (char *)&fm_ext->fe_device -(char *)fm_ext);
fprintf(stderr,"%s() res32[0] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved[0]), (char *)&fm_ext->fe_reserved[0] - (char *)fm_ext);
fprintf(stderr,"%s() res32[1] %d %ld\n", __func__, sizeof(fm_ext->fe_reserved[1]), (char *)&fm_ext->fe_reserved[1] - (char *)fm_ext);
StripeChunks_get() fm_ext->fe_device=0 fe_logical=0 sizeof(struct fiemap_extent)56
StripeChunks_get() logical 8 0
StripeChunks_get() physical 8 8
StripeChunks_get() length 8 16
StripeChunks_get() res64[0] 8 24
StripeChunks_get() res64[1] 8 32
StripeChunks_get() flags 4 40
StripeChunks_get() device 4 48
StripeChunks_get() res32[0] 4 48
StripeChunks_get() res32[1] 4 52
On 8/18/22 16:11, Andreas Dilger wrote:
On Aug 18, 2022, at 14:28, John Bauer <bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>> wrote:
Andreas,
Thanks for the reply. I don't think I'm accessing the Lustre filefrag ( see below ). Where would I normally find that installed? I downloaded the lustre-release git repository and can't find filefrag stuff to build my own. Is that somewhere else?
filefrag is part of the e2fsprogs package ("rpm -qf $(which filefrag)"), so you need to download and install the Lustre-patched e2fsprogs from https://downloads.whamcloud.com/public/e2fsprogs/latest/
More info:
pfe27.jbauer2 334> cat /sys/fs/lustre/version
2.12.8_ddn12
You should really use "lctl get_param version", since the Lustre /proc and /sys files move around on occasion.
The PFL/FLR change for FIEMAP is not included in this version, but it _should_ be irrelevant because the file you are testing is using a plain layout, not PFL or FLR.
pfe27.jbauer2 335> filefrag -v /nobackupp17/jbauer2/dd.dat
Filesystem type is: bd00bd0
File size of /nobackupp17/jbauer2/dd.dat is 104857600 (25600 blocks of 4096 bytes)
/nobackupp17/jbauer2/dd.dat: FIBMAP unsupported
pfe27.jbauer2 336> which filefrag
/usr/sbin/filefrag
John
On 8/18/22 14:57, Andreas Dilger wrote:
What version of Lustre are you using? Does "filefrag -v" from a newer Lustre e2fsprogs (1.45.6.wc3+) work properly?
There was a small change to the Lustre FIEMAP handling in order to handle overstriped files and PFL/FLR files with many stripes and multiple components, since the FIEMAP "restart" mechanism was broken for files that had multiple objects on the same OST index. See LU-11484 for details. That change was included in the 2.14.0 release.
In essence, the fe_device field now encodes the absolute file stripe number in the high 16 bits of fe_device, and the device number in the low 16 bits (as it did before). Since "filefrag -v" prints fe_device in hex and would show as "0x<stripe><device>" instead of "0x0000<device>", this was considered an acceptable tradeoff compared to other "less compatible" changes that would have been needed to implement PFL/FLR handling.
That said, I would have expected this change to result in your tool reporting very large values for fe_device (e.g. OST index + N * 65536), so returning all-zero values is somewhat unexpected.
Cheers, Andreas
On Aug 18, 2022, at 06:27, John Bauer <bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>> wrote:
Hi all,
I am trying to get my llfie program (which uses fiemap) going again, but now the struct fiemap_extent structures I get back from the ioctl call, all have fe_device=0. The output from lfs getstripe indicates that the devices are not all 0. The sum of the fe_length members adds up to the file size, so that is working. The fe_physical members look reasonable also. Has something changed? This used to work.
Thanks, John
pfe27.jbauer2 300> llfie /nobackupp17/jbauer2/dd.dat
LustreStripeInfo_get() lum->lmm_magic=0xbd30bd0
listExtents() fe_physical=30643484360704 fe_device=0 fe_length=16777216
listExtents() fe_physical=30646084829184 fe_device=0 fe_length=2097152
listExtents() fe_physical=5705226518528 fe_device=0 fe_length=16777216
listExtents() fe_physical=5710209351680 fe_device=0 fe_length=2097152
listExtents() fe_physical=30621271326720 fe_device=0 fe_length=16777216
listExtents() fe_physical=31761568366592 fe_device=0 fe_length=16777216
listExtents() fe_physical=24757567225856 fe_device=0 fe_length=16777216
listExtents() fe_physical=14196460748800 fe_device=0 fe_length=16777216
listExtents() nMapped=8 byteCount=104857600
pfe27.jbauer2 301> lfs getstripe /nobackupp17/jbauer2/dd.dat
/nobackupp17/jbauer2/dd.dat
lmm_stripe_count: 6
lmm_stripe_size: 2097152
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 126
lmm_pool: ssd-pool
obdidx objid objid group
126 13930025 0xd48e29 0
113 13115889 0xc821f1 0
120 14003176 0xd5abe8 0
109 12785483 0xc3174b 0
102 13811117 0xd2bdad 0
116 13377285 0xcc1f05 0
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220819/e047de7d/attachment-0001.htm>
More information about the lustre-discuss
mailing list