[lustre-discuss] Lustre 2.12 chokes on files created by 2.4/2.5?

Franke, Knut knut.franke at atos.net
Mon Apr 27 01:23:11 PDT 2020


Hello list,

after upgrading a filesystem (with ZFS backend) from 2.10 to 2.12, I've
started to see errors stat()ing some of the older files (lstat() hangs;
after disabling auto-scrub on the OSSes, it returns with -1 EREMCHG),
and I'm wondering why nobody else seems to be having this issue. Errors
like the following appear on OST1 (but not OST0):

00080000:00020000:4.0:1585213596.566408:0:36486:0:(osd_object.c:481:osd
_check_lma()) fs-OST0001: FID-in-LMA [0x100000000:0x145:0x0] does not
match the object self-fid [0x100010000:0x145:0x0]

Has anyone else been seeing these?

The check that's failing here has been added in commit 89ead21 (LU-
7585 
zfs: OI scrub for ZFS), which is included since Lustre 2.11, so I'm
guessing that the inconsistency has simply been ignored by Lustre 2.10.

In lustre/osp/osp_internal.h I found the following comment:

> In 2.6+ ost_idx is packed into IDIF FID, while in 2.4 and 2.5 IDIF is
> always FID_SEQ_IDIF(0x100000000ULL), which does not include OST index
> in the seq.

Looking at the inaccessible files (and the OSS logs), it seems that the
issue can be traced to lookup failures of objects on OST 1 with FID-in-
LMA sequence number 0x100000000 (i.e. written by Lustre 2.4/2.5
according to the above, which is a reasonable assumption for the
filesystem and files in question), where Lustre erroneously adds the
OST index to the self-fid (or erroneously compares the new-style self-
fid to the old-style FID-in-LMA). If this is true, this error should
occur for basically all files written by Lustre 2.4/2.5 (except if they
have a stripe count of 1 and only reside on OST 0).

Does this make sense? Should osd_check_lma simply be more lenient in
its check in order to allow for old-style FIDs? Should I manually
change the affected trusted.lma EAs on OST1 to include the OST index in
the sequence number? Or would either of these cause issues in other
places?

More details at https://jira.whamcloud.com/projects/LU/issues/LU-13392.


Kind regards,
Knut Franke
-- 
Knut Franke
Systems Engineer
science + computing ag
Teamline: +49 7071 94 57 680
Hagellocher Weg 73
D-72070 Tübingen
Website: https://www.atos.net/


More information about the lustre-discuss mailing list