[lustre-devel] LDISKFS-fs error: osd_iget: special inode unallocated, Remounting filesystem read-only

Cyrus Ramavarapu cramavarapu at microsoft.com
Tue Nov 28 11:47:32 PST 2023


Hello,
 
I have recently started seeing sanity-lfsck failures in tests 18g, 23b, and 23c on Ubuntu 20.04 5.15.0-1051-azure due to the MDT filesystem going readonly preventing either the start of LFSCK or LFSCK operations. In all cases logs on the MDS show the following:
 
Nov 20 20:06:59 e72f0907-59ba-4ffd-9528-2e3ad47050e4-mdsmgs-a0-vm kernel: LDISKFS-fs error (device dm-0): osd_iget:500: inode #195: comm mdt03_003: iget: special inode unallocated Nov 20 20:06:59 e72f0907-59ba-4ffd-9528-2e3ad47050e4-mdsmgs-a0-vm kernel: Aborting journal on device dm-0-8.
Nov 20 20:06:59 e72f0907-59ba-4ffd-9528-2e3ad47050e4-mdsmgs-a0-vm kernel: LustreError: 29024:0:(osd_handler.c:1787:osd_trans_commit_cb()) transaction @0x00000000a2d278af commit error: 2 Nov 20 20:06:59 e72f0907-59ba-4ffd-9528-2e3ad47050e4-mdsmgs-a0-vm kernel: LDISKFS-fs (dm-0): Remounting filesystem read-only
 
LFSCK operations if they start will fail with error code 117 (EFSCORRUPTED):
 
00000020:00000001:8.0:1700166322.660540:0:43212:0:(lu_object.c:908:lu_object_find_at()) Process leaving (rc=18446744073709551499 : -117 : ffffffffffffff8b)
00100000:00000001:8.0:1700166322.660541:0:43212:0:(lfsck_layout.c:3241:lfsck_layout_scan_orphan_one()) Process leaving via out (rc=18446744073709551499 : -117 : 0xffffffffffffff8b)
 
In both cases, the error comes from an ldiskfs_iget operation which passes the LDISKFS_IGET_SPECIAL flag to __ext4_iget. A recent ext4 patch started checking for this flag and will return EFSCORRUPTED if the inode is unallocated (https://lkml.kernel.org/stable/20230320145452.175177331@linuxfoundation.org/ ).
 
Adding LDISKFS_IGET_SPECIAL always to ldiskfs_iget was done as part of LU-13166 (https://review.whamcloud.com/c/fs/lustre-release/+/37421 ) and feels broad to me in the context of the upstream ext4 change. At the moment I am investigating removing the LDISKFS_IGET_SPECIAL flag from ldiskfs_iget to see how it impacts the LFSCK tests and to determine if a more targeted change can be made to satisfy the intent of LU-13166.
 
Any suggestions or thoughts on how to approach this problem would be greatly appreciated. Additional logs or debugging information can be provided if needed.
 
Thank you and best,
Cyrus Ramavarapu


More information about the lustre-devel mailing list