[lustre-discuss] MDT corruption
Alastair Basden
a.g.basden at durham.ac.uk
Mon Jan 26 09:50:54 PST 2026
Hi all,
We are wondering whether anyone can shed some light for us.
A MDT raid controller failed, and the drbd replica seems to be corrupted,
since we can't mount the MDT on another node (where it should have been
replicated to).
We are using Lustre 2.12.6.
Errors are (when trying to mount):
LDISKFS-fs (drbd3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
LustreError: 114156:0:(osd_iam.c:182:iam_load_idle_blocks()) drbd3: cannot load idle blocks, blk = 1244, err = -5
LustreError: 114156:0:(osd_oi.c:324:osd_oi_table_open()) drbd3: can't open oi.16.6: rc = -5
LustreError: 114156:0:(osd_oi.c:327:osd_oi_table_open()) drbd3: expect to open total 64 OI files.
LustreError: 114156:0:(obd_config.c:559:class_setup()) setup cos8-MDT0003-osd failed (-5)
LustreError: 114156:0:(obd_mount.c:202:lustre_start_simple()) cos8-MDT0003-osd setup error -5
LustreError: 114156:0:(obd_mount_server.c:1958:server_fill_super()) Unable to start osd on /dev/drbd3: -5
LustreError: 114156:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-5)
We can mount as ldiskfs, and the oi.16.6 file is there, however we suspect
this is corrupted (based on teh above error).
We are wondering whether replacing this file from a backup (or indeed from
the failed raid once the controller is back online) would be an option,
and allow the system to continue again, albeit with some potential data
loss of recent accesses.
The failed MDT is not the primary one.
Anyone any ideas?
Thanks,
Alastair.
More information about the lustre-discuss
mailing list