[lustre-discuss] MDT corruption

Alastair Basden a.g.basden at durham.ac.uk
Mon Jan 26 09:50:54 PST 2026


Hi all,

We are wondering whether anyone can shed some light for us.

A MDT raid controller failed, and the drbd replica seems to be corrupted, 
since we can't mount the MDT on another node (where it should have been 
replicated to).

We are using Lustre 2.12.6.

Errors are (when trying to mount):

LDISKFS-fs (drbd3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
LustreError: 114156:0:(osd_iam.c:182:iam_load_idle_blocks()) drbd3: cannot load idle blocks, blk = 1244, err = -5
LustreError: 114156:0:(osd_oi.c:324:osd_oi_table_open()) drbd3: can't open oi.16.6: rc = -5
LustreError: 114156:0:(osd_oi.c:327:osd_oi_table_open()) drbd3: expect to open total 64 OI files.
LustreError: 114156:0:(obd_config.c:559:class_setup()) setup cos8-MDT0003-osd failed (-5)
LustreError: 114156:0:(obd_mount.c:202:lustre_start_simple()) cos8-MDT0003-osd setup error -5
LustreError: 114156:0:(obd_mount_server.c:1958:server_fill_super()) Unable to start osd on /dev/drbd3: -5
LustreError: 114156:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-5)

We can mount as ldiskfs, and the oi.16.6 file is there, however we suspect 
this is corrupted (based on teh above error).

We are wondering whether replacing this file from a backup (or indeed from 
the failed raid once the controller is back online) would be an option, 
and allow the system to continue again, albeit with some potential data 
loss of recent accesses.

The failed MDT is not the primary one.

Anyone any ideas?

Thanks,
Alastair.


More information about the lustre-discuss mailing list