[lustre-discuss] Recover crashing MDS/MDT (assertion "invalid llog tail")

Tue Aug 11 13:41:43 PDT 2015

Hello,

I'm experiencing MDS crashes after recovering from FS corruption
caused by hardware RAID failures on a system I manage.

I'm hoping a similar recovery process discussed on LU-6696 could work for
me: https://jira.hpdd.intel.com/browse/LU-6696

I'm trying to get the FS back to a readable state so users can copy
files off the system, at which point the FS will likely be reformatted.

Details: Luster servers are running 2.5.3.90 on RHEL kernel 2.6.32-431.56.1.

All was running well till a hardware failure occurred with the backing storage
arrays of the MDT.  The storage array has active and standby RAID controllers.
The active controller crashed and the "mirrored" cache to the standby
controller was lost.  MDS+MDT/MGS kept running with multi-path driver under
the assumption all issued IOs succeeded.  :-/

Hours later the MDS kernel noticed corruption on the MDT and remounted
it read-only:

[4018464.727774] LDISKFS-fs error (device dm-2): htree_dirblock_to_tree: bad entry in directory #1967679418: rec_len % 4 != 0 - block=983894031offset=0(0), inode=1919907631, rec_len=12139, name_len=119
[4018464.745662] Aborting journal on device dm-2-8.
[4018464.751092] LustreError: 8086:0:(osd_handler.c:860:osd_trans_commit_cb()) transaction @0xffff8803ebe4acc0 commit error: 2
[4018464.762367] LDISKFS-fs (dm-2): Remounting filesystem read-only

All Lustre clients were unmounted followed by server targets in the following
order: MGS/MDT (same device), all 36xOSTs.  The umounts completed without
issue.

The MDT was mounted to recover the journal before running:
e2fsck -fp /dev/mdtdevice

No surprise, that reported: "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY."

I created a lvm snapshot of the MDT and progressed to:
e2fsck -fy /dev/mdtdevice

As expected, the above found numerous issues with files created prior to the
MDT going RO.

Mounting the MGS/MDT after the e2fsck succeeded.  I am able to mount OST
indexes 0 through 6, but the MDS kernel panics when OST0007 is mounted.

I've tried mounting the targets with '-o abort_recov' and also attempted
'tunefs.lustre --writeconf' on the MDT and OSTs, mounting them in index order.

Mounting OST index 7 always panics the MDS kernel.

I increased the 'lctl set_param' debug/subsystem_debug/printk in hopes
of narrowing down where the issue is and captured messages when OST
index 7 was being mounted.  A small snippet prior to LBUG:

<6>[  964.491664] Lustre: 2491:0:(osp_sync.c:949:osp_sync_llog_init()) lstrFS-OS
T0007-osc-MDT0000: Init llog for 7 - catid 0x10:1:0
<snip>
<6>[  964.504006] Lustre: 2493:0:(llog.c:318:llog_process_thread()) index: 63072 last_index 64767
<6>[  964.504010] Lustre: 2493:0:(llog_osd.c:551:llog_osd_next_block()) looking for log index 63072 (cur idx 32896 off 2113536)
<3>[  964.505912] LustreError: 2493:0:(llog_osd.c:638:llog_osd_next_block()) lstrFS-MDT0000-osd: invalid llog tail at log id 0x2b54:1/0 offset 4087808
<6>[  964.505923] Lustre: 2493:0:(llog.c:402:llog_process_thread()) kfreed 'buf': 8192 at ffff880617ace000.
<6>[  964.505928] Lustre: 2493:0:(llog.c:480:llog_process_or_fork()) kfreed 'lpi': 80 at ffff8806257854c0.
<6>[  964.505937] Lustre: 2493:0:(llog.c:402:llog_process_thread()) kfreed 'buf': 8192 at ffff880617aae000.
<6>[  964.505942] Lustre: 2493:0:(llog.c:480:llog_process_or_fork()) kfreed 'lpi': 80 at ffff880625a298c0.
<0>[  964.505951] LustreError: 2493:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 2 in progress, 0 in flight: -22
<0>[  964.505956] LustreError: 2493:0:(osp_sync.c:874:osp_sync_thread()) LBUG
<4>[  964.505958] Pid: 2493, comm: osp-syn-7-0
<4>[  964.505960]
<4>[  964.505960] Call Trace:
<4>[  964.505988]  [<ffffffffa02e3895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[  964.505997]  [<ffffffffa02e3e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[  964.506007]  [<ffffffffa0e5a2e3>] osp_sync_thread+0x753/0x7d0 [osp]
<4>[  964.506011]  [<ffffffff81529e9e>] ? thread_return+0x4e/0x770
<4>[  964.506018]  [<ffffffffa0e59b90>] ? osp_sync_thread+0x0/0x7d0 [osp]
<4>[  964.506021]  [<ffffffff8109abf6>] kthread+0x96/0xa0
<4>[  964.506025]  [<ffffffff8100c20a>] child_rip+0xa/0x20
<4>[  964.506027]  [<ffffffff8109ab60>] ? kthread+0x0/0xa0
<4>[  964.506029]  [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>[  964.506030] 
<6>[  964.506218] LNet: 2493:0:(linux-debug.c:142:libcfs_run_upcall()) Invoked LNET upcall /usr/lib/lustre/lnet_upcall LBUG,/builddir/build/BUILD/lustre-2.5.3.90/lustre/osp/osp_sync.c,osp_sync_thread,874
<0>[  964.506221] Kernel panic - not syncing: LBUG

Is there any way to fix or remove the (bad?) llog for 7 catid 0x10:1:0?
Also, how is 0x10:1:0 decoded to map back to a file on the ldiskfs?

Thank you,
-Josh