[lustre-discuss] Recover crashing MDS/MDT (assertion "invalid llog tail")

Tue Aug 11 14:35:34 PDT 2015

Josh,

FWIW, I have been through a very similar issue.
My story from where you are:

I ran fsck on all the targets until it was happy. I mounted the MGS, then the MDT. Then I mounted all the OSTs EXCEPT the one that would cause kernel panics. This let me get at all the data that did not have anything on that OST.

Then we took about 5 months to get a support contract, just in case.
In the meantime, I upgraded to the latest version of lustre. Everything looked good an when we had downtime with support in place, I mounted the 'bad' OST which came up with no issue.

The best I could come up with was the upgrade made a difference. Not sure if one of the bugs that was fixed did it or what.

But, as I said, we were able to bring up the filesystem with the exception of the one OST, so that could help you some.

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

-----Original Message-----
From: lustre-discuss [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Josh Samuelson
Sent: Tuesday, August 11, 2015 1:42 PM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Recover crashing MDS/MDT (assertion "invalid llog tail")

Hello,

I'm experiencing MDS crashes after recovering from FS corruption caused by hardware RAID failures on a system I manage.

I'm hoping a similar recovery process discussed on LU-6696 could work for
me: https://jira.hpdd.intel.com/browse/LU-6696

I'm trying to get the FS back to a readable state so users can copy files off the system, at which point the FS will likely be reformatted.

Details: Luster servers are running 2.5.3.90 on RHEL kernel 2.6.32-431.56.1.

All was running well till a hardware failure occurred with the backing storage arrays of the MDT.  The storage array has active and standby RAID controllers.
The active controller crashed and the "mirrored" cache to the standby controller was lost.  MDS+MDT/MGS kept running with multi-path driver under the assumption all issued IOs succeeded.  :-/

Hours later the MDS kernel noticed corruption on the MDT and remounted it read-only:

[4018464.727774] LDISKFS-fs error (device dm-2): htree_dirblock_to_tree: bad entry in directory #1967679418: rec_len % 4 != 0 - block=983894031offset=0(0), inode=1919907631, rec_len=12139, name_len=119 [4018464.745662] Aborting journal on device dm-2-8.
[4018464.751092] LustreError: 8086:0:(osd_handler.c:860:osd_trans_commit_cb()) transaction @0xffff8803ebe4acc0 commit error: 2 [4018464.762367] LDISKFS-fs (dm-2): Remounting filesystem read-only

All Lustre clients were unmounted followed by server targets in the following
order: MGS/MDT (same device), all 36xOSTs.  The umounts completed without issue.

The MDT was mounted to recover the journal before running:
e2fsck -fp /dev/mdtdevice

No surprise, that reported: "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY."

I created a lvm snapshot of the MDT and progressed to:
e2fsck -fy /dev/mdtdevice

As expected, the above found numerous issues with files created prior to the MDT going RO.

Mounting the MGS/MDT after the e2fsck succeeded.  I am able to mount OST indexes 0 through 6, but the MDS kernel panics when OST0007 is mounted.

I've tried mounting the targets with '-o abort_recov' and also attempted 'tunefs.lustre --writeconf' on the MDT and OSTs, mounting them in index order.

Mounting OST index 7 always panics the MDS kernel.

I increased the 'lctl set_param' debug/subsystem_debug/printk in hopes of narrowing down where the issue is and captured messages when OST index 7 was being mounted.  A small snippet prior to LBUG:

<6>[  964.491664] Lustre: 2491:0:(osp_sync.c:949:osp_sync_llog_init()) lstrFS-OS
T0007-osc-MDT0000: Init llog for 7 - catid 0x10:1:0 <snip> <6>[  964.504006] Lustre: 2493:0:(llog.c:318:llog_process_thread()) index: 63072 last_index 64767 <6>[  964.504010] Lustre: 2493:0:(llog_osd.c:551:llog_osd_next_block()) looking for log index 63072 (cur idx 32896 off 2113536) <3>[  964.505912] LustreError: 2493:0:(llog_osd.c:638:llog_osd_next_block()) lstrFS-MDT0000-osd: invalid llog tail at log id 0x2b54:1/0 offset 4087808 <6>[  964.505923] Lustre: 2493:0:(llog.c:402:llog_process_thread()) kfreed 'buf': 8192 at ffff880617ace000.
<6>[  964.505928] Lustre: 2493:0:(llog.c:480:llog_process_or_fork()) kfreed 'lpi': 80 at ffff8806257854c0.
<6>[  964.505937] Lustre: 2493:0:(llog.c:402:llog_process_thread()) kfreed 'buf': 8192 at ffff880617aae000.
<6>[  964.505942] Lustre: 2493:0:(llog.c:480:llog_process_or_fork()) kfreed 'lpi': 80 at ffff880625a298c0.
<0>[  964.505951] LustreError: 2493:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 2 in progress, 0 in flight: -22 <0>[  964.505956] LustreError: 2493:0:(osp_sync.c:874:osp_sync_thread()) LBUG <4>[  964.505958] Pid: 2493, comm: osp-syn-7-0 <4>[  964.505960] <4>[  964.505960] Call Trace:
<4>[  964.505988]  [<ffffffffa02e3895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[  964.505997]  [<ffffffffa02e3e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[  964.506007]  [<ffffffffa0e5a2e3>] osp_sync_thread+0x753/0x7d0 [osp] <4>[  964.506011]  [<ffffffff81529e9e>] ? thread_return+0x4e/0x770 <4>[  964.506018]  [<ffffffffa0e59b90>] ? osp_sync_thread+0x0/0x7d0 [osp] <4>[  964.506021]  [<ffffffff8109abf6>] kthread+0x96/0xa0 <4>[  964.506025]  [<ffffffff8100c20a>] child_rip+0xa/0x20 <4>[  964.506027]  [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4>[  964.506029]  [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>[  964.506030] <6>[  964.506218] LNet: 2493:0:(linux-debug.c:142:libcfs_run_upcall()) Invoked LNET upcall /usr/lib/lustre/lnet_upcall LBUG,/builddir/build/BUILD/lustre-2.5.3.90/lustre/osp/osp_sync.c,osp_sync_thread,874
<0>[  964.506221] Kernel panic - not syncing: LBUG

Is there any way to fix or remove the (bad?) llog for 7 catid 0x10:1:0?
Also, how is 0x10:1:0 decoded to map back to a file on the ldiskfs?

Thank you,
-Josh
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org