[lustre-discuss] (LFSCK) LBUG: ASSERTION( get_current()->journal_info == ((void *)0) ) failed

Cédric Dufour - Idiap Research Institute cedric.dufour at idiap.ch
Wed Sep 14 23:05:55 PDT 2016


Hello,

On 14/09/16 20:58, Bernd Schubert wrote:
> Hi Cédric,
>
> I'm by no means familiar with Lustre code anymore, but based on the stack 
> trace and function names, it seems to be a problem with the journal. Maybe try 
> to do an 'efsck -f' which would replay the journal and possibly clean up the 
> file it has problem with.

Thanks for the tip.

Unfortunately, I did perform a filesystem check as part of my attempts for recovery (and even ran a dry-run afterwards, to make sure no errors were dangling).

Cédric


>
>
> Cheers,
> Bernd
>
>
> On Wednesday, September 14, 2016 9:28:38 AM CEST Cédric Dufour - Idiap 
> Research Institute wrote:
>> Hello,
>>
>> Last Friday, during normal operations, our MDS froze with the following
>> LBUG, which happens again as soon as one mounts the MDT again:
>>
>> Sep 13 15:10:28 n00a kernel: [ 8414.600584] LustreError:
>> 11696:0:(osd_handler.c:936:osd_trans_start()) ASSERTION(
>> get_current()->journal_info == ((void *)0) ) failed: Sep 13 15:10:28
>> n00a kernel: [ 8414.612825] LustreError:
>> 11696:0:(osd_handler.c:936:osd_trans_start()) LBUG
>> Sep 13 15:10:28 n00a kernel: [ 8414.619833] Pid: 11696, comm: lfsck
>> Sep 13 15:10:28 n00a kernel: [ 8414.619835] Sep 13 15:10:28 n00a kernel:
>> [ 8414.619835] Call Trace:
>> Sep 13 15:10:28 n00a kernel: [ 8414.619850]  [<ffffffffa0224822>]
>> libcfs_debug_dumpstack+0x52/0x80 [libcfs]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619857]  [<ffffffffa0224db2>]
>> lbug_with_loc+0x42/0xa0 [libcfs]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619864]  [<ffffffffa0b11890>]
>> osd_trans_start+0x250/0x630 [osd_ldiskfs]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619870]  [<ffffffffa0b0e748>] ?
>> osd_declare_xattr_set+0x58/0x230 [osd_ldiskfs]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619876]  [<ffffffffa0c6ffc7>]
>> lod_trans_start+0x177/0x200 [lod]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619881]  [<ffffffffa0cbd752>]
>> lfsck_namespace_double_scan+0x1122/0x1e50 [lfsck]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619888]  [<ffffffff8136741b>] ?
>> thread_return+0x3e/0x10c
>> Sep 13 15:10:28 n00a kernel: [ 8414.619894]  [<ffffffff81038b87>] ?
>> enqueue_task_fair+0x58/0x5d
>> Sep 13 15:10:28 n00a kernel: [ 8414.619899]  [<ffffffffa0cb68ea>]
>> lfsck_double_scan+0x5a/0x70 [lfsck]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619904]  [<ffffffffa0cb7dfd>]
>> lfsck_master_engine+0x50d/0x650 [lfsck]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619909]  [<ffffffffa0cb78f0>] ?
>> lfsck_master_engine+0x0/0x650 [lfsck]
>> Sep 13 15:10:28 n00a kernel: [ 8414.619915]  [<ffffffff810534c4>]
>> kthread+0x7b/0x83
>> Sep 13 15:10:28 n00a kernel: [ 8414.619918]  [<ffffffff810369d3>] ?
>> finish_task_switch+0x48/0xb9
>> Sep 13 15:10:28 n00a kernel: [ 8414.619924]  [<ffffffff8101092a>]
>> child_rip+0xa/0x20
>> Sep 13 15:10:28 n00a kernel: [ 8414.619928]  [<ffffffff81053449>] ?
>> kthread+0x0/0x83
>> Sep 13 15:10:28 n00a kernel: [ 8414.619931]  [<ffffffff81010920>] ?
>> child_rip+0x0/0x20
>>
>>
>> I originally had the LFSCK launched in "dry-run" mode:
>>
>> lctl lfsck_start --device lustre-1-MDT0000 --dryrun on --type namespace
>>
>> The LFSCK was reported completed (I was 'watch[ing] -n 1' on a terminal)
>> before the LBUG popped-up; now, I can't even get any output
>>
>> cat /proc/fs/lustre/mdd/lustre-1-MDT0000/lfsck_namespace  # just hang
>> there indefinitely
>>
>>
>> I remember seing a lfsck_namespace file in the MDT underlyding LDISKFS;
>> is there anything sensible I can do with it (e.g. would deleting it
>> solve the situation) ?
>> What else could I do ?
>>
>>
>> Thanks for your answers and best regards,
>>
>> Cédric D.
>>
>>
>> PS: I had this message originally posted on HPDD-discuss mailing list
>> and just realized it was the wrong place; sorry for any crossposting case
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list