[lustre-discuss] (LFSCK) LBUG: ASSERTION( get_current()->journal_info == ((void *)0) ) failed

Bernd Schubert bernd.schubert at fastmail.fm
Wed Sep 14 11:58:26 PDT 2016


Hi Cédric,

I'm by no means familiar with Lustre code anymore, but based on the stack 
trace and function names, it seems to be a problem with the journal. Maybe try 
to do an 'efsck -f' which would replay the journal and possibly clean up the 
file it has problem with.


Cheers,
Bernd


On Wednesday, September 14, 2016 9:28:38 AM CEST Cédric Dufour - Idiap 
Research Institute wrote:
> Hello,
> 
> Last Friday, during normal operations, our MDS froze with the following
> LBUG, which happens again as soon as one mounts the MDT again:
> 
> Sep 13 15:10:28 n00a kernel: [ 8414.600584] LustreError:
> 11696:0:(osd_handler.c:936:osd_trans_start()) ASSERTION(
> get_current()->journal_info == ((void *)0) ) failed: Sep 13 15:10:28
> n00a kernel: [ 8414.612825] LustreError:
> 11696:0:(osd_handler.c:936:osd_trans_start()) LBUG
> Sep 13 15:10:28 n00a kernel: [ 8414.619833] Pid: 11696, comm: lfsck
> Sep 13 15:10:28 n00a kernel: [ 8414.619835] Sep 13 15:10:28 n00a kernel:
> [ 8414.619835] Call Trace:
> Sep 13 15:10:28 n00a kernel: [ 8414.619850]  [<ffffffffa0224822>]
> libcfs_debug_dumpstack+0x52/0x80 [libcfs]
> Sep 13 15:10:28 n00a kernel: [ 8414.619857]  [<ffffffffa0224db2>]
> lbug_with_loc+0x42/0xa0 [libcfs]
> Sep 13 15:10:28 n00a kernel: [ 8414.619864]  [<ffffffffa0b11890>]
> osd_trans_start+0x250/0x630 [osd_ldiskfs]
> Sep 13 15:10:28 n00a kernel: [ 8414.619870]  [<ffffffffa0b0e748>] ?
> osd_declare_xattr_set+0x58/0x230 [osd_ldiskfs]
> Sep 13 15:10:28 n00a kernel: [ 8414.619876]  [<ffffffffa0c6ffc7>]
> lod_trans_start+0x177/0x200 [lod]
> Sep 13 15:10:28 n00a kernel: [ 8414.619881]  [<ffffffffa0cbd752>]
> lfsck_namespace_double_scan+0x1122/0x1e50 [lfsck]
> Sep 13 15:10:28 n00a kernel: [ 8414.619888]  [<ffffffff8136741b>] ?
> thread_return+0x3e/0x10c
> Sep 13 15:10:28 n00a kernel: [ 8414.619894]  [<ffffffff81038b87>] ?
> enqueue_task_fair+0x58/0x5d
> Sep 13 15:10:28 n00a kernel: [ 8414.619899]  [<ffffffffa0cb68ea>]
> lfsck_double_scan+0x5a/0x70 [lfsck]
> Sep 13 15:10:28 n00a kernel: [ 8414.619904]  [<ffffffffa0cb7dfd>]
> lfsck_master_engine+0x50d/0x650 [lfsck]
> Sep 13 15:10:28 n00a kernel: [ 8414.619909]  [<ffffffffa0cb78f0>] ?
> lfsck_master_engine+0x0/0x650 [lfsck]
> Sep 13 15:10:28 n00a kernel: [ 8414.619915]  [<ffffffff810534c4>]
> kthread+0x7b/0x83
> Sep 13 15:10:28 n00a kernel: [ 8414.619918]  [<ffffffff810369d3>] ?
> finish_task_switch+0x48/0xb9
> Sep 13 15:10:28 n00a kernel: [ 8414.619924]  [<ffffffff8101092a>]
> child_rip+0xa/0x20
> Sep 13 15:10:28 n00a kernel: [ 8414.619928]  [<ffffffff81053449>] ?
> kthread+0x0/0x83
> Sep 13 15:10:28 n00a kernel: [ 8414.619931]  [<ffffffff81010920>] ?
> child_rip+0x0/0x20
> 
> 
> I originally had the LFSCK launched in "dry-run" mode:
> 
> lctl lfsck_start --device lustre-1-MDT0000 --dryrun on --type namespace
> 
> The LFSCK was reported completed (I was 'watch[ing] -n 1' on a terminal)
> before the LBUG popped-up; now, I can't even get any output
> 
> cat /proc/fs/lustre/mdd/lustre-1-MDT0000/lfsck_namespace  # just hang
> there indefinitely
> 
> 
> I remember seing a lfsck_namespace file in the MDT underlyding LDISKFS;
> is there anything sensible I can do with it (e.g. would deleting it
> solve the situation) ?
> What else could I do ?
> 
> 
> Thanks for your answers and best regards,
> 
> Cédric D.
> 
> 
> PS: I had this message originally posted on HPDD-discuss mailing list
> and just realized it was the wrong place; sorry for any crossposting case
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




More information about the lustre-discuss mailing list