[Lustre-discuss] 1.6.4.1 - LBUG on MDS

Niklas Edmundsson Niklas.Edmundsson at hpc2n.umu.se
Sun Jan 13 23:17:38 PST 2008


On Mon, 14 Jan 2008, Niklas Edmundsson wrote:

> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS
> LBUG:ed with:
>
> -------------8<--------------------
> Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0
> Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG
> Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198
> Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198
> -------------8<--------------------

Ahem. It seems I got a little carried away with grep there and missed 
the stack trace. This should be more complete:
---------------8<---------------
Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0
Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG
Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198
Jan 12 10:39:40 ll_mdt_22     R  running task       0  6198      1          6199  6197 (L-TLB)
Jan 12 10:39:40  343836365b3e343c 0036373832382e32 0000383338373433 0000000000000246
Jan 12 10:39:40  ffff8100f0697560 0000000000000018 343836365b3e303c 3030313132382e32
Jan 12 10:39:40  ffffffffff00205d ffff8100f06976fa ffff8101706976ef ffffffff805172e0
Jan 12 10:39:40 Call Trace:
Jan 12 10:39:40  [<ffffffff80315c71>] vsnprintf+0x5b1/0x630
Jan 12 10:39:40  [<ffffffff8021d470>] physflat_send_IPI_mask+0x0/0x80
Jan 12 10:39:40  [<ffffffff802360ef>] vprintk+0x2ef/0x320
Jan 12 10:39:40  [<ffffffff8022b923>] __wake_up_common+0x43/0x80
Jan 12 10:39:40  [<ffffffff8022b923>] __wake_up_common+0x43/0x80
Jan 12 10:39:40  [<ffffffff8023616e>] printk+0x4e/0x60
Jan 12 10:39:40  [<ffffffff802360ef>] vprintk+0x2ef/0x320
Jan 12 10:39:40  [<ffffffff802360ef>] vprintk+0x2ef/0x320
Jan 12 10:39:40  [<ffffffff802360ef>] vprintk+0x2ef/0x320
Jan 12 10:39:40  [<ffffffff80256450>] kallsyms_lookup+0xf0/0x230
Jan 12 10:39:40  [<ffffffff80256450>] kallsyms_lookup+0xf0/0x230
Jan 12 10:39:40  [<ffffffff8020b090>] printk_address+0xb0/0xc0
Jan 12 10:39:40  [<ffffffff8023616e>] printk+0x4e/0x60
Jan 12 10:39:40  [<ffffffff80255c2a>] module_text_address+0x3a/0x50
Jan 12 10:39:40  [<ffffffff802491da>] kernel_text_address+0x1a/0x30
Jan 12 10:39:40  [<ffffffff802491da>] kernel_text_address+0x1a/0x30
Jan 12 10:39:40  [<ffffffff8020b4cc>] show_trace+0x21c/0x250
Jan 12 10:39:40  [<ffffffff8020b5ea>] _show_stack+0xea/0x100
Jan 12 10:39:40  [<ffffffff883f3a0a>] :libcfs:lbug_with_loc+0x7a/0xc0
Jan 12 10:39:40  [<ffffffff8871bb01>] :mds:mds_orphan_add_link+0x641/0x7e0
Jan 12 10:39:40  [<ffffffff883cabfd>] :ldiskfs:__ldiskfs_journal_stop+0x2d/0x60
Jan 12 10:39:40  [<ffffffff802cb55b>] dnotify_parent+0x2b/0xa0
Jan 12 10:39:40  [<ffffffff802a81a3>] dput+0x23/0x170
Jan 12 10:39:40  [<ffffffff8871d498>] :mds:mds_reint_unlink+0x17f8/0x25f0
Jan 12 10:39:40  [<ffffffff8850ec47>] :ptlrpc:ptlrpc_prep_set+0x2c7/0x360
Jan 12 10:39:40  [<ffffffff802a81a3>] dput+0x23/0x170
Jan 12 10:39:40  [<ffffffff8870f7b9>] :mds:mds_reint_rec+0x1d9/0x2b0
Jan 12 10:39:40  [<ffffffff887357cc>] :mds:mds_unlink_unpack+0x29c/0x3c0
Jan 12 10:39:40  [<ffffffff884e6f91>] :ptlrpc:ldlm_run_cp_ast_work+0x171/0x200
Jan 12 10:39:40  [<ffffffff88734624>] :mds:mds_update_unpack+0x214/0x2b0
Jan 12 10:39:40  [<ffffffff886ff971>] :mds:mds_reint+0x4b1/0x5a0
Jan 12 10:39:40  [<ffffffff885201cf>] :ptlrpc:lustre_msg_get_version+0x4f/0x100
Jan 12 10:39:40  [<ffffffff8870beea>] :mds:mds_handle+0x2fca/0x5f88
Jan 12 10:39:40  [<ffffffff884ff878>] :ptlrpc:ldlm_cli_cancel+0x298/0x2c0
Jan 12 10:39:40  [<ffffffff802899d0>] __drain_alien_cache+0x60/0x90
Jan 12 10:39:40  [<ffffffff8022e812>] find_busiest_group+0x252/0x6c0
Jan 12 10:39:40  [<ffffffff8848ae45>] :obdclass:class_handle2object+0xd5/0x160
Jan 12 10:39:40  [<ffffffff8851c480>] :ptlrpc:lustre_swab_ptlrpc_body+0x0/0x90
Jan 12 10:39:40  [<ffffffff88521155>] :ptlrpc:lustre_swab_buf+0xc5/0xf0
Jan 12 10:39:40  [<ffffffff8852710a>] :ptlrpc:ptlrpc_server_handle_request+0xc8a/0x1460
Jan 12 10:39:40  [<ffffffff80416d20>] thread_return+0x0/0x100
Jan 12 10:39:40  [<ffffffff8020df9e>] do_gettimeofday+0x5e/0xb0
Jan 12 10:39:40  [<ffffffff883fbf06>] :libcfs:lcw_update_time+0x16/0x100
Jan 12 10:39:40  [<ffffffff8023f309>] lock_timer_base+0x29/0x60
Jan 12 10:39:40  [<ffffffff8023f7f0>] __mod_timer+0xc0/0xf0
Jan 12 10:39:40  [<ffffffff8852933c>] :ptlrpc:ptlrpc_main+0x85c/0x9e0
Jan 12 10:39:40  [<ffffffff8022f490>] default_wake_function+0x0/0x10
Jan 12 10:39:40  [<ffffffff8020ac4c>] child_rip+0xa/0x12
Jan 12 10:39:41  [<ffffffff88528ae0>] :ptlrpc:ptlrpc_main+0x0/0x9e0
Jan 12 10:39:41  [<ffffffff8020ac42>] child_rip+0x0/0x12
Jan 12 10:39:41 
Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198
---------------8<---------------

> I also have the lustre-log.1200130781.6198, but it seems to contain
> binary data so I'll supply it only if it's needed.
>
> The following triggered the bug:
> - mkdir rfiles
> - in rfiles create 300000 files of random size 0-32k
> - rm -rf rfiles &
> - sleep 600 (ie. wait until you get bored and the rm isn't finished).
> - rm -rf rfiles &
>
> This suggests that something isn't locked properly since two
> concurrent rm's in a directory definitely shouldn't cause the MDS so
> fall over...


/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke at hpc2n.umu.se
---------------------------------------------------------------------------
  An Elephant Is Just A Mouse Built To Gov't Specs!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=




More information about the lustre-discuss mailing list