[Lustre-discuss] 677:ldlm_lock_decref()) LBUG on MDS

Scott Barber scott at imemories.com
Sat May 15 15:29:36 PDT 2010


All OSSs and MDS are 1.8.3 CentOS x_86_64
Clients are a mix of 1.6.4.2, 1.8.1.1 and 1.8.3

Just hit an LBUG on our MDS. I googled it and searched lustre's
bugzilla without any luck:
May 15 14:48:20 mds01 kernel: LustreError:
3867:0:(ldlm_lock.c:677:ldlm_lock_decref()) ASSERTION(lock != NULL)
failed: Non-existing lock: 0x4a26a46afdc46bc6
May 15 14:48:20 mds01 kernel: LustreError:
3867:0:(ldlm_lock.c:677:ldlm_lock_decref()) LBUG
May 15 14:48:20 mds01 kernel: Lustre:
3867:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for
process 3867
May 15 14:48:20 mds01 kernel: ll_mdt_55     R  running task       0
3867      1          3868  3866 (L-TLB)
May 15 14:48:20 mds01 kernel:  ffff8100763d2d00 ffffffff80062ff8
ffff810062b8c000 0000000000000082
May 15 14:48:20 mds01 kernel:  0000008100001400 000000000000000a
ffff81005be14820 ffff81007fa7c100
May 15 14:48:20 mds01 kernel:  0000000000000286 ffffffff8003dafe
ffff81005bb5a4c0 ffff81007fa78000
May 15 14:48:20 mds01 kernel: Call Trace:
May 15 14:48:20 mds01 kernel:  [<ffffffff80062ff8>] thread_return+0x62/0xfe
May 15 14:48:20 mds01 kernel:  [<ffffffff8003dafe>] lock_timer_base+0x1b/0x3c
May 15 14:48:20 mds01 kernel:  [<ffffffff8001ca74>] __mod_timer+0xb0/0xbe
May 15 14:48:20 mds01 kernel:  [<ffffffff8861e408>]
:ptlrpc:ptlrpc_main+0x1258/0x1420
May 15 14:48:20 mds01 kernel:  [<ffffffff8008c86b>]
default_wake_function+0x0/0xe
May 15 14:48:20 mds01 kernel:  [<ffffffff800b7076>]
audit_syscall_exit+0x336/0x362
May 15 14:48:20 mds01 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
May 15 14:48:20 mds01 kernel:  [<ffffffff8861d1b0>]
:ptlrpc:ptlrpc_main+0x0/0x1420
May 15 14:48:20 mds01 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
May 15 14:48:20 mds01 kernel:
May 15 14:48:20 mds01 kernel: LustreError: dumping log to
/tmp/lustre-log.1273960100.3867

It looks like /tmp/lustre-log.1273960100.3867 has a whole bunch of
messages like:
pinger.c^@ptlrpc_pinger_main^@not pinging sanvol06-OST0021_UUID (in
recovery: DISCONN or recovery disabled: 1/0)
pinger.c^@ptlrpc_pinger_main^@not pinging sanvol06-OST0022_UUID (in
recovery: DISCONN or recovery disabled: 1/0)
pinger.c^@ptlrpc_pinger_main^@not pinging sanvol06-OST0023_UUID (in
recovery: DISCONN or recovery disabled: 1/0)
pinger.c^@ptlrpc_pinger_main^@not pinging sanvol06-OST0024_UUID (in
recovery: DISCONN or recovery disabled: 1/0)
pinger.c^@ptlrpc_pinger_main^@not pinging sanvol06-OST0025_UUID (in
recovery: DISCONN or recovery disabled: 1/0)
pinger.c^@ptlrpc_pinger_main^@not pinging sanvol06-OST0026_UUID (in
recovery: DISCONN or recovery disabled: 1/0)

sanvol06-OST0021_UUID through sanvol06-OST0026_UUID are old OSTs that
I removed many months ago and no longer exist. (All marked as lctl
conf_param sanvol06-OST00xx.osc.active=0)

I can access the volume just fine and don't see any other issues. Thoughts?

Thanks,
Scott Barber
Senior Systems Administrator
iMemories.com



More information about the lustre-discuss mailing list