[Lustre-discuss] Luster LBUG Not Healthy ON MDT with 1.6.7 & 1.6.5.1

anil kumar anil.k.kv at gmail.com
Wed Apr 22 04:20:50 PDT 2009


Hi,

We have issues with LBUG not healthy on MDT, following are the error
messages logged

installed 1.6.7 and got the following, so we had to revert back to 1.6.5.1,
even with 1.6.5.1 i see the same issue now,

Apr 20 06:18:45 dadbdd01 kernel: LustreError:
15772:0:(mds_open.c:1097:mds_open())
ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed:dchild
29f8cba:16b5b5ab (ffff8105c4b50660) inode
ffff8105c71c3d48/44010682/381007275
Apr 20 06:18:45 dadbdd01 kernel: LustreError:
15772:0:(mds_open.c:1097:mds_open()) LBUG
Apr 20 06:18:45 dadbdd01 kernel: Lustre:
15772:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for
process 15772
Apr 20 06:18:45 dadbdd01 kernel: ll_mdt_50     R  running task       0
15772      1         15773 15771 (L-TLB)
Apr 20 06:18:45 dadbdd01 kernel:  ffff8107cf93de50 0000000000000046
ffff8107fe1bf1c8 ffffffff8006b6c9
Apr 20 06:18:45 dadbdd01 kernel:  ffff8107cfcb8840 ffffffff885e46c1
ffff8107fe1bf000 ffff8107fe1bf0e0
Apr 20 06:18:45 dadbdd01 kernel:  ffff8108009fdc40 ffffffff885e23d6
ffff8107fe1bf188 0000000000000000
Apr 20 06:18:45 dadbdd01 kernel: Call Trace:
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff8006b6c9>]
do_gettimeofday+0x50/0x92
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff885e23d6>]
:libcfs:lcw_update_time+0x16/0x100
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff88734efc>]
:ptlrpc:ptlrpc_main+0xdcc/0xf50
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff80088432>]
default_wake_function+0x0/0xe
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff88734130>]
:ptlrpc:ptlrpc_main+0x0/0xf50
Apr 20 06:18:45 dadbdd01 kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11
Apr 20 06:18:45 dadbdd01 kernel:
Apr 20 06:18:45 dadbdd01 kernel: LustreError: dumping log to
/tmp/lustre-log.1240233525.15772
Apr 20 06:20:25 dadbdd01 kernel: Lustre:
15763:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting
Apr 20 06:20:25 dadbdd01 kernel: Lustre:
15763:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to
0xffff8107fa566000; still busy with 2 active RPCs
Apr 20 06:20:25 dadbdd01 kernel: LustreError:
15763:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff8107fa72dc00 x277188558/t0
o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens
304/200 e 0 to 0 dl 1240233725 ref 1 fl Interpret:/0/0 rc -16/0
Apr 20 06:20:50 dadbdd01 kernel: Lustre:
15769:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting
Apr 20 06:20:50 dadbdd01 kernel: Lustre:
15769:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to
0xffff8107fa566000; still busy with 2 active RPCs
Apr 20 06:20:50 dadbdd01 kernel: LustreError:
15769:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff81055a96ac00 x277188624/t0
o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens
304/200 e 0 to 0 dl 1240233750 ref 1 fl Interpret:/0/0 rc -16/0
Apr 20 06:21:15 dadbdd01 kernel: Lustre:
15741:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting
Apr 20 06:21:15 dadbdd01 kernel: Lustre:
15741:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to
0xffff8107fa566000; still busy with 2 active RPCs
Apr 20 06:21:15 dadbdd01 kernel: LustreError:
15741:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff81078039f800 x277188703/t0
o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens
304/200 e 0 to 0 dl 1240233775 ref 1 fl Interpret:/0/0 rc -16/0
Apr 20 06:21:40 dadbdd01 kernel: Lustre:
15953:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting
Apr 20 06:21:40 dadbdd01 kernel: Lustre:
15953:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to
0xffff8107fa566000; still busy with 2 active RPCs
Apr 20 06:21:40 dadbdd01 kernel: LustreError:
15953:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff810515284200 x277188781/t0
o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens
304/200 e 0 to 0 dl 1240233800 ref 1 fl Interpret:/0/0 rc -16/0
Apr 20 06:21:54 dadbdd01 kernel: LustreError:
0:0:(ldlm_lockd.c:234:waiting_locks_callback()) ### lock callback timer
expired after 7485s: evicting client at 10.229.168.36 at tcp  ns:
mds-farmres-MDT0000_UUID lock: ffff8106ba114600/0x7cd8e2e59918c380 lrc:
1/0,0 mode: CR/CR res: 53842204/380418865 bits 0x3 rrc: 3 type: IBT flags:
4000020 remote: 0x6628125da30deb02 expref: 41753 pid 15774
Apr 20 06:22:05 dadbdd01 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 15772: it was inactive for 200s
Apr 20 06:22:05 dadbdd01 kernel: Lustre:
0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process
15772
Apr 20 06:22:05 dadbdd01 kernel: ll_mdt_50     D ffff8107fdb66528     0
15772      1         15773 15771 (L-TLB)
Apr 20 06:22:05 dadbdd01 kernel:  ffff8107cf93d700 0000000000000046
ffffffff885e4026 ffffffff885e47f0
Apr 20 06:22:05 dadbdd01 kernel:  000000000000000a ffff810828fe57a0
ffff81011cb24100 000232d0fd32a434
Apr 20 06:22:05 dadbdd01 kernel:  000000000000212e ffff810828fe5988
ffff810700000007 ffffffff8003b127
Apr 20 06:22:05 dadbdd01 kernel: Call Trace:
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8003b127>]
remove_wait_queue+0x1c/0x2c
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff80088432>]
default_wake_function+0x0/0xe
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff885d9c6b>]
:libcfs:lbug_with_loc+0xbb/0xc0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff889a48d7>]
:mds:mds_open+0x1f57/0x322b
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff887a52a1>]
:ksocklnd:ksocknal_queue_tx_locked+0x4f1/0x550
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff886e719e>]
:ptlrpc:lock_res_and_lock+0xbe/0xe0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88981bf9>]
:mds:mds_reint_rec+0x1d9/0x2b0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff889a7b43>]
:mds:mds_open_unpack+0x2f3/0x410
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff889a622f>]
:mds:mds_update_unpack+0x20f/0x2b0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8897465a>]
:mds:mds_reint+0x35a/0x420
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88973432>]
:mds:fixup_handle_for_resent_req+0x52/0x200
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88978303>]
:mds:mds_intent_policy+0x453/0xc10
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff886274ec>]
:lnet:LNetMDBind+0x2ac/0x400
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff886ef386>]
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff886ecbe6>]
:ptlrpc:ldlm_lock_enqueue+0x186/0x990
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff886e9dad>]
:ptlrpc:ldlm_lock_create+0x9ad/0x9e0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8870de20>]
:ptlrpc:ldlm_server_completion_ast+0x0/0x5b0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8870b745>]
:ptlrpc:ldlm_handle_enqueue+0xc95/0x1280
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8870e3d0>]
:ptlrpc:ldlm_server_blocking_ast+0x0/0x6b0
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8897cb07>]
:mds:mds_handle+0x4047/0x4d10
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8014090e>] __next_cpu+0x19/0x28
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff80073331>]
smp_send_reschedule+0x4e/0x53
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88686d31>]
:obdclass:class_handle2object+0xd1/0x160
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88725b7f>]
:ptlrpc:lustre_msg_get_conn_cnt+0x4f/0x100
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8872fe9a>]
:ptlrpc:ptlrpc_check_req+0x1a/0x110
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88731fe2>]
:ptlrpc:ptlrpc_server_handle_request+0x992/0x1030
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8006b6c9>]
do_gettimeofday+0x50/0x92
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff885e23d6>]
:libcfs:lcw_update_time+0x16/0x100
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88734efc>]
:ptlrpc:ptlrpc_main+0xdcc/0xf50
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff80088432>]
default_wake_function+0x0/0xe
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff88734130>]
:ptlrpc:ptlrpc_main+0x0/0xf50
Apr 20 06:22:05 dadbdd01 kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11
Apr 20 06:22:05 dadbdd01 kernel:
Apr 20 06:22:05 dadbdd01 kernel: LustreError: dumping log to
/tmp/lustre-log.1240233725.15772

Can someone suggest on this,

Thanks,
Anil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090422/6b950040/attachment.htm>


More information about the lustre-discuss mailing list