[Lustre-discuss] LBUG not healthy

anil kumar anil.k.kv at gmail.com
Mon May 11 02:49:14 PDT 2009


Hi,

I have following issues with 1.6.5.1 & 1.6.7.1, please let us know if there
are any work around or fix for this.
I notice time out on OST regularly on different OST/OSS


May 10 08:17:57 dadbdd01 kernel: LustreError:
30058:0:(mds_open.c:1097:mds_open())
ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed:dchild
57d21b1:7baa128
1 (ffff8106bf064df8) inode ffff8105d85f9668/92086705/2074743425
May 10 08:17:57 dadbdd01 kernel: LustreError:
30058:0:(mds_open.c:1097:mds_open()) LBUG
May 10 08:17:57 dadbdd01 kernel: Lustre:
30058:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for
process 30058
May 10 08:17:57 dadbdd01 kernel: ll_mdt_61     R  running task       0
30058      1         30210 30057 (L-TLB)
May 10 08:17:57 dadbdd01 kernel:  ffff810183e1fe50 0000000000000046
ffff81081fa21800 ffffffff8006b6c9
May 10 08:17:57 dadbdd01 kernel:  ffff810314fdd540 ffffffff885e06c1
ffff8105b9099c00 ffff8105b9099ce0
May 10 08:17:57 dadbdd01 kernel:  ffff81065bf8a800 ffffffff885de3d6
ffff8105b9099d88 0000000000000000
May 10 08:17:57 dadbdd01 kernel: Call Trace:
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff8006b6c9>]
do_gettimeofday+0x50/0x92
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff885de3d6>]
:libcfs:lcw_update_time+0x16/0x100
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff88730efc>]
:ptlrpc:ptlrpc_main+0xdcc/0xf50
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff80088432>]
default_wake_function+0x0/0xe
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff88730130>]
:ptlrpc:ptlrpc_main+0x0/0xf50
May 10 08:17:57 dadbdd01 kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11
May 10 08:17:57 dadbdd01 kernel:
May 10 08:17:57 dadbdd01 kernel: LustreError: dumping log to
/tmp/lustre-log.1241968677.30058
May 10 08:19:37 dadbdd01 kernel: Lustre:
29978:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting
May 10 08:19:37 dadbdd01 kernel: Lustre:
29978:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from f34601eb-c4bd-2d2f-ae73-0dde3
00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active
RPCs
May 10 08:19:37 dadbdd01 kernel: LustreError:
29978:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff8104966c4600 x236853107/t0 o3
8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens
304/200 e 0 to 0 dl 1241968877 ref 1 fl Interpret:/0/0 rc -16/0
May 10 08:19:37 dadbdd01 kernel: LustreError:
29978:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous similar
message
May 10 08:20:02 dadbdd01 kernel: Lustre:
30213:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting
May 10 08:20:02 dadbdd01 kernel: Lustre:
30213:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from f34601eb-c4bd-2d2f-ae73-0dde3
00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active
RPCs
May 10 08:20:02 dadbdd01 kernel: LustreError:
30213:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff8105fd78de00 x236853112/t0 o3
8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens
304/200 e 0 to 0 dl 1241968902 ref 1 fl Interpret:/0/0 rc -16/0
May 10 08:20:27 dadbdd01 kernel: Lustre:
15599:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting
May 10 08:20:27 dadbdd01 kernel: Lustre:
15599:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from f34601eb-c4bd-2d2f-ae73-0dde3
00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active
RPCs
May 10 08:20:27 dadbdd01 kernel: LustreError:
15599:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff8101a4eeba00 x236853130/t0 o3
8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens
304/200 e 0 to 0 dl 1241968927 ref 1 fl Interpret:/0/0 rc -16/0
May 10 08:20:52 dadbdd01 kernel: Lustre:
29999:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting
May 10 08:20:52 dadbdd01 kernel: Lustre:
29999:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse
reconnection from f34601eb-c4bd-2d2f-ae73-0dde3
00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active
RPCs
May 10 08:20:52 dadbdd01 kernel: LustreError:
29999:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-16)  req at ffff81026a206000 x236853142/t0 o3
8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens
304/200 e 0 to 0 dl 1241968952 ref 1 fl Interpret:/0/0 rc -16/0
May 10 08:21:03 dadbdd01 kernel: LustreError:
0:0:(ldlm_lockd.c:234:waiting_locks_callback()) ### lock callback timer
expired after 1244s: evicting client at 10.
229.168.37 at tcp  ns: mds-farmres-MDT0000_UUID lock:
ffff81082085e200/0xa87c36a1c15f5013 lrc: 1/0,0 mode: CR/CR res:
93848281/2073576669 bits 0x3 rrc: 7 type: IBT
flags: 4000020 remote: 0xff97ebb5aa42e7c4 expref: 69961 pid 15621
May 10 08:21:17 dadbdd01 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 30058: it was inactive for 200s
May 10 08:21:17 dadbdd01 kernel: Lustre:
0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process
30058
May 10 08:21:17 dadbdd01 kernel: ll_mdt_61     D ffff8104fcbb0528     0
30058      1         30210 30057 (L-TLB)
May 10 08:21:17 dadbdd01 kernel:  ffff810183e1f700 0000000000000046
ffffffff885e0026 ffffffff885e07f0
May 10 08:21:17 dadbdd01 kernel:  000000000000000a ffff81082b91c7a0
ffff81082fe9e100 0004b26ed485e4e0
May 10 08:21:17 dadbdd01 kernel:  000000000000293f ffff81082b91c988
ffff810100000005 ffffffff8003b127
May 10 08:21:17 dadbdd01 kernel: Call Trace:
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8003b127>]
remove_wait_queue+0x1c/0x2c
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff80088432>]
default_wake_function+0x0/0xe
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff885d5c6b>]
:libcfs:lbug_with_loc+0xbb/0xc0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff889a28d7>]
:mds:mds_open+0x1f57/0x322b
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff887a32a1>]
:ksocklnd:ksocknal_queue_tx_locked+0x4f1/0x550
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff886e319e>]
:ptlrpc:lock_res_and_lock+0xbe/0xe0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8897fbf9>]
:mds:mds_reint_rec+0x1d9/0x2b0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff889a5b43>]
:mds:mds_open_unpack+0x2f3/0x410
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff889a422f>]
:mds:mds_update_unpack+0x20f/0x2b0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8897265a>]
:mds:mds_reint+0x35a/0x420
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88971432>]
:mds:fixup_handle_for_resent_req+0x52/0x200
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88976303>]
:mds:mds_intent_policy+0x453/0xc10
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff886eb386>]
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff886e8be6>]
:ptlrpc:ldlm_lock_enqueue+0x186/0x990
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff886e5dad>]
:ptlrpc:ldlm_lock_create+0x9ad/0x9e0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88709e20>]
:ptlrpc:ldlm_server_completion_ast+0x0/0x5b0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88707745>]
:ptlrpc:ldlm_handle_enqueue+0xc95/0x1280
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8870a3d0>]
:ptlrpc:ldlm_server_blocking_ast+0x0/0x6b0
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8897ab07>]
:mds:mds_handle+0x4047/0x4d10
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8014090e>] __next_cpu+0x19/0x28
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff80073331>]
smp_send_reschedule+0x4e/0x53
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88682d31>]
:obdclass:class_handle2object+0xd1/0x160
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88721b7f>]
:ptlrpc:lustre_msg_get_conn_cnt+0x4f/0x100
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8872be9a>]
:ptlrpc:ptlrpc_check_req+0x1a/0x110
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8872dfe2>]
:ptlrpc:ptlrpc_server_handle_request+0x992/0x1030
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8006b6c9>]
do_gettimeofday+0x50/0x92
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff885de3d6>]
:libcfs:lcw_update_time+0x16/0x100
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88730efc>]
:ptlrpc:ptlrpc_main+0xdcc/0xf50
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff80088432>]
default_wake_function+0x0/0xe
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff88730130>]
:ptlrpc:ptlrpc_main+0x0/0xf50
May 10 08:21:17 dadbdd01 kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11
May 10 08:21:17 dadbdd01 kernel:
May 10 08:21:17 dadbdd01 kernel: LustreError: dumping log to
/tmp/lustre-log.1241968877.30058

Thanks
Anil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090511/b95e64d4/attachment.htm>


More information about the lustre-discuss mailing list