[Lustre-discuss] OSS errors

Johnlya johnlya at gmail.com
Mon Aug 4 00:17:52 PDT 2008


Lustre version is 1.6.5.1

[root at OSS1_MASTER ~]# uname -a
Linux OSS1_MASTER 2.6.9-67.0.7.EL_lustre.1.6.5smp #1 SMP Mon May 12
22:02:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux


On Mon, 2008-08-04, at 03:06 PM, Johnlya <john... at gmail.com> wrote:
> When the system resouce of Client is not enough, the OSS display some
> errors:
>
> Lustre: lenovo-OST0002: haven't heard from client 24bdc118-
> cf78-9d56-190c-bb9a2836bd41 (at 192.168.1.251 at tcp) in 227 seconds. I
> think it's dead, and I am evicting it.
> Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo-
> OST0000: 24bdc118-cf78-9d56-190c-bb9a2836bd41 reconnecting
> Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1
> previous similar message
> LustreError: 5881:0:(ldlm_resource.c:767:ldlm_resource_add())
> lvbo_init failed for resource 2207359: rc -2
> LustreError: 6969:0:(ldlm_lock.c:430:__ldlm_handle2lock())
> ASSERTION(lock->l_resource != NULL) failed
> LustreError: 6969:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG
> Lustre: 6969:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing
> stack for process 6969
> ldlm_cn_13    R  running task       0  6969      1          6970  6968
> (L-TLB)
> 0000000000000000 ffffffffa031b4c9 0000010005fe1a00 0000000000000000
>        00000100bffab240 ffffffffa01ee45e 0000010005eed598
> 0000000000000001
>        0000010082899ea0 0000000000000000
> Call Trace:<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request
> +2457}
>        <ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
> <ffffffff80133855>{__wake_up_common+67}
>        <ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
> <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
>        <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
> <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
>        <ffffffff80110de3>{child_rip+8}
> <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
>        <ffffffff80110ddb>{child_rip+0}
> LustreError: dumping log to /tmp/lustre-log.1216640103.6969
> Lustre: 6495:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo-
> OST0002: 440eafce-9f15-16a6-4764-7f54d92f9204 reconnecting
> Lustre: 6495:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 2
> previous similar messages
> Lustre: 6495:0:(ldlm_lib.c:760:target_handle_connect()) lenovo-
> OST0002: refuse reconnection from
> 440eafce-9f15-16a6-4764-7f54d92f9... at 192.168.1.102@tcp to
> 0x0000010058fde000; still busy with 2 active RPCs
> LustreError: 6495:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@
> processing error (-16)  req at 0000010137f4b400 x68793117/t0 o8->440eafce-9f15-16a6-4764-7f54d92f9204 at NET_0x20000c0a80166_UUID:0/0
>
> lens 304/200 e 0 to 0 dl 1216640303 ref 1 fl Interpret:/0/0 rc -16/0
> Lustre: Request x103723701 sent from lenovo-OST0002 to NID
> 192.168.1.102 at tcp 20s ago has timed out (limit 20s).
> Lustre: Skipped 6 previous similar messages
> LustreError: 138-a: lenovo-OST0002: A client on nid 192.168.1.102 at tcp
> was evicted due to a lock glimpse callback to 192.168.1.102 at tcp timed
> out: rc -110
> Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 6969:
> it was inactive for 600s
> Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack
> for process 6969
> ldlm_cn_13    D 0000000000000001     0  6969      1          6970
> 6968 (L-TLB)
> 000001008289db38 0000000000000046 0000000000000000 ffffffffa0201728
>        0000000000000700 000001008289dac8 00000000000001b0
> 00000000a01e4a58
>        0000010083163030 00000000000002d9
> Call Trace:<ffffffffa01e9014>{:libcfs:libcfs_debug_dumplog+292}
>        <ffffffffa01e4bb6>{:libcfs:lbug_with_loc+182}
> <ffffffffa01ebb44>{:libcfs:libcfs_assertion_failed+84}
>        <ffffffffa02d44e8>{:ptlrpc:__ldlm_handle2lock+328}
>        <ffffffffa03141f4>{:ptlrpc:lustre_msg_set_timeout+52}
>        <ffffffffa03124c7>{:ptlrpc:lustre_msg_get_flags+87}
>        <ffffffffa02f182d>{:ptlrpc:ldlm_request_cancel+525}
>        <ffffffffa030fd79>{:ptlrpc:lustre_pack_reply+41}
> <ffffffffa0315890>{:ptlrpc:lustre_swab_ldlm_request+0}
>        <ffffffffa02f2e34>{:ptlrpc:ldlm_handle_cancel+532}
>        <ffffffffa0312dcf>{:ptlrpc:lustre_msg_get_opc+95}
> <ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
>        <ffffffffa02f53ba>{:ptlrpc:ldlm_cancel_handler+730}
>        <ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
> <ffffffffa0312baf>{:ptlrpc:lustre_msg_get_handle+79}
>        <ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
>        <ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
> <ffffffff80133855>{__wake_up_common+67}
>        <ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
> <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
>        <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
> <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
>        <ffffffff80110de3>{child_rip+8}
> <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
>        <ffffffff80110ddb>{child_rip+0}
> LustreError: dumping log to /tmp/lustre-log.1216640703.6969
> LustreError: 6701:0:(ldlm_resource.c:767:ldlm_resource_add())
> lvbo_init failed for resource 3973448: rc -2
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list