[Lustre-discuss] Luster clients getting evicted
Tom.Wang
Tom.Wang at Sun.COM
Fri Feb 8 11:47:53 PST 2008
Hello,
m45_amp214_om D 0000000000000000 0 2587 1 31389 2586
(NOTLB)
00000101f6b435f8 0000000000000006 000001022c7fc030 0000000000000001
00000100080f1a40 0000000000000246 00000101f6b435a8 0000000380136025
00000102270a1030 00000000000000d0
Call Trace:<ffffffffa0216e79>{:lnet:LNetPut+1689}
<ffffffff8030e45f>{__down+147}
<ffffffff80134659>{default_wake_function+0}
<ffffffff8030ff7d>{__down_failed+53}
<ffffffffa04292e1>{:lustre:.text.lock.file+5}
<ffffffffa044b12e>{:lustre:ll_mdc_blocking_ast+798}
<ffffffffa02c8eb8>{:ptlrpc:ldlm_resource_get+456}
<ffffffffa02c3bbb>{:ptlrpc:ldlm_cancel_callback+107}
<ffffffffa02da615>{:ptlrpc:ldlm_cli_cancel_local+213}
<ffffffffa02c3c48>{:ptlrpc:ldlm_lock_addref_internal_nolock+56}
<ffffffffa02c3dbc>{:ptlrpc:search_queue+284}
<ffffffffa02dbc03>{:ptlrpc:ldlm_cancel_list+99}
<ffffffffa02dc113>{:ptlrpc:ldlm_cancel_lru_local+915}
<ffffffffa02ca293>{:ptlrpc:ldlm_resource_putref+435}
<ffffffffa02dc2c9>{:ptlrpc:ldlm_prep_enqueue_req+313}
<ffffffffa0394e6f>{:mdc:mdc_enqueue+1023}
<ffffffffa02c1035>{:ptlrpc:lock_res_and_lock+53}
<ffffffffa0268730>{:obdclass:class_handle2object+224}
<ffffffffa02c5fea>{:ptlrpc:__ldlm_handle2lock+794}
<ffffffffa02c106f>{:ptlrpc:unlock_res_and_lock+31}
<ffffffffa02c5c03>{:ptlrpc:ldlm_lock_decref_internal+595}
<ffffffffa02c156c>{:ptlrpc:ldlm_lock_add_to_lru+140}
<ffffffffa02c1035>{:ptlrpc:lock_res_and_lock+53}
<ffffffffa02c6f0a>{:ptlrpc:ldlm_lock_decref+154}
<ffffffffa039617d>{:mdc:mdc_intent_lock+685}
<ffffffffa044ae10>{:lustre:ll_mdc_blocking_ast+0}
<ffffffffa02d85f0>{:ptlrpc:ldlm_completion_ast+0}
<ffffffffa044ae10>{:lustre:ll_mdc_blocking_ast+0}
<ffffffffa02d85f0>{:ptlrpc:ldlm_completion_ast+0}
<ffffffffa044b64b>{:lustre:ll_prepare_mdc_op_data+139}
<ffffffffa0418a32>{:lustre:ll_intent_file_open+450}
<ffffffffa044ae10>{:lustre:ll_mdc_blocking_ast+0}
<ffffffff80192006>{__d_lookup+287}
<ffffffffa0419724>{:lustre:ll_file_open+2100}
<ffffffffa0428a18>{:lustre:ll_inode_permission+184}
<ffffffff80179bdb>{sys_access+349}
<ffffffff8017a1ee>{__dentry_open+201}
<ffffffff8017a3a9>{filp_open+95} <ffffffff80179bdb>{sys_access+349}
<ffffffff801f00b5>{strncpy_from_user+74}
<ffffffff8017a598>{sys_open+57}
<ffffffff8011026a>{system_call+126}
It seems blocking_ast process was blocked here. Could you dump the
lustre/llite/namei.o by objdump -S lustre/llite/namei.o and send to me?
Thanks
WangDi
Brock Palen wrote:
>>> On Feb 7, 2008, at 11:09 PM, Tom.Wang wrote:
>>>>> MDT dmesg:
>>>>>
>>>>> LustreError: 9042:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
>>>>> processing error (-107) req at 000001002b
>>>>> 52b000 x445020/t0 o400-><?>@<?>:-1 lens 128/0 ref 0 fl
>>>>> Interpret:/0/0 rc -107/0
>>>>> LustreError: 0:0:(ldlm_lockd.c:210:waiting_locks_callback()) ###
>>>>> lock callback timer expired: evicting cl
>>>>> ient
>>>>> 2faf3c9e-26fb-64b7-ca6c-7c5b09374e67 at NET_0x200000aa4008d_UUID nid
>>>>> 10.164.0.141 at tcp ns: mds-nobackup
>>>>> -MDT0000_UUID lock: 00000100476df240/0xbc269e05c512de3a lrc:
>>>>> 1/0,0 mode: CR/CR res: 11240142/324715850 bi
>>>>> ts 0x5 rrc: 2 type: IBT flags: 20 remote: 0x4e54bc800174cd08
>>>>> expref: 372 pid 26925
>>>>>
>>>> The client was evicted because of this lock can not be released on
>>>> client
>>>> on time. Could you provide the stack strace of client at that time?
>>>>
>>>> I assume increase obd_timeout could fix your problem. Then maybe
>>>> you should wait 1.6.5 released, including a new feature
>>>> adaptive_timeout,
>>>> which will adjust the timeout value according to the network
>>>> congestion
>>>> and server load. And it should help your problem.
>>>
>>> Waiting for the next version of lustre might be the best thing. I
>>> had upped the timeout a few days back but the next day i had errors
>>> on the MDS box. I have switched it back:
>>>
>>> lctl conf_param nobackup-MDT0000.sys.timeout=300
>>>
>>> I would love to give you that trace but I don't know how to get it.
>>> Is there a debug option to turn on in the clients?
>> You can get that by echo t > /proc/sysrq-trigger on client.
>>
> Cool command, output of the client is attached. The four processes
> m45_amp214_om, is the application that hung when working off of
> luster. you can see its stuck in IO state.
>
>>
>>
>>
>>
>>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list