[Lustre-discuss] Luster clients getting evicted

Tom.Wang Tom.Wang at Sun.COM
Fri Feb 8 10:19:39 PST 2008


Brock Palen wrote:
>
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> On Feb 7, 2008, at 11:09 PM, Tom.Wang wrote:
>>> MDT dmesg:
>>>
>>> LustreError: 9042:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@  
>>> processing error (-107)  req at 000001002b
>>> 52b000 x445020/t0 o400-><?>@<?>:-1 lens 128/0 ref 0 fl 
>>> Interpret:/0/0  rc -107/0
>>> LustreError: 0:0:(ldlm_lockd.c:210:waiting_locks_callback()) ### 
>>> lock  callback timer expired: evicting cl
>>> ient 2faf3c9e-26fb-64b7-ca6c-7c5b09374e67 at NET_0x200000aa4008d_UUID  
>>> nid 10.164.0.141 at tcp  ns: mds-nobackup
>>> -MDT0000_UUID lock: 00000100476df240/0xbc269e05c512de3a lrc: 1/0,0  
>>> mode: CR/CR res: 11240142/324715850 bi
>>> ts 0x5 rrc: 2 type: IBT flags: 20 remote: 0x4e54bc800174cd08 
>>> expref:  372 pid 26925
>>>
>> The client was evicted because of this lock can not be released on 
>> client
>> on time. Could you provide the stack strace of client at that time?
>>
>> I assume increase obd_timeout could fix your problem. Then maybe
>> you should wait 1.6.5 released, including a new feature 
>> adaptive_timeout,
>> which will adjust the timeout value according to the network congestion
>> and server load. And it should help your problem.
>
> Waiting for the next version of lustre might be the best thing.  I had 
> upped the timeout a few days back but the next day i had errors on the 
> MDS box.  I have switched it back:
>
> lctl conf_param nobackup-MDT0000.sys.timeout=300
>
> I would love to give you that trace but I don't know how to get it.  
> Is there a debug option to turn on in the clients? 
You can get that by echo t > /proc/sysrq-trigger on client.







More information about the lustre-discuss mailing list