[Lustre-discuss] Too many client eviction

DEGREMONT Aurelien aurelien.degremont at cea.fr
Tue May 3 10:09:42 PDT 2011


Correct me if I'm wrong, but when I'm looking at Lustre manual, it said 
that client is adapting its timeout, but not the server. I'm understood 
that server->client RPC still use the old mechanism, especially for our 
case where it seems server is revoking a client lock (ldlm_timeout is 
used for that?) and client did not respond.

I forgot to say that we have LNET routers also involved for some cases.

Thank you

Aurélien

Andreas Dilger a écrit :
> I don't think ldlm_timeout and obd_timeout have much effect when AT is enabled. I believe that LLNL has some adjusted tunables for AT that might help for you (increased at_min, etc).
>
> Hopefully Chris or someone at LLNL can comment. I think they were also documented in bugzilla, though I don't know the bug number. 
>
> Cheers, Andreas
>
> On 2011-05-03, at 6:59 AM, DEGREMONT Aurelien <aurelien.degremont at cea.fr> wrote:
>
>   
>> Hello
>>
>> We often see some of our Lustre clients being evicted abusively (clients 
>> seem healthy).
>> The pattern is always the same:
>>
>> All of this on Lustre 2.0, with adaptative timeout enabled
>>
>> 1 - A server complains about a client :
>> ### lock callback timer expired... after 25315s...
>> (nothing on client)
>>
>> (few seconds later)
>>
>> 2 - The client receives -107 to a obd_ping for this target
>> (server says "@@@processing error 107")
>>
>> 3 - Client realize its connection was lost.
>> Client notices it was evicted.
>> It reconnects.
>>
>> (To be sure) When client is evicted, all undergoing I/O are lost, no 
>> recovery will be done for that?
>>
>> We are thinking to increase timeout to give more time to clients to 
>> answer the ldlm revocation.
>> (maybe it is just too loaded)
>> - Is ldlm_timeout enough to do so?
>> - Do we need to also change obd_timeout in accordance? Is there a risk 
>> to trigger new timeouts if we just change ldlm_timeout (cascading timeout).
>>
>> Any feedback in this area is welcomed.
>>
>> Thank you
>>
>> Aurélien Degrémont
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>     




More information about the lustre-discuss mailing list