[Lustre-discuss] Too many client eviction
DEGREMONT Aurelien
aurelien.degremont at cea.fr
Tue May 3 05:59:40 PDT 2011
Hello
We often see some of our Lustre clients being evicted abusively (clients
seem healthy).
The pattern is always the same:
All of this on Lustre 2.0, with adaptative timeout enabled
1 - A server complains about a client :
### lock callback timer expired... after 25315s...
(nothing on client)
(few seconds later)
2 - The client receives -107 to a obd_ping for this target
(server says "@@@processing error 107")
3 - Client realize its connection was lost.
Client notices it was evicted.
It reconnects.
(To be sure) When client is evicted, all undergoing I/O are lost, no
recovery will be done for that?
We are thinking to increase timeout to give more time to clients to
answer the ldlm revocation.
(maybe it is just too loaded)
- Is ldlm_timeout enough to do so?
- Do we need to also change obd_timeout in accordance? Is there a risk
to trigger new timeouts if we just change ldlm_timeout (cascading timeout).
Any feedback in this area is welcomed.
Thank you
Aurélien Degrémont
More information about the lustre-discuss
mailing list