[Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.

Simon Kelley simon at thekelleys.org.uk
Wed Feb 11 01:45:51 PST 2009


Oleg Drokin wrote:
> Hello!
> 
> On Feb 10, 2009, at 12:46 PM, Simon Kelley wrote:
>> If, by "the complete event" you mean the "received cancel for unknown 
>> cookie", there's not much more to tell. Grepping through the last 
>> month's server logs shows that there are bursts of typically between 3 
>> and 7 messages, at the same time and from the same client. After a 
>> gap, the same thing but from a different client. The number can be as 
>> low a one, and up to ten. They look to be related to client workload, 
>> at a guess.
> 
> Ok, so you do not see a pattern of this unknown cookie message followed 
> by eviction in some time like 100 seconds? That's what my question about.
> 
> Bye,
>     Oleg
> 

No, there are plenty of examples of the unknown cookie message and no
eviction or other problem.

It's possible that there is a pattern where there's a run on "unknown
lock cookie" for a particular client and then a couple of minutes later
"lock callback timer expired: evicting client" messages for a
_different_ client, but the signal is not strong.

It does look like the clusters of "unknown lock cookie" may be related
to striping. If a file striped across several OSTs experiences the
problem then there is a cluster of the messages  all referencing the
same client node, one from each OST.

I'm working on reproducing the problem in a controlled way and getting
the information you asked for.


Cheers,

Simon.





More information about the lustre-discuss mailing list