[Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Simon Kelley
simon at thekelleys.org.uk
Wed Feb 11 01:45:51 PST 2009
Oleg Drokin wrote:
> Hello!
>
> On Feb 10, 2009, at 12:46 PM, Simon Kelley wrote:
>> If, by "the complete event" you mean the "received cancel for unknown
>> cookie", there's not much more to tell. Grepping through the last
>> month's server logs shows that there are bursts of typically between 3
>> and 7 messages, at the same time and from the same client. After a
>> gap, the same thing but from a different client. The number can be as
>> low a one, and up to ten. They look to be related to client workload,
>> at a guess.
>
> Ok, so you do not see a pattern of this unknown cookie message followed
> by eviction in some time like 100 seconds? That's what my question about.
>
> Bye,
> Oleg
>
No, there are plenty of examples of the unknown cookie message and no
eviction or other problem.
It's possible that there is a pattern where there's a run on "unknown
lock cookie" for a particular client and then a couple of minutes later
"lock callback timer expired: evicting client" messages for a
_different_ client, but the signal is not strong.
It does look like the clusters of "unknown lock cookie" may be related
to striping. If a file striped across several OSTs experiences the
problem then there is a cluster of the messages all referencing the
same client node, one from each OST.
I'm working on reproducing the problem in a controlled way and getting
the information you asked for.
Cheers,
Simon.
More information about the lustre-discuss
mailing list