[Lustre-discuss] 1.6.4.1 - active client evicted

Niklas Edmundsson Niklas.Edmundsson at hpc2n.umu.se
Fri Jan 11 01:00:57 PST 2008


On Thu, 10 Jan 2008, Oleg Drokin wrote:

>> Ignoring prediction from 130.239.78.233 at tcp of 130.239.78.238 at tcp
>> down 4829687047 seconds in the future
>
> This is harmless message that would be shut in 1.6.5
> You can see details in bug 14300

OK.

> As for your original message - hard to tell what caused it. We can see
> that servers decided the client was unresponsive.
> Could it be some network packet lost for example?
> Were not there any other messages at around 12:20 and before that
> (that's when it was evicted) on a client?
> Because at 12:40 - that's already 20 minutes past eviction.

Thats the weird thing - there's nothing lustre-related logged before 
that on the client that day! The client seems oblivious to the fact 
that it's been evicted, and this was while it was doing IO... Also the 
clocks are synced by ntp, and thus not off by much...

I could accept network errors etc as an explanation, but then I would 
have assumed that the client would have logged stuff, tried 
reconnecting etc... As it was it was simply dead in the water until I 
rebooted the thing.

What mechanism does Lustre use to check if a peer is up? Since lctl 
ping worked between all nodes I suspect it uses something more 
involved. Can I trigger the same check using lctl?


/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke at hpc2n.umu.se
---------------------------------------------------------------------------
  "Wow, Veronica, he totally wants to protect and serve you." - Meg
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=




More information about the lustre-discuss mailing list