[Lustre-discuss] freezing

Oleg Drokin Oleg.Drokin at Sun.COM
Thu Dec 6 06:31:39 PST 2007


Hello!

On Dec 6, 2007, at 3:30 AM, Papp Tamas wrote:
> This is the messages log:
> Dec  5 11:23:34 node3 heartbeat: [3166]: info: These are nothing to  
> worry about.
> Dec  5 22:33:24 node3 syslogd 1.4.1: restart.
> You can see, there is absolutely nothing for whole afternoon in logs,
> I mean it's too few, maybe could happen something to it? What?

Do you have serial console or something similar to confirm there were  
not some
final messages from kernel before death?

> Today morning I see on meta1's console an oops message, so I rebooted
> the whole cluster.

That's not an oops, that's a message from lustre indicating that  
certain thread spent
too much time doing something (in this case waiting for a response  
from ost).
This looks as a known bug that was already fixed in a more recent  
release (cannot
recall specific bug number off the top of my head).

> Now everything seems to be working, except some oops in meta1's
> messages log like this:

again this is not the oops.

> What could I do with this?

Probably at this point you should try upgrading to 1.6.3, and then  
1.6.4 that should be
out quite soon.

Bye,
     Oleg




More information about the lustre-discuss mailing list