[Lustre-discuss] OSS crashes

Brian J. Murrell Brian.Murrell at Sun.COM
Wed Jul 23 08:42:58 PDT 2008


On Wed, 2008-07-23 at 17:20 +0200, Thomas Roth wrote:
> Hi,

Hello,

> Well, in these cases the machine is simply dead: the jobs writing via 
> Lustre have stopped with write failed: Input/output error,

Is there any messages on the console of such a machine when it's hung?
Can you get a stack trace (i.e. sysrq-t) of the processes on the hung
machine?

> I can't get 
> into the machine neither via ssh nor via console, the only thing I can 
> do is a hard reset. That's why I suspected the hardware first.

Indeed.  The (serial) console is the best source of information in this
sort of case.  Hopefully you are logging it and can retrieve the
messages prior to the hang.

> Where else could I look for overloaded hardware capacities?

Not sure.  That's quite hardware specific.

> Any way to 
> find out about  the number of OST threads our hardware can handle?

Well, you could run some iokit benchmarks and find out where your
plateau in performance is WRT to increasing threads to a single OST.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080723/51f575aa/attachment.pgp>


More information about the lustre-discuss mailing list