[Lustre-discuss] strange slowdown

Thu Dec 13 08:36:33 PST 2007

Hello!

On Dec 13, 2007, at 11:24 AM, Aaron Knister wrote:

> Please bear with me while I try to explain this problem. It's very
> strange.
> I have a 33TB lustre file system with 5.6 underlying LUNs. The
> interconnect is infiniband. Any write I/O to the disk causes the mount
> to hang, and the underlying disk starts doing lots and lots of tiny
> reads for about 10 minutes. There is only one client mounted and this
> small I/O continues after killling the client. I am running lustre
> 1.6.3 with the latest rhel5 kernel on rhel5. I cannot find any
> suggestive error messages and am at a loss. It's a production
> filesystem and it's pretty much unusable.

This is native client, not nfs client, I presume?
Also is write is a sort of append write (to the end of files) or a
rewrite (into existing files in the middle)?
How do you do the write?

you can enable some debug logging on your OST and/or clients
(echo +vfstrace +rpctrace +inode >/proc/sys/lnet/debug), replicate
a problem, then do lctl dk >somewhere (important as the debug buffer is
cleaned by this) and inspect this file where you dumped the log to
see what are those reads and where do they come from.

Bye,
     Oleg