[Lustre-discuss] brief 'hangs' on file operations

Tina Friedrich Tina.Friedrich at diamond.ac.uk
Thu Sep 2 06:43:09 PDT 2010


Hello List,

we are trying to debug some issues - or possibly different 
manifestations of the same issue - on our file system.

Causing most grieve at the moment is that we sometimes see delays 
writing files. From the writing clients end, it simply looks as if I/O 
stops for a while (we've seen 'pauses' of anything up to 10 seconds). 
This appears to be independent of what client does the writing, and 
software doing the writing. We investigated this a bit using strace and 
dd; the 'slow' calls appear to always be either open, write, or close 
calls. Usually, these take well below 0.001s; in around 0.5% or 1% of 
cases, they take up to multiple seconds. It does not seem to be 
associated with any specific OST, OSS, client or anything; there is 
nothing in any log files or any exceptional load on MDS or OSS or any of 
the clients.

The other issue is that we frequently see delays when trying to read a 
file. I sometimes takes more than 60s for a file to be visible on a 
machine after the initial write on a different machine has completed 
(both machines being Lustre clients). Again, there is nothing in the 
logs, nor exceptional load on any of the machines.

Any ideas what this could be? How can we debug this?

Clients and servers are using Lustre 1.6.7.2.ddn3.5.

Regards,
Tina

-- 
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442



More information about the lustre-discuss mailing list