[Lustre-discuss] brief 'hangs' on file operations
Tina.Friedrich at diamond.ac.uk
Thu Sep 2 06:43:09 PDT 2010
we are trying to debug some issues - or possibly different
manifestations of the same issue - on our file system.
Causing most grieve at the moment is that we sometimes see delays
writing files. From the writing clients end, it simply looks as if I/O
stops for a while (we've seen 'pauses' of anything up to 10 seconds).
This appears to be independent of what client does the writing, and
software doing the writing. We investigated this a bit using strace and
dd; the 'slow' calls appear to always be either open, write, or close
calls. Usually, these take well below 0.001s; in around 0.5% or 1% of
cases, they take up to multiple seconds. It does not seem to be
associated with any specific OST, OSS, client or anything; there is
nothing in any log files or any exceptional load on MDS or OSS or any of
The other issue is that we frequently see delays when trying to read a
file. I sometimes takes more than 60s for a file to be visible on a
machine after the initial write on a different machine has completed
(both machines being Lustre clients). Again, there is nothing in the
logs, nor exceptional load on any of the machines.
Any ideas what this could be? How can we debug this?
Clients and servers are using Lustre 220.127.116.11.ddn3.5.
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
More information about the lustre-discuss